Abstract
Multi-document summarization techniques aim to reduce the documents into a small set of words or paragraphs that convey the main meaning of the original documents. Many approaches for multi-document summarization have used probability based methods and machine learning techniques to summarize multiple documents sharing a common topic at the same time. However, these techniques fail to semantically analyze proper nouns and newly-coined words because most of them depend on old-fashioned dictionary or thesaurus. To overcome these drawbacks, we propose a novel multi-document summarization technique which employs the tag cluster on Flickr, a kind of folksonomy systems, for detecting key sentences from multiple documents. We first create a word frequency table for analyzing the semantics and contribution of words by using HITS algorithm. Then, by exploiting tag clusters, we analyze the semantic relationship between words in the word frequency table. The experimental results on TAC 2008, 2009 data sets demonstrate the improvement of our proposed framework over existing summarization systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mani, I.: Automatic Summarization. John Benjamins (2001)
Barzilay, R., McKeown, K.R., Elhadad, M.: Information fusion in the context of multi-document summarization. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, ACL 1999, pp. 550–557. Association for Computational Linguistics (1999)
Knight, K., Marcu, D.: Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139, 91–107 (2002)
McKeown, K.R., Klavans, J.L., Hatzivassiloglou, V., et al.: Towards multidocument summarization by reformulation: Progress and prospects, pp. 453–460. John Wiley & Sons Ltd. (1999)
Hennig, L., Labor, D.: Topic-based multi-document summarization with probabilistic latent semantic analysis. In: Proceedings of the International Conference RANLP, pp. 144–149 (2009)
Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2008, pp. 299–306. ACM (2008)
Dang, C., Luo, X.: WordNet-based Document Summarization. In: Proceeding of the 7th WSEAS International Conference on Applied Computer & Applied Computational Science (ACACOS 2008), pp. 383–387 (2008)
Zhu, J., Wang, C., He, X., et al.: Tag-oriented document summarization. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 1195–1196. ACM (2009)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM (JACM) 46, 604–632 (1999)
Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language, NAACL 2003, pp. 71–78. Association for Computational Linguistics (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Heu, JU., Jeong, JW., Qasim, I., Joo, YD., Cho, JM., Lee, DH. (2013). Multi-document Summarization Exploiting Semantic Analysis Based on Tag Cluster. In: Li, S., et al. Advances in Multimedia Modeling. Lecture Notes in Computer Science, vol 7733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35728-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-35728-2_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35727-5
Online ISBN: 978-3-642-35728-2
eBook Packages: Computer ScienceComputer Science (R0)