Abstract
The PageRank model has been successfully exploited for multi-document summarization by making use of the link relationships between sentences in the document set, under the assumption that all the sentences are indistinguishable from each other. However, different documents in the set are usually not equally important, and the sentences in an important document are deemed more salient than the sentences in a trivial document. This paper proposes the document-based HITS model (DocHITS) to fully leverage the document-level information by considering documents and sentences as hubs and authorities. Experimental results on the DUC2001 and DUC2002 datasets demonstrate the good effectiveness of our proposed model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrival. ACM Press and Addison Wesley (1999)
Daumé, H., Marcu, D.: Bayesian query-focused summarization. In: Proceedings of COLING-ACL 2006 (2006)
Erkan, G., Radev, D.: LexPageRank: prestige in multi-document text summarization. In: Proceedings of EMNLP 2004 (2004)
Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing Text Documents: Sentence Selection and Evaluation Metrics. In: Proceedings of ACM SIGIR 1999 (1999)
Harabagiu, S., Lacatusu, F.: Topic themes for multi-document summarization. In: Proceedings of SIGIR 2005 (2005)
Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G.B., Zhang, X.: Cross-document summarization by concept classification. In: Proceedings of SIGIR 2002 (2002)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Kraaij, W., Spitters, M., van der Heijden, M.: Combining a mixture language model and Naïve Bayes for multi-document summarization. In: SIGIR2001 Workshop on Text Summarization
Leuski, A., Lin, C.-Y., Hovy, E.: iNeATS: interactive multi-document summarization. In: Proceedings of ACL 2003 (2003)
Lin, C.-Y., Hovy, E.H.: From Single to Multi-document Summarization: A Prototype System and its Evaluation. In: Proceedings of ACL 2002 (2002)
Lin, C.-Y., Hovy, E.H.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL 2003 (2003)
Mani, I., Bloedorn, E.: Summarizing Similarities and Differences Among Related Documents. Information Retrieval 1(1) (2000)
Marcu, D.: Discourse-based summarization in DUC–2001. In: SIGIR 2001 Workshop on Text Summarization (2001)
Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. In: Proceedings of IJCNLP 2005 (2005)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Libraries (1998)
Radev, D.R., Jing, H.Y., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)
Wan, X., Yang, J.: Improved affinity graph based multi-document summarization. In: Proceedings of HLT-NAACL 2006 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wan, X. (2008). Document-Based HITS Model for Multi-document Summarization. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-89197-0_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89196-3
Online ISBN: 978-3-540-89197-0
eBook Packages: Computer ScienceComputer Science (R0)