Skip to main content

Document-Based HITS Model for Multi-document Summarization

  • Conference paper
PRICAI 2008: Trends in Artificial Intelligence (PRICAI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5351))

Included in the following conference series:

Abstract

The PageRank model has been successfully exploited for multi-document summarization by making use of the link relationships between sentences in the document set, under the assumption that all the sentences are indistinguishable from each other. However, different documents in the set are usually not equally important, and the sentences in an important document are deemed more salient than the sentences in a trivial document. This paper proposes the document-based HITS model (DocHITS) to fully leverage the document-level information by considering documents and sentences as hubs and authorities. Experimental results on the DUC2001 and DUC2002 datasets demonstrate the good effectiveness of our proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrival. ACM Press and Addison Wesley (1999)

    Google Scholar 

  2. Daumé, H., Marcu, D.: Bayesian query-focused summarization. In: Proceedings of COLING-ACL 2006 (2006)

    Google Scholar 

  3. Erkan, G., Radev, D.: LexPageRank: prestige in multi-document text summarization. In: Proceedings of EMNLP 2004 (2004)

    Google Scholar 

  4. Goldstein, J., Kantrowitz, M., Mittal, V., Carbonell, J.: Summarizing Text Documents: Sentence Selection and Evaluation Metrics. In: Proceedings of ACM SIGIR 1999 (1999)

    Google Scholar 

  5. Harabagiu, S., Lacatusu, F.: Topic themes for multi-document summarization. In: Proceedings of SIGIR 2005 (2005)

    Google Scholar 

  6. Hardy, H., Shimizu, N., Strzalkowski, T., Ting, L., Wise, G.B., Zhang, X.: Cross-document summarization by concept classification. In: Proceedings of SIGIR 2002 (2002)

    Google Scholar 

  7. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  8. Kraaij, W., Spitters, M., van der Heijden, M.: Combining a mixture language model and Naïve Bayes for multi-document summarization. In: SIGIR2001 Workshop on Text Summarization

    Google Scholar 

  9. Leuski, A., Lin, C.-Y., Hovy, E.: iNeATS: interactive multi-document summarization. In: Proceedings of ACL 2003 (2003)

    Google Scholar 

  10. Lin, C.-Y., Hovy, E.H.: From Single to Multi-document Summarization: A Prototype System and its Evaluation. In: Proceedings of ACL 2002 (2002)

    Google Scholar 

  11. Lin, C.-Y., Hovy, E.H.: Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics. In: Proceedings of HLT-NAACL 2003 (2003)

    Google Scholar 

  12. Mani, I., Bloedorn, E.: Summarizing Similarities and Differences Among Related Documents. Information Retrieval 1(1) (2000)

    Google Scholar 

  13. Marcu, D.: Discourse-based summarization in DUC–2001. In: SIGIR 2001 Workshop on Text Summarization (2001)

    Google Scholar 

  14. Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. In: Proceedings of IJCNLP 2005 (2005)

    Google Scholar 

  15. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Libraries (1998)

    Google Scholar 

  16. Radev, D.R., Jing, H.Y., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing and Management 40, 919–938 (2004)

    Article  MATH  Google Scholar 

  17. Wan, X., Yang, J.: Improved affinity graph based multi-document summarization. In: Proceedings of HLT-NAACL 2006 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wan, X. (2008). Document-Based HITS Model for Multi-document Summarization. In: Ho, TB., Zhou, ZH. (eds) PRICAI 2008: Trends in Artificial Intelligence. PRICAI 2008. Lecture Notes in Computer Science(), vol 5351. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89197-0_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89197-0_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89196-3

  • Online ISBN: 978-3-540-89197-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics