Document Summarization via Self-Present Sentence Relevance Model

Li, Xiaodong; Zhu, Shanfeng; Xie, Haoran; Li, Qing

doi:10.1007/978-3-642-37450-0_24

Xiaodong Li²¹,
Shanfeng Zhu²²,
Haoran Xie²¹ &
…
Qing Li²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7826))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1836 Accesses
2 Citations

Abstract

Automatic document summarization is always attractive to computer science researchers. A novel approach is proposed to address this topic and mainly focuses on the summarization of plain documents. Conventional summarization methods do not fully use the inter-sentence relevance that is not preserved during the processing. In contrast, to tackle the problem and incorporate the latent relations among sentences, our approach constructs relevance structures at sentence-level for plain documents and each sentence is scored with a significance value. Accordingly, important sentences “present” themselves automatically, and the summary paragraph is then generated by selecting top-k scored sentences. Convergence of the algorithm is proved, and experiment, which is conducted on two data sets (DUC 2006 and DUC 2007), shows that the proposed model gives convincing results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336. ACM (1998)
Google Scholar
Conroy, J.M., O’leary, D.P.: Text summarization via hidden markov models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 406–407. ACM (2001)
Google Scholar
Dou, S., Sun, J.-T., Li, H., Yang, Q., Chen, Z.: Document summarization using conditional random fields. In: Proceedings of IJCAI, vol. 7, pp. 2862–2867 (2007)
Google Scholar
Erkan, G., Radev, D.R.: LexRank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research 22(1), 457–479 (2004)
Google Scholar
Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25. ACM (2001)
Google Scholar
He, Z., Chen, C., Bu, J., Wang, C., Zhang, L., Cai, D., He, X.: Document summarization based on data reconstruction. In: Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)
Google Scholar
Jones, K.S.: Automatic summarising: The state of the art. Information Processing & Management 43(6), 1449–1481 (2007)
Article Google Scholar
Lin, C.-Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 71–78. Association for Computational Linguistics (2003)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Ma, T., Wan, X.: Multi-document Summarization Using Minimum Distortion. In: 2010 IEEE 10th International Conference on Data Mining, pp. 354–363. IEEE (2010)
Google Scholar
Mani, I., Bloedorn, E.: Multi-document summarization by graph search and matching. In: AAAI 1997 (1997)
Google Scholar
Mani, I., Klein, G., House, D., Hirschman, L., Firmin, T., Sundheim, B.: SUMMAC: a text summarization evaluation. Natural Language Engineering 8(01), 43–68 (2002)
Article Google Scholar
Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. In: Proceedings of IJCNLP, vol. 5 (2005)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program: Electronic Library and Information Systems 14(3), 130–137 (1993)
Article Google Scholar
Radev, D.R., Jing, H., Stys, M., Tam, D.: Centroid-based summarization of multiple documents. Information Processing & Management 40(6), 919–938 (2004)
Article MATH Google Scholar
Salton, G., McGill, M.J.: Introduction to modern information retrieval, vol. 1. McGraw-Hill (1983)
Google Scholar
Wan, X., Yang, J.: Improved affinity graph based multi-document summarization. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 181–184. Association for Computational Linguistics (2006)
Google Scholar
Wan, X., Yang, J.: Multi-document summarization using cluster-based link analysis. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 299–306. ACM (2008)
Google Scholar
Wang, D., Li, T., Zhu, S., Ding, C.: Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization. In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 307–314. ACM (2008)
Google Scholar
Wasson, M.: Using leading text for news summaries: Evaluation results and implications for commercial summarization applications. In: Proceedings of the 17th International Conference on Computational Linguistics, vol. 2, pp. 1364–1368. Association for Computational Linguistics (1998)
Google Scholar
Yin, W., Pei, Y., Zhang, F., Huang, L.: Query-focused multi-document summarization based on query-sensitive feature space. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 1652–1656. ACM, New York (2012)
Google Scholar
Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–120. ACM (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Hong Kong
Xiaodong Li, Haoran Xie & Qing Li
Shanghai Key Lab of Intelligent Information Processing and School of Computer Science, Fudan University, Shanghai, 200433, China
Shanfeng Zhu

Authors

Xiaodong Li
View author publications
You can also search for this author in PubMed Google Scholar
Shanfeng Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Xie
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Binghamton University, 13902, Binghamton, NY, USA
Weiyi Meng
Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Ling Feng
Department of Computer Science, National University of Singapore, 117417, Singapore
Stéphane Bressan
Research Group Data Analystics and Computing, University of Vienna, 1090, Vienna, Austria
Werner Winiwarter
School of Computer, Wuhan University, 430072, Wuhan, China
Wei Song

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Zhu, S., Xie, H., Li, Q. (2013). Document Summarization via Self-Present Sentence Relevance Model. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7826. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37450-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-37450-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37449-4
Online ISBN: 978-3-642-37450-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics