Skip to main content

Automatic Text Summarization

  • Chapter
  • First Online:
Text Data Mining

Abstract

With the rapid popularization of the Internet and the overexpansion of information, it is an urgent need for a text compression technology to extract condensed information for readers. Automatic text summarization, also known as document summarization, is such an information compression technology that automatically converts document (or a collection of documents) into a short summary. This chapter first introduces the main tasks in automatic text summarization, and then details different approaches to automatic summarization technology for single-document and multidocument summarization. After that, we introduce query-based summarization and crosslingual and multilingual summarization. Finally, we introduce the evaluation methods for automatic text summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://duc.nist.gov./.

  2. 2.

    https://tac.nist.gov/about/index.html.

  3. 3.

    http://jamesclarke.net/research/resources.

  4. 4.

    Concepts and facts have different definitions. Generally, concepts correspond to entities (people, objects, institutions, etc.) and facts correspond to actions.

  5. 5.

    In practical application, the scope of candidates for concepts and facts can be expanded or contracted appropriately according to specific needs.

  6. 6.

    For example, the open-source toolkit released by Stanford University can perform coreference resolution for English entities. https://nlp.stanford.edu/projects/coref.shtml.

  7. 7.

    http://www.statmt.org/moses/giza/GIZA++.html.

  8. 8.

    http://tcci.ccf.org.cn/conference/2015/pages/page05_evadata.html.http://tcci.ccf.org.cn/conference/2017/taskdata.php.

  9. 9.

    http://tcci.ccf.org.cn/conference/2016/pages/page05_evadata.html.

References

  • Barzilay, R., Elhadad, N., & McKeown, K. R. (2002). Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research, 17, 35–55.

    Article  Google Scholar 

  • Biadsy, F., Hirschberg, J., & Filatova, E. (2008). An unsupervised approach to biography production using wikipedia. In Proceedings of ACL (pp. 807–815).

    Google Scholar 

  • Bing, L., Li, P., Liao, Y., Lam, W., Guo, W., & Passonneau, R. J. (2015). Abstractive multi-document summarization via phrase selection and merging. In Proceedings of ACL.

    Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  • Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR.

    Google Scholar 

  • Chopra, S., Auli, M., & Rush, A. M. (2016). Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of ACL.

    Google Scholar 

  • Dang, H. T., & Owczarzak, K. (2008). Overview of the TAC 2008 update summarization task. In Proceedings of TAC.

    Google Scholar 

  • Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2), 264–285.

    Article  Google Scholar 

  • Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.

    Article  Google Scholar 

  • Gu, J., Lu, Z., Li, H., & Li, V. O. (2016). Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of ACL.

    Google Scholar 

  • Huang, X., Wan, X., & Xiao, J. (2011). Comparative news summarization using linear programming. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers (Vol. 2, pp. 648–653). Stroudsburg: Association for Computational Linguistics.

    Google Scholar 

  • Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1), 91–107.

    Article  Google Scholar 

  • Kurisinkel, L. J., Mishra, P., Muralidaran, V., Varma, V., & Misra Sharma, D. (2016). Non-decreasing sub-modular function for comprehensible summarization. In Proceedings of the NAACL Student Research Workshop (pp. 94–101).

    Google Scholar 

  • Landauer, T. K. (2006). Latent semantic analysis. New York: Wiley.

    Book  Google Scholar 

  • Li, C., Liu, F., Weng, F., & Liu, Y. (2013a). Document summarization via guided sentence compression. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 490–500).

    Google Scholar 

  • Li, H., Zhang, J., Zhou, Y., & Zong, C. (2016a). Guiderank: A guided ranking graph model for multilingual multi-document summarization. In Proceedings of NLPCC (pp. 608–620).

    Google Scholar 

  • Li, H., Zhu, J., Ma, C., Zhang, J., & Zong, C. (2017b). Multi-modal summarization for asynchronous collection of text, image, audio and video. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 1092–1102).

    Google Scholar 

  • Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74–81).

    Google Scholar 

  • Liu, F., Flanigan, J., Thomson, S., Sadeh, N., & Smith, N. A. (2015a). Toward abstractive summarization using semantic representations. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1077–1086).

    Google Scholar 

  • Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2), 159–165.

    Article  MathSciNet  Google Scholar 

  • Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (pp. 404–411).

    Google Scholar 

  • Nallapati, R., Zhai, F., & Zhou, B. (2017). Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Thirty-First AAAI Conference on Artificial Intelligence.

    Google Scholar 

  • Nema, P., Khapra, M. M., Laha, A., & Ravindran, B. (2017). Diversity driven attention model for query-based abstractive summarization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1063–1072).

    Google Scholar 

  • Nenkova, A., & Passonneau, R. J. (2004). Evaluating content selection in summarization: The pyramid method. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004 (pp. 145–152).

    Google Scholar 

  • Osborne, M. (2002). Using maximum entropy for sentence extraction. In Proceedings of the ACL-02 Workshop on Automatic Summarization-Volume 4 (pp. 1–8). Stroudsburg: Association for Computational Linguistics.

    Google Scholar 

  • Page, L., & Brin, S. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(17), 107–117.

    Google Scholar 

  • Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311–318).

    Google Scholar 

  • Peyrard, M., & Eckle-Kohler, J. (2017). Supervised learning of automatic pyramid for optimization-based multi-document summarization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1084–1094).

    Google Scholar 

  • Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. In Proceedings of EMNLP.

    Google Scholar 

  • Tan, J., Wan, X., & Xiao, J. (2017). Abstractive document summarization with a graph-based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1171–1181).

    Google Scholar 

  • Turner, J., & Charniak, E. (2005). Supervised and unsupervised learning for sentence compression. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 290–297). Stroudsburg: Association for Computational Linguistics.

    Google Scholar 

  • Wan, X. (2011). Using bilingual information for cross-language document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 1546–1555). Stroudsburg: Association for Computational Linguistics.

    Google Scholar 

  • Wan, X., Li, H., & Xiao, J. (2010). Cross-language document summarization based on machine translation quality prediction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 917–926). Stroudsburg: Association for Computational Linguistics.

    Google Scholar 

  • Wang, W. Y., Mehdad, Y., Radev, D. R., & Stent, A. (2016a). A low-rank approximation approach to learning joint embeddings of news stories and images for timeline summarization. In Proceedings of ACL (pp. 58–68).

    Google Scholar 

  • Yan, R., Wan, X., Otterbacher, J., Kong, L., Li, X., & Zhang, Y. (2011). Evolutionary timeline summarization: A balanced optimization framework via iterative substitution. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 745–754).

    Google Scholar 

  • Yao, J., Wan, X., & Xiao, J. (2017). Recent advances in document summarization. Knowledge and Information Systems, 53(2), 297–336.

    Article  Google Scholar 

  • Zhang, J., Zhou, Y., & Zong, C. (2016a). Abstractive cross-language summarization via translation model enhanced predicate argument structure fusing. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(10), 1842–1853.

    Article  Google Scholar 

  • Zhou, Q., Yang, N., Wei, F., & Zhou, M. (2017). Selective encoding for abstractive sentence summarization. In Proceedings of ACL.

    Google Scholar 

  • Zhu, J., Li, H., Liu, T., Zhou, Y., Zhang, J., & Zong, C. (2018). MSMO: Multimodal summarization with multimodal output. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 4154–4164).

    Google Scholar 

  • Zhu, J., Zhou, Y., Zhang, J., Li, H., Zong, C., & Li, C. (2020). Multimodal summarization with guidance of multimodal reference. In Proceedings of AAAI.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Tsinghua University Press

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zong, C., Xia, R., Zhang, J. (2021). Automatic Text Summarization. In: Text Data Mining. Springer, Singapore. https://doi.org/10.1007/978-981-16-0100-2_11

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-0100-2_11

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-0099-9

  • Online ISBN: 978-981-16-0100-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics