Abstract
With the rapid popularization of the Internet and the overexpansion of information, it is an urgent need for a text compression technology to extract condensed information for readers. Automatic text summarization, also known as document summarization, is such an information compression technology that automatically converts document (or a collection of documents) into a short summary. This chapter first introduces the main tasks in automatic text summarization, and then details different approaches to automatic summarization technology for single-document and multidocument summarization. After that, we introduce query-based summarization and crosslingual and multilingual summarization. Finally, we introduce the evaluation methods for automatic text summarization.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
Concepts and facts have different definitions. Generally, concepts correspond to entities (people, objects, institutions, etc.) and facts correspond to actions.
- 5.
In practical application, the scope of candidates for concepts and facts can be expanded or contracted appropriately according to specific needs.
- 6.
For example, the open-source toolkit released by Stanford University can perform coreference resolution for English entities. https://nlp.stanford.edu/projects/coref.shtml.
- 7.
- 8.
- 9.
References
Barzilay, R., Elhadad, N., & McKeown, K. R. (2002). Inferring strategies for sentence ordering in multidocument news summarization. Journal of Artificial Intelligence Research, 17, 35–55.
Biadsy, F., Hirschberg, J., & Filatova, E. (2008). An unsupervised approach to biography production using wikipedia. In Proceedings of ACL (pp. 807–815).
Bing, L., Li, P., Liao, Y., Lam, W., Guo, W., & Passonneau, R. J. (2015). Abstractive multi-document summarization via phrase selection and merging. In Proceedings of ACL.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Carbonell, J., & Goldstein, J. (1998). The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proceedings of SIGIR.
Chopra, S., Auli, M., & Rush, A. M. (2016). Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of ACL.
Dang, H. T., & Owczarzak, K. (2008). Overview of the TAC 2008 update summarization task. In Proceedings of TAC.
Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2), 264–285.
Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.
Gu, J., Lu, Z., Li, H., & Li, V. O. (2016). Incorporating copying mechanism in sequence-to-sequence learning. In Proceedings of ACL.
Huang, X., Wan, X., & Xiao, J. (2011). Comparative news summarization using linear programming. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: Short Papers (Vol. 2, pp. 648–653). Stroudsburg: Association for Computational Linguistics.
Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence, 139(1), 91–107.
Kurisinkel, L. J., Mishra, P., Muralidaran, V., Varma, V., & Misra Sharma, D. (2016). Non-decreasing sub-modular function for comprehensible summarization. In Proceedings of the NAACL Student Research Workshop (pp. 94–101).
Landauer, T. K. (2006). Latent semantic analysis. New York: Wiley.
Li, C., Liu, F., Weng, F., & Liu, Y. (2013a). Document summarization via guided sentence compression. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (pp. 490–500).
Li, H., Zhang, J., Zhou, Y., & Zong, C. (2016a). Guiderank: A guided ranking graph model for multilingual multi-document summarization. In Proceedings of NLPCC (pp. 608–620).
Li, H., Zhu, J., Ma, C., Zhang, J., & Zong, C. (2017b). Multi-modal summarization for asynchronous collection of text, image, audio and video. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 1092–1102).
Lin, C.-Y. (2004). ROUGE: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74–81).
Liu, F., Flanigan, J., Thomson, S., Sadeh, N., & Smith, N. A. (2015a). Toward abstractive summarization using semantic representations. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 1077–1086).
Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of Research and Development, 2(2), 159–165.
Mihalcea, R., & Tarau, P. (2004). Textrank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (pp. 404–411).
Nallapati, R., Zhai, F., & Zhou, B. (2017). Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Thirty-First AAAI Conference on Artificial Intelligence.
Nema, P., Khapra, M. M., Laha, A., & Ravindran, B. (2017). Diversity driven attention model for query-based abstractive summarization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1063–1072).
Nenkova, A., & Passonneau, R. J. (2004). Evaluating content selection in summarization: The pyramid method. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004 (pp. 145–152).
Osborne, M. (2002). Using maximum entropy for sentence extraction. In Proceedings of the ACL-02 Workshop on Automatic Summarization-Volume 4 (pp. 1–8). Stroudsburg: Association for Computational Linguistics.
Page, L., & Brin, S. (1998). The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems, 30(17), 107–117.
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (pp. 311–318).
Peyrard, M., & Eckle-Kohler, J. (2017). Supervised learning of automatic pyramid for optimization-based multi-document summarization. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1084–1094).
Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for abstractive sentence summarization. In Proceedings of EMNLP.
Tan, J., Wan, X., & Xiao, J. (2017). Abstractive document summarization with a graph-based attentional neural model. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1171–1181).
Turner, J., & Charniak, E. (2005). Supervised and unsupervised learning for sentence compression. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp. 290–297). Stroudsburg: Association for Computational Linguistics.
Wan, X. (2011). Using bilingual information for cross-language document summarization. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 1546–1555). Stroudsburg: Association for Computational Linguistics.
Wan, X., Li, H., & Xiao, J. (2010). Cross-language document summarization based on machine translation quality prediction. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 917–926). Stroudsburg: Association for Computational Linguistics.
Wang, W. Y., Mehdad, Y., Radev, D. R., & Stent, A. (2016a). A low-rank approximation approach to learning joint embeddings of news stories and images for timeline summarization. In Proceedings of ACL (pp. 58–68).
Yan, R., Wan, X., Otterbacher, J., Kong, L., Li, X., & Zhang, Y. (2011). Evolutionary timeline summarization: A balanced optimization framework via iterative substitution. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 745–754).
Yao, J., Wan, X., & Xiao, J. (2017). Recent advances in document summarization. Knowledge and Information Systems, 53(2), 297–336.
Zhang, J., Zhou, Y., & Zong, C. (2016a). Abstractive cross-language summarization via translation model enhanced predicate argument structure fusing. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(10), 1842–1853.
Zhou, Q., Yang, N., Wei, F., & Zhou, M. (2017). Selective encoding for abstractive sentence summarization. In Proceedings of ACL.
Zhu, J., Li, H., Liu, T., Zhou, Y., Zhang, J., & Zong, C. (2018). MSMO: Multimodal summarization with multimodal output. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (pp. 4154–4164).
Zhu, J., Zhou, Y., Zhang, J., Li, H., Zong, C., & Li, C. (2020). Multimodal summarization with guidance of multimodal reference. In Proceedings of AAAI.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 Tsinghua University Press
About this chapter
Cite this chapter
Zong, C., Xia, R., Zhang, J. (2021). Automatic Text Summarization. In: Text Data Mining. Springer, Singapore. https://doi.org/10.1007/978-981-16-0100-2_11
Download citation
DOI: https://doi.org/10.1007/978-981-16-0100-2_11
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0099-9
Online ISBN: 978-981-16-0100-2
eBook Packages: Computer ScienceComputer Science (R0)