The notion of Textual Entailment (TE) is an established indicator of text connectedness. It captures semantic relationships between texts. Recently, it has been used successfully for determining sentence salience in many text summarization methods. However, it has been reported in previous works that the standard textual entailment is not ideal for measuring sentence salience. This is because textual entailment relationships between sentences are quite rare in real-world texts. Therefore, we suggest using partial TE to accomplish the task of recognizing standard TE. We present the single document summarization problem as an optimization problem which is solved using a weighted Minimum Set Cover (wMSC) algorithm. In this method, sentences are broken into fragments and Partial TE is used to form sets of fragments. Finally, wMSC is applied to the sets to obtain the minimum set cover, which corresponds to the summary of the document. The results achieved on the DUC 2002 dataset using ROUGE and other quality metrics show that the proposed method outperforms the state of the art.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Levy et al.  have introduced the term complete TE for standard TE. Thus, standard TE will be referred to as Complete TE from here onwards in rest of the paper.
As mentioned, that is a key difference between single- and multiple-document summarization.
ROUGE version (1.5.5) runs with the same parameters as mentioned on the DUC website (ROUGE-1.5.5.pl -n 2 -m -2 4 -u -c 95 -r 1000 -f A -p 0.5 -t 0 -l 100 -d).
Baralis E, Cagliero L, Fiori A, Jabeen S (2011) Pattexsum: A pattern-based text summarizer. In: Proceedings of the workshop on mining complex patterns, p 14
Barzilay R, Elhadad M (1999) Using lexical chains for text summarization. In: Mani I, Mark TM (eds) Advances in automatic text summarization. The MIT Press, London, pp 111–121
Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Comput Networks ISDN Syst 29(8):1157–1166
Cheng J, Lapata M (2016) Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252
Cornuéjols G (2001) Combinatorial optimization: packing and covering. SIAM, Philapedia
Dagan I, Dolan B, Magnini B, Roth D (2009) Recognizing textual entailment: rational, evaluation and approaches. Nat Lang Eng 15(4):1–17
Filatova E, Hatzivassiloglou V (2004) A formal model for information selection in multi-sentence text extraction. In: Proceedings of the 20th international conference on computational linguistics, association for computational linguistics, COLING ’04
Gupta A, Kathuria M, Singh A, Sachdeva A, Bhati S (2012) Analog textual entailment and spectral clustering (atesc) based summarization. In: International conference on big data analytics. Springer, pp 101–110
Gupta A, Kaur M, Singh A, Goel A, Mirkin S (2014) Text summarization through entailment-based minimum vertex cover. Lexical and Computational Semantics (* SEM 2014), p 75
He Z, Chen C, Bu J, Wang C, Zhang L, Cai D, He X (2012) Document summarization based on data reconstruction. In: AAAI
Hirao T, Sasaki Y, Isozaki H, Maeda E (2002) Ntt’s text summarization system for duc-2002. In: Proceedings of the document understanding conference 2002. Citeseer, pp 104–107
Hirao T, Yoshida Y, Nishino M, Yasuda N, Nagata M (2013) Single-document summarization as a tree knapsack problem. EMNLP 13:1515–1520
Hoey M (1991) Patterns of lexis in text
Jones KS (2007) Automatic summarizing: the state of the art. Inf Process Manag 43:1449–1481
Jones KS, Galliers JR (1995) Evaluating natural language processing systems: an analysis and review, vol 1083. Springer, Berlin
Kikuchi Y, Hirao T, Takamura H, Okumura M, Nagata M (2014) Single document summarization based on nested tree structure. In: ACL (2), pp 315–320
Korte B, Vygen J, Korte B, Vygen J (2002) Combinatorial optimization. Springer, Berlin
Lapata M, Barzilay R (2005) Automatic evaluation of text coherence: models and representations. IJCAI 5:1085–1090
Levy O, Zesch T, Dagan I, Gurevych I (2013) Recognizing partial textual entailment. In: ACL (2), pp 451–455
Lin CY (1995) Knowledge-based automatic topic identification. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 308–310
Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, Barcelona, Spain, vol 8
Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol 1, Edmonta, Canada, pp 71–78
Mani I (2001) Summarization evaluation: an overview
Marcu D (1999) Discourse trees are good indicators of importance in text. Advances in automatic text summarization, pp 123–136
Marcu D (2008) From discourse structure to text summaries. In: Proceedings of the ACL/EACL ’97, workshop on intelligent scalable text summarization, Madrid, Spain, pp 82–88
Martins AF, Smith NA (2009) Summarization with a joint model for sentence extraction and compression. In: Proceedings of the workshop on integer linear programming for natural langauge processing. Association for Computational Linguistics, pp 1–9
McDonald R (2007) A study of global inference algorithms in multi-document summarization. Springer, Berlin
Mihalcea R, Tarau P (2004) Textrank: Bringing order into texts. Proc EMNLP Barcelona Spain 4(4):275
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
Monz C, de Rijke M (2001) Light-weight entailment checking for computational semantics. In: Proceedings of the 3rd workshop on inference in computational semantics (ICoS-3)
Nenkova A, McKeown K et al (2011) Automatic summarization. Found Trends Inf Retriev 5(2–3):103–233
Nielsen RD, Ward W, Martin JH (2009) Recognizing entailment in intelligent tutoring systems. Nat Lang Eng 15(04):479–501
Nishino M, Yasuda N, Hirao T, Suzuki J, Nagata M (2013) Text summarization while maximizing multiple objectives with lagrangian relaxation. In: Advances in Information Retrieval. Springer, pp 772–775
Ono K, Sumita K, Miike S (1994) Abstract generation based on rhetorical structure extraction. In: Proceedings of the 15th conference on Computational linguistics, vol 1. Association for Computational Linguistics, pp 344–348
Parveen D, Mesgar M, Strube M (2016) Generating coherent summaries of scientific articles using coherence patterns. In: EMNLP, pp 772–783
Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33:193–207
Skorochod’ko EF (1971) Adaptive method of automatic abstracting and indexing. In: IFIP Congress (2), vol 71, pp 1179–1182
Steinberger J, Poesio M, Kabadjov MA, Ježek K (2007) Two uses of anaphora resolution in summarization. Inf Process Manag 43(6):1663–1680
Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th conference of the european chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 781–789
Tatar D, Tamaianu-Morita E, Mihis A, Lupsa D (2008) Summarization by logic segmentation and text entailment. Adv Nat Lang Process Appl 15:26
Wan X (2010) Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, pp 1137–1145
Wt Yih, Goodman J, Vanderwende L, Suzuki H (2007) Multi-document summarization by maximizing informative content-words. IJCAI 7:1776–1782
Young NE (2008) Greedy set-cover algorithms. In: Encyclopedia of algorithms. Springer, pp. 379–381
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Gupta, A., Kaur, M., Mittal, S. et al. PE-MSC: partial entailment-based minimum set cover for text summarization. Knowl Inf Syst (2021). https://doi.org/10.1007/s10115-020-01537-1
- Text Summarization
- Minimum set cover
- information retrieval
- Natural Language Processing