Abstract
The notion of Textual Entailment (TE) is an established indicator of text connectedness. It captures semantic relationships between texts. Recently, it has been used successfully for determining sentence salience in many text summarization methods. However, it has been reported in previous works that the standard textual entailment is not ideal for measuring sentence salience. This is because textual entailment relationships between sentences are quite rare in real-world texts. Therefore, we suggest using partial TE to accomplish the task of recognizing standard TE. We present the single document summarization problem as an optimization problem which is solved using a weighted Minimum Set Cover (wMSC) algorithm. In this method, sentences are broken into fragments and Partial TE is used to form sets of fragments. Finally, wMSC is applied to the sets to obtain the minimum set cover, which corresponds to the summary of the document. The results achieved on the DUC 2002 dataset using ROUGE and other quality metrics show that the proposed method outperforms the state of the art.
This is a preview of subscription content, access via your institution.



Notes
- 1.
Levy et al. [19] have introduced the term complete TE for standard TE. Thus, standard TE will be referred to as Complete TE from here onwards in rest of the paper.
- 2.
As mentioned, that is a key difference between single- and multiple-document summarization.
- 3.
- 4.
- 5.
ROUGE version (1.5.5) runs with the same parameters as mentioned on the DUC website (ROUGE-1.5.5.pl -n 2 -m -2 4 -u -c 95 -r 1000 -f A -p 0.5 -t 0 -l 100 -d).
References
- 1.
Baralis E, Cagliero L, Fiori A, Jabeen S (2011) Pattexsum: A pattern-based text summarizer. In: Proceedings of the workshop on mining complex patterns, p 14
- 2.
Barzilay R, Elhadad M (1999) Using lexical chains for text summarization. In: Mani I, Mark TM (eds) Advances in automatic text summarization. The MIT Press, London, pp 111–121
- 3.
Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Comput Networks ISDN Syst 29(8):1157–1166
- 4.
Cheng J, Lapata M (2016) Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252
- 5.
Cornuéjols G (2001) Combinatorial optimization: packing and covering. SIAM, Philapedia
- 6.
Dagan I, Dolan B, Magnini B, Roth D (2009) Recognizing textual entailment: rational, evaluation and approaches. Nat Lang Eng 15(4):1–17
- 7.
Filatova E, Hatzivassiloglou V (2004) A formal model for information selection in multi-sentence text extraction. In: Proceedings of the 20th international conference on computational linguistics, association for computational linguistics, COLING ’04
- 8.
Gupta A, Kathuria M, Singh A, Sachdeva A, Bhati S (2012) Analog textual entailment and spectral clustering (atesc) based summarization. In: International conference on big data analytics. Springer, pp 101–110
- 9.
Gupta A, Kaur M, Singh A, Goel A, Mirkin S (2014) Text summarization through entailment-based minimum vertex cover. Lexical and Computational Semantics (* SEM 2014), p 75
- 10.
He Z, Chen C, Bu J, Wang C, Zhang L, Cai D, He X (2012) Document summarization based on data reconstruction. In: AAAI
- 11.
Hirao T, Sasaki Y, Isozaki H, Maeda E (2002) Ntt’s text summarization system for duc-2002. In: Proceedings of the document understanding conference 2002. Citeseer, pp 104–107
- 12.
Hirao T, Yoshida Y, Nishino M, Yasuda N, Nagata M (2013) Single-document summarization as a tree knapsack problem. EMNLP 13:1515–1520
- 13.
Hoey M (1991) Patterns of lexis in text
- 14.
Jones KS (2007) Automatic summarizing: the state of the art. Inf Process Manag 43:1449–1481
- 15.
Jones KS, Galliers JR (1995) Evaluating natural language processing systems: an analysis and review, vol 1083. Springer, Berlin
- 16.
Kikuchi Y, Hirao T, Takamura H, Okumura M, Nagata M (2014) Single document summarization based on nested tree structure. In: ACL (2), pp 315–320
- 17.
Korte B, Vygen J, Korte B, Vygen J (2002) Combinatorial optimization. Springer, Berlin
- 18.
Lapata M, Barzilay R (2005) Automatic evaluation of text coherence: models and representations. IJCAI 5:1085–1090
- 19.
Levy O, Zesch T, Dagan I, Gurevych I (2013) Recognizing partial textual entailment. In: ACL (2), pp 451–455
- 20.
Lin CY (1995) Knowledge-based automatic topic identification. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 308–310
- 21.
Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, Barcelona, Spain, vol 8
- 22.
Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol 1, Edmonta, Canada, pp 71–78
- 23.
Mani I (2001) Summarization evaluation: an overview
- 24.
Marcu D (1999) Discourse trees are good indicators of importance in text. Advances in automatic text summarization, pp 123–136
- 25.
Marcu D (2008) From discourse structure to text summaries. In: Proceedings of the ACL/EACL ’97, workshop on intelligent scalable text summarization, Madrid, Spain, pp 82–88
- 26.
Martins AF, Smith NA (2009) Summarization with a joint model for sentence extraction and compression. In: Proceedings of the workshop on integer linear programming for natural langauge processing. Association for Computational Linguistics, pp 1–9
- 27.
McDonald R (2007) A study of global inference algorithms in multi-document summarization. Springer, Berlin
- 28.
Mihalcea R, Tarau P (2004) Textrank: Bringing order into texts. Proc EMNLP Barcelona Spain 4(4):275
- 29.
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41
- 30.
Monz C, de Rijke M (2001) Light-weight entailment checking for computational semantics. In: Proceedings of the 3rd workshop on inference in computational semantics (ICoS-3)
- 31.
Nenkova A, McKeown K et al (2011) Automatic summarization. Found Trends Inf Retriev 5(2–3):103–233
- 32.
Nielsen RD, Ward W, Martin JH (2009) Recognizing entailment in intelligent tutoring systems. Nat Lang Eng 15(04):479–501
- 33.
Nishino M, Yasuda N, Hirao T, Suzuki J, Nagata M (2013) Text summarization while maximizing multiple objectives with lagrangian relaxation. In: Advances in Information Retrieval. Springer, pp 772–775
- 34.
Ono K, Sumita K, Miike S (1994) Abstract generation based on rhetorical structure extraction. In: Proceedings of the 15th conference on Computational linguistics, vol 1. Association for Computational Linguistics, pp 344–348
- 35.
Parveen D, Mesgar M, Strube M (2016) Generating coherent summaries of scientific articles using coherence patterns. In: EMNLP, pp 772–783
- 36.
Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33:193–207
- 37.
Skorochod’ko EF (1971) Adaptive method of automatic abstracting and indexing. In: IFIP Congress (2), vol 71, pp 1179–1182
- 38.
Steinberger J, Poesio M, Kabadjov MA, Ježek K (2007) Two uses of anaphora resolution in summarization. Inf Process Manag 43(6):1663–1680
- 39.
Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th conference of the european chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 781–789
- 40.
Tatar D, Tamaianu-Morita E, Mihis A, Lupsa D (2008) Summarization by logic segmentation and text entailment. Adv Nat Lang Process Appl 15:26
- 41.
Wan X (2010) Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, pp 1137–1145
- 42.
Wt Yih, Goodman J, Vanderwende L, Suzuki H (2007) Multi-document summarization by maximizing informative content-words. IJCAI 7:1776–1782
- 43.
Young NE (2008) Greedy set-cover algorithms. In: Encyclopedia of algorithms. Springer, pp. 379–381
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Gupta, A., Kaur, M., Mittal, S. et al. PE-MSC: partial entailment-based minimum set cover for text summarization. Knowl Inf Syst (2021). https://doi.org/10.1007/s10115-020-01537-1
Received:
Revised:
Accepted:
Published:
Keywords
- Text Summarization
- Minimum set cover
- information retrieval
- Natural Language Processing