PE-MSC: partial entailment-based minimum set cover for text summarization

Abstract

The notion of Textual Entailment (TE) is an established indicator of text connectedness. It captures semantic relationships between texts. Recently, it has been used successfully for determining sentence salience in many text summarization methods. However, it has been reported in previous works that the standard textual entailment is not ideal for measuring sentence salience. This is because textual entailment relationships between sentences are quite rare in real-world texts. Therefore, we suggest using partial TE to accomplish the task of recognizing standard TE. We present the single document summarization problem as an optimization problem which is solved using a weighted Minimum Set Cover (wMSC) algorithm. In this method, sentences are broken into fragments and Partial TE is used to form sets of fragments. Finally, wMSC is applied to the sets to obtain the minimum set cover, which corresponds to the summary of the document. The results achieved on the DUC 2002 dataset using ROUGE and other quality metrics show that the proposed method outperforms the state of the art.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

Notes

  1. 1.

    Levy et al. [19] have introduced the term complete TE for standard TE. Thus, standard TE will be referred to as Complete TE from here onwards in rest of the paper.

  2. 2.

    As mentioned, that is a key difference between single- and multiple-document summarization.

  3. 3.

    https://www-nlpir.nist.gov/projects/duc/data/2002_data.html.

  4. 4.

    https://pypi.python.org/pypi/PuLP.

  5. 5.

    ROUGE version (1.5.5) runs with the same parameters as mentioned on the DUC website (ROUGE-1.5.5.pl -n 2 -m -2 4 -u -c 95 -r 1000 -f A -p 0.5 -t 0 -l 100 -d).

References

  1. 1.

    Baralis E, Cagliero L, Fiori A, Jabeen S (2011) Pattexsum: A pattern-based text summarizer. In: Proceedings of the workshop on mining complex patterns, p 14

  2. 2.

    Barzilay R, Elhadad M (1999) Using lexical chains for text summarization. In: Mani I, Mark TM (eds) Advances in automatic text summarization. The MIT Press, London, pp 111–121

    Google Scholar 

  3. 3.

    Broder AZ, Glassman SC, Manasse MS, Zweig G (1997) Syntactic clustering of the web. Comput Networks ISDN Syst 29(8):1157–1166

    Article  Google Scholar 

  4. 4.

    Cheng J, Lapata M (2016) Neural summarization by extracting sentences and words. arXiv preprint arXiv:1603.07252

  5. 5.

    Cornuéjols G (2001) Combinatorial optimization: packing and covering. SIAM, Philapedia

    Google Scholar 

  6. 6.

    Dagan I, Dolan B, Magnini B, Roth D (2009) Recognizing textual entailment: rational, evaluation and approaches. Nat Lang Eng 15(4):1–17

    Article  Google Scholar 

  7. 7.

    Filatova E, Hatzivassiloglou V (2004) A formal model for information selection in multi-sentence text extraction. In: Proceedings of the 20th international conference on computational linguistics, association for computational linguistics, COLING ’04

  8. 8.

    Gupta A, Kathuria M, Singh A, Sachdeva A, Bhati S (2012) Analog textual entailment and spectral clustering (atesc) based summarization. In: International conference on big data analytics. Springer, pp 101–110

  9. 9.

    Gupta A, Kaur M, Singh A, Goel A, Mirkin S (2014) Text summarization through entailment-based minimum vertex cover. Lexical and Computational Semantics (* SEM 2014), p 75

  10. 10.

    He Z, Chen C, Bu J, Wang C, Zhang L, Cai D, He X (2012) Document summarization based on data reconstruction. In: AAAI

  11. 11.

    Hirao T, Sasaki Y, Isozaki H, Maeda E (2002) Ntt’s text summarization system for duc-2002. In: Proceedings of the document understanding conference 2002. Citeseer, pp 104–107

  12. 12.

    Hirao T, Yoshida Y, Nishino M, Yasuda N, Nagata M (2013) Single-document summarization as a tree knapsack problem. EMNLP 13:1515–1520

    Google Scholar 

  13. 13.

    Hoey M (1991) Patterns of lexis in text

  14. 14.

    Jones KS (2007) Automatic summarizing: the state of the art. Inf Process Manag 43:1449–1481

    Article  Google Scholar 

  15. 15.

    Jones KS, Galliers JR (1995) Evaluating natural language processing systems: an analysis and review, vol 1083. Springer, Berlin

    Google Scholar 

  16. 16.

    Kikuchi Y, Hirao T, Takamura H, Okumura M, Nagata M (2014) Single document summarization based on nested tree structure. In: ACL (2), pp 315–320

  17. 17.

    Korte B, Vygen J, Korte B, Vygen J (2002) Combinatorial optimization. Springer, Berlin

    Google Scholar 

  18. 18.

    Lapata M, Barzilay R (2005) Automatic evaluation of text coherence: models and representations. IJCAI 5:1085–1090

    Google Scholar 

  19. 19.

    Levy O, Zesch T, Dagan I, Gurevych I (2013) Recognizing partial textual entailment. In: ACL (2), pp 451–455

  20. 20.

    Lin CY (1995) Knowledge-based automatic topic identification. In: Proceedings of the 33rd annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp 308–310

  21. 21.

    Lin CY (2004) Rouge: A package for automatic evaluation of summaries. In: Text summarization branches out: proceedings of the ACL-04 workshop, Barcelona, Spain, vol 8

  22. 22.

    Lin CY, Hovy E (2003) Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol 1, Edmonta, Canada, pp 71–78

  23. 23.

    Mani I (2001) Summarization evaluation: an overview

  24. 24.

    Marcu D (1999) Discourse trees are good indicators of importance in text. Advances in automatic text summarization, pp 123–136

  25. 25.

    Marcu D (2008) From discourse structure to text summaries. In: Proceedings of the ACL/EACL ’97, workshop on intelligent scalable text summarization, Madrid, Spain, pp 82–88

  26. 26.

    Martins AF, Smith NA (2009) Summarization with a joint model for sentence extraction and compression. In: Proceedings of the workshop on integer linear programming for natural langauge processing. Association for Computational Linguistics, pp 1–9

  27. 27.

    McDonald R (2007) A study of global inference algorithms in multi-document summarization. Springer, Berlin

    Google Scholar 

  28. 28.

    Mihalcea R, Tarau P (2004) Textrank: Bringing order into texts. Proc EMNLP Barcelona Spain 4(4):275

    Google Scholar 

  29. 29.

    Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(11):39–41

    Article  Google Scholar 

  30. 30.

    Monz C, de Rijke M (2001) Light-weight entailment checking for computational semantics. In: Proceedings of the 3rd workshop on inference in computational semantics (ICoS-3)

  31. 31.

    Nenkova A, McKeown K et al (2011) Automatic summarization. Found Trends Inf Retriev 5(2–3):103–233

    Article  Google Scholar 

  32. 32.

    Nielsen RD, Ward W, Martin JH (2009) Recognizing entailment in intelligent tutoring systems. Nat Lang Eng 15(04):479–501

    Article  Google Scholar 

  33. 33.

    Nishino M, Yasuda N, Hirao T, Suzuki J, Nagata M (2013) Text summarization while maximizing multiple objectives with lagrangian relaxation. In: Advances in Information Retrieval. Springer, pp 772–775

  34. 34.

    Ono K, Sumita K, Miike S (1994) Abstract generation based on rhetorical structure extraction. In: Proceedings of the 15th conference on Computational linguistics, vol 1. Association for Computational Linguistics, pp 344–348

  35. 35.

    Parveen D, Mesgar M, Strube M (2016) Generating coherent summaries of scientific articles using coherence patterns. In: EMNLP, pp 772–783

  36. 36.

    Salton G, Singhal A, Mitra M, Buckley C (1997) Automatic text structuring and summarization. Inf Process Manag 33:193–207

    Article  Google Scholar 

  37. 37.

    Skorochod’ko EF (1971) Adaptive method of automatic abstracting and indexing. In: IFIP Congress (2), vol 71, pp 1179–1182

  38. 38.

    Steinberger J, Poesio M, Kabadjov MA, Ježek K (2007) Two uses of anaphora resolution in summarization. Inf Process Manag 43(6):1663–1680

    Article  Google Scholar 

  39. 39.

    Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th conference of the european chapter of the Association for Computational Linguistics. Association for Computational Linguistics, pp 781–789

  40. 40.

    Tatar D, Tamaianu-Morita E, Mihis A, Lupsa D (2008) Summarization by logic segmentation and text entailment. Adv Nat Lang Process Appl 15:26

    Google Scholar 

  41. 41.

    Wan X (2010) Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd international conference on computational linguistics. Association for Computational Linguistics, pp 1137–1145

  42. 42.

    Wt Yih, Goodman J, Vanderwende L, Suzuki H (2007) Multi-document summarization by maximizing informative content-words. IJCAI 7:1776–1782

    Google Scholar 

  43. 43.

    Young NE (2008) Greedy set-cover algorithms. In: Encyclopedia of algorithms. Springer, pp. 379–381

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Anand Gupta.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gupta, A., Kaur, M., Mittal, S. et al. PE-MSC: partial entailment-based minimum set cover for text summarization. Knowl Inf Syst (2021). https://doi.org/10.1007/s10115-020-01537-1

Download citation

Keywords

  • Text Summarization
  • Minimum set cover
  • information retrieval
  • Natural Language Processing