Skip to main content

Text Entailment for Logical Segmentation and Summarization

  • Conference paper
Natural Language and Information Systems (NLDB 2008)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5039))

Abstract

Summarization is the process of condensing a source text into a shorter version preserving its information content ([2]). This paper presents some original methods for text summarization by extraction of a single source document based on a particular intuition which is not explored till now: the logical structure of a text. The summarization relies on an original linear segmentation algorithm which we denote logical segmentation (LTT) because the score of a sentence is the number of sentences of the text which are entailed by it.

The summary is obtained by three methods: selecting the first sentence(s) from a segment, selecting the best scored sentence(s) from a segment and selecting the most informative sentence(s) (relative to the previously selected) from a segment. Moreover, our methods permit dynamically adjusting the derived summary size, independently of the number of segments.

Alternatively, a Dynamic Programming (DP) method, based on the continuity principle and applied to the sentences logically scored as above is proposed. This method proceeds by obtaining the summary firstly and then determining the segments.

Our methods of segmentation are applied and evaluated against the segmentation of the text “I spent the first 19 years” of Morris and Hirst ([17]). The original text is reproduced at [26]. Some statistics about the informativeness of the summaries with different lengths and obtained with the above methods relatively to the original (summarized) text are given. These statistics prove that the segmentation preceding the summarization could improve the quality of obtained summaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barzilay, R., Elhadad, M.: Using lexical chains for Text summarization. In: Mani, J., Maybury, M. (eds.) Advances in Automated Text Summarization, pp. 111–121. MIT Press, Cambridge (1999)

    Google Scholar 

  2. Barzilay, R., Lapata, M.: Modelling local coherence: an entity based approach. In: 43rd Annual Meeting of the ACL, pp. 141–148 (2005)

    Google Scholar 

  3. Befferman, D., Berger, A.: Statistical models of text segmentation. Machine Learning 34(1-3), 177–210 (1999)

    Article  Google Scholar 

  4. Boguraev, B., Neff, M.: Salience-based content characterization of text document. In: Mani, J., Maybury, M. (eds.) Advances in Automated Text Summarization, pp. 99–110. MIT Press, Cambridge (1999)

    Google Scholar 

  5. Choi, F.Y.: Advances in domain independent linear text segmentation. In: 6th Applied Natural Language Processing Conference, NAACL , pp. 26-33 (2000)

    Google Scholar 

  6. Corley, C., Mihalcea, R.: Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp. 13–18 (2005)

    Google Scholar 

  7. Dagan, I., Glickman, O., Magnini, B.: The PASCAL Recognising Textual Entailment Challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006)

    Google Scholar 

  8. Ferret, O., Grau, B.: A topic segmentation based on sematic domains. In: Horn, W. (ed.) ECAI, pp. 426–430. IOS Press, Amsterdam (2000)

    Google Scholar 

  9. Grosz, B., Sidner, C.: Attention, intentions and the structure of discourse. Computational Linguistics 12(3), 175–204 (1986)

    Google Scholar 

  10. Hearst, M.: TextTiling: A Quantitative Approach to Discourse Segmentation. Technical Report 93/24, University of California, Berkeley (1993)

    Google Scholar 

  11. Hearst, M.: TextTiling: Segmentig Text into Multi-Paragraph Subtopic Passages. Computational Linguistic 23, 33–64 (1997)

    Google Scholar 

  12. Hearst, M.: Multi-paragraph segmentation of expository text. In: 32nd Annual Meeting of ACL, pp. 9–16. ACL (1994)

    Google Scholar 

  13. Hovy, E.: Text summarization. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford University Press, Oxford (2003)

    Google Scholar 

  14. Kaufmann, S.: Cohesion and collocation: using context vectors in Text Segmentation. In: 37th Annual Meeting of the ACL, pp. 591–599. ACL (1999)

    Google Scholar 

  15. Mani, J.: Automatic summarization. John Benjamins Publishing Comp., Amsterdam (2001)

    MATH  Google Scholar 

  16. Marcu, D.: From discourse structure to text summaries. In: Mani, I., Maybury, M. (eds.) ACL/EACL Workshop on Intelligent Scalable TS, pp. 82–88 (1997)

    Google Scholar 

  17. Morris, J., Hirst, G.: Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text. Computational Linguistics 17(1), 21–48 (1991)

    Google Scholar 

  18. Orasan, C.: Comparative evaluation of modular automatic summarization systems using CAST. PhD Thesis. University of Wolverhampton, UK (2006)

    Google Scholar 

  19. Pevzner, L., Hearst, M.: A critique and improvement of an Evaluation Metric for Text segmentation. Computational Linguistics 28(1), 19–36 (2002)

    Article  Google Scholar 

  20. Radev, D., Hovy, E., McKeown, K.: Introduction to the Special Issues on Summarization. Computational Linguistics 28, 399–408 (2002)

    Article  Google Scholar 

  21. Silber, H.G., McCoy, K.: Efficiently computed lexical chains, as an intermediate representation for automatic text summarization. Computational Linguistics 28, 487–496 (2002)

    Article  Google Scholar 

  22. Tatar, D., Serban, G., Mihis, A., Mihalcea, R.: Text Entailment as directional relation. In: Orasan, C., Kuebler, S. (eds.) CALP Workshop at RANLP, pp. 53–58. Incoma Ltd, Bulgaria (2007)

    Google Scholar 

  23. Tatar, D., Tamaianu-Morita, E., Mihis, A., Lupsa, D.: Summarization by logic segmentation and text entailment. Research in Computing Science 33, 15–26 (2007)

    Google Scholar 

  24. Journal of Machine Learning Research, http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop

  25. The OpenNLP CCG Library, http://opennlp.sourceforge.net/

  26. Babes-Bolyai University, http://www.cs.ubbcluj.ro/~dtatar/nlp/Hirst.txt

Download references

Author information

Authors and Affiliations

Authors

Editor information

Epaminondas Kapetanios Vijayan Sugumaran Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tatar, D., Mihis, A.D., Lupsa, D. (2008). Text Entailment for Logical Segmentation and Summarization. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds) Natural Language and Information Systems. NLDB 2008. Lecture Notes in Computer Science, vol 5039. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69858-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-69858-6_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-69857-9

  • Online ISBN: 978-3-540-69858-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics