Text Entailment for Logical Segmentation and Summarization

Tatar, Doina; Mihis, Andreea Diana; Lupsa, Dana

doi:10.1007/978-3-540-69858-6_24

Doina Tatar¹,
Andreea Diana Mihis¹ &
Dana Lupsa¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5039))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1369 Accesses
5 Citations

Abstract

Summarization is the process of condensing a source text into a shorter version preserving its information content ([2]). This paper presents some original methods for text summarization by extraction of a single source document based on a particular intuition which is not explored till now: the logical structure of a text. The summarization relies on an original linear segmentation algorithm which we denote logical segmentation (LTT) because the score of a sentence is the number of sentences of the text which are entailed by it.

The summary is obtained by three methods: selecting the first sentence(s) from a segment, selecting the best scored sentence(s) from a segment and selecting the most informative sentence(s) (relative to the previously selected) from a segment. Moreover, our methods permit dynamically adjusting the derived summary size, independently of the number of segments.

Alternatively, a Dynamic Programming (DP) method, based on the continuity principle and applied to the sentences logically scored as above is proposed. This method proceeds by obtaining the summary firstly and then determining the segments.

Our methods of segmentation are applied and evaluated against the segmentation of the text “I spent the first 19 years” of Morris and Hirst ([17]). The original text is reproduced at [26]. Some statistics about the informativeness of the summaries with different lengths and obtained with the above methods relatively to the original (summarized) text are given. These statistics prove that the segmentation preceding the summarization could improve the quality of obtained summaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Barzilay, R., Elhadad, M.: Using lexical chains for Text summarization. In: Mani, J., Maybury, M. (eds.) Advances in Automated Text Summarization, pp. 111–121. MIT Press, Cambridge (1999)
Google Scholar
Barzilay, R., Lapata, M.: Modelling local coherence: an entity based approach. In: 43rd Annual Meeting of the ACL, pp. 141–148 (2005)
Google Scholar
Befferman, D., Berger, A.: Statistical models of text segmentation. Machine Learning 34(1-3), 177–210 (1999)
Article Google Scholar
Boguraev, B., Neff, M.: Salience-based content characterization of text document. In: Mani, J., Maybury, M. (eds.) Advances in Automated Text Summarization, pp. 99–110. MIT Press, Cambridge (1999)
Google Scholar
Choi, F.Y.: Advances in domain independent linear text segmentation. In: 6th Applied Natural Language Processing Conference, NAACL , pp. 26-33 (2000)
Google Scholar
Corley, C., Mihalcea, R.: Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp. 13–18 (2005)
Google Scholar
Dagan, I., Glickman, O., Magnini, B.: The PASCAL Recognising Textual Entailment Challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006)
Google Scholar
Ferret, O., Grau, B.: A topic segmentation based on sematic domains. In: Horn, W. (ed.) ECAI, pp. 426–430. IOS Press, Amsterdam (2000)
Google Scholar
Grosz, B., Sidner, C.: Attention, intentions and the structure of discourse. Computational Linguistics 12(3), 175–204 (1986)
Google Scholar
Hearst, M.: TextTiling: A Quantitative Approach to Discourse Segmentation. Technical Report 93/24, University of California, Berkeley (1993)
Google Scholar
Hearst, M.: TextTiling: Segmentig Text into Multi-Paragraph Subtopic Passages. Computational Linguistic 23, 33–64 (1997)
Google Scholar
Hearst, M.: Multi-paragraph segmentation of expository text. In: 32nd Annual Meeting of ACL, pp. 9–16. ACL (1994)
Google Scholar
Hovy, E.: Text summarization. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford University Press, Oxford (2003)
Google Scholar
Kaufmann, S.: Cohesion and collocation: using context vectors in Text Segmentation. In: 37th Annual Meeting of the ACL, pp. 591–599. ACL (1999)
Google Scholar
Mani, J.: Automatic summarization. John Benjamins Publishing Comp., Amsterdam (2001)
MATH Google Scholar
Marcu, D.: From discourse structure to text summaries. In: Mani, I., Maybury, M. (eds.) ACL/EACL Workshop on Intelligent Scalable TS, pp. 82–88 (1997)
Google Scholar
Morris, J., Hirst, G.: Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text. Computational Linguistics 17(1), 21–48 (1991)
Google Scholar
Orasan, C.: Comparative evaluation of modular automatic summarization systems using CAST. PhD Thesis. University of Wolverhampton, UK (2006)
Google Scholar
Pevzner, L., Hearst, M.: A critique and improvement of an Evaluation Metric for Text segmentation. Computational Linguistics 28(1), 19–36 (2002)
Article Google Scholar
Radev, D., Hovy, E., McKeown, K.: Introduction to the Special Issues on Summarization. Computational Linguistics 28, 399–408 (2002)
Article Google Scholar
Silber, H.G., McCoy, K.: Efficiently computed lexical chains, as an intermediate representation for automatic text summarization. Computational Linguistics 28, 487–496 (2002)
Article Google Scholar
Tatar, D., Serban, G., Mihis, A., Mihalcea, R.: Text Entailment as directional relation. In: Orasan, C., Kuebler, S. (eds.) CALP Workshop at RANLP, pp. 53–58. Incoma Ltd, Bulgaria (2007)
Google Scholar
Tatar, D., Tamaianu-Morita, E., Mihis, A., Lupsa, D.: Summarization by logic segmentation and text entailment. Research in Computing Science 33, 15–26 (2007)
Google Scholar
Journal of Machine Learning Research, http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop
The OpenNLP CCG Library, http://opennlp.sourceforge.net/
Babes-Bolyai University, http://www.cs.ubbcluj.ro/~dtatar/nlp/Hirst.txt

Download references

Author information

Authors and Affiliations

University “Babes-Bolyai”, Cluj-Napoca, Romania
Doina Tatar, Andreea Diana Mihis & Dana Lupsa

Authors

Doina Tatar
View author publications
You can also search for this author in PubMed Google Scholar
Andreea Diana Mihis
View author publications
You can also search for this author in PubMed Google Scholar
Dana Lupsa
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Epaminondas Kapetanios Vijayan Sugumaran Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tatar, D., Mihis, A.D., Lupsa, D. (2008). Text Entailment for Logical Segmentation and Summarization. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds) Natural Language and Information Systems. NLDB 2008. Lecture Notes in Computer Science, vol 5039. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69858-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-540-69858-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69857-9
Online ISBN: 978-3-540-69858-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics