Abstract
Summarization is the process of condensing a source text into a shorter version preserving its information content ([2]). This paper presents some original methods for text summarization by extraction of a single source document based on a particular intuition which is not explored till now: the logical structure of a text. The summarization relies on an original linear segmentation algorithm which we denote logical segmentation (LTT) because the score of a sentence is the number of sentences of the text which are entailed by it.
The summary is obtained by three methods: selecting the first sentence(s) from a segment, selecting the best scored sentence(s) from a segment and selecting the most informative sentence(s) (relative to the previously selected) from a segment. Moreover, our methods permit dynamically adjusting the derived summary size, independently of the number of segments.
Alternatively, a Dynamic Programming (DP) method, based on the continuity principle and applied to the sentences logically scored as above is proposed. This method proceeds by obtaining the summary firstly and then determining the segments.
Our methods of segmentation are applied and evaluated against the segmentation of the text “I spent the first 19 years” of Morris and Hirst ([17]). The original text is reproduced at [26]. Some statistics about the informativeness of the summaries with different lengths and obtained with the above methods relatively to the original (summarized) text are given. These statistics prove that the segmentation preceding the summarization could improve the quality of obtained summaries.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barzilay, R., Elhadad, M.: Using lexical chains for Text summarization. In: Mani, J., Maybury, M. (eds.) Advances in Automated Text Summarization, pp. 111–121. MIT Press, Cambridge (1999)
Barzilay, R., Lapata, M.: Modelling local coherence: an entity based approach. In: 43rd Annual Meeting of the ACL, pp. 141–148 (2005)
Befferman, D., Berger, A.: Statistical models of text segmentation. Machine Learning 34(1-3), 177–210 (1999)
Boguraev, B., Neff, M.: Salience-based content characterization of text document. In: Mani, J., Maybury, M. (eds.) Advances in Automated Text Summarization, pp. 99–110. MIT Press, Cambridge (1999)
Choi, F.Y.: Advances in domain independent linear text segmentation. In: 6th Applied Natural Language Processing Conference, NAACL , pp. 26-33 (2000)
Corley, C., Mihalcea, R.: Measuring the semantic similarity of texts. In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pp. 13–18 (2005)
Dagan, I., Glickman, O., Magnini, B.: The PASCAL Recognising Textual Entailment Challenge. In: Quiñonero-Candela, J., Dagan, I., Magnini, B., d’Alché-Buc, F. (eds.) MLCW 2005. LNCS (LNAI), vol. 3944, pp. 177–190. Springer, Heidelberg (2006)
Ferret, O., Grau, B.: A topic segmentation based on sematic domains. In: Horn, W. (ed.) ECAI, pp. 426–430. IOS Press, Amsterdam (2000)
Grosz, B., Sidner, C.: Attention, intentions and the structure of discourse. Computational Linguistics 12(3), 175–204 (1986)
Hearst, M.: TextTiling: A Quantitative Approach to Discourse Segmentation. Technical Report 93/24, University of California, Berkeley (1993)
Hearst, M.: TextTiling: Segmentig Text into Multi-Paragraph Subtopic Passages. Computational Linguistic 23, 33–64 (1997)
Hearst, M.: Multi-paragraph segmentation of expository text. In: 32nd Annual Meeting of ACL, pp. 9–16. ACL (1994)
Hovy, E.: Text summarization. In: Mitkov, R. (ed.) The Oxford Handbook of Computational Linguistics, pp. 583–598. Oxford University Press, Oxford (2003)
Kaufmann, S.: Cohesion and collocation: using context vectors in Text Segmentation. In: 37th Annual Meeting of the ACL, pp. 591–599. ACL (1999)
Mani, J.: Automatic summarization. John Benjamins Publishing Comp., Amsterdam (2001)
Marcu, D.: From discourse structure to text summaries. In: Mani, I., Maybury, M. (eds.) ACL/EACL Workshop on Intelligent Scalable TS, pp. 82–88 (1997)
Morris, J., Hirst, G.: Lexical Cohesion Computed by Thesaural Relations as an Indicator of the Structure of Text. Computational Linguistics 17(1), 21–48 (1991)
Orasan, C.: Comparative evaluation of modular automatic summarization systems using CAST. PhD Thesis. University of Wolverhampton, UK (2006)
Pevzner, L., Hearst, M.: A critique and improvement of an Evaluation Metric for Text segmentation. Computational Linguistics 28(1), 19–36 (2002)
Radev, D., Hovy, E., McKeown, K.: Introduction to the Special Issues on Summarization. Computational Linguistics 28, 399–408 (2002)
Silber, H.G., McCoy, K.: Efficiently computed lexical chains, as an intermediate representation for automatic text summarization. Computational Linguistics 28, 487–496 (2002)
Tatar, D., Serban, G., Mihis, A., Mihalcea, R.: Text Entailment as directional relation. In: Orasan, C., Kuebler, S. (eds.) CALP Workshop at RANLP, pp. 53–58. Incoma Ltd, Bulgaria (2007)
Tatar, D., Tamaianu-Morita, E., Mihis, A., Lupsa, D.: Summarization by logic segmentation and text entailment. Research in Computing Science 33, 15–26 (2007)
Journal of Machine Learning Research, http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop
The OpenNLP CCG Library, http://opennlp.sourceforge.net/
Babes-Bolyai University, http://www.cs.ubbcluj.ro/~dtatar/nlp/Hirst.txt
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tatar, D., Mihis, A.D., Lupsa, D. (2008). Text Entailment for Logical Segmentation and Summarization. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds) Natural Language and Information Systems. NLDB 2008. Lecture Notes in Computer Science, vol 5039. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69858-6_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-69858-6_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69857-9
Online ISBN: 978-3-540-69858-6
eBook Packages: Computer ScienceComputer Science (R0)