Evaluation of Sentence Compression Techniques against Human Performance

Perera, Prasad; Kosseim, Leila

doi:10.1007/978-3-642-54903-8_46

Prasad Perera¹⁷ &
Leila Kosseim¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1686 Accesses
3 Citations

Abstract

This paper presents a comparison of various sentence compression techniques with human compressed sentences in the context of text summarization. Sentence compression is useful in text summarization as it allows to remove redundant and irrelevant information hence preserve space for more relevant information. In this paper, we evaluate recent state-of-the-art sentence compression techniques that are based on syntax alone, a mixture of relevancy and syntax, part of speech feature based machine learning, keywords alone and a naïve random word removal baseline. Results show that syntactic based techniques complemented by relevancy measures outperform all other techniques to preserve content in the task of text summarization. However, further analysis of human compressed sentences also shows that human compression techniques rely on world knowledge which is not captured by any automatic technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chandrasekar, R., Doran, C., Srinivas, B.: Motivations and Methods for Text Simplification. In: Proceedings of COLING 1996, Copenhagen, pp. 1041–1044 (1996)
Google Scholar
Dorr, B., Zajic, D., Schwartz, R.: Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation. In: Proceedings of the HLT-NAACL Workshop on Text Summarization, pp. 1–8 (2003)
Google Scholar
Knight, K., Marcu, D.: Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139(1), 91–107 (2002)
Article MATH MathSciNet Google Scholar
Gagnon, M., Da Sylva, L.: Text Compression by Syntactic Pruning. In: Lamontagne, L., Marchand, M. (eds.) Canadian AI 2006. LNCS (LNAI), vol. 4013, pp. 312–323. Springer, Heidelberg (2006)
Chapter Google Scholar
Zajic, D., Dorr, B.J., Lin, J., Schwartz, R.: Multi-Candidate Reduction: Sentence Compression as a Tool for Document Summarization Tasks. Information Processing and Management 43, 1549–1570 (2007)
Article Google Scholar
Harman, D., Liberman, M.: TIPSTER Complete. Linguistic Data Consortium (LDC), Philadelphia (1993)
Google Scholar
Jaoua, M., Jaoua, F., Belguith, L.H., Hamadou, A.B.: Évaluation de l’impact de l’intégration des étapes de filtrage et de compression dans le processus d’automatisation du résumé. In: Résumé Automatique de Documents. Document numérique, Lavoisier, vol. 15, pp. 67–90 (2012)
Google Scholar
Jing, H., McKeown, K.R.: Cut and Paste Based Text Summarization. In: Proceedings of NAACL 2000, Seattle, pp. 178–185 (2000)
Google Scholar
Jing, H.: Sentence Reduction for Automatic Text Summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, Seattle, pp. 310–315 (April 2000)
Google Scholar
Nguyen, M.L., Phan, X.H., Horiguchi, S., Shimazu, A.: A New Sentence Reduction Technique Based on a Decision Tree Model. International Journal on Artificial Intelligence Tools 16(1), 129–138 (2007)
Article Google Scholar
McClosky, D., Charniak, E., Johnson, M.: Effective Self-Training for Parsing. In: Proceedings of HLT-NAACL 2006, New York, pp. 152–159 (2006)
Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. The MIT Press (May 1998)
Google Scholar
Le Nguyen, M., Shimazu, A., Horiguchi, S., Ho, B.T., Fukushi, M.: Probabilistic Sentence Reduction Using Support Vector Machines. In: Proceedings of COLING 2004, Geneva, pp. 743–749 (August 2004)
Google Scholar
Clarke, J., Lapata, M.: Global Inference for Sentence Compression an Integer Linear Programming Approach. Journal of Artificial Intelligence Research (JAIR) 31(1), 399–429 (2008)
MATH Google Scholar
Filippova, K., Strube, M.: Dependency Tree Based Sentence Compression. In: Proceedings of the Fifth International Natural Language Generation Conference, INLG 2008, Stroudsburg, PA, USA, pp. 25–32 (2008)
Google Scholar
Galanis, D., Androutsopoulos, I.: An Extractive Supervised Two-Stage Method for Sentence Compression. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, Los Angeles, California, pp. 885–893 (2010)
Google Scholar
Perera, P., Kosseim, L.: Evaluating Syntactic Sentence Compression for Text Summarisation. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2013. LNCS, vol. 7934, pp. 126–139. Springer, Heidelberg (2013)
Chapter Google Scholar
Conroy, J.M., Schlesinger, J.D., O’Leary, D.P., Goldstein, J.: Back to Basics: CLASSY 2006. In: Proceedings of the HLT-NAACL 2006 Document Understanding Workshop, New York City (2006)
Google Scholar
Schlesinger, J.D., O’Leary, D.P., Conroy, J.M.: Arabic/English Multi-document Summarization with CLASSY—The Past and the Future. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 568–581. Springer, Heidelberg (2008)
Chapter Google Scholar
Dunlavy, D.M., Conroy, J.M., Schlesinger, J.D., Goodman, S.A., Okurowski, M.E., O’Leary, D.P., van Halteren, H.: Performance of a Three-Stage System for Multi-Document Summarization. In: Proceedings of the HLT-NAACL 2003 Document Understanding Workshop, Edmonton, Canada, pp. 153–159 (2003)
Google Scholar
Dang, H.T.: DUC 2005: Evaluation of Question-focused Summarization Systems. In: Proceedings of the Workshop on Task-Focused Summarization and Question Answering, Sydney, pp. 48–55 (2006)
Google Scholar
Dang, H.T.: Overview of DUC 2006. In: Proceedings of the HLT-NAACL 2006 Document Understanding Workshop (2006)
Google Scholar
Lin, C.Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Moens, M.F., Szpakowicz, S. (eds.) Text Summarization Branches Out: Proceedings of the ACL 2004 Workshop, Barcelona, Spain, pp. 74–81 (July 2004)
Google Scholar
Riezler, S., King, T.H., Crouch, R., Zaenen, A.: Statistical Sentence Condensation Using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, Edmonton, Canada, vol. 1, pp. 118–125 (2003)
Google Scholar
Clarke, J., Lapata, M.: Models for Sentence Compression: A Comparison Across Domains, Training Requirements and Evaluation Measures. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, Sydney, Australia, pp. 377–384 (2006)
Google Scholar
Marneffe, M.C.D., Manning, C.D.: The Stanford Typed Dependencies Representation. In: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, CrossParser 2008, Manchester, pp. 1–8 (2008)
Google Scholar
Copeck, T., Inkpen, D., Kazantseva, A., Kennedy, A., Kipp, D., Szpakowicz, S.: Catch What You Can. In: Proceedings of Document Understanding Conference (DUC 2007), Rochester, New York, USA (2007)
Google Scholar
Pingali, P.: K, R., Varma, V.: IIIT Hyderabad at DUC 2007. In: Proceedings of the HLT-NAACL 2007 Document Understanding Workshop, Rochester, New York (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science & Software Engineering, Concordia University, Montreal, Canada
Prasad Perera & Leila Kosseim

Authors

Prasad Perera
View author publications
You can also search for this author in PubMed Google Scholar
Leila Kosseim
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, Av. Juan Dios Bátiz, Col. Nueva Industrial Vallejo, 07738, Mexico D.F, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perera, P., Kosseim, L. (2014). Evaluation of Sentence Compression Techniques against Human Performance. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_46

Download citation

DOI: https://doi.org/10.1007/978-3-642-54903-8_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics