Abstract
This paper presents a comparison of various sentence compression techniques with human compressed sentences in the context of text summarization. Sentence compression is useful in text summarization as it allows to remove redundant and irrelevant information hence preserve space for more relevant information. In this paper, we evaluate recent state-of-the-art sentence compression techniques that are based on syntax alone, a mixture of relevancy and syntax, part of speech feature based machine learning, keywords alone and a naïve random word removal baseline. Results show that syntactic based techniques complemented by relevancy measures outperform all other techniques to preserve content in the task of text summarization. However, further analysis of human compressed sentences also shows that human compression techniques rely on world knowledge which is not captured by any automatic technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chandrasekar, R., Doran, C., Srinivas, B.: Motivations and Methods for Text Simplification. In: Proceedings of COLING 1996, Copenhagen, pp. 1041–1044 (1996)
Dorr, B., Zajic, D., Schwartz, R.: Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation. In: Proceedings of the HLT-NAACL Workshop on Text Summarization, pp. 1–8 (2003)
Knight, K., Marcu, D.: Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139(1), 91–107 (2002)
Gagnon, M., Da Sylva, L.: Text Compression by Syntactic Pruning. In: Lamontagne, L., Marchand, M. (eds.) Canadian AI 2006. LNCS (LNAI), vol. 4013, pp. 312–323. Springer, Heidelberg (2006)
Zajic, D., Dorr, B.J., Lin, J., Schwartz, R.: Multi-Candidate Reduction: Sentence Compression as a Tool for Document Summarization Tasks. Information Processing and Management 43, 1549–1570 (2007)
Harman, D., Liberman, M.: TIPSTER Complete. Linguistic Data Consortium (LDC), Philadelphia (1993)
Jaoua, M., Jaoua, F., Belguith, L.H., Hamadou, A.B.: Évaluation de l’impact de l’intégration des étapes de filtrage et de compression dans le processus d’automatisation du résumé. In: Résumé Automatique de Documents. Document numérique, Lavoisier, vol. 15, pp. 67–90 (2012)
Jing, H., McKeown, K.R.: Cut and Paste Based Text Summarization. In: Proceedings of NAACL 2000, Seattle, pp. 178–185 (2000)
Jing, H.: Sentence Reduction for Automatic Text Summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, Seattle, pp. 310–315 (April 2000)
Nguyen, M.L., Phan, X.H., Horiguchi, S., Shimazu, A.: A New Sentence Reduction Technique Based on a Decision Tree Model. International Journal on Artificial Intelligence Tools 16(1), 129–138 (2007)
McClosky, D., Charniak, E., Johnson, M.: Effective Self-Training for Parsing. In: Proceedings of HLT-NAACL 2006, New York, pp. 152–159 (2006)
Fellbaum, C.: WordNet: An Electronic Lexical Database. The MIT Press (May 1998)
Le Nguyen, M., Shimazu, A., Horiguchi, S., Ho, B.T., Fukushi, M.: Probabilistic Sentence Reduction Using Support Vector Machines. In: Proceedings of COLING 2004, Geneva, pp. 743–749 (August 2004)
Clarke, J., Lapata, M.: Global Inference for Sentence Compression an Integer Linear Programming Approach. Journal of Artificial Intelligence Research (JAIR) 31(1), 399–429 (2008)
Filippova, K., Strube, M.: Dependency Tree Based Sentence Compression. In: Proceedings of the Fifth International Natural Language Generation Conference, INLG 2008, Stroudsburg, PA, USA, pp. 25–32 (2008)
Galanis, D., Androutsopoulos, I.: An Extractive Supervised Two-Stage Method for Sentence Compression. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, Los Angeles, California, pp. 885–893 (2010)
Perera, P., Kosseim, L.: Evaluating Syntactic Sentence Compression for Text Summarisation. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2013. LNCS, vol. 7934, pp. 126–139. Springer, Heidelberg (2013)
Conroy, J.M., Schlesinger, J.D., O’Leary, D.P., Goldstein, J.: Back to Basics: CLASSY 2006. In: Proceedings of the HLT-NAACL 2006 Document Understanding Workshop, New York City (2006)
Schlesinger, J.D., O’Leary, D.P., Conroy, J.M.: Arabic/English Multi-document Summarization with CLASSY—The Past and the Future. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 568–581. Springer, Heidelberg (2008)
Dunlavy, D.M., Conroy, J.M., Schlesinger, J.D., Goodman, S.A., Okurowski, M.E., O’Leary, D.P., van Halteren, H.: Performance of a Three-Stage System for Multi-Document Summarization. In: Proceedings of the HLT-NAACL 2003 Document Understanding Workshop, Edmonton, Canada, pp. 153–159 (2003)
Dang, H.T.: DUC 2005: Evaluation of Question-focused Summarization Systems. In: Proceedings of the Workshop on Task-Focused Summarization and Question Answering, Sydney, pp. 48–55 (2006)
Dang, H.T.: Overview of DUC 2006. In: Proceedings of the HLT-NAACL 2006 Document Understanding Workshop (2006)
Lin, C.Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Moens, M.F., Szpakowicz, S. (eds.) Text Summarization Branches Out: Proceedings of the ACL 2004 Workshop, Barcelona, Spain, pp. 74–81 (July 2004)
Riezler, S., King, T.H., Crouch, R., Zaenen, A.: Statistical Sentence Condensation Using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, Edmonton, Canada, vol. 1, pp. 118–125 (2003)
Clarke, J., Lapata, M.: Models for Sentence Compression: A Comparison Across Domains, Training Requirements and Evaluation Measures. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, Sydney, Australia, pp. 377–384 (2006)
Marneffe, M.C.D., Manning, C.D.: The Stanford Typed Dependencies Representation. In: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, CrossParser 2008, Manchester, pp. 1–8 (2008)
Copeck, T., Inkpen, D., Kazantseva, A., Kennedy, A., Kipp, D., Szpakowicz, S.: Catch What You Can. In: Proceedings of Document Understanding Conference (DUC 2007), Rochester, New York, USA (2007)
Pingali, P.: K, R., Varma, V.: IIIT Hyderabad at DUC 2007. In: Proceedings of the HLT-NAACL 2007 Document Understanding Workshop, Rochester, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Perera, P., Kosseim, L. (2014). Evaluation of Sentence Compression Techniques against Human Performance. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-54903-8_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54902-1
Online ISBN: 978-3-642-54903-8
eBook Packages: Computer ScienceComputer Science (R0)