Skip to main content

Evaluation of Sentence Compression Techniques against Human Performance

  • Conference paper
Computational Linguistics and Intelligent Text Processing (CICLing 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8404))

Abstract

This paper presents a comparison of various sentence compression techniques with human compressed sentences in the context of text summarization. Sentence compression is useful in text summarization as it allows to remove redundant and irrelevant information hence preserve space for more relevant information. In this paper, we evaluate recent state-of-the-art sentence compression techniques that are based on syntax alone, a mixture of relevancy and syntax, part of speech feature based machine learning, keywords alone and a naïve random word removal baseline. Results show that syntactic based techniques complemented by relevancy measures outperform all other techniques to preserve content in the task of text summarization. However, further analysis of human compressed sentences also shows that human compression techniques rely on world knowledge which is not captured by any automatic technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chandrasekar, R., Doran, C., Srinivas, B.: Motivations and Methods for Text Simplification. In: Proceedings of COLING 1996, Copenhagen, pp. 1041–1044 (1996)

    Google Scholar 

  2. Dorr, B., Zajic, D., Schwartz, R.: Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation. In: Proceedings of the HLT-NAACL Workshop on Text Summarization, pp. 1–8 (2003)

    Google Scholar 

  3. Knight, K., Marcu, D.: Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139(1), 91–107 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  4. Gagnon, M., Da Sylva, L.: Text Compression by Syntactic Pruning. In: Lamontagne, L., Marchand, M. (eds.) Canadian AI 2006. LNCS (LNAI), vol. 4013, pp. 312–323. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Zajic, D., Dorr, B.J., Lin, J., Schwartz, R.: Multi-Candidate Reduction: Sentence Compression as a Tool for Document Summarization Tasks. Information Processing and Management 43, 1549–1570 (2007)

    Article  Google Scholar 

  6. Harman, D., Liberman, M.: TIPSTER Complete. Linguistic Data Consortium (LDC), Philadelphia (1993)

    Google Scholar 

  7. Jaoua, M., Jaoua, F., Belguith, L.H., Hamadou, A.B.: Évaluation de l’impact de l’intégration des étapes de filtrage et de compression dans le processus d’automatisation du résumé. In: Résumé Automatique de Documents. Document numérique, Lavoisier, vol. 15, pp. 67–90 (2012)

    Google Scholar 

  8. Jing, H., McKeown, K.R.: Cut and Paste Based Text Summarization. In: Proceedings of NAACL 2000, Seattle, pp. 178–185 (2000)

    Google Scholar 

  9. Jing, H.: Sentence Reduction for Automatic Text Summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, Seattle, pp. 310–315 (April 2000)

    Google Scholar 

  10. Nguyen, M.L., Phan, X.H., Horiguchi, S., Shimazu, A.: A New Sentence Reduction Technique Based on a Decision Tree Model. International Journal on Artificial Intelligence Tools 16(1), 129–138 (2007)

    Article  Google Scholar 

  11. McClosky, D., Charniak, E., Johnson, M.: Effective Self-Training for Parsing. In: Proceedings of HLT-NAACL 2006, New York, pp. 152–159 (2006)

    Google Scholar 

  12. Fellbaum, C.: WordNet: An Electronic Lexical Database. The MIT Press (May 1998)

    Google Scholar 

  13. Le Nguyen, M., Shimazu, A., Horiguchi, S., Ho, B.T., Fukushi, M.: Probabilistic Sentence Reduction Using Support Vector Machines. In: Proceedings of COLING 2004, Geneva, pp. 743–749 (August 2004)

    Google Scholar 

  14. Clarke, J., Lapata, M.: Global Inference for Sentence Compression an Integer Linear Programming Approach. Journal of Artificial Intelligence Research (JAIR) 31(1), 399–429 (2008)

    MATH  Google Scholar 

  15. Filippova, K., Strube, M.: Dependency Tree Based Sentence Compression. In: Proceedings of the Fifth International Natural Language Generation Conference, INLG 2008, Stroudsburg, PA, USA, pp. 25–32 (2008)

    Google Scholar 

  16. Galanis, D., Androutsopoulos, I.: An Extractive Supervised Two-Stage Method for Sentence Compression. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT 2010, Los Angeles, California, pp. 885–893 (2010)

    Google Scholar 

  17. Perera, P., Kosseim, L.: Evaluating Syntactic Sentence Compression for Text Summarisation. In: Métais, E., Meziane, F., Saraee, M., Sugumaran, V., Vadera, S. (eds.) NLDB 2013. LNCS, vol. 7934, pp. 126–139. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  18. Conroy, J.M., Schlesinger, J.D., O’Leary, D.P., Goldstein, J.: Back to Basics: CLASSY 2006. In: Proceedings of the HLT-NAACL 2006 Document Understanding Workshop, New York City (2006)

    Google Scholar 

  19. Schlesinger, J.D., O’Leary, D.P., Conroy, J.M.: Arabic/English Multi-document Summarization with CLASSY—The Past and the Future. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 568–581. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  20. Dunlavy, D.M., Conroy, J.M., Schlesinger, J.D., Goodman, S.A., Okurowski, M.E., O’Leary, D.P., van Halteren, H.: Performance of a Three-Stage System for Multi-Document Summarization. In: Proceedings of the HLT-NAACL 2003 Document Understanding Workshop, Edmonton, Canada, pp. 153–159 (2003)

    Google Scholar 

  21. Dang, H.T.: DUC 2005: Evaluation of Question-focused Summarization Systems. In: Proceedings of the Workshop on Task-Focused Summarization and Question Answering, Sydney, pp. 48–55 (2006)

    Google Scholar 

  22. Dang, H.T.: Overview of DUC 2006. In: Proceedings of the HLT-NAACL 2006 Document Understanding Workshop (2006)

    Google Scholar 

  23. Lin, C.Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Moens, M.F., Szpakowicz, S. (eds.) Text Summarization Branches Out: Proceedings of the ACL 2004 Workshop, Barcelona, Spain, pp. 74–81 (July 2004)

    Google Scholar 

  24. Riezler, S., King, T.H., Crouch, R., Zaenen, A.: Statistical Sentence Condensation Using Ambiguity Packing and Stochastic Disambiguation Methods for Lexical-Functional Grammar. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, NAACL 2003, Edmonton, Canada, vol. 1, pp. 118–125 (2003)

    Google Scholar 

  25. Clarke, J., Lapata, M.: Models for Sentence Compression: A Comparison Across Domains, Training Requirements and Evaluation Measures. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, Sydney, Australia, pp. 377–384 (2006)

    Google Scholar 

  26. Marneffe, M.C.D., Manning, C.D.: The Stanford Typed Dependencies Representation. In: Proceedings of the Workshop on Cross-Framework and Cross-Domain Parser Evaluation, CrossParser 2008, Manchester, pp. 1–8 (2008)

    Google Scholar 

  27. Copeck, T., Inkpen, D., Kazantseva, A., Kennedy, A., Kipp, D., Szpakowicz, S.: Catch What You Can. In: Proceedings of Document Understanding Conference (DUC 2007), Rochester, New York, USA (2007)

    Google Scholar 

  28. Pingali, P.: K, R., Varma, V.: IIIT Hyderabad at DUC 2007. In: Proceedings of the HLT-NAACL 2007 Document Understanding Workshop, Rochester, New York (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Perera, P., Kosseim, L. (2014). Evaluation of Sentence Compression Techniques against Human Performance. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54903-8_46

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-54903-8_46

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-54902-1

  • Online ISBN: 978-3-642-54903-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics