Skip to main content

On the Limits of Sentence Compression by Deletion

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5790))

Abstract

Data-driven approaches to sentence compression define the task as dropping any subset of words from the input sentence while retaining important information and grammaticality. We show that only 16% of the observed compressed sentences in the domain of subtitling can be accounted for in this way. We argue that this is partly due to the lack of appropriate evaluation material and estimate that a deletion model is in fact compatible with approximately 55% of the observed data. We analyse the remaining cases in which deletion only failed to provide the required level of compression. We conclude that in those cases word order changes and paraphrasing are crucial. We therefore argue for more elaborate sentence compression models which include paraphrasing and word reordering. We report preliminary results of applying a recently proposed more powerful compression model in the context of subtitling for Dutch.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Barzilay, R., Lee, L.: Learning to paraphrase: an unsupervised approach using multiple-sequence alignment. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, Morristown, NJ, USA, pp. 16–23 (2003)

    Google Scholar 

  2. Belz, A., Reiter, E.: Comparing automatic and human evaluation of NLG systems. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 313–320 (2006)

    Google Scholar 

  3. Bouma, G., van Noord, G., Malouf, R.: Alpino: Wide-coverage computational analysis of Dutch. In: Daelemans, W., Sima’an, K., Veenstra, J., Zavre, J., et al. (eds.) Computational Linguistics in the Netherlands 2000. Selected Papers from the Eleventh CLIN Meeting, Rodopi, Amsterdam, New York, pp. 45–59 (2001)

    Google Scholar 

  4. Clarke, J., Lapata, M.: Models for sentence compression: a comparison across domains, training requirements and evaluation measures. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, Morristown, NJ, USA, pp. 377–384 (2006)

    Google Scholar 

  5. Clarke, J., Lapata, M.: Global inference for sentence compression an integer linear programming approach. Journal of Artificial Intelligence Research 31, 399–429 (2008)

    MATH  Google Scholar 

  6. Cohn, T., Lapata, M.: Sentence compression beyond word deletion. In: Proceedings of the 22nd International Conference on Computational Linguistics, vol. 1, pp. 137–144. Association for Computational Linguistics (2008)

    Google Scholar 

  7. Cohn, T., Lapata, M.: Sentence compression as tree transduction. J. Artif. Int. Res. 34(1), 637–674 (2009)

    MATH  Google Scholar 

  8. Corston-Oliver, S.: Text compaction for display on very small screens. In: Proceedings of the Workshop on Automatic Summarization (WAS 2001), Pittsburgh, PA, USA, pp. 89–98 (2001)

    Google Scholar 

  9. Daelemans, W., Höthker, A., Tjong Kim Sang, E.: Automatic sentence simplification for subtitling in Dutch and English. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 1045–1048 (2004)

    Google Scholar 

  10. Dolan, B., Quirk, C., Brockett, C.: Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proceedings of the 20th International Conference on Computational Linguistics, Morristown, NJ, USA, pp. 350–356 (2004)

    Google Scholar 

  11. Eisner, J.: Learning non-isomorphic tree mappings for machine translation. In: Proceedings of 41st Annual Meeting of the Association for Computational Linguistics, Sapporo, Japan, pp. 205–208 ( July 2003)

    Google Scholar 

  12. Filippova, K., Strube, M.: Sentence fusion via dependency graph compression. In: EMNLP 2008: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 177–185. Association for Computational Linguistics, Morristown (2008)

    Chapter  Google Scholar 

  13. Filippova, K., Strube, M.: Tree linearization in English: improving language model based approaches. In: NAACL 2009: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 225–228. Association for Computational Linguistics, Morristown (2009) (Companion Volume: Short Papers)

    Google Scholar 

  14. Gatt, A., Belz, A.: Attribute selection for referring expression generation: New algorithms and evaluation methods. In: Proceedings of the Fifth International Natural Language Generation Conference, pp. 50–58. Association for Computational Linguistics, Columbus (2008)

    Chapter  Google Scholar 

  15. Ibrahim, A., Katz, B., Lin, J.: Extracting structural paraphrases from aligned monolingual corpora. In: Proceedings of the 2nd International Workshop on Paraphrasing, Sapporo, Japan, vol. 16, pp. 57–64 (2003)

    Google Scholar 

  16. Inui, K., Tokunaga, T., Tanaka, H.: Text revision: A model and its implementation. In: Proceedings of the 6th International Workshop on Natural Language Generation: Aspects of Automated Natural Language Generation, pp. 215–230. Springer, London (1992)

    Chapter  Google Scholar 

  17. Jing, H., McKeown, K.: Cut and paste based text summarization. In: Proceedings of the 1st Conference of the North American Chapter of the Association for Computational Linguistics, San Francisco, CA, USA, pp. 178–185 (2000)

    Google Scholar 

  18. Knight, K., Marcu, D.: Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139(1), 91–107 (2002)

    Article  MATH  Google Scholar 

  19. Le, N.M., Horiguchi, S.: A new sentence reduction based on decision tree model. In: Proceedings of the 17th Pacific Asia Conference on Language, Information and Computation, pp. 290–297 (2003)

    Google Scholar 

  20. Lin, C.Y.: Improving summarization performance by sentence compression - A pilot study. In: Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages, vol. 2003, pp. 1–9 (2003)

    Google Scholar 

  21. Lin, D., Pantel, P.: Discovery of inference rules for question answering. Natural Language Engineering 7(4), 343–360 (2001)

    Article  Google Scholar 

  22. Marsi, E., Krahmer, E.: Annotating a parallel monolingual treebank with semantic similarity relations. In: Proceedings of the 6th International Workshop on Treebanks and Linguistic Theories, Bergen, Norway, pp. 85–96 (2007)

    Google Scholar 

  23. Marsi, E., Krahmer, E.: Detecting semantic overlap: A parallel monolingual treebank for Dutch. In: Verberne, S., van Halteren, H., Coppen, P.A. (eds.) Computational Linguistics in the Netherlands (CLIN 2007): Selected papers from the 18th meeting, Rodopi, Amsterdam, pp. 69–84 (2008)

    Google Scholar 

  24. Nomoto, T.: A Comparison of Model Free versus Model Intensive Approaches to Sentence Compression. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 391–399 (2009)

    Google Scholar 

  25. Ordelman, R., de Jong, F., van Hessen, A., Hondorp, H.: Twnc: a multifaceted Dutch news corpus. ELRA Newsletter 12(3/4), 4–7 (2007)

    Google Scholar 

  26. Turner, J., Charniak, E.: Supervised and unsupervised learning for sentence compression. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan, pp. 290–297 (June 2005)

    Google Scholar 

  27. Vandeghinste, V., Pan, Y.: Sentence compression for automated subtitling: A hybrid approach. In: Proceedings of the ACL Workshop on Text Summarization, pp. 89–95 (2004)

    Google Scholar 

  28. Vandeghinste, V., Tjong Kim Sang, E.: Using a Parallel Transcript/Subtitle Corpus for Sentence Compression. In: Proceedings of LREC 2004 (2004)

    Google Scholar 

  29. Wan, S., Dras, M., Dale, R., Paris, C.: Spanning tree approaches for statistical sentence generation. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 13–44. Springer, Heidelberg (2010)

    Google Scholar 

  30. Zajic, D., Dorr, B.J., Lin, J., Schwartz, R.: Multi-candidate reduction: Sentence compression as a tool for document summarization tasks. Information Processing Management 43(6), 1549–1570 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Marsi, E., Krahmer, E., Hendrickx, I., Daelemans, W. (2010). On the Limits of Sentence Compression by Deletion. In: Krahmer, E., Theune, M. (eds) Empirical Methods in Natural Language Generation. EACL ENLG 2009 2009. Lecture Notes in Computer Science(), vol 5790. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15573-4_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15573-4_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15572-7

  • Online ISBN: 978-3-642-15573-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics