Skip to main content

Application 2: Machine Translation

  • Chapter
  • First Online:
Multiword Expressions Acquisition
  • 964 Accesses

Abstract

Throughout the previous chapters, we have demonstrated that MWEs are a source of errors for machine translation (MT) systems and for human non-native speakers of a language. As Manning and Schütze (1999, p. 184) point out, “a nice way to test whether a combination is a collocation [MWE] is to translate it into another language. If we cannot translate the combination word by word, then there is evidence that we are dealing with a collocation”. In Sect. 2.3.2, we argue that the fact that MWEs cannot be translated word-for-word is a consequence of their limited syntactic and semantic compositionality. Adequate solutions for the variable syntactic/semantic fixedness of MWEs are not easy to find, especially in the context of statistical MT models. However, for high quality MT, it is important to detect MWEs, to disambiguate them semantically and to treat them appropriately in order to avoid generating unnatural translations or losing information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Experiments on the integration of PVs into English-Portuguese MT are described in the original version of the thesis. However, since these experiments do not show conclusive results yet, we prefer to report in this book the use of the mwetoolkit as a tool for studying and evaluating MT quality rather than integrating MWEs into MT systems.

  2. 2.

    Some languages, however, require some linguistic preprocessing. This is the case of Chinese word segmentation, for instance.

  3. 3.

    The term phrase is used here to denote any sequence of words, in opposition to its standard use in linguistic to denote a well formed linguistic constituent.

  4. 4.

    http://www.statmt.org/moses/

  5. 5.

    Work reported in this section was previously published in the paper How hard is it to automatically translate phrasal verbs from English to French? (Ramisch et al. 2013). It was carried out with the collaboration of Laurent Besacier and Alexander Kobzar.

  6. 6.

    Available at the Web Inventory of Transcribed and Translated Talks: https://wit3.fbk.eu/

  7. 7.

    In total, 4 MT systems were built.

  8. 8.

    These were further cleaned, as described in Sect. 7.2.2.3.

  9. 9.

    Problematic source sentences were removed manually, but a small number of such cases accidentally remained in the test data.

  10. 10.

    The guidelines, labels and datasets discussed here are available at http://cameleon.imag.fr/xwiki/bin/view/Main/Phrasal_verbs_annotation

  11. 11.

    Statistical significance was calculated using a two-tailed t test for the difference in means.

References

  • Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL 2005 workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization, Ann Arbor. Association for Computational Linguistics, pp 65–72. http://www.aclweb.org/anthology/W/W05/W05-0909

  • Bolinger D (1971) The phrasal verb in English. Harvard University Press, Harvard, 187p

    Google Scholar 

  • Briscoe T, Carroll J, Watson R (2006) The second release of the RASP system. In: Curran J (ed) Proceedings of the COLING/ACL 2006 interactive presentation sessions, Association for Computational Linguistics, Sidney, pp 77–80. http://www.aclweb.org/anthology/P/P06/P06-4020

  • Carpuat M, Diab M (2010) Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. In: Proceedings of human language technology: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics (NAACL 2003), Los Angeles. Association for Computational Linguistics, pp 242–245. http://www.aclweb.org/anthology/N10-1029

  • Cettolo M, Girardi C, Federico M (2012) WIT3: web inventory of transcribed and translated talks. In: Proceedings of the 16th conference of the European association for machine translation (EAMT), Trento, pp 261–268

    Google Scholar 

  • Fraser B (1976) The verb-particle combination in English. Academic, New York

    Google Scholar 

  • Gale WA, Church K (1993) A program for aligning sentences in bilingual corpora. Comput Linguist 19(1):75–102

    Google Scholar 

  • Knight K (1999) Decoding complexity in word-replacement translation models. Comput Linguist 25(4):607–615

    Google Scholar 

  • Knight K, Koehn P (2003) What’s new in statistical machine translation. In: Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on human language technology (NAACL 2003), Edmonton. Association for Computational Linguistics, p 5

    Google Scholar 

  • Koehn P (2010) Statistical machine translation. Cambridge University Press, Cambridge, 488p

    MATH  Google Scholar 

  • Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on human language technology (NAACL 2003), Edmonton. Association for Computational Linguistics, pp 48–54

    Google Scholar 

  • Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics (ACL 2007), Prague. Association for Computational Linguistics, pp 177–180

    Google Scholar 

  • Lohse B, Hawkins JA, Wasow T (2004) Domain minimization in English verb-particle constructions. Language 80(2):238–261

    Article  Google Scholar 

  • Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3):1–49

    Article  Google Scholar 

  • Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT, Cambridge, 620p

    MATH  Google Scholar 

  • Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong. Association for Computational Linguistics, pp 440–447

    Google Scholar 

  • Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51

    Article  MATH  Google Scholar 

  • Och FJ, Ney H (2004) The alignment template approach to statistical machine translation. Comput Linguist 30(4):417–449

    Article  MATH  Google Scholar 

  • Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evalution of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia. Association for Computational Linguistics, pp 311–318

    Google Scholar 

  • Ramisch C, Besacier L, Kobzar O (2013) How hard is it to automatically translate phrasal verbs from English to French? In: Mitkov R, Monti J, Pastor GC, Seretan V (eds) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice, pp 53–61. http://www.mtsummit2013.info/workshop4.asp

  • Shinozaki T, Ostendorf M (2008) Cross-validation and aggregated EM training for robust parameter estimation. Comput Speech Lang 22(2):185–195

    Article  Google Scholar 

  • Sinclair J (ed) (1989) Collins COBUILD dictionary of phrasal verbs. Collins COBUILD, London, 512p

    Google Scholar 

  • Snover M, Dorr BJ, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, Cambridge. Association for Machine Translation in the Americas, pp 223–231

    Google Scholar 

  • Stolcke A (2002) SRILM – an extensible language modeling toolkit. In: Hansen JHL, Pellom B (eds) Proceedings of the seventh international conference on spoken language processing, third INTERSPEECH event (ICSLP 2001 – INTERSPEECH 2002), Denver. International Speech Communication Association, pp 901–904

    Google Scholar 

  • Stymne S (2009) A comparison of merging strategies for translation of German compounds. In: Proceedings of the student research workshop at EACL 2009, Athens, pp 61–69

    Google Scholar 

  • Stymne S (2011a) Blast: a tool for error analysis of machine translation output. In: Proceedings of the ACL 2011 system demonstrations, Portland. Association for Computational Linguistics, pp 56–61. http://www.aclweb.org/anthology/P11-4010

  • Stymne S (2011b) Pre- and postprocessing for statistical machine translation into Germanic languages. In: Proceedings of the ACL 2011 student research workshop, Portland. Association for Computational Linguistics, pp 12–17. http://www.aclweb.org/anthology/P11-3003

  • Tillmann C, Ney H (2003) Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Comput Linguist 29(1):97–133

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Ramisch, C. (2015). Application 2: Machine Translation. In: Multiword Expressions Acquisition. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-09207-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09207-2_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09206-5

  • Online ISBN: 978-3-319-09207-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics