Application 2: Machine Translation

Ramisch, Carlos

doi:10.1007/978-3-319-09207-2_7

Carlos Ramisch⁵

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

964 Accesses

Abstract

Throughout the previous chapters, we have demonstrated that MWEs are a source of errors for machine translation (MT) systems and for human non-native speakers of a language. As Manning and Schütze (1999, p. 184) point out, “a nice way to test whether a combination is a collocation [MWE] is to translate it into another language. If we cannot translate the combination word by word, then there is evidence that we are dealing with a collocation”. In Sect. 2.3.2, we argue that the fact that MWEs cannot be translated word-for-word is a consequence of their limited syntactic and semantic compositionality. Adequate solutions for the variable syntactic/semantic fixedness of MWEs are not easy to find, especially in the context of statistical MT models. However, for high quality MT, it is important to detect MWEs, to disambiguate them semantically and to treat them appropriately in order to avoid generating unnatural translations or losing information.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Experiments on the integration of PVs into English-Portuguese MT are described in the original version of the thesis. However, since these experiments do not show conclusive results yet, we prefer to report in this book the use of the mwetoolkit as a tool for studying and evaluating MT quality rather than integrating MWEs into MT systems.
2.
Some languages, however, require some linguistic preprocessing. This is the case of Chinese word segmentation, for instance.
3.
The term phrase is used here to denote any sequence of words, in opposition to its standard use in linguistic to denote a well formed linguistic constituent.
4.
http://www.statmt.org/moses/
5.
Work reported in this section was previously published in the paper How hard is it to automatically translate phrasal verbs from English to French? (Ramisch et al. 2013). It was carried out with the collaboration of Laurent Besacier and Alexander Kobzar.
6.
Available at the Web Inventory of Transcribed and Translated Talks: https://wit3.fbk.eu/
7.
In total, 4 MT systems were built.
8.
These were further cleaned, as described in Sect. 7.2.2.3.
9.
Problematic source sentences were removed manually, but a small number of such cases accidentally remained in the test data.
10.
The guidelines, labels and datasets discussed here are available at http://cameleon.imag.fr/xwiki/bin/view/Main/Phrasal_verbs_annotation
11.
Statistical significance was calculated using a two-tailed t test for the difference in means.

References

Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL 2005 workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization, Ann Arbor. Association for Computational Linguistics, pp 65–72. http://www.aclweb.org/anthology/W/W05/W05-0909
Bolinger D (1971) The phrasal verb in English. Harvard University Press, Harvard, 187p
Google Scholar
Briscoe T, Carroll J, Watson R (2006) The second release of the RASP system. In: Curran J (ed) Proceedings of the COLING/ACL 2006 interactive presentation sessions, Association for Computational Linguistics, Sidney, pp 77–80. http://www.aclweb.org/anthology/P/P06/P06-4020
Carpuat M, Diab M (2010) Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. In: Proceedings of human language technology: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics (NAACL 2003), Los Angeles. Association for Computational Linguistics, pp 242–245. http://www.aclweb.org/anthology/N10-1029
Cettolo M, Girardi C, Federico M (2012) WIT³: web inventory of transcribed and translated talks. In: Proceedings of the 16th conference of the European association for machine translation (EAMT), Trento, pp 261–268
Google Scholar
Fraser B (1976) The verb-particle combination in English. Academic, New York
Google Scholar
Gale WA, Church K (1993) A program for aligning sentences in bilingual corpora. Comput Linguist 19(1):75–102
Google Scholar
Knight K (1999) Decoding complexity in word-replacement translation models. Comput Linguist 25(4):607–615
Google Scholar
Knight K, Koehn P (2003) What’s new in statistical machine translation. In: Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on human language technology (NAACL 2003), Edmonton. Association for Computational Linguistics, p 5
Google Scholar
Koehn P (2010) Statistical machine translation. Cambridge University Press, Cambridge, 488p
MATH Google Scholar
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on human language technology (NAACL 2003), Edmonton. Association for Computational Linguistics, pp 48–54
Google Scholar
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics (ACL 2007), Prague. Association for Computational Linguistics, pp 177–180
Google Scholar
Lohse B, Hawkins JA, Wasow T (2004) Domain minimization in English verb-particle constructions. Language 80(2):238–261
Article Google Scholar
Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3):1–49
Article Google Scholar
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT, Cambridge, 620p
MATH Google Scholar
Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong. Association for Computational Linguistics, pp 440–447
Google Scholar
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Article MATH Google Scholar
Och FJ, Ney H (2004) The alignment template approach to statistical machine translation. Comput Linguist 30(4):417–449
Article MATH Google Scholar
Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evalution of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia. Association for Computational Linguistics, pp 311–318
Google Scholar
Ramisch C, Besacier L, Kobzar O (2013) How hard is it to automatically translate phrasal verbs from English to French? In: Mitkov R, Monti J, Pastor GC, Seretan V (eds) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice, pp 53–61. http://www.mtsummit2013.info/workshop4.asp
Shinozaki T, Ostendorf M (2008) Cross-validation and aggregated EM training for robust parameter estimation. Comput Speech Lang 22(2):185–195
Article Google Scholar
Sinclair J (ed) (1989) Collins COBUILD dictionary of phrasal verbs. Collins COBUILD, London, 512p
Google Scholar
Snover M, Dorr BJ, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, Cambridge. Association for Machine Translation in the Americas, pp 223–231
Google Scholar
Stolcke A (2002) SRILM – an extensible language modeling toolkit. In: Hansen JHL, Pellom B (eds) Proceedings of the seventh international conference on spoken language processing, third INTERSPEECH event (ICSLP 2001 – INTERSPEECH 2002), Denver. International Speech Communication Association, pp 901–904
Google Scholar
Stymne S (2009) A comparison of merging strategies for translation of German compounds. In: Proceedings of the student research workshop at EACL 2009, Athens, pp 61–69
Google Scholar
Stymne S (2011a) Blast: a tool for error analysis of machine translation output. In: Proceedings of the ACL 2011 system demonstrations, Portland. Association for Computational Linguistics, pp 56–61. http://www.aclweb.org/anthology/P11-4010
Stymne S (2011b) Pre- and postprocessing for statistical machine translation into Germanic languages. In: Proceedings of the ACL 2011 student research workshop, Portland. Association for Computational Linguistics, pp 12–17. http://www.aclweb.org/anthology/P11-3003
Tillmann C, Ney H (2003) Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Comput Linguist 29(1):97–133
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Aix Marseille University, Marseille, France
Carlos Ramisch

Authors

Carlos Ramisch
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ramisch, C. (2015). Application 2: Machine Translation. In: Multiword Expressions Acquisition. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-09207-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-09207-2_7
Published: 05 August 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09206-5
Online ISBN: 978-3-319-09207-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics