Abstract
Throughout the previous chapters, we have demonstrated that MWEs are a source of errors for machine translation (MT) systems and for human non-native speakers of a language. As Manning and Schütze (1999, p. 184) point out, “a nice way to test whether a combination is a collocation [MWE] is to translate it into another language. If we cannot translate the combination word by word, then there is evidence that we are dealing with a collocation”. In Sect. 2.3.2, we argue that the fact that MWEs cannot be translated word-for-word is a consequence of their limited syntactic and semantic compositionality. Adequate solutions for the variable syntactic/semantic fixedness of MWEs are not easy to find, especially in the context of statistical MT models. However, for high quality MT, it is important to detect MWEs, to disambiguate them semantically and to treat them appropriately in order to avoid generating unnatural translations or losing information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Experiments on the integration of PVs into English-Portuguese MT are described in the original version of the thesis. However, since these experiments do not show conclusive results yet, we prefer to report in this book the use of the mwetoolkit as a tool for studying and evaluating MT quality rather than integrating MWEs into MT systems.
- 2.
Some languages, however, require some linguistic preprocessing. This is the case of Chinese word segmentation, for instance.
- 3.
The term phrase is used here to denote any sequence of words, in opposition to its standard use in linguistic to denote a well formed linguistic constituent.
- 4.
- 5.
Work reported in this section was previously published in the paper How hard is it to automatically translate phrasal verbs from English to French? (Ramisch et al. 2013). It was carried out with the collaboration of Laurent Besacier and Alexander Kobzar.
- 6.
Available at the Web Inventory of Transcribed and Translated Talks: https://wit3.fbk.eu/
- 7.
In total, 4 MT systems were built.
- 8.
These were further cleaned, as described in Sect. 7.2.2.3.
- 9.
Problematic source sentences were removed manually, but a small number of such cases accidentally remained in the test data.
- 10.
The guidelines, labels and datasets discussed here are available at http://cameleon.imag.fr/xwiki/bin/view/Main/Phrasal_verbs_annotation
- 11.
Statistical significance was calculated using a two-tailed t test for the difference in means.
References
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL 2005 workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization, Ann Arbor. Association for Computational Linguistics, pp 65–72. http://www.aclweb.org/anthology/W/W05/W05-0909
Bolinger D (1971) The phrasal verb in English. Harvard University Press, Harvard, 187p
Briscoe T, Carroll J, Watson R (2006) The second release of the RASP system. In: Curran J (ed) Proceedings of the COLING/ACL 2006 interactive presentation sessions, Association for Computational Linguistics, Sidney, pp 77–80. http://www.aclweb.org/anthology/P/P06/P06-4020
Carpuat M, Diab M (2010) Task-based evaluation of multiword expressions: a pilot study in statistical machine translation. In: Proceedings of human language technology: the 2010 annual conference of the North American chapter of the Association for Computational Linguistics (NAACL 2003), Los Angeles. Association for Computational Linguistics, pp 242–245. http://www.aclweb.org/anthology/N10-1029
Cettolo M, Girardi C, Federico M (2012) WIT3: web inventory of transcribed and translated talks. In: Proceedings of the 16th conference of the European association for machine translation (EAMT), Trento, pp 261–268
Fraser B (1976) The verb-particle combination in English. Academic, New York
Gale WA, Church K (1993) A program for aligning sentences in bilingual corpora. Comput Linguist 19(1):75–102
Knight K (1999) Decoding complexity in word-replacement translation models. Comput Linguist 25(4):607–615
Knight K, Koehn P (2003) What’s new in statistical machine translation. In: Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on human language technology (NAACL 2003), Edmonton. Association for Computational Linguistics, p 5
Koehn P (2010) Statistical machine translation. Cambridge University Press, Cambridge, 488p
Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of the 2003 conference of the North American chapter of the Association for Computational Linguistics on human language technology (NAACL 2003), Edmonton. Association for Computational Linguistics, pp 48–54
Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association for Computational Linguistics (ACL 2007), Prague. Association for Computational Linguistics, pp 177–180
Lohse B, Hawkins JA, Wasow T (2004) Domain minimization in English verb-particle constructions. Language 80(2):238–261
Lopez A (2008) Statistical machine translation. ACM Comput Surv 40(3):1–49
Manning CD, Schütze H (1999) Foundations of statistical natural language processing. MIT, Cambridge, 620p
Och FJ, Ney H (2000) Improved statistical alignment models. In: Proceedings of the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), Hong Kong. Association for Computational Linguistics, pp 440–447
Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51
Och FJ, Ney H (2004) The alignment template approach to statistical machine translation. Comput Linguist 30(4):417–449
Papineni K, Roukos S, Ward T, Zhu W (2002) BLEU: a method for automatic evalution of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia. Association for Computational Linguistics, pp 311–318
Ramisch C, Besacier L, Kobzar O (2013) How hard is it to automatically translate phrasal verbs from English to French? In: Mitkov R, Monti J, Pastor GC, Seretan V (eds) Proceedings of the MT summit 2013 workshop on multi-word units in machine translation and translation technology (MUMTTT 2013), Nice, pp 53–61. http://www.mtsummit2013.info/workshop4.asp
Shinozaki T, Ostendorf M (2008) Cross-validation and aggregated EM training for robust parameter estimation. Comput Speech Lang 22(2):185–195
Sinclair J (ed) (1989) Collins COBUILD dictionary of phrasal verbs. Collins COBUILD, London, 512p
Snover M, Dorr BJ, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th conference of the Association for Machine Translation in the Americas, Cambridge. Association for Machine Translation in the Americas, pp 223–231
Stolcke A (2002) SRILM – an extensible language modeling toolkit. In: Hansen JHL, Pellom B (eds) Proceedings of the seventh international conference on spoken language processing, third INTERSPEECH event (ICSLP 2001 – INTERSPEECH 2002), Denver. International Speech Communication Association, pp 901–904
Stymne S (2009) A comparison of merging strategies for translation of German compounds. In: Proceedings of the student research workshop at EACL 2009, Athens, pp 61–69
Stymne S (2011a) Blast: a tool for error analysis of machine translation output. In: Proceedings of the ACL 2011 system demonstrations, Portland. Association for Computational Linguistics, pp 56–61. http://www.aclweb.org/anthology/P11-4010
Stymne S (2011b) Pre- and postprocessing for statistical machine translation into Germanic languages. In: Proceedings of the ACL 2011 student research workshop, Portland. Association for Computational Linguistics, pp 12–17. http://www.aclweb.org/anthology/P11-3003
Tillmann C, Ney H (2003) Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Comput Linguist 29(1):97–133
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Ramisch, C. (2015). Application 2: Machine Translation. In: Multiword Expressions Acquisition. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-09207-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-09207-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09206-5
Online ISBN: 978-3-319-09207-2
eBook Packages: Computer ScienceComputer Science (R0)