Advertisement

Quality Expectations of Machine Translation

  • Andy Way
Chapter
Part of the Machine Translation: Technologies and Applications book series (MATRA, volume 1)

Abstract

Machine Translation (MT) is being deployed for a range of use-cases by millions of people on a daily basis. There should, therefore, be no doubt as to the utility of MT. However, not everyone is convinced that MT can be useful, especially as a productivity enhancer for human translators. In this chapter, I address this issue, describing how MT is currently deployed, how its output is evaluated and how this could be enhanced, especially as MT quality itself improves. Central to these issues is the acceptance that there is no longer a single ‘gold standard’ measure of quality, such that the situation in which MT is deployed needs to be borne in mind, especially with respect to the expected ‘shelf-life’ of the translation itself.

Keywords

Translation quality assessment Translation metrics Neural machine translation Translator productivity Translation users 

Notes

Acknowledgments

This work has been supported by the ADAPT Centre for Digital Content Technology which is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.

References

  1. Agarwal A, Lavie A (2008) METEOR, M-BLEU and M-TER: evaluation metrics for high-correlation with human rankings of machine translation output. In: Proceedings of the third workshop on Statistical Machine Translation, Columbus, pp 115–118Google Scholar
  2. Albrecht J, Hwa R (2007) Regression for sentence-level MT evaluation with pseudo references. In: Proceedings of the 45th annual meeting of the Association of Computational Linguistics, Prague, pp 296–303Google Scholar
  3. Arnold D, Moffat D, Sadler L, Way A (1993) Automatic generation of test suites. Mach Transl 8:29–38CrossRefGoogle Scholar
  4. Arnold D, Balkan L, Meijer S, Humphreys L, Sadler L (1994) Machine translation: an introductory guide. Blackwells-NCC, LondonGoogle Scholar
  5. Babych B, Hartley A (2004) Extending the BLEU MT evaluation method with frequency weightings. In: Proceedings of ACL 2004: 42nd annual meeting of the Association for Computational Linguistics, Barcelona, pp 621–628Google Scholar
  6. Balkan L, Jäschke M, Humphreys L, Meijer S, Way A (1991) Declarative evaluation of an MT system: practical experiences. In: Proceedings of the evaluators’ forum, Les Rasses, Vaud, pp 85–97Google Scholar
  7. Balkan L, Arnold D, Meijer S (1994) Test suites for natural language processing. In: Proceedings of translating and the computer 16, London, pp 51–58Google Scholar
  8. Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of ACL 2005, Proceedings of the workshop on intrinsic and extrinsic evaluation measures for MT and/or summarization at the 43rd annual meeting of the Association for Computational Linguistics, Ann Arbor, pp 65–72Google Scholar
  9. Bellos D (2011) Is that a fish in your ear: translation and the meaning of everything. Particular Books, LondonGoogle Scholar
  10. Bentivogli L, Bisazza A, Cettolo M, Federico M (2016) Neural versus phrase-based machine translation quality: a case study. In: Proceedings of the 2016 conference on empirical methods in natural language processing, Austin, pp 257–267Google Scholar
  11. Biçici E, Dymetman M (2008) Dynamic translation memory: using statistical machine translation to improve translation memory fuzzy matches. In: Proceedings of the 9th international conference on computational linguistics and intelligent text processing, Haifa, pp 454–465Google Scholar
  12. Callison-Burch C, Osborne M, Koehn P (2006) Re-evaluating the role of BLEU in machine translation research. In: Proceedings of EACL 2006, 11th conference of the European chapter of the Association for Computational Linguistics, Trento, pp 249–256Google Scholar
  13. Callison-Burch C, Fordyce C, Koehn P, Monz C, Schroeder J (2008) Further meta-evaluation of machine translation. In: Proceedings of the third workshop on Statistical Machine Translation, Columbus, pp 70–106Google Scholar
  14. Chatterjee R, Turchi M, Negri M (2015) The FBK participation in the WMT15 automatic post-editing shared task. In: Proceedings of the tenth workshop on Statistical Machine Translation, Lisbon, pp 210–215Google Scholar
  15. Chung J, Cho K, Bengio Y (2016) A character-level decoder without explicit segmentation for neural machine translation. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, vol 1: Long Papers. Berlin, pp 1693–1703Google Scholar
  16. Coughlin D (2003) Correlating automated and human assessments of machine translation quality. In: Proceedings of MT Summit IX, New Orleans, pp 63–70Google Scholar
  17. Cuong H, Frank S, Sima’an K (2016) ILLC-UvA adaptation system (Scorpio) at WMT’16 IT-DOMAIN Task. In: Proceedings of the first conference on Machine Translation, Berlin, pp 423–427Google Scholar
  18. de Almeida G (2013) Translating the post-editor: an investigation of post-editing changes and correlations with professional experience across two Romance languages. Dissertation, Dublin City UniversityGoogle Scholar
  19. Doddington G (2002) Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In: Proceedings of HLT 2002: human language technology conference, San Diego, pp 138–145Google Scholar
  20. Doyon J, White J, Taylor K (1999) Task-based evaluation for machine translation. In: Proceedings of MT Summit VII “MT in the Great Translation Era”, Singapore, pp 574–578Google Scholar
  21. Farrús M, Costa-Jussà M, Popović M (2012) Study and correlation analysis of linguistic, perceptual and automatic machine translation evaluations. J Am Soc Inf Sci Technol 63(1):174–184CrossRefGoogle Scholar
  22. Font Llitjós A, Carbonell J, Lavie A (2005) A framework for interactive and automatic refinement of transfer-based machine translation. In: 10th EAMT conference “Practical applications of machine translation”, Budapest, pp 87–96Google Scholar
  23. Ha T-L, Niehues J, Cho E, Mediani M, Waibel A (2015) The KIT translation systems for IWSLT 2015. In: Proceedings of international workshop on spoken language translation, Da Nang, pp 62–69Google Scholar
  24. He Y, Way A (2009a) Improving the objective function in minimum error rate training. In: Proceedings of Machine Translation Summit XII, Ottawa, pp 238–245Google Scholar
  25. He Y, Way A (2009b) Metric and reference factors in minimum error rate training. Mach Transl 24(1):27–38CrossRefGoogle Scholar
  26. He Y, Way A (2009c) Learning labelled dependencies in machine translation evaluation. In: Proceedings of EAMT-09, the 13th annual meeting of the European Association for Machine Translation, Barcelona, pp 44–51Google Scholar
  27. He Y, Ma Y, van Genabith J, Way A (2010a) Bridging SMT and TM with translation recommendation. In: Proceedings of the 48th annual meeting of the Association for Computational Linguistics, Uppsala, pp 622–630Google Scholar
  28. He Y, Ma Y, Way A, van Genabith J (2010b) Integrating n-best SMT outputs into a TM system. In: Proceedings of the 23rd international conference on computational linguistics, Beijing, pp 374–382Google Scholar
  29. Heyn M (1998) Translation memories – insights & prospects. In: Bowker L, Cronin M, Kenny D, Pearson J (eds) Unity in diversity? Current trends in translation studies. St Jerome, Manchester, pp 123–136Google Scholar
  30. Hofmann N (2015) MT-enhanced fuzzy matching with Transit NXT and STAR Moses. EAMT-2015: Proceedings of the eighteenth annual conference of the European Association for Machine Translation, Antalya, p 215Google Scholar
  31. Hovy Y, Ravichandran D (2003) Holy and unholy grails. Panel discussion at MT Summit IX, New Orleans. Available from http://www.mt-archive.info/MTS-2003-Hovy-1.pdf. Accessed 12 Nov 2017
  32. Huck M, Birch A (2015) The Edinburgh machine translation systems for IWSLT 2015. In: Proceedings of the international workshop on spoken language translation, Da Nang, pp 31–38Google Scholar
  33. Humphreys L, Jäschke M, Way A, Balkan L, Meyer S (1991) Operational evaluation of MT, draft research proposal. Working papers in language processing 22, University of EssexGoogle Scholar
  34. Isozaki H, Hirao T, Duh K, Sudoh K, Tsukada H (2010) Automatic evaluation of translation quality for distant language pairs. In: Proceedings of the 2010 conference on empirical methods in natural language processing, Cambridge, pp 944–952Google Scholar
  35. Jean S, Firat O, Cho K, Memisevic R, Bengio Y (2015) Montreal neural machine translation systems for WMT15. In: Proceedings of the tenth workshop on Statistical Machine Translation, Lisbon, pp 134–140Google Scholar
  36. Jehl L, Simianer P, Hitschler J, Riezler S (2015) The Heidelberg University English-German translation system for IWSLT 2015. In: Proceedings of the international workshop on spoken language translation, Da Nang, pp 45–49Google Scholar
  37. King M, Falkedal K (1990) Using test suites in evaluation of MT systems. In: Proceedings of COLING-90, Papers presented to the 13th international conference on computational linguistics, vol 2, Helsinki, pp 211–216Google Scholar
  38. Koehn P, Senellart J (2010) Convergence of translation memory and statistical machine translation. In: Proceedings of AMTA workshop on MT Research and the Translation Industry, Denver, pp 21–31Google Scholar
  39. Koehn P, Och F, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of HLT-NAACL 2003: conference combining Human Language Technology conference series and the North American chapter of the Association for Computational Linguistics conference series, Edmonton, pp 48–54Google Scholar
  40. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th annual meeting of the Association of Computational Linguistics, Prague, pp 177–180Google Scholar
  41. Levenshtein V (1966) Binary codes capable of correcting deletions, insertions, and reversals. Sov Phys Dokl 10:707–710MathSciNetGoogle Scholar
  42. Lewis W, Quirk C (2013) Controlled ascent: imbuing statistical MT with linguistic knowledge. In: Proceedings of the second workshop on Hybrid Approaches to Translation, Sofia, pp 51–66Google Scholar
  43. Li L, Way A, Liu Q (2014) A discriminative framework of integrating translation memory features into SMT. In: Proceedings of the 11th conference of the Association for Machine Translation in the Americas, vol 1: MT Researchers Track, Vancouver, pp 249–260Google Scholar
  44. Liang P, Bouchard-Côté A, Klein D, Taskar B (2006) An end-to-end discriminative approach to machine translation. In: Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the Association for Computational Linguistics, Sydney, pp 761–768Google Scholar
  45. Lin C-Y, Och F (2004) ORANGE: a Method for evaluating automatic evaluation metrics for machine translation. In: COLING 2004: Proceedings of the 20th international conference on Computational Linguistics, Geneva, pp 501–507Google Scholar
  46. Liu D, Gildea D (2005) Syntactic features for evaluation of machine translation. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Ann Arbor, pp 25–32Google Scholar
  47. Luong M-T, Manning C (2015) Stanford neural machine translation systems for spoken language domains. In: Proceedings of the international workshop on spoken language translation, Da Nang, pp 76–79Google Scholar
  48. Luong M-T, Manning C (2016) Achieving open vocabulary neural machine translation with hybrid word-character models. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, vol 1: Long Papers, Berlin, pp 1054–1063Google Scholar
  49. Ma Y, He Y, Way A, van Genabith J (2011) Consistent translation using discriminative learning – a translation memory-inspired approach. In: Proceedings of the 49th annual meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, pp 1239–1248Google Scholar
  50. Miller G, Beckwith R, Fellbaum C, Gross D, Miller K (1990) Introduction to WordNet: an on-line lexical database. Int J Lexicogr 3(4):235–244CrossRefGoogle Scholar
  51. Moorkens J, Way A (2016) Comparing translator acceptability of TM and SMT outputs. Balt J Mod Comput 4(2):141–151Google Scholar
  52. Naskar S, Toral A, Gaspari F, Way A (2011) Framework for diagnostic evaluation of MT based on linguistic checkpoints. In: Proceedings of Machine Translation Summit XIII, Xiamen, pp 529–536Google Scholar
  53. Och F (2003) Minimum error rate training in statistical machine translation. In: ACL 2003, 41st annual meeting of the Association for Computational Linguistics, Sapporo, pp 160–167Google Scholar
  54. Owczarzak K, van Genabith J, Way A (2007) Labelled dependencies in machine translation evaluation. In: Proceedings of the second workshop on Statistical Machine Translation, Prague, pp 104–111Google Scholar
  55. Papineni K, Roukos S, Ward T, Zhu W-J (2002) BLEU: a method for automatic evaluation of machine translation. In: ACL-2002: 40th annual meeting of the Association for Computational Linguistics, Philadelphia, pp 311–318Google Scholar
  56. Penkale S, Way A (2013) Tailor-made quality-controlled translation. In: Proceedings of translating and the computer 35, London, 7 pagesGoogle Scholar
  57. Pierce J, Carroll J, Hamp E, Hays D, Hockett C, Oettinger A, Perlis A (1966) Language and machines – computers in translation and linguistics. ALPAC report, National Academy of Sciences, Washington, DCGoogle Scholar
  58. Popović M (2015) ChrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the tenth workshop on Statistical Machine Translation, Lisbon, pp 392–395Google Scholar
  59. Popović M, Ney H (2011) Towards automatic error analysis of machine translation output. Comput Linguist 37(4):657–688MathSciNetCrossRefGoogle Scholar
  60. Riezler S, Maxwell J (2005) On some pitfalls in automatic evaluation and significance testing for MT. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Ann Arbor, pp 57–64Google Scholar
  61. Sennrich R, Haddow B, Birch A (2016a) Edinburgh neural machine translation systems for WMT 16. In: Proceedings of the first conference on Machine Translation, Berlin, pp 371–376Google Scholar
  62. Sennrich R, Haddow B, Birch A (2016b) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th annual meeting of the Association for Computational Linguistics, vol 1, Berlin, pp 86–96Google Scholar
  63. Shterionov D, Nagle P, Casanellas L, Superbo R, O’Dowd T, Way A (2018) Human vs automatic quality evaluation of NMT and PBSMT. Mach Transl 32(3–4.) (in press)Google Scholar
  64. Sikes R (2007) Fuzzy matching in theory and practice. Multilingual 18(6):39–43Google Scholar
  65. Simard M, Isabelle P (2009) Phrase-based machine translation in a computer-assisted translation environment. In: Proceedings of the twelfth Machine Translation Summit (MT Summit XII), Ottawa, pp 120–127Google Scholar
  66. Smith A, Hardmeier C, Tiedemann J (2016) Climbing mount BLEU: the strange world of reachable high-BLEU translations. Balt J Mod Comput 4(2):269–281Google Scholar
  67. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of AMTA 2006, the 7th conference of the Association for Machine Translation in the Americas, Cambridge, pp 223–231Google Scholar
  68. Somers H (2005) Round-trip translation: what is it good for? In: Proceedings of the Australasian Language Technology workshop 2005 (ALTW 2005), Sydney, pp 71–77Google Scholar
  69. Thomas K (1999) Designing a task-based evaluation methodology for a spoken machine translation system. In: Proceedings of 37th annual meeting of the Association for Computational Linguistics, College Park, pp 569–572Google Scholar
  70. Tillmann C, Vogel S, Ney H, Sawaf H, Zubiaga A (1997) Accelerated DP-based search for statistical translation. In: Proceedings of the 5th European conference on Speech Communication and Technology (EuroSpeech ’97), Rhodes, pp 2667–2670Google Scholar
  71. Vasconcellos M (1989) MT utilization at the Pan American Health Organization. In: IFTT’89: harmonizing human beings and computers in translation. International Forum for Translation Technology, Oiso, pp 56–58Google Scholar
  72. Vilar D, Xu J, D’Haro L, Ney H (2006) Error analysis of statistical machine translation output. In: Proceedings of the fifth international conference on Language Resources and Evaluation (LREC), Pisa, pp 697–702Google Scholar
  73. Voss C, Tate C (2006) Task-based evaluation of machine translation (MT) engines: measuring how well people extract who, when, where-type elements in MT output. In: EAMT-2006: 11th annual conference of the European Association for Machine Translation, Proceedings, Oslo, pp 203–212Google Scholar
  74. Wang K, Zong C, Su K-Y (2013) Integrating translation memory into phrase-based machine translation during decoding. In: Proceedings of the 51st annual meeting of the Association for Computational Linguistics, vol1, Sofia, pp 11–21Google Scholar
  75. Way A (2012) Is that a fish in your ear: translation and the meaning of everything – David Bellos, book review. Mach Transl 26(3):255–269CrossRefGoogle Scholar
  76. Way A (2013) Traditional and emerging use-cases for machine translation. In: Proceedings of translating and the computer 35, LondonGoogle Scholar
  77. Way A (2018) Machine translation: where are we at today? In: Angelone E, Massey G, Ehrensberger-Dow M (eds) The Bloomsbury companion to language industry studies. Bloomsbury, London. (in press)Google Scholar
  78. Ye Y, Zhou M, Lin C-Y (2007) Sentence level machine translation evaluation as a ranking. In: Proceedings of the second workshop on Statistical Machine Translation, Prague, pp 240–247Google Scholar
  79. Zhang J, Wu X, Calixto I, Hosseinzadeh Vahid A, Zhang X, Way A, Liu Q (2014) Experiments in medical translation shared task at WMT 2014. In: Proceedings of WMT 2014: the ninth workshop on Statistical Machine Translation, Baltimore, pp 260–265Google Scholar
  80. Zhou L, Lin C-Y, Munteanu D, Hovy E (2006) Paraeval: using paraphrases to evaluate summaries automatically. In: Proceedings of the Human Language Technology conference of the NAACL, main conference, New York City, pp 447–454Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.ADAPT Centre/School of ComputingDublin City UniversityDublinIreland

Personalised recommendations