Advertisement

Machine Translation

, Volume 28, Issue 2, pp 127–150 | Cite as

Translation project adaptation for MT-enhanced computer assisted translation

  • Mauro Cettolo
  • Nicola Bertoldi
  • Marcello Federico
  • Holger Schwenk
  • Loïc Barrault
  • Christophe Servan
Article

Abstract

The effective integration of MT technology into computer-assisted translation tools is a challenging topic both for academic research and the translation industry. In particular, professional translators consider the ability of MT systems to adapt to the feedback provided by them to be crucial. In this paper, we propose an adaptation scheme to tune a statistical MT system to a translation project using small amounts of post-edited texts, like those generated by a single user in even just one day of work. The same scheme can be applied on a larger scale in order to focus general purpose models towards the specific domain of interest. We assess our method on two domains, namely information technology and legal, and four translation directions, from English to French, Italian, Spanish and German. The main outcome is that our adaptation strategy can be very effective provided that the seed data used for adaptation is ‘close enough’ to the remaining text to be translated; otherwise, MT quality neither improves nor worsens, thus showing the robustness of our method.

Keywords

Statistical machine translation Self-tuning MT Domain adaptation Project adaptation Computer-assisted translation 

Notes

Acknowledgments

This work was supported by the MateCAT project, which is funded by the EC under the \(7^{th}\) Framework Programme.

References

  1. Axelrod A, He X, Gao J (2011) Domain adaptation via pseudo in-domain data selection. In: Proceedings of the conference on Empirical Methods in Natural Language Processing (EMNLP). Edinburgh, pp 355–362Google Scholar
  2. Bach N, Hsiao R, Eck M, Charoenpornsawat P, Vogel S, Schultz T, Lane I, Waibel A, Black AW (2009) Incremental adaptation of speech-to-speech translation. In: Proceedings of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (NAACL HLT) Conference: Short Papers. Boulder, US-CO, pp 149–152Google Scholar
  3. Bertoldi N, Cettolo M, Federico M, Buck C (2012) Evaluating the learning curve of domain adaptive statistical machine translation systems. In: Proceedings of the Workshop on Statistical Machine Translation (WMT). Montréal, pp 433–441Google Scholar
  4. Bertoldi N, Cettolo M, Federico M (2013) Cache-based online adaptation for machine translation enhanced computer assisted translation. In: Proceedings of the MT summit XIV. Nice, pp 35–42Google Scholar
  5. Bisazza A, Ruiz N, Federico M (2011) Fill-up versus interpolation methods for phrase-based SMT adaptation. In: Proceedings of the International Workshop on Spoken Language Translation (IWSLT). San Francisco, US-CA, pp 136–143Google Scholar
  6. Bojar O, Buck C, Callison-Burch C, Federmann C, Haddow B, Koehn P, Monz C, Post M, Soricut R, Specia L (2013) Findings of the 2013 workshop on statistical machine translation. In: Proceedings of the eighth workshop on statistical machine translation. Sofia, pp 1–44Google Scholar
  7. Cettolo M, Servan C, Bertoldi N, Federico M, Barrault L, Schwenk H (2013) Issues in incremental adaptation of statistical mt from human post-edits. In: Proceedings of the MT summit XIV Workshop on Post-editing Technology and Practice (WPTP-2). Nice, pp 111–118Google Scholar
  8. Chen SF, Goodman J (1999) An empirical study of smoothing techniques for language modeling. Comput Speech Lang 4(13):359–393CrossRefGoogle Scholar
  9. Crammer K, Dekel D, Keshet J, Shalev-Shwartz S, Singer Y (2006) Online passive–aggressive algorithms. J Mach Learn Res 7:551–585MathSciNetMATHGoogle Scholar
  10. Federico M, Cattelan A, Trombetti M (2012) Measuring user productivity in machine translation enhanced computer assisted translation. In: Proceedings of conference of the Association for Machine Translation in the Americas (AMTA). San Diego, US-CAGoogle Scholar
  11. Foster G, Kuhn R (2007) Mixture-model adaptation for SMT. In: Proceedings of the Workshop on Statistical Machine Translation (WMT). Prague, pp 128–135Google Scholar
  12. Foster G, Goutte C, Kuhn R (2010) Discriminative instance weighting for domain adaptation in statistical machine translation. In: Proceedings of the conference on Empirical Methods in Natural Language Processing (EMNLP). Cambridge, US-MA, pp 451–459Google Scholar
  13. Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: Proceedings of the Conference on empirical methods in natural language processing (EMNLP). Honolulu, US-HI, pp 848–856Google Scholar
  14. Gao J, Zhang M (2002) Improving Language model size reduction using better pruning criteria. In: Proceedings of the annual meeting of the Association for Computational Linguistics (ACL). Philadelphia, US-PA, pp 176–182Google Scholar
  15. Green S, Heer J, Manning CD (2013) The efficacy of human post-editing for language translation. In: Proceedings of the SIGCHI conference on human factors in computing systems. ACM, Paris, pp 439–448Google Scholar
  16. Guerberof A (2009) Productivity and quality in MT post-editing. In: Proceedings of the MT summit XII, Beyond translation memories: new tools for translators workshop. Ottawa, CanadaGoogle Scholar
  17. Hardt D, Elming J (2010) Incremental re-training for post-editing SMT. In: Proceedings of the Conference of the Association for Machine Translation in the Americas (AMTA). Denver, US-COGoogle Scholar
  18. Hasler E, Haddow B, Koehn P (2012) Sparse lexicalised features and topic adaptation for SMT. In: Proceedings of the International Workshop on Spoken Language Translation (IWSLT). Hong Kong, pp 268–275Google Scholar
  19. Kneser R, Steinbiss V (1993) On the dynamic adaptation of stochastic language models. In: Proceedings of the IEEE international conference on acoustics, speech and signal processing (ICASSP), vol II, Minneapolis, US-MN, pp 586–588Google Scholar
  20. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of the MT summit X. Phuket, pp 79–86Google Scholar
  21. Koehn P, Schroeder J (2007) Experiments in domain adaptation for statistical machine translation. In: Proceedings of the Workshop on Statistical Machine Translation (WMT). Prague, pp 224–227Google Scholar
  22. Koehn P, Axelrod A, Mayne AB, Callison-Burch C, Osborne M, Talbot D (2005) Edinburgh system description for the 2005 IWSLT speech translation evaluation. In: Proceedings of the international workshop on spoken language translation (IWSLT). Pittsburgh, US-PAGoogle Scholar
  23. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) Moses: open source toolkit for statistical machine translation. In: Annual Meeting of the Association for Computational Linguistics (ACL): Companion volume proceedings of the demo and poster sessions. Prague, pp 177–180Google Scholar
  24. Läubli S, Fishel M, Massey G, Ehrensberger-Dow M, Volk M (2013) Assessing post-editing efficiency in a realistic translation environment. In: Proceedings of the MT summit XIV, workshop on post-editing technology and practice. Nice, pp 83–91Google Scholar
  25. Liu L, Cao H, Watanabe T, Zhao T, Yu M, Zhu C (2012) Locally training the log-linear model for SMT. In: Proceedings of the joint conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL). Jeju Island, pp 402–411Google Scholar
  26. Matsoukas S, Rosti AVI, Zhang B (2009) Discriminative Corpus weight estimation for machine translation. In: Proceedings of the conference on Empirical Methods in Natural Language Processing (EMNLP). Singapore, pp 708–717Google Scholar
  27. Moore RC, Lewis W (2010) Intelligent selection of language model training data. In: Proceedings of the annual meeting of the Association of Computational (ACL): Short Papers. Uppsala, pp 220–224Google Scholar
  28. Nakov P (2008) Improving English-Spanish Statistical machine translation: experiments in domain adaptation, sentence paraphrasing, tokenization, and recasing. In: Proceedings of the Workshop on Statistical Machine Translation (WMT). Columbus, US-OH, pp 147–150Google Scholar
  29. Niehues J, Waibel A (2012) Detailed Analysis of different strategies for phrase table adaptation in SMT. In: Proceedings of the conference of the Association for Machine Translation in the Americas (AMTA). San Diego, US-CAGoogle Scholar
  30. Noreen EW (1989) Computer intensive methods for testing hypotheses: an introduction. Wiley Interscience, New YorkGoogle Scholar
  31. Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the annual meeting of the Association for Computational (ACL). Sapporo, pp 160–167Google Scholar
  32. Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51CrossRefMATHGoogle Scholar
  33. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the annual meeting of the Association of Computational (ACL). Philadelphia, US-PA, pp 311–318Google Scholar
  34. Plitt M, Masselot F (2010) A productivity test of statistical machine translation post-editing in a typical localisation context. Prague Bull Math Linguist 93:7–16CrossRefGoogle Scholar
  35. Quenouille MH (1956) Notes on bias in estimation. Biometrika 43:353–360MathSciNetCrossRefMATHGoogle Scholar
  36. Rousseau A (2013) XenC: an open-source tool for data selection in natural language processing. Prague Bull Math Linguist 100(1):73–82CrossRefGoogle Scholar
  37. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: Proceedings of the Conference of the association for machine translation in the Americas (AMTA). Cambridge, US-MA, pp 223–231Google Scholar
  38. Steinberger R, Pouliquen B, Widiger A, Ignat C, Erjavec T, Tufiş D, Varga D (2006) The JRC-acquis: a multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the international conference on language resources and evaluation (LREC). Genoa, pp 2142–2147Google Scholar
  39. Tiedemann J (2012) Parallel Data, Tools and Interfaces in OPUS. In: Proceedings of the international conference on Language Resources and Evaluation (LREC). Istanbul, pp 2214–2218Google Scholar
  40. Turian JP, Shen L, Melamed ID (2003) Evaluation of machine translation and its evaluation. In: Proceedings of MT summit IX, New Orleans, US-LA, pp 386–393Google Scholar
  41. Yasuda K, Zhang R, Yamamoto H, Sumita E (2008) Method of Selecting training data to build a compact and efficient translation model. In: Proceedings of the International Joint Conference on Natural Language Processing (IJCNLP). Hyderabad, pp 655–660Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  • Mauro Cettolo
    • 1
  • Nicola Bertoldi
    • 1
  • Marcello Federico
    • 1
  • Holger Schwenk
    • 2
  • Loïc Barrault
    • 2
  • Christophe Servan
    • 2
    • 3
  1. 1.FBK, Fondazione Bruno KesslerPovo, TrentoItaly
  2. 2.LIUM, University of Le MansLe Mans cedex 9France
  3. 3.Xerox Research Centre EuropeMeylanFrance

Personalised recommendations