Advertisement

Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge

  • William D. LewisEmail author
  • Chris Quirk
  • Qin Gao
Chapter
  • 766 Downloads
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

We explore the intersection of rule-based and statistical approaches in machine translation, with a particular focus on past and current work at Microsoft Research. Until about 10 years ago, the only machine translation systems worth using were rule-based and linguistically-informed. Along came statistical approaches, which use large corpora to directly guide translations toward expressions people would actually say. Rather than making local decisions when writing and conditioning rules, goodness of translation was modeled numerically and free parameters were selected to optimize that goodness. This led to huge improvements in translation quality as more and more data was consumed. By necessity, the pendulum is swinging back towards the inclusion of linguistic features in MT systems. We describe some of our statistical and non-statistical attempts to incorporate linguistic insights into machine translation systems, showing what is currently working well, and what isn’t. We also look at trade-offs in using linguistic knowledge (“rules”) in pre- or post-processing by language pair, with a particular eye on the return on investment as training data increases in size.

Keywords

Machine Translation Source Language Human Evaluation Statistical Machine Translation Sentence Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Axelrod, Amittai, Xiaodong He, and Jianfeng Gao. 2011. Domain adaptation via pseudo in-domain data selection. In Proceedings of EMNLP, 355–362.Google Scholar
  2. Brants, Thorsten, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, June, 858–867. Association for Computational Linguistics.Google Scholar
  3. Callison-Burch, Chris, Philipp Koehn, Christof Monz, and Josh Schroeder. 2009. Findings of the 2009 workshop on statistical machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, March, 1–28. Association for Computational Linguistics.CrossRefGoogle Scholar
  4. Callison-Burch, Chris, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki, and Omar Zaidan. 2010. Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and Metricsmatr, Uppsala, July, 17–53. Association for Computational Linguistics.Google Scholar
  5. Carpuat, Marine, and Michel Simard. 2012. The trouble with SMT consistency. In Proceedings of the Seventh Workshop on Statistical Machine Translation, Montréal, June, 442–449. Association for Computational Linguistics.Google Scholar
  6. Chahuneau, Victor, Noah A. Smith, and Chris Dyer. 2013. Knowledge-rich morphological priors for Bayesian language models. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, June, 1206–1215. Association for Computational Linguistics.Google Scholar
  7. Coughlin, Deborah A. 2003. Correlating automated and human assessments of machine translation quality. In Proceedings of MT Summit IX, New Orleans, September, LA. The Association for Machine Translation in the Americas (AMTA).Google Scholar
  8. Dalrymple, Mary. 2001. Lexical functional grammar. Syntax and semantics series, vol. 42. New York: Academic.Google Scholar
  9. Denkowski, Michael, Greg Hanneman, and Alon Lavie. 2012. The CMU-Avenue French-English translation system. In Proceedings of the NAACL 2012 Workshop on Statistical Machine Translation.Google Scholar
  10. Farrús, Mireia, Marta R. Costa-Jussá, and Maja Popovic. 2012. Study and correlation analysis of linguistic, perceptual and automatic machine translation evaluations. Journal of the American Society for Information Science and Technology 63(1):174–84.CrossRefGoogle Scholar
  11. Gimpel, Kevin, Nathan Schneider, Brendan O’Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. 2011. Part-of-speech tagging for twitter. In Proceedings of ACL, Portland, OR.Google Scholar
  12. Hassan, Hany, and Arul Menezes. 2013. Social text normalization using contextual graph random walks. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, August. Association for Computational Linguistics.Google Scholar
  13. He, Xiaodong, Mei Yang, Jianfeng Gao, Patrick Nguyen, and Robert Moore. 2008. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In Proceedings of EMNLP.Google Scholar
  14. Hopkins, Mark, and Jonathan May. 2011. Tuning as ranking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, July, 1352–1362. Association for Computational Linguistics.Google Scholar
  15. Jensen, Karen, George E. Heidorn, and Stephen D. Richardson. 1992. Natural language processing: The PLNLP approach. Boston: Kluwer Academic Publishers.Google Scholar
  16. Jeong, Minwoo, Kristina Toutanova, Hisami Suzuki, and Chris Quirk. 2010. A discriminative lexicon model for complex morphology. In The Ninth Conference of the Association for Machine Translation in the Americas (AMTA-2010).Google Scholar
  17. Knight, Kevin. 2013. Tutorial on decipherment. In ACL 2013, Sofia, August.Google Scholar
  18. Menezes, Arul, and Stephen D. Richardson. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. Stroudsburg: Association for Computational Linguistics. doi:dx.doi.org/10.3115/1118037.1118043.Google Scholar
  19. Mi, Haitao, Liang Huang, and Qun Liu. 2008. Forest-based translation. In Proceedings of ACL-08: HLT, Columbus, OH, June, 192–199. Association for Computational Linguistics.Google Scholar
  20. Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, June, 746–751. Association for Computational Linguistics.Google Scholar
  21. Moore, Robert C, and William D. Lewis. 2010. Intelligent selection of language model training data. In Proceedings of the ACL 2010 Conference Short Papers, Uppsala, July.Google Scholar
  22. Och, Franz Josef, and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguisitics 30(4):417–449.CrossRefzbMATHGoogle Scholar
  23. Och, Franz Josef. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st ACL, Sapporo.Google Scholar
  24. Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th ACL, Philadelphia, PA.Google Scholar
  25. Quirk, Chris, and Simon Corston-Oliver. 2006. The impact of parse quality on syntactically-informed statistical machine translation. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, July, 62–69. Association for Computational Linguistics.Google Scholar
  26. Quirk, Chris, and Arul Menezes. 2006. Dependency Treelet translation: The convergence of statistical and example-based machine translation? Machine Translation 20:43–65.CrossRefGoogle Scholar
  27. Riezler, Stefan, and John T. Maxwell. 2005. On some pitfalls in automatic evaluation and significance testing for MT. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, June, 57–64. Association for Computational Linguistics.Google Scholar
  28. Ringger, Eric, Robert C. Moore, Eugene Charniak, Lucy Vanderwende, and Hisami Suzuki. 2004. Using the Penn Treebank to evaluate non-treebank parsers. In Proceedings of LREC, May. European Language Resources Association.Google Scholar
  29. Ritter, Alan, Mausam, Oren Etzioni, and Sam Clark. 2012. Open domain event extraction from twitter. In Proceedings of the 18th International Conference on Knowledge Discovery and Data Mining (KDD), Beijing.Google Scholar
  30. Wang, Wei, Klaus Macherey, Wolfgang Macherey, Franz Och, and Peng Xu. 2012. Improved domain adaptation for statistical machine translation. In Proceedings of AMTA.Google Scholar
  31. Weaver, Warren. 1955. Translation. In Machine translation of languages, eds. William N. Locke, and A. Donald Booth, 15–23. Cambridge, MA: MIT Press.Google Scholar
  32. Zhang, Hao, Licheng Fang, Peng Xu, and Xiaoyun Wu. 2011. Binarized forest to string translation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, June, 835–845. Association for Computational Linguistics.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Microsoft ResearchOne Microsoft WayRedmondUSA

Personalised recommendations