Skip to main content

Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge

  • Chapter
  • First Online:
  • 981 Accesses

Abstract

We explore the intersection of rule-based and statistical approaches in machine translation, with a particular focus on past and current work at Microsoft Research. Until about 10 years ago, the only machine translation systems worth using were rule-based and linguistically-informed. Along came statistical approaches, which use large corpora to directly guide translations toward expressions people would actually say. Rather than making local decisions when writing and conditioning rules, goodness of translation was modeled numerically and free parameters were selected to optimize that goodness. This led to huge improvements in translation quality as more and more data was consumed. By necessity, the pendulum is swinging back towards the inclusion of linguistic features in MT systems. We describe some of our statistical and non-statistical attempts to incorporate linguistic insights into machine translation systems, showing what is currently working well, and what isn’t. We also look at trade-offs in using linguistic knowledge (“rules”) in pre- or post-processing by language pair, with a particular eye on the return on investment as training data increases in size.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    For the original 1949 Translation memorandum by Weaver see Weaver (1955).

  2. 2.

    These parsers were developed with a strong focus on corpora, though. George Heidorn, Karen Jensen, and the NLP research group developed a tool chain for quickly parsing a large bank of test sentences and comparing against the last best result. The improvements and regressions resulting from a change to the grammar could be manually evaluated, and the changes refined until the end result. The end result was a data driven but not statistical approach to parser development.

  3. 3.

    These results were not published, but were provided to the authors in a personal conversation with Xiaodong He. In a related paper (He et al. 2008), He and colleagues showed significant improvements in BLEU on a system combination system, but no diffs in human eval. Upon analysis, the researchers were able to show that the biggest benefit to BLEU was in short content, but the same preference was not exhibited on the same content by the human evaluators. In other words, the improvements observed in the short content that BLEU favored had little impact on the overall impressions of the human evaluators.

  4. 4.

    A sizable portion of the data for each were scraped from the Web, but there were other sources used as well, such as Europarl, data from TAUS, MS internal localization data, UN content, WMT news content, etc.

  5. 5.

    Clearly, the sample is very small, so the regression line should be taken with a grain of salt. We would need a lot more data to be able to draw any strong conclusions.

  6. 6.

    The bump up at 40+ on English-Spanish and German-English is inexplicable, but may be attributable to the difficulty that either decoder has in processing such long content. There is also likely an interaction with statistical noise cause by such small sample sizes.

  7. 7.

    Note: The English-Spanish and English-German systems shown in Table 4 are trained on the same data for the “full” systems discussed in Sect. 4.2.

  8. 8.

    The word error rate of the test set is 17.09.

  9. 9.

    The English-French Gigaword corpus is described in Callison-Burch et al. (2009).

  10. 10.

    For a complete description of TextCorrector, please see Hassan and Menezes (2013). Also, TextCorrector is directly available through our API. See the following for more details: http://www.microsoft.com/en-us/translator/developers.aspx.

References

  • Axelrod, Amittai, Xiaodong He, and Jianfeng Gao. 2011. Domain adaptation via pseudo in-domain data selection. In Proceedings of EMNLP, 355–362.

    Google Scholar 

  • Brants, Thorsten, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, June, 858–867. Association for Computational Linguistics.

    Google Scholar 

  • Callison-Burch, Chris, Philipp Koehn, Christof Monz, and Josh Schroeder. 2009. Findings of the 2009 workshop on statistical machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, March, 1–28. Association for Computational Linguistics.

    Book  Google Scholar 

  • Callison-Burch, Chris, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki, and Omar Zaidan. 2010. Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and Metricsmatr, Uppsala, July, 17–53. Association for Computational Linguistics.

    Google Scholar 

  • Carpuat, Marine, and Michel Simard. 2012. The trouble with SMT consistency. In Proceedings of the Seventh Workshop on Statistical Machine Translation, Montréal, June, 442–449. Association for Computational Linguistics.

    Google Scholar 

  • Chahuneau, Victor, Noah A. Smith, and Chris Dyer. 2013. Knowledge-rich morphological priors for Bayesian language models. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, June, 1206–1215. Association for Computational Linguistics.

    Google Scholar 

  • Coughlin, Deborah A. 2003. Correlating automated and human assessments of machine translation quality. In Proceedings of MT Summit IX, New Orleans, September, LA. The Association for Machine Translation in the Americas (AMTA).

    Google Scholar 

  • Dalrymple, Mary. 2001. Lexical functional grammar. Syntax and semantics series, vol. 42. New York: Academic.

    Google Scholar 

  • Denkowski, Michael, Greg Hanneman, and Alon Lavie. 2012. The CMU-Avenue French-English translation system. In Proceedings of the NAACL 2012 Workshop on Statistical Machine Translation.

    Google Scholar 

  • Farrús, Mireia, Marta R. Costa-Jussá, and Maja Popovic. 2012. Study and correlation analysis of linguistic, perceptual and automatic machine translation evaluations. Journal of the American Society for Information Science and Technology 63(1):174–84.

    Article  Google Scholar 

  • Gimpel, Kevin, Nathan Schneider, Brendan O’Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. 2011. Part-of-speech tagging for twitter. In Proceedings of ACL, Portland, OR.

    Google Scholar 

  • Hassan, Hany, and Arul Menezes. 2013. Social text normalization using contextual graph random walks. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, August. Association for Computational Linguistics.

    Google Scholar 

  • He, Xiaodong, Mei Yang, Jianfeng Gao, Patrick Nguyen, and Robert Moore. 2008. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In Proceedings of EMNLP.

    Google Scholar 

  • Hopkins, Mark, and Jonathan May. 2011. Tuning as ranking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, July, 1352–1362. Association for Computational Linguistics.

    Google Scholar 

  • Jensen, Karen, George E. Heidorn, and Stephen D. Richardson. 1992. Natural language processing: The PLNLP approach. Boston: Kluwer Academic Publishers.

    Google Scholar 

  • Jeong, Minwoo, Kristina Toutanova, Hisami Suzuki, and Chris Quirk. 2010. A discriminative lexicon model for complex morphology. In The Ninth Conference of the Association for Machine Translation in the Americas (AMTA-2010).

    Google Scholar 

  • Knight, Kevin. 2013. Tutorial on decipherment. In ACL 2013, Sofia, August.

    Google Scholar 

  • Menezes, Arul, and Stephen D. Richardson. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. Stroudsburg: Association for Computational Linguistics. doi:dx.doi.org/10.3115/1118037.1118043.

    Google Scholar 

  • Mi, Haitao, Liang Huang, and Qun Liu. 2008. Forest-based translation. In Proceedings of ACL-08: HLT, Columbus, OH, June, 192–199. Association for Computational Linguistics.

    Google Scholar 

  • Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, June, 746–751. Association for Computational Linguistics.

    Google Scholar 

  • Moore, Robert C, and William D. Lewis. 2010. Intelligent selection of language model training data. In Proceedings of the ACL 2010 Conference Short Papers, Uppsala, July.

    Google Scholar 

  • Och, Franz Josef, and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguisitics 30(4):417–449.

    Article  MATH  Google Scholar 

  • Och, Franz Josef. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st ACL, Sapporo.

    Google Scholar 

  • Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th ACL, Philadelphia, PA.

    Google Scholar 

  • Quirk, Chris, and Simon Corston-Oliver. 2006. The impact of parse quality on syntactically-informed statistical machine translation. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, July, 62–69. Association for Computational Linguistics.

    Google Scholar 

  • Quirk, Chris, and Arul Menezes. 2006. Dependency Treelet translation: The convergence of statistical and example-based machine translation? Machine Translation 20:43–65.

    Article  Google Scholar 

  • Riezler, Stefan, and John T. Maxwell. 2005. On some pitfalls in automatic evaluation and significance testing for MT. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, June, 57–64. Association for Computational Linguistics.

    Google Scholar 

  • Ringger, Eric, Robert C. Moore, Eugene Charniak, Lucy Vanderwende, and Hisami Suzuki. 2004. Using the Penn Treebank to evaluate non-treebank parsers. In Proceedings of LREC, May. European Language Resources Association.

    Google Scholar 

  • Ritter, Alan, Mausam, Oren Etzioni, and Sam Clark. 2012. Open domain event extraction from twitter. In Proceedings of the 18th International Conference on Knowledge Discovery and Data Mining (KDD), Beijing.

    Google Scholar 

  • Wang, Wei, Klaus Macherey, Wolfgang Macherey, Franz Och, and Peng Xu. 2012. Improved domain adaptation for statistical machine translation. In Proceedings of AMTA.

    Google Scholar 

  • Weaver, Warren. 1955. Translation. In Machine translation of languages, eds. William N. Locke, and A. Donald Booth, 15–23. Cambridge, MA: MIT Press.

    Google Scholar 

  • Zhang, Hao, Licheng Fang, Peng Xu, and Xiaoyun Wu. 2011. Binarized forest to string translation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, June, 835–845. Association for Computational Linguistics.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to William D. Lewis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Lewis, W.D., Quirk, C., Gao, Q. (2016). Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge. In: Costa-jussà, M., Rapp, R., Lambert, P., Eberle, K., Banchs, R., Babych, B. (eds) Hybrid Approaches to Machine Translation. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-21311-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-21311-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-21310-1

  • Online ISBN: 978-3-319-21311-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics