Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge

Lewis, William D.; Quirk, Chris; Gao, Qin

doi:10.1007/978-3-319-21311-8_2

Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge

William D. Lewis¹⁰,
Chris Quirk¹⁰ &
Qin Gao¹⁰

Chapter
First Online: 13 July 2016

981 Accesses

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

Abstract

We explore the intersection of rule-based and statistical approaches in machine translation, with a particular focus on past and current work at Microsoft Research. Until about 10 years ago, the only machine translation systems worth using were rule-based and linguistically-informed. Along came statistical approaches, which use large corpora to directly guide translations toward expressions people would actually say. Rather than making local decisions when writing and conditioning rules, goodness of translation was modeled numerically and free parameters were selected to optimize that goodness. This led to huge improvements in translation quality as more and more data was consumed. By necessity, the pendulum is swinging back towards the inclusion of linguistic features in MT systems. We describe some of our statistical and non-statistical attempts to incorporate linguistic insights into machine translation systems, showing what is currently working well, and what isn’t. We also look at trade-offs in using linguistic knowledge (“rules”) in pre- or post-processing by language pair, with a particular eye on the return on investment as training data increases in size.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
For the original 1949 Translation memorandum by Weaver see Weaver (1955).
2.
These parsers were developed with a strong focus on corpora, though. George Heidorn, Karen Jensen, and the NLP research group developed a tool chain for quickly parsing a large bank of test sentences and comparing against the last best result. The improvements and regressions resulting from a change to the grammar could be manually evaluated, and the changes refined until the end result. The end result was a data driven but not statistical approach to parser development.
3.
These results were not published, but were provided to the authors in a personal conversation with Xiaodong He. In a related paper (He et al. 2008), He and colleagues showed significant improvements in BLEU on a system combination system, but no diffs in human eval. Upon analysis, the researchers were able to show that the biggest benefit to BLEU was in short content, but the same preference was not exhibited on the same content by the human evaluators. In other words, the improvements observed in the short content that BLEU favored had little impact on the overall impressions of the human evaluators.
4.
A sizable portion of the data for each were scraped from the Web, but there were other sources used as well, such as Europarl, data from TAUS, MS internal localization data, UN content, WMT news content, etc.
5.
Clearly, the sample is very small, so the regression line should be taken with a grain of salt. We would need a lot more data to be able to draw any strong conclusions.
6.
The bump up at 40+ on English-Spanish and German-English is inexplicable, but may be attributable to the difficulty that either decoder has in processing such long content. There is also likely an interaction with statistical noise cause by such small sample sizes.
7.
Note: The English-Spanish and English-German systems shown in Table 4 are trained on the same data for the “full” systems discussed in Sect. 4.2.
8.
The word error rate of the test set is 17.09.
9.
The English-French Gigaword corpus is described in Callison-Burch et al. (2009).
10.
For a complete description of TextCorrector, please see Hassan and Menezes (2013). Also, TextCorrector is directly available through our API. See the following for more details: http://www.microsoft.com/en-us/translator/developers.aspx.

References

Axelrod, Amittai, Xiaodong He, and Jianfeng Gao. 2011. Domain adaptation via pseudo in-domain data selection. In Proceedings of EMNLP, 355–362.
Google Scholar
Brants, Thorsten, Ashok C. Popat, Peng Xu, Franz J. Och, and Jeffrey Dean. 2007. Large language models in machine translation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, June, 858–867. Association for Computational Linguistics.
Google Scholar
Callison-Burch, Chris, Philipp Koehn, Christof Monz, and Josh Schroeder. 2009. Findings of the 2009 workshop on statistical machine translation. In Proceedings of the Fourth Workshop on Statistical Machine Translation, Athens, March, 1–28. Association for Computational Linguistics.
Book Google Scholar
Callison-Burch, Chris, Philipp Koehn, Christof Monz, Kay Peterson, Mark Przybocki, and Omar Zaidan. 2010. Findings of the 2010 joint workshop on statistical machine translation and metrics for machine translation. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and Metricsmatr, Uppsala, July, 17–53. Association for Computational Linguistics.
Google Scholar
Carpuat, Marine, and Michel Simard. 2012. The trouble with SMT consistency. In Proceedings of the Seventh Workshop on Statistical Machine Translation, Montréal, June, 442–449. Association for Computational Linguistics.
Google Scholar
Chahuneau, Victor, Noah A. Smith, and Chris Dyer. 2013. Knowledge-rich morphological priors for Bayesian language models. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, June, 1206–1215. Association for Computational Linguistics.
Google Scholar
Coughlin, Deborah A. 2003. Correlating automated and human assessments of machine translation quality. In Proceedings of MT Summit IX, New Orleans, September, LA. The Association for Machine Translation in the Americas (AMTA).
Google Scholar
Dalrymple, Mary. 2001. Lexical functional grammar. Syntax and semantics series, vol. 42. New York: Academic.
Google Scholar
Denkowski, Michael, Greg Hanneman, and Alon Lavie. 2012. The CMU-Avenue French-English translation system. In Proceedings of the NAACL 2012 Workshop on Statistical Machine Translation.
Google Scholar
Farrús, Mireia, Marta R. Costa-Jussá, and Maja Popovic. 2012. Study and correlation analysis of linguistic, perceptual and automatic machine translation evaluations. Journal of the American Society for Information Science and Technology 63(1):174–84.
Article Google Scholar
Gimpel, Kevin, Nathan Schneider, Brendan O’Connor, Dipanjan Das, Daniel Mills, Jacob Eisenstein, Michael Heilman, Dani Yogatama, Jeffrey Flanigan, and Noah A. Smith. 2011. Part-of-speech tagging for twitter. In Proceedings of ACL, Portland, OR.
Google Scholar
Hassan, Hany, and Arul Menezes. 2013. Social text normalization using contextual graph random walks. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, August. Association for Computational Linguistics.
Google Scholar
He, Xiaodong, Mei Yang, Jianfeng Gao, Patrick Nguyen, and Robert Moore. 2008. Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. In Proceedings of EMNLP.
Google Scholar
Hopkins, Mark, and Jonathan May. 2011. Tuning as ranking. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, Edinburgh, July, 1352–1362. Association for Computational Linguistics.
Google Scholar
Jensen, Karen, George E. Heidorn, and Stephen D. Richardson. 1992. Natural language processing: The PLNLP approach. Boston: Kluwer Academic Publishers.
Google Scholar
Jeong, Minwoo, Kristina Toutanova, Hisami Suzuki, and Chris Quirk. 2010. A discriminative lexicon model for complex morphology. In The Ninth Conference of the Association for Machine Translation in the Americas (AMTA-2010).
Google Scholar
Knight, Kevin. 2013. Tutorial on decipherment. In ACL 2013, Sofia, August.
Google Scholar
Menezes, Arul, and Stephen D. Richardson. 2001. A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. Stroudsburg: Association for Computational Linguistics. doi:dx.doi.org/10.3115/1118037.1118043.
Google Scholar
Mi, Haitao, Liang Huang, and Qun Liu. 2008. Forest-based translation. In Proceedings of ACL-08: HLT, Columbus, OH, June, 192–199. Association for Computational Linguistics.
Google Scholar
Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. 2013. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta, GA, June, 746–751. Association for Computational Linguistics.
Google Scholar
Moore, Robert C, and William D. Lewis. 2010. Intelligent selection of language model training data. In Proceedings of the ACL 2010 Conference Short Papers, Uppsala, July.
Google Scholar
Och, Franz Josef, and Hermann Ney. 2004. The alignment template approach to statistical machine translation. Computational Linguisitics 30(4):417–449.
Article MATH Google Scholar
Och, Franz Josef. 2003. Minimum error rate training in statistical machine translation. In Proceedings of the 41st ACL, Sapporo.
Google Scholar
Papineni, Kishore, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th ACL, Philadelphia, PA.
Google Scholar
Quirk, Chris, and Simon Corston-Oliver. 2006. The impact of parse quality on syntactically-informed statistical machine translation. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, July, 62–69. Association for Computational Linguistics.
Google Scholar
Quirk, Chris, and Arul Menezes. 2006. Dependency Treelet translation: The convergence of statistical and example-based machine translation? Machine Translation 20:43–65.
Article Google Scholar
Riezler, Stefan, and John T. Maxwell. 2005. On some pitfalls in automatic evaluation and significance testing for MT. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, June, 57–64. Association for Computational Linguistics.
Google Scholar
Ringger, Eric, Robert C. Moore, Eugene Charniak, Lucy Vanderwende, and Hisami Suzuki. 2004. Using the Penn Treebank to evaluate non-treebank parsers. In Proceedings of LREC, May. European Language Resources Association.
Google Scholar
Ritter, Alan, Mausam, Oren Etzioni, and Sam Clark. 2012. Open domain event extraction from twitter. In Proceedings of the 18th International Conference on Knowledge Discovery and Data Mining (KDD), Beijing.
Google Scholar
Wang, Wei, Klaus Macherey, Wolfgang Macherey, Franz Och, and Peng Xu. 2012. Improved domain adaptation for statistical machine translation. In Proceedings of AMTA.
Google Scholar
Weaver, Warren. 1955. Translation. In Machine translation of languages, eds. William N. Locke, and A. Donald Booth, 15–23. Cambridge, MA: MIT Press.
Google Scholar
Zhang, Hao, Licheng Fang, Peng Xu, and Xiaoyun Wu. 2011. Binarized forest to string translation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Portland, OR, June, 835–845. Association for Computational Linguistics.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, One Microsoft Way, Redmond, WA, 98052, USA
William D. Lewis, Chris Quirk & Qin Gao

Authors

William D. Lewis
View author publications
You can also search for this author in PubMed Google Scholar
Chris Quirk
View author publications
You can also search for this author in PubMed Google Scholar
Qin Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to William D. Lewis .

Editor information

Editors and Affiliations

Universitat politècnica de catalunya , Barcelona, Spain
Marta R. Costa-jussà
University of Aix-Marseille and University of Mainz, Marseille, France
Reinhard Rapp
Pompeu Fabra University, Barcelona, Barcelona, Spain
Patrik Lambert
Lingenio GmbH, Heidelberg, Baden-Württemberg, Germany
Kurt Eberle
Institute for Infocomm Research, Singapore, Singapur, Singapore
Rafael E. Banchs
Centre for Translation Studies, University of Leeds School of Modern Languages&Cultures, Leeds, United Kingdom
Bogdan Babych

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lewis, W.D., Quirk, C., Gao, Q. (2016). Controlled Ascent: Imbuing Statistical MT with Linguistic Knowledge. In: Costa-jussà, M., Rapp, R., Lambert, P., Eberle, K., Banchs, R., Babych, B. (eds) Hybrid Approaches to Machine Translation. Theory and Applications of Natural Language Processing. Springer, Cham. https://doi.org/10.1007/978-3-319-21311-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-21311-8_2
Published: 13 July 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21310-1
Online ISBN: 978-3-319-21311-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics