Machine Translation

, Volume 31, Issue 4, pp 225–249 | Cite as

Induction of latent domains in heterogeneous corpora: a case study of word alignment

  • Hoang Cuong
  • Khalil Sima’an


This paper focuses on the insensitivity of existing word alignment models to domain differences, which often yields suboptimal results on large heterogeneous data. A novel latent domain word alignment model is proposed, which induces domain-focused lexical and alignment statistics. We propose to train the model on a heterogeneous corpus under partial supervision, using a small number of seed samples from different domains. The seed samples allow estimating sharper, domain-focused word alignment statistics for sentence pairs. Our experiments show that the derived domain-focused statistics, once combined together, produce significant improvements both in word alignment accuracy and in translation accuracy of their resulting SMT systems. Going beyond the findings, we surmise that virtually any large corpus (e.g., Europarl, Hansards, Common Crawl) harbors an arbitrary diversity of hidden domains, unknown in advance. We address the novel challenge of unsupervised induction of hidden domains in parallel corpora, applied within a domain-focused word-alignment modeling framework. On the technical side, we contrast flat estimation for the unsupervised induction of domains to a simple form of hierarchical estimation, consisting of two steps aiming at avoiding bad local maxima. Extensive experiments, conducted over seven different language pairs with fully unsupervised induction of domains for word alignment, demonstrate significant improvements in alignment accuracy.


Statistical machine translation Word alignment models Latent domain model 



We thanks anonymous reviewers and Ivan Titov for their inputs. The second author is supported by VICI Grant Nr 277-89-002 from the Netherlands Organization for Scientific Research (NWO).


  1. Axelrod A, He X, Gao J (2011) Domain adaptation via pseudo in-domain data selection. In: EMNLPGoogle Scholar
  2. Beal MJ (2003) Variational algorithms for approximate Bayesian inference. PhD Thesis, Gatsby Computational Neuroscience Unit, University College, LondonGoogle Scholar
  3. Bojar O, Buck C, Callison-Burch C, Federmann C, Haddow B, Koehn P, Monz C, Post M, Soricut R, Specia L (2013) Findings of the 2013 workshop on statistical machine translation. In: WMTGoogle Scholar
  4. Bojar O, Chatterjee R, Federmann C, Haddow B, Huck M, Hokamp C, Koehn P, Logacheva V, Monz C, Negri M, Post M, Scarton C, Specia L, Turchi M (2015) Findings of the 2015 workshop on statistical machine translation. In: WMTGoogle Scholar
  5. Brown PF, Pietra VJD, Pietra SAD, Mercer RL (1993) The mathematics of statistical machine translation: parameter estimation. Comput Linguist 19:263–311Google Scholar
  6. Carpuat M, Goutte C, Foster G (2014) Linear mixture models for robust machine translation. In: WMTGoogle Scholar
  7. Chang YW, Rush AM, DeNero J, Collins M (2014) A constrained Viterbi relaxation for bidirectional word alignment. In: ACLGoogle Scholar
  8. Cherry C, Foster G (2012) Batch tuning strategies for statistical machine translation. In: NAACL HLTGoogle Scholar
  9. Clark JH, Dyer C, Lavie A, Smith NA (2011) Better hypothesis testing for statistical machine translation: controlling for optimizer instability. In: ACL HLT (short papers)Google Scholar
  10. Cuong H, Sima’an K (2014a) Latent domain phrase-based models for adaptation. In: EMNLPGoogle Scholar
  11. Cuong H, Sima’an K (2014b) Latent domain translation models in mix-of-domains haystack. In: COLINGGoogle Scholar
  12. Cuong H, Sima’an K, Titov I (2016) Adapting to all domains at once: rewarding domain invariance in SMT. TACL.
  13. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 39(1):1–38MathSciNetzbMATHGoogle Scholar
  14. Denkowski M, Lavie A (2011) METEOR 1.3: automatic metric for reliable optimization and evaluation of machine translation systems. In: WMTGoogle Scholar
  15. Devlin J, Zbib R, Huang Z, Lamar T, Schwartz R, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: ACLGoogle Scholar
  16. Duh K, Sudoh K, Tsukada H (2010) Analysis of translation model adaptation in statistical machine translation. In: IWSLTGoogle Scholar
  17. Farajian MA, Bertoldi N, Federico M (2014) Online word alignment for online adaptive machine translation. In: Proceedings of the EACL 2014 workshop on humans and computer-assisted translationGoogle Scholar
  18. Fraser A, Marcu D (2006) Semi-supervised training for statistical word alignment. In: COLING-ACLGoogle Scholar
  19. Galley M, Manning CD (2008) A simple and effective hierarchical phrase reordering model. In: EMNLPGoogle Scholar
  20. Gao Q, Bach N, Vogel S (2010) A semi-supervised word alignment algorithm with partial manual alignments. In: WMTGoogle Scholar
  21. Gao Q, Lewis W, Quirk C, Hwang MY (2011) Incremental training and intentional over-fitting of word alignment. In: MT SummitGoogle Scholar
  22. Gao Q, Vogel S (2010) Consensus versus expertise: a case study of word alignment with mechanical turk. In: NAACL HLT 2010 workshop on creating speech and language data with Amazon’s Mechanical TurkGoogle Scholar
  23. Graça JV, Ganchev K, Taskar B (2010) Learning tractable word alignment models with complex constraints. Comput Linguist 36(3):481–504. MathSciNetCrossRefGoogle Scholar
  24. Graca J, Pardal JP, Coheur L, Caseiro D (2008) Building a golden collection of parallel multi-language word alignment. In: LRECGoogle Scholar
  25. Holmqvist M, Ahrenberg L (2011) A gold standard for English–Swedish word alignment. In: Proceedings of the 18th Nordic conference of computational linguistics NODALIDA 2011, vol 11Google Scholar
  26. Hua W, Haifeng W, Zhanyi L (2005) Alignment model adaptation for domain-specific word alignment. In: ACLGoogle Scholar
  27. Huck M, Peitz S, Freitag M, Nuhn M, Ney H (2012) The RWTH Aachen machine translation system for WMT 2012. In: WMTGoogle Scholar
  28. Kirchhoff K, Bilmes J (2014) Submodularity for data selection in machine translation. In: EMNLPGoogle Scholar
  29. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. In: Proceedings of MT SummitGoogle Scholar
  30. Koehn P, Hoang H, Birch A, Callison-Burch C, Federico M, Bertoldi N, Cowan B, Shen W, Moran C, Zens R, Dyer C, Bojar O, Constantin A, Herbst E (2007) MOSES: open source toolkit for statistical machine translation. In: ACL on interactive poster and demonstration sessionsGoogle Scholar
  31. Koehn P, Och FJ, Marcu D (2003) Statistical phrase-based translation. In: NAACL HLTGoogle Scholar
  32. Liang P, Taskar B, Klein D (2006) Alignment by agreement. In: HLT-NAACLGoogle Scholar
  33. Liu C, Liu Y, Sun M, Luan H, Yu H (2015) Generalized agreement for bidirectional word alignment. In: Proceedings of the EMNLPGoogle Scholar
  34. Mansour Y, Mohri M, Rostamizadeh A (2009a) Domain adaptation with multiple sources. In: Proceedings of NIPSGoogle Scholar
  35. Mansour Y, Mohri M, Rostamizadeh A (2009b) Multiple source adaptation and the RÉnyi divergence. In: Proceedings of UAIGoogle Scholar
  36. Mihalcea R, Pedersen T (2003) An evaluation exercise for word alignment. In: Proceedings of the HLT-NAACL 2003 workshop on building and using parallel texts: data driven machine translation and beyond, vol 3Google Scholar
  37. Och FJ, Gildea D, Khudanpur S, Sarkar A, Yamada K, Fraser A, Kumar S, Shen L, Smith D, Eng K, Jain V, Jin Z, Radev D (2004) A smorgasbord of features for statistical machine translation. In: HLT-NAACLGoogle Scholar
  38. Och FJ, Ney H (2003) A systematic comparison of various statistical alignment models. Comput Linguist 29(1):19–51. CrossRefzbMATHGoogle Scholar
  39. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: ACLGoogle Scholar
  40. Riley D, Gildea D (2012) Improving the IBM alignment models using variational Bayes. In: Proceedings of ACL (short paper)Google Scholar
  41. Shah K, Barrault L, Schwenk H (2010) Translation model adaptation by resampling. In: WMTGoogle Scholar
  42. Shen S, Liu Y, Sun M, Luan H (2015) Consistency-aware search for word alignment. In: Proceedings of the EMNLPGoogle Scholar
  43. Simion A, Collins M, Stein C (2013) A convex alternative to IBM model 2. In: EMNLPGoogle Scholar
  44. Snover M, Dorr B, Schwartz R, Micciulla L, Makhoul J (2006) A study of translation edit rate with targeted human annotation. In: AMTAGoogle Scholar
  45. Steinberger R, Eisele A, Klocek S, Pilos S, Schlüter P (2012) DGT-TM: a freely available translation memory in 22 languages. In: LRECGoogle Scholar
  46. Steinberger R, Pouliquen B, Widiger A, Ignat C, Erjavec T, Tufis D, Varga D (2006) The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. In: LRECGoogle Scholar
  47. Tam YC, Lane I, Schultz T (2007) Bilingual LSA-based adaptation for statistical machine translation. Mach Transl 21(4):187–207. CrossRefGoogle Scholar
  48. Tamura A, Watanabe T, Sumita E (2014) Recurrent neural networks for word alignment model. In: ACLGoogle Scholar
  49. Vogel S, Ney H, Tillmann C (1996) HMM-based word alignment in statistical translation. In: COLING, p 836–841.
  50. Wang X, Utiyama M, Finch A, Watanabe T, Sumita E (2015) Leave-one-out word alignment without garbage collector effects. In: Proceedings of the EMNLPGoogle Scholar
  51. Zhang H, Chiang D (2014) Kneser–Ney smoothing on expected counts. In: Proceedings of ACLGoogle Scholar
  52. Zhao B, Xing EP (2008) HM-BiTAM: bilingual topic exploration, word alignment, and translation. In: NIPSGoogle Scholar

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  1. 1.ILLCUniversity of AmsterdamAmsterdamThe Netherlands

Personalised recommendations