Skip to main content
Log in

Monte Carlo techniques for phrase-based translation

  • Published:
Machine Translation

Abstract

Recent advances in statistical machine translation have used approximate beam search for NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum risk training and decoding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arun A, Dyer C, Haddow B, Blunsom P, Lopez A, Koehn P (2009) Monte Carlo inference and maximization for phrase-based translation. In: Proceedings of CoNLL, Association for Computational Linguistics, Boulder, Colorado, pp 102–110

  • Blunsom P, Cohn T, Osborne M (2008) A discriminative latent variable model for statistical machine translation. In: Proceedings of ACL-08: HLT, Association for Computational Linguistics, Columbus, Ohio, pp 200–208

  • Callison-Burch C, Koehn P, Monz C, Schroeder J (2009) Findings of the 2009 workshop on statistical machine translation. In: Proceedings of the fourth workshop on statistical machine translation, Association for Computational Linguistics, Athens, Greece, pp 1–28

  • Casacuberta F, Higuera CDL (2000) Computational complexity of problems on probabilistic grammars and transducers. Springer-Verlag, London, UK

    Google Scholar 

  • DeNero J, Bouchard-Côté A, Klein D (2008) Sampling alignment structure under a Bayesian translation model. In: Proceedings of the 2008 conference on empirical methods in natural language processing, Association for Computational Linguistics, Honolulu, Hawaii, pp 314–323

  • Eisner J, Tromble RW (2006) Local search with very large-scale neighborhoods for optimal permutations in machine translation. In: Proceedings of the HLT-NAACL workshop on computationally hard problems and joint inference in speech and language processing, New York, pp 57–75

  • Finkel JR, Manning CD, Ng AY (2006) Solving the problem of cascading errors: approximate bayesian inference for linguistic annotation pipelines. In: Proceedings of the 2006 conference on empirical methods in natural language processing, Association for Computational Linguistics, Sydney, Australia, pp 618–626

  • Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6: 721–741

    Article  MATH  Google Scholar 

  • Germann U, Jahr M, Knight K, Marcu D, Yamada K (2001) Fast decoding and optimal decoding for machine translation. In: Proceedings of 39th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Toulouse, France, pp 228–235

  • Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–732

    Article  MATH  MathSciNet  Google Scholar 

  • Johnson H, Martin J, Foster G, Kuhn R (2007a) Improving translation quality by discarding most of the phrasetable. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), Association for Computational Linguistics, Prague, Czech Republic, pp 967–975

  • Johnson M, Griffiths T, Goldwater S (2007b) Bayesian inference for PCFGs via Markov Chain Monte Carlo. In: Human language technologies 2007: the conference of the North American chapter of the Association for Computational Linguistics, Proceedings of the Main Conference, Association for Computational Linguistics, Rochester, New York, pp 139–146

  • Koehn P, Hoang H (2007) Factored translation models. In: Proceedings of EMNLP, Association for Computational Linguistics, Prague, Czech Republic, pp 868–876

  • Koehn P, Och F, Marcu D (2003) Statistical phrase-based translation. In: Proceedings of HLT-NAACL. Morristown, NJ, USA, pp 48–54

  • Kumar S, Byrne W (2004) Minimum Bayes-risk decoding for statistical machine translation. In: Susan Dumais DM, Roukos S (eds) HLT-NAACL 2004: main proceedings, Association for Computational Linguistics, Boston, Massachusetts, USA, pp 169–176

  • Langlais P, Gotti F, Patry A (2007) A greedy decoder for phrase-based statistical machine translation. In: 11th international conference on theoretical and methodological issues in machine translation (TMI 2007), Sḱdcvde, Sweden, pp 104–113

  • Li Z, Eisner J, Khudanpur S (2009) Variational decoding for statistical machine translation. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP, pp 593–601

  • Liu DC, Nocedal J (1989) On the limited memory BFGS method for large scale optimization. Math Program 45(3): 503–528

    Article  MATH  MathSciNet  Google Scholar 

  • Marcu D, Wong W (2002) A phrase-based, joint probability model for statistical machine translation. In: EMNLP ’02: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, Association for Computational Linguistics, Morristown, NJ, USA, pp 133–139

  • Metropolis N, Ulam S (1949) The Monte Carlo method. J Am Stat Assoc 44(247): 335–341

    Article  MATH  MathSciNet  Google Scholar 

  • Och FJ (2003) Minimum error rate training in statistical machine translation. In: Proceedings of the 41st annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Sapporo, Japan, pp 160–167

  • Papineni K, Roukos S, Ward T, Zhu W-J (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of 40th annual meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, pp 311–318

  • Schraudolph NN (1999) Local gain adaptation in stochastic gradient descent. Technical Report IDSIA-09-99, IDSIA

  • Smith DA, Eisner J (2006) Minimum risk annealing for training log-linear models. In: Proceedings of the COLING/ACL 2006 main conference poster sessions, Sydney, Australia, pp 787–794

  • Zens R, Hasan S, Ney H (2007) A systematic comparison of training criteria for statistical machine translation. In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL), pp 524–532

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abhishek Arun.

Additional information

This paper extends work presented in Arun et al. (2009).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arun, A., Haddow, B., Koehn, P. et al. Monte Carlo techniques for phrase-based translation. Machine Translation 24, 103–121 (2010). https://doi.org/10.1007/s10590-010-9080-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10590-010-9080-7

Keywords

Navigation