Skip to main content

MBR Phrase Scoring and Pruning for SMT

  • Conference paper
Book cover Advances in Computer Science and Education

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 140))

  • 2288 Accesses

Abstract

One of the major reasons for translation errors in phrase-based SMT systems is the incorrect phrases induced from inaccuracy word-aligned parallel data. In this paper, we propose a novel approach that uses the minimum Bayes-risk (MBR) principle to improve the accuracy of phrase extraction. Our approach performs as a four-stage pipeline: first, bilingual phrases are extracted from parallel corpus using a standard phrase induction method; then, phrases are separated into groups under specific constraints and scored using an MBR model; next, word alignment links contained in phrases with their MBR scores lower than a certain threshold are pruned in the parallel data; last, a new phrase table is learned from the link-pruned parallel data and used in SMT decoding. We evaluate our approach on the SMT Chinese-English MT tasks, and show significant improvements on parallel data sets of different scales.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Och, F., Ney, H.: The Alignment template approach to Statistical Machine Translation. Computational Linguistics 30(4), 417–449 (2004)

    Article  MATH  Google Scholar 

  2. Kumar, S., Byrne, W.: Minimum Bayes-Risk Word Alignment of Bilingual Texts. In: Proc. EMNLP, pp. 140–147 (2002)

    Google Scholar 

  3. Kumar, S., Byrne, W.: Minimum Bayes-Risk Decoding for Statistical Machine Translation. In: Proc. HLT-NAACL, pp. 169–176 (2004)

    Google Scholar 

  4. Kumar, S., Macherey, W., Dyer, C., Och, F.: Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices. In: Proc. ACL, pp. 163–171 (2009)

    Google Scholar 

  5. Mi, H., Huang, L., Liu, Q.: Forest-based Translation. In: Proc. ACL, pp. 192–199 (2008)

    Google Scholar 

  6. Liu, Y., Xia, T., Xiao, X., Liu, Q.: Weighted Alignment Matrices for Statistical Machine Translation. In: Proc. EMNLP, pp. 1017–1026 (2009)

    Google Scholar 

  7. Och, F.: An Efficient Method for Determining Bilingual Word Classes. In: Proc. EACL, pp. 160–167 (1999)

    Google Scholar 

  8. Xiong, D., Liu, Q., Lin, S.: Maximum Entropy based Phrase Reordering Model for Statistical Machine Translation. In: Proc. ACL, pp. 521–528 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Duan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag GmbH Berlin Heidelberg

About this paper

Cite this paper

Duan, N. (2012). MBR Phrase Scoring and Pruning for SMT. In: Xie, A., Huang, X. (eds) Advances in Computer Science and Education. Advances in Intelligent and Soft Computing, vol 140. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27945-4_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-27945-4_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-27944-7

  • Online ISBN: 978-3-642-27945-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics