Abstract
One of the major reasons for translation errors in phrase-based SMT systems is the incorrect phrases induced from inaccuracy word-aligned parallel data. In this paper, we propose a novel approach that uses the minimum Bayes-risk (MBR) principle to improve the accuracy of phrase extraction. Our approach performs as a four-stage pipeline: first, bilingual phrases are extracted from parallel corpus using a standard phrase induction method; then, phrases are separated into groups under specific constraints and scored using an MBR model; next, word alignment links contained in phrases with their MBR scores lower than a certain threshold are pruned in the parallel data; last, a new phrase table is learned from the link-pruned parallel data and used in SMT decoding. We evaluate our approach on the SMT Chinese-English MT tasks, and show significant improvements on parallel data sets of different scales.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Och, F., Ney, H.: The Alignment template approach to Statistical Machine Translation. Computational Linguistics 30(4), 417–449 (2004)
Kumar, S., Byrne, W.: Minimum Bayes-Risk Word Alignment of Bilingual Texts. In: Proc. EMNLP, pp. 140–147 (2002)
Kumar, S., Byrne, W.: Minimum Bayes-Risk Decoding for Statistical Machine Translation. In: Proc. HLT-NAACL, pp. 169–176 (2004)
Kumar, S., Macherey, W., Dyer, C., Och, F.: Efficient Minimum Error Rate Training and Minimum Bayes-Risk Decoding for Translation Hypergraphs and Lattices. In: Proc. ACL, pp. 163–171 (2009)
Mi, H., Huang, L., Liu, Q.: Forest-based Translation. In: Proc. ACL, pp. 192–199 (2008)
Liu, Y., Xia, T., Xiao, X., Liu, Q.: Weighted Alignment Matrices for Statistical Machine Translation. In: Proc. EMNLP, pp. 1017–1026 (2009)
Och, F.: An Efficient Method for Determining Bilingual Word Classes. In: Proc. EACL, pp. 160–167 (1999)
Xiong, D., Liu, Q., Lin, S.: Maximum Entropy based Phrase Reordering Model for Statistical Machine Translation. In: Proc. ACL, pp. 521–528 (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this paper
Cite this paper
Duan, N. (2012). MBR Phrase Scoring and Pruning for SMT. In: Xie, A., Huang, X. (eds) Advances in Computer Science and Education. Advances in Intelligent and Soft Computing, vol 140. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27945-4_31
Download citation
DOI: https://doi.org/10.1007/978-3-642-27945-4_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27944-7
Online ISBN: 978-3-642-27945-4
eBook Packages: EngineeringEngineering (R0)