Abstract
Recurrent Neural Networks (RNN) based Neural Machine Translation (NMT) models under an encoder-decoder framework show significant improvements in translation quality recently. Given the encoded representations of source sentence, the NMT systems generate translated sentence word by word, dependent on the hidden states of the decoder. The hidden states of the decoder update at each decoding step, deciding the next translation to be generated. In this case, the transitions of the hidden states between successive steps contribute to the decisions of the next token of the translation, which draws less attention in previous work. In this work, we propose an explicit supervised objective on the transitions of the decoder hidden states, aiming to help our model to learn the transitional patterns better. We first attempt to model the increment of the transition by the proposed subtraction operation. Then, we require the increment to be predictive of the word to translate. The proposed approach strengthens the relationship between the transition of the decoder and the translation. Empirical evaluation shows considerable improvements on Chinese-English, German-English, and English-German translation tasks, demonstrating the effectiveness of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The corpora includes LDC2002E18, LDC2003E07, LDC2003E14, Hansards portion of LDC2004T07, LDC2004T08 and LDC2005T06.
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR 2015 (2015)
Bojar, O., et al. (eds.): Proceedings of the Second Conference on Machine Translation. Association for Computational Linguistics, Copenhagen, September 2017. http://www.aclweb.org/anthology/W17-47
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP 2014 (2014)
Conneau, A., Kruszewski, G., Lample, G., Barrault, L., Baroni, M.: What you can cram into a single vector: probing sentence embeddings for linguistic properties (2018). http://arxiv.org/abs/1805.01070
Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.: Convolutional sequence to sequence learning. In: ICML 2017 (2017)
Luong, T., Pham, H., Manning, D.C.: Effective approaches to attention-based neural machine translation. In: EMNLP 2015 (2015)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL 2002 (2002)
Sennrich, R., et al.: The University of Edinburgh’s neural MT systems for WMT17 (2017)
Sennrich, R., et al.: Nematus: A Toolkit for Neural Machine Translation (2016)
Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. Comput. Sci. (2016)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014 (2014)
Vaswani, A., et al.: Attention is all you need. In: NIPS 2017. Curran Associates, Inc. (2017)
Weng, R., Huang, S., Zheng, Z., Dai, X.Y., Chen, J.: Neural machine translation with word predictions. In: EMNLP 2017 (2017)
Zheng, Z., et al.: Modeling Past and Future for Neural Machine Translation. ArXiv e-prints, November 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zheng, Z., Huang, S., Dai, XY., Chen, J. (2019). Controlling the Transition of Hidden States for Neural Machine Translation. In: Chen, J., Zhang, J. (eds) Machine Translation. CWMT 2018. Communications in Computer and Information Science, vol 954. Springer, Singapore. https://doi.org/10.1007/978-981-13-3083-4_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-3083-4_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3082-7
Online ISBN: 978-981-13-3083-4
eBook Packages: Computer ScienceComputer Science (R0)