Controlling the Transition of Hidden States for Neural Machine Translation

  • Zaixiang Zheng
  • Shujian HuangEmail author
  • Xin-Yu Dai
  • Jiajun Chen
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 954)


Recurrent Neural Networks (RNN) based Neural Machine Translation (NMT) models under an encoder-decoder framework show significant improvements in translation quality recently. Given the encoded representations of source sentence, the NMT systems generate translated sentence word by word, dependent on the hidden states of the decoder. The hidden states of the decoder update at each decoding step, deciding the next translation to be generated. In this case, the transitions of the hidden states between successive steps contribute to the decisions of the next token of the translation, which draws less attention in previous work. In this work, we propose an explicit supervised objective on the transitions of the decoder hidden states, aiming to help our model to learn the transitional patterns better. We first attempt to model the increment of the transition by the proposed subtraction operation. Then, we require the increment to be predictive of the word to translate. The proposed approach strengthens the relationship between the transition of the decoder and the translation. Empirical evaluation shows considerable improvements on Chinese-English, German-English, and English-German translation tasks, demonstrating the effectiveness of our approach.


Neural Machine Translation Transition Control 


  1. 1.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR 2015 (2015)Google Scholar
  2. 2.
    Bojar, O., et al. (eds.): Proceedings of the Second Conference on Machine Translation. Association for Computational Linguistics, Copenhagen, September 2017.
  3. 3.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP 2014 (2014)Google Scholar
  4. 4.
    Conneau, A., Kruszewski, G., Lample, G., Barrault, L., Baroni, M.: What you can cram into a single vector: probing sentence embeddings for linguistic properties (2018).
  5. 5.
    Gehring, J., Auli, M., Grangier, D., Yarats, D., Dauphin, Y.: Convolutional sequence to sequence learning. In: ICML 2017 (2017)Google Scholar
  6. 6.
    Luong, T., Pham, H., Manning, D.C.: Effective approaches to attention-based neural machine translation. In: EMNLP 2015 (2015)Google Scholar
  7. 7.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL 2002 (2002)Google Scholar
  8. 8.
    Sennrich, R., et al.: The University of Edinburgh’s neural MT systems for WMT17 (2017)Google Scholar
  9. 9.
    Sennrich, R., et al.: Nematus: A Toolkit for Neural Machine Translation (2016)Google Scholar
  10. 10.
    Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. Comput. Sci. (2016)Google Scholar
  11. 11.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: NIPS 2014 (2014)Google Scholar
  12. 12.
    Vaswani, A., et al.: Attention is all you need. In: NIPS 2017. Curran Associates, Inc. (2017)Google Scholar
  13. 13.
    Weng, R., Huang, S., Zheng, Z., Dai, X.Y., Chen, J.: Neural machine translation with word predictions. In: EMNLP 2017 (2017)Google Scholar
  14. 14.
    Zheng, Z., et al.: Modeling Past and Future for Neural Machine Translation. ArXiv e-prints, November 2017Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Zaixiang Zheng
    • 1
  • Shujian Huang
    • 1
    Email author
  • Xin-Yu Dai
    • 1
  • Jiajun Chen
    • 1
  1. 1.Nanjing UniversityNanjingPeople’s Republic of China

Personalised recommendations