Neural Machine Translation with Recurrent Highway Networks

  • Maulik ParmarEmail author
  • V. Susheela Devi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11308)


Recurrent Neural Networks have lately gained a lot of popularity in language modelling tasks, especially in neural machine translation (NMT). Very recent NMT models are based on Encoder-Decoder, where a deep LSTM based encoder is used to project the source sentence to a fixed dimensional vector and then another deep LSTM decodes the target sentence from the vector. However there has been very little work on exploring architectures that have more than one layer in space (i.e. in each time step). This paper examines the effectiveness of the simple Recurrent Highway Networks (RHN) in NMT tasks. The model uses Recurrent Highway Neural Network in encoder and decoder, with attention. We also explore the reconstructor model to improve adequacy. We demonstrate the effectiveness of all three approaches on the IWSLT English-Vietnamese dataset. We see that RHN performs on par with LSTM based models and even better in some cases. We see that deep RHN models are easy to train compared to deep LSTM based models because of highway connections. The paper also investigates the effects of increasing recurrent depth in each time step.


Recurrent highway networks Reconstructor Attention Encoder-decoder 


  1. 1.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with Neural Networks. arXiv (2014)Google Scholar
  2. 2.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv (2014)Google Scholar
  3. 3.
    Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. arXiv (2015)Google Scholar
  4. 4.
    Zilly, J.G., Srivastava, R.K., Koutník, J., Schmidhuber, J.: Recurrent highway networks. arXiv (2016)Google Scholar
  5. 5.
    Pundak, G., Sainath, T.: Highway-LSTM and recurrent highway networks for speech recognition. In: Proceedings of Interspeech (2017)Google Scholar
  6. 6.
    Kalchbrenner, N., Blunsom, P.: Recurrent continuous translation models. arXiv (2013)Google Scholar
  7. 7.
    Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: NAACL (2003)Google Scholar
  8. 8.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)Google Scholar
  9. 9.
    Tu, Z., Liu, Y., Shang, L., Liu, X., Li, H.: Neural machine translation with reconstruction. In: AAAI (2017)Google Scholar
  10. 10.
    Luong, M.-T., Manning, C.D.: Stanford neural machine translation systems for spoken language domains. arxiv (2015)Google Scholar
  11. 11.
    Pascanu, R., Gülçehre, Ç., Cho, K., Bengio, Y.: How to construct deep recurrent neural networks. CoRR abs/1312.6026 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Indian Institute of ScienceBengaluruIndia

Personalised recommendations