Dynamic Fusion: Attentional Language Model for Neural Machine Translation

  • Michiki KurosawaEmail author
  • Mamoru Komachi
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1215)


Neural Machine Translation (NMT) can be used to generate fluent output. As such, language models have been investigated for incorporation with NMT. In prior investigations, two models have been used: a translation model and a language model. The translation model’s predictions are weighted by the language model with a hand-crafted ratio in advance. However, these approaches fail to adopt the language model weighting with regard to the translation history. In another line of approach, language model prediction is incorporated into the translation model by jointly considering source and target information. However, this line of approach is limited because it largely ignores the adequacy of the translation output.

Accordingly, this work employs two mechanisms, the translation model and the language model, with an attentive architecture to the language model as an auxiliary element of the translation model. Compared with previous work in English–Japanese machine translation using a language model, the experimental results obtained with the proposed Dynamic Fusion mechanism improve BLEU and Rank-based Intuitive Bilingual Evaluation Scores (RIBES) scores. Additionally, in the analyses of the attention and predictivity of the language model, the Dynamic Fusion mechanism allows predictive language modeling that conforms to the appropriate grammatical structure.


Language model Neural machine translation Attention mechanism 


  1. 1.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR (2015)Google Scholar
  2. 2.
    Brants, T., Popat, A.C., Xu, P., Och, F.J., Dean, J.: Large language models in machine translation. In: Proceedings of EMNLP-CoNLL, pp. 858–867 (2007)Google Scholar
  3. 3.
    Gulcehre, C., et al.: On using monolingual corpora in neural machine translation. arXiv (2015)Google Scholar
  4. 4.
    Hirasawa, T., Yamagishi, H., Matsumura, Y., Komachi, M.: Multimodal machine translation with embedding prediction. In: Proceedings of NAACL, pp. 86–91, June 2019Google Scholar
  5. 5.
    Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of EMNLP, pp. 944–952 (2010)Google Scholar
  6. 6.
    Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of EMNLP, pp. 1412–1421 (2015)Google Scholar
  7. 7.
    Luong, T., Sutskever, I., Le, Q., Vinyals, O., Zaremba, W.: Addressing the rare word problem in neural machine translation. In: Proceedings of ACL, pp. 11–19 (2015)Google Scholar
  8. 8.
    Matsumura, Y., Komachi, M.: Tokyo Metropolitan University neural machine translation system for WAT 2017. In: Proceedings of WAT, pp. 160–166 (2017)Google Scholar
  9. 9.
    Nakazawa, T., et al.: ASPEC: Asian scientific paper excerpt corpus. In: Proceedings of LREC, pp. 2204–2208 (2016)Google Scholar
  10. 10.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp. 311–318 (2002)Google Scholar
  11. 11.
    Qi, Y., Sachan, D., Felix, M., Padmanabhan, S., Neubig, G.: When and why are pre-trained word embeddings useful for neural machine translation? In: Proceedings of NAACL, pp. 529–535 (2018).
  12. 12.
    Ramachandran, P., Liu, P., Le, Q.: Unsupervised pretraining for sequence to sequence learning. In: Proceedings of EMNLP, pp. 383–391 (2017).
  13. 13.
    Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. In: Proceedings of WMT, pp. 83–91 (2016).
  14. 14.
    Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of ACL, pp. 86–96 (2016).
  15. 15.
    Sriram, A., Jun, H., Satheesh, S., Coates, A.: Cold fusion: training Seq2Seq models together with language models. arXiv (2017)Google Scholar
  16. 16.
    Stahlberg, F., Cross, J., Stoyanov, V.: Simple fusion: return of the language model. In: Proceedings of WMT, pp. 204–211 (2018)Google Scholar
  17. 17.
    Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: Proceedings of ACL, pp. 76–85 (2016).
  18. 18.
    Yamagishi, H., Kanouchi, S., Sato, T., Komachi, M.: Improving Japanese-to-English neural machine translation by voice prediction. In: Proceedings of IJCNLP, pp. 277–282, November 2017Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Tokyo Metropolitan UniversityHinoJapan

Personalised recommendations