Dynamic Fusion: Attentional Language Model for Neural Machine Translation

Kurosawa, Michiki; Komachi, Mamoru

doi:10.1007/978-981-15-6168-9_9

Michiki Kurosawa¹⁰ &
Mamoru Komachi¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1215))

Included in the following conference series:

International Conference of the Pacific Association for Computational Linguistics

671 Accesses
1 Citations

Abstract

Neural Machine Translation (NMT) can be used to generate fluent output. As such, language models have been investigated for incorporation with NMT. In prior investigations, two models have been used: a translation model and a language model. The translation model’s predictions are weighted by the language model with a hand-crafted ratio in advance. However, these approaches fail to adopt the language model weighting with regard to the translation history. In another line of approach, language model prediction is incorporated into the translation model by jointly considering source and target information. However, this line of approach is limited because it largely ignores the adequacy of the translation output.

Accordingly, this work employs two mechanisms, the translation model and the language model, with an attentive architecture to the language model as an auxiliary element of the translation model. Compared with previous work in English–Japanese machine translation using a language model, the experimental results obtained with the proposed Dynamic Fusion mechanism improve BLEU and Rank-based Intuitive Bilingual Evaluation Scores (RIBES) scores. Additionally, in the analyses of the attention and predictivity of the language model, the Dynamic Fusion mechanism allows predictive language modeling that conforms to the appropriate grammatical structure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A logit is a probability projection layer without softmax.
2.
http://www.phontron.com/travatar/evaluation.html.
3.
https://github.com/taku910/mecab.
4.
http://www.statmt.org/moses/.
5.
We exclude sentences whose number of tokens with more than 60 tokens in training.
6.
We did not perform an experiment with Simple Fusion because Simple Fusion requires the vocabularies of both the language model and translation model to be identical.
7.
The language model cannot predict that the first token correctly because it starts with <BOS>.

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR (2015)
Google Scholar
Brants, T., Popat, A.C., Xu, P., Och, F.J., Dean, J.: Large language models in machine translation. In: Proceedings of EMNLP-CoNLL, pp. 858–867 (2007)
Google Scholar
Gulcehre, C., et al.: On using monolingual corpora in neural machine translation. arXiv (2015)
Google Scholar
Hirasawa, T., Yamagishi, H., Matsumura, Y., Komachi, M.: Multimodal machine translation with embedding prediction. In: Proceedings of NAACL, pp. 86–91, June 2019
Google Scholar
Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of EMNLP, pp. 944–952 (2010)
Google Scholar
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of EMNLP, pp. 1412–1421 (2015)
Google Scholar
Luong, T., Sutskever, I., Le, Q., Vinyals, O., Zaremba, W.: Addressing the rare word problem in neural machine translation. In: Proceedings of ACL, pp. 11–19 (2015)
Google Scholar
Matsumura, Y., Komachi, M.: Tokyo Metropolitan University neural machine translation system for WAT 2017. In: Proceedings of WAT, pp. 160–166 (2017)
Google Scholar
Nakazawa, T., et al.: ASPEC: Asian scientific paper excerpt corpus. In: Proceedings of LREC, pp. 2204–2208 (2016)
Google Scholar
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp. 311–318 (2002)
Google Scholar
Qi, Y., Sachan, D., Felix, M., Padmanabhan, S., Neubig, G.: When and why are pre-trained word embeddings useful for neural machine translation? In: Proceedings of NAACL, pp. 529–535 (2018). https://doi.org/10.18653/v1/N18-2084
Ramachandran, P., Liu, P., Le, Q.: Unsupervised pretraining for sequence to sequence learning. In: Proceedings of EMNLP, pp. 383–391 (2017). https://doi.org/10.18653/v1/D17-1039
Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. In: Proceedings of WMT, pp. 83–91 (2016). https://doi.org/10.18653/v1/W16-2209
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of ACL, pp. 86–96 (2016). https://doi.org/10.18653/v1/P16-1009
Sriram, A., Jun, H., Satheesh, S., Coates, A.: Cold fusion: training Seq2Seq models together with language models. arXiv (2017)
Google Scholar
Stahlberg, F., Cross, J., Stoyanov, V.: Simple fusion: return of the language model. In: Proceedings of WMT, pp. 204–211 (2018)
Google Scholar
Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: Proceedings of ACL, pp. 76–85 (2016). https://doi.org/10.18653/v1/P16-1008
Yamagishi, H., Kanouchi, S., Sato, T., Komachi, M.: Improving Japanese-to-English neural machine translation by voice prediction. In: Proceedings of IJCNLP, pp. 277–282, November 2017
Google Scholar

Download references

Author information

Authors and Affiliations

Tokyo Metropolitan University, 6-6 Asahigaoka, Hino, Tokyo, 191-0065, Japan
Michiki Kurosawa & Mamoru Komachi

Authors

Michiki Kurosawa
View author publications
You can also search for this author in PubMed Google Scholar
Mamoru Komachi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michiki Kurosawa .

Editor information

Editors and Affiliations

Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Le-Minh Nguyen
University of Engineering and Technology, Hanoi, Vietnam
Xuan-Hieu Phan
Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Kôiti Hasida
Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Satoshi Tojo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurosawa, M., Komachi, M. (2020). Dynamic Fusion: Attentional Language Model for Neural Machine Translation. In: Nguyen, LM., Phan, XH., Hasida, K., Tojo, S. (eds) Computational Linguistics. PACLING 2019. Communications in Computer and Information Science, vol 1215. Springer, Singapore. https://doi.org/10.1007/978-981-15-6168-9_9

Download citation

DOI: https://doi.org/10.1007/978-981-15-6168-9_9
Published: 02 July 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6167-2
Online ISBN: 978-981-15-6168-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics