Abstract
Neural Machine Translation (NMT) can be used to generate fluent output. As such, language models have been investigated for incorporation with NMT. In prior investigations, two models have been used: a translation model and a language model. The translation model’s predictions are weighted by the language model with a hand-crafted ratio in advance. However, these approaches fail to adopt the language model weighting with regard to the translation history. In another line of approach, language model prediction is incorporated into the translation model by jointly considering source and target information. However, this line of approach is limited because it largely ignores the adequacy of the translation output.
Accordingly, this work employs two mechanisms, the translation model and the language model, with an attentive architecture to the language model as an auxiliary element of the translation model. Compared with previous work in English–Japanese machine translation using a language model, the experimental results obtained with the proposed Dynamic Fusion mechanism improve BLEU and Rank-based Intuitive Bilingual Evaluation Scores (RIBES) scores. Additionally, in the analyses of the attention and predictivity of the language model, the Dynamic Fusion mechanism allows predictive language modeling that conforms to the appropriate grammatical structure.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
A logit is a probability projection layer without softmax.
- 2.
- 3.
- 4.
- 5.
We exclude sentences whose number of tokens with more than 60 tokens in training.
- 6.
We did not perform an experiment with Simple Fusion because Simple Fusion requires the vocabularies of both the language model and translation model to be identical.
- 7.
The language model cannot predict that the first token correctly because it starts with <BOS>.
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Proceedings of ICLR (2015)
Brants, T., Popat, A.C., Xu, P., Och, F.J., Dean, J.: Large language models in machine translation. In: Proceedings of EMNLP-CoNLL, pp. 858–867 (2007)
Gulcehre, C., et al.: On using monolingual corpora in neural machine translation. arXiv (2015)
Hirasawa, T., Yamagishi, H., Matsumura, Y., Komachi, M.: Multimodal machine translation with embedding prediction. In: Proceedings of NAACL, pp. 86–91, June 2019
Isozaki, H., Hirao, T., Duh, K., Sudoh, K., Tsukada, H.: Automatic evaluation of translation quality for distant language pairs. In: Proceedings of EMNLP, pp. 944–952 (2010)
Luong, T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Proceedings of EMNLP, pp. 1412–1421 (2015)
Luong, T., Sutskever, I., Le, Q., Vinyals, O., Zaremba, W.: Addressing the rare word problem in neural machine translation. In: Proceedings of ACL, pp. 11–19 (2015)
Matsumura, Y., Komachi, M.: Tokyo Metropolitan University neural machine translation system for WAT 2017. In: Proceedings of WAT, pp. 160–166 (2017)
Nakazawa, T., et al.: ASPEC: Asian scientific paper excerpt corpus. In: Proceedings of LREC, pp. 2204–2208 (2016)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp. 311–318 (2002)
Qi, Y., Sachan, D., Felix, M., Padmanabhan, S., Neubig, G.: When and why are pre-trained word embeddings useful for neural machine translation? In: Proceedings of NAACL, pp. 529–535 (2018). https://doi.org/10.18653/v1/N18-2084
Ramachandran, P., Liu, P., Le, Q.: Unsupervised pretraining for sequence to sequence learning. In: Proceedings of EMNLP, pp. 383–391 (2017). https://doi.org/10.18653/v1/D17-1039
Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. In: Proceedings of WMT, pp. 83–91 (2016). https://doi.org/10.18653/v1/W16-2209
Sennrich, R., Haddow, B., Birch, A.: Improving neural machine translation models with monolingual data. In: Proceedings of ACL, pp. 86–96 (2016). https://doi.org/10.18653/v1/P16-1009
Sriram, A., Jun, H., Satheesh, S., Coates, A.: Cold fusion: training Seq2Seq models together with language models. arXiv (2017)
Stahlberg, F., Cross, J., Stoyanov, V.: Simple fusion: return of the language model. In: Proceedings of WMT, pp. 204–211 (2018)
Tu, Z., Lu, Z., Liu, Y., Liu, X., Li, H.: Modeling coverage for neural machine translation. In: Proceedings of ACL, pp. 76–85 (2016). https://doi.org/10.18653/v1/P16-1008
Yamagishi, H., Kanouchi, S., Sato, T., Komachi, M.: Improving Japanese-to-English neural machine translation by voice prediction. In: Proceedings of IJCNLP, pp. 277–282, November 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kurosawa, M., Komachi, M. (2020). Dynamic Fusion: Attentional Language Model for Neural Machine Translation. In: Nguyen, LM., Phan, XH., Hasida, K., Tojo, S. (eds) Computational Linguistics. PACLING 2019. Communications in Computer and Information Science, vol 1215. Springer, Singapore. https://doi.org/10.1007/978-981-15-6168-9_9
Download citation
DOI: https://doi.org/10.1007/978-981-15-6168-9_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-6167-2
Online ISBN: 978-981-15-6168-9
eBook Packages: Computer ScienceComputer Science (R0)