A Study on Multilingual Transfer Learning in Neural Machine Translation: Finding the Balance Between Languages

Bardet, Adrien; Bougares, Fethi; Barrault, Loïc

doi:10.1007/978-3-030-31372-2_5

A Study on Multilingual Transfer Learning in Neural Machine Translation: Finding the Balance Between Languages

Adrien Bardet¹¹,
Fethi Bougares¹¹ &
Loïc Barrault¹¹

Conference paper
First Online: 27 September 2019

684 Accesses
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11816))

Abstract

Transfer learning is an interesting approach to tackle the low resource languages machine translation problem. Transfer learning, as a machine learning algorithm, requires to make several choices such as selecting the training data and more particularly language pairs and their available quantity and quality. Other important choices must be made during the preprocessing step, like selecting data to learn subword units, the subsequent model’s vocabulary. It is still unclear how to optimize this transfer. In this paper, we analyse the impact of such early choices on the performance of the systems. We show that systems performance are depending on quantity of available data and proximity of the involved languages as well as the protocol used to determined the subword units model and consequently the vocabulary. We also propose a multilingual approach to transfer learning involving a universal encoder. This multilingual approach is comparable to a multi-source transfer learning setup where the system learns from multiple languages before the transfer. We analyse subword units distribution across different languages and show that, once again, preprocessing choices impact systems overall performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://github.com/lium-lst/nmtpytorch.

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014). http://arxiv.org/abs/1409.0473
Bojar, O., et al.: Findings of the 2018 conference on machine translation (WMT18). In: Proceedings of the Third Conference on Machine Translation, Volume 2: Shared Task Papers, pp. 272–307. Association for Computational Linguistics, Belgium, Brussels, October 2018. http://www.aclweb.org/anthology/W18-6401
Caglayan, O., García-Martínez, M., Bardet, A., Aransa, W., Bougares,F., Barrault, L.: NMTPY: a flexible toolkit for advanced neural machine translation systems. Prague Bull. Math. Linguist. 109, 15–28 (2017). https://doi.org/10.1515/pralin-2017-0035, https://ufal.mff.cuni.cz/pbml/109/art-caglayan-et-al.pdf
Article Google Scholar
Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014). http://arxiv.org/abs/1406.1078
Dabre, R., Nakagawa, T., Kazawa, H.: An empirical study of language relatedness for transfer learning in neural machine translation. In: Proceedings of the 31st Pacific Asia Conference on Language, Information and Computation, pp. 282–286. The National University (Philippines) (2017). http://aclweb.org/anthology/Y17-1038
Durrani, N., Dalvi, F., Sajjad, H., Belinkov, Y., Nakov, P.: One size does not fit all: comparing NMT representations of different granularities. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 1504–1516. Association for Computational Linguistics, Minneapolis, Minnesota, June 2019. https://www.aclweb.org/anthology/N19-1154
Gu, J., Wang, Y., Chen, Y., Li, V.O.K., Cho, K.: Meta-learning for low-resource neural machine translation. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3622–3631. Association for Computational Linguistics, Brussels, Belgium, October–November 2018. https://www.aclweb.org/anthology/D18-1398
Ha, T., Niehues, J., Waibel, A.H.: Toward multilingual neural machine translation with universal encoder and decoder. CoRR abs/1611.04798 (2016). http://arxiv.org/abs/1611.04798
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. CoRR abs/1502.01852 (2015). http://arxiv.org/abs/1502.01852
Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. CoRR abs/1611.04558 (2016). http://arxiv.org/abs/1611.04558
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Kocmi, T., Bojar, O.: Trivial transfer learning for low-resource neural machine translation. In: Proceedings of the Third Conference on Machine Translation: Research Papers, WMT 2018, Belgium, Brussels, 31 October–1 November 2018, pp. 244–252 (2018). https://aclanthology.info/papers/W18-6325/w18-6325
Koehn, P., Knowles, R.: Six challenges for neural machine translation. In: Proceedings of the First Workshop on Neural Machine Translation, pp. 28–39. Association for Computational Linguistics, Vancouver, August 2017. https://doi.org/10.18653/v1/W17-3204, https://www.aclweb.org/anthology/W17-3204
Kudo, T., Richardson, J.: Sentencepiece: a simple and language independent subword tokenizer and detokenizer for neural text processing. CoRR abs/1808.06226 (2018). http://arxiv.org/abs/1808.06226
Lakew, S.M., Erofeeva, A., Negri, M., Federico, M., Turchi, M.: Transfer learning in multilingual neural machine translation with dynamic vocabulary. In: IWSLT 2018, October 2018
Google Scholar
Nguyen, T.Q., Chiang, D.: Transfer learning across low-resource, related languages for neural machine translation. CoRR abs/1708.09803 (2017). http://arxiv.org/abs/1708.09803
Sachan, D., Neubig, G.: Parameter sharing methods for multilingual self-attentional translation models. In: 3rd Conference on Machine Translation (WMT), Brussels, Belgium, October 2018. https://arxiv.org/abs/1809.00252
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014). http://jmlr.org/papers/v15/srivastava14a.html
MathSciNet MATH Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014). http://arxiv.org/abs/1409.3215
Zoph, B., Yuret, D., May, J., Knight, K.: Transfer learning for low-resource neural machine translation. CoRR abs/1604.02201 (2016). http://arxiv.org/abs/1604.02201

Download references

Acknowledgments

This work was supported by the French National Research Agency (ANR) through the CHIST-ERA M2CR project, under the contract number ANR-15-CHR2-0006-017.

Author information

Authors and Affiliations

LIUM, Le Mans University, Le Mans, France
Adrien Bardet, Fethi Bougares & Loïc Barrault

Authors

Adrien Bardet
View author publications
You can also search for this author in PubMed Google Scholar
Fethi Bougares
View author publications
You can also search for this author in PubMed Google Scholar
Loïc Barrault
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Adrien Bardet , Fethi Bougares or Loïc Barrault .

Editor information

Editors and Affiliations

Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
Queen Mary University of London, London, UK
Matthew Purver
Jožef Stefan Institute, Ljubljana, Slovenia
Senja Pollak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bardet, A., Bougares, F., Barrault, L. (2019). A Study on Multilingual Transfer Learning in Neural Machine Translation: Finding the Balance Between Languages. In: Martín-Vide, C., Purver, M., Pollak, S. (eds) Statistical Language and Speech Processing. SLSP 2019. Lecture Notes in Computer Science(), vol 11816. Springer, Cham. https://doi.org/10.1007/978-3-030-31372-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-31372-2_5
Published: 27 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31371-5
Online ISBN: 978-3-030-31372-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics