Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

Abstract

There are several approaches for improving neural machine translation for low-resource languages: monolingual data can be exploited via pretraining or data augmentation; parallel corpora on related language pairs can be used via parameter sharing or transfer learning in multilingual models; subword segmentation and regularization techniques can be applied to ensure high coverage of the vocabulary. We review these approaches in the context of an asymmetric-resource one-to-many translation task, in which the pair of target languages are related, with one being a very low-resource and the other a higher-resource language. We test various methods on three artificially restricted translation tasks—English to Estonian (low-resource) and Finnish (high-resource), English to Slovak and Czech, English to Danish and Swedish—and one real-world task, Norwegian to North Sámi and Finnish. The experiments show positive effects especially for scheduled multi-task learning, denoising autoencoder, and subword sampling.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. 1.

    Software available at https://github.com/Waino/morfessor-emprune.

  2. 2.

    https://victorio.uit.no/freecorpus/.

  3. 3.

    http://www.statmt.org/wmt18/translation-task.html.

  4. 4.

    http://hdl.handle.net/11858/00-097C-0000-0001-CCDB-0.

  5. 5.

    Technical University of Kosice, 2014

  6. 6.

    https://spraakbanken.gu.se/eng/resource/rd-prot.

  7. 7.

    http://hdl.handle.net/11022/0000-0000-2238-B.

  8. 8.

    https://www.nb.no/sprakbanken/show?serial=oai%3Anb.no%3Asbr-4&lang=en.

  9. 9.

    sewiki-20191201 dump.

  10. 10.

    mteval-v13a.pl

  11. 11.

    Software available at https://github.com/Waino/OpenNMT-py/tree/dynamicdata. Later, the dataloader of OpenNMT-py version 2.0 was redesigned to incorporate our proposals.

References

  1. Arivazhagan N, Bapna A, Firat O, Lepikhin D, Johnson M, Krikun M, Chen MX, Cao Y, Foster G, Cherry C, Macherey W, Chen Z, Wu Y (2019) Massively multilingual neural machine translation in the wild: Findings and challenges. arXiv:1907.05019 [cs.CL]

  2. Artetxe M, Labaka G, Agirre E, Cho K (2018) Unsupervised neural machine translation. In: Proceedings of the 6th international conference on learning representations (ICLR), http://arxiv.org/abs/1710.11041

  3. Belinkov Y, Bisk Y (2017) Synthetic and natural noise both break neural machine translation. arXiv:1711.02173 [cs.CL]

  4. Blackwood G, Ballesteros M, Ward T (2018) Multilingual neural machine translation with task-specific attention. In: Proceedings of the 27th international conference on computational linguistics, pp 3112–3122

  5. Bojar O, Dušek O, Kocmi T, Libovický J, Novák M, Popel M, Sudarikov R, Variš D (2016) CzEng 1.6: Enlarged Czech-English Parallel Corpus with Processing Tools Dockered. In: Sojka P, Horák A, Kopeček I, Pala K (eds) Text, speech, and dialogue: 19th international conference, TSD 2016, Masaryk University, Springer International Publishing, Cham/Heidelberg/New York/Dordrecht/London, no. 9924 in Lecture Notes in Artificial Intelligence, pp 231–238

  6. Bojar O, Federmann C, Fishel M, Graham Y, Haddow B, Huck M, Koehn P, Monz C (2018) Findings of the 2018 conference on machine translation (wmt18). In: Proceedings of the third conference on machine translation, volume 2: shared task papers, association for computational linguistics, Belgium, Brussels, pp 272–307. http://www.aclweb.org/anthology/W18-6401

  7. Bourlard H, Kamp Y (1988) Auto-association by multilayer perceptrons and singular value decomposition. Biol Cybern 59(4–5):291–294

    MathSciNet  Article  Google Scholar 

  8. Caruana R (1998) Multitask learning. Learning to learn. Springer, Berlin, pp 95–133

    Google Scholar 

  9. Caswell I, Chelba C, Grangier D (2019) Tagged back-translation. In: Proceedings of the fourth conference on machine translation (WMT) (Volume 1: Research Papers), pp 53–63

  10. Chen Z, Badrinarayanan V, Lee CY, Rabinovich A (2018) Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In: Proceedings of the international conference on machine learning (ICML), pp 794–803

  11. Cheng Y, Xu W, He Z, He W, Wu H, Sun M, Liu Y (2016) Semi-supervised learning for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (acl) (volume 1: long papers), pp 1965–1974, http://arxiv.org/abs/1606.04596

  12. Cherry C, Foster G, Bapna A, Firat O, Macherey W (2018) Revisiting character-based neural machine translation with capacity and compression. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, Brussels, Belgium, pp 4295–4305, https://doi.org/10.18653/v1/D18-1461, https://www.aclweb.org/anthology/D18-1461

  13. Chu C, Dabre R, Kurohashi S (2017) An empirical comparison of domain adaptation methods for neural machine translation. In: Proceedings of the 55th annual meeting of the association for computational linguistics (ACL) (Volume 2: Short Papers), Association for Computational Linguistics, Vancouver, pp 385–391, https://doi.org/10.18653/v1/P17-2061, https://www.aclweb.org/anthology/P17-2061

  14. Chung J, Cho K, Bengio Y (2016) A character-level decoder without explicit segmentation for neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (ACL) (Volume 1: Long Papers), pp 1693–1703, http://arxiv.org/abs/1603.06147

  15. Conneau A, Lample G (2019) Cross-lingual language model pretraining. In: Proceedings of advances in neural information processing systems (NIPS), pp 7059–7069, http://papers.nips.cc/paper/8928-cross-lingual-language-model-pretraining

  16. Costa-jussà MR, Fonollosa JAR (2016) Character-based neural machine translation. In: Proceedings of the 54th annual meeting of the association for computational linguistics (ACL) (Volume 2: Short Papers), Association for Computational Linguistics, Berlin, Germany, pp 357–361, https://doi.org/10.18653/v1/P16-2058, https://www.aclweb.org/anthology/P16-2058

  17. Costa-jussà MR, Escolano C, Fonollosa JAR (2017) Byte-based neural machine translation. In: Proceedings of the first workshop on subword and character level models in NLP, Association for Computational Linguistics, Copenhagen, Denmark, pp 154–158, https://doi.org/10.18653/v1/W17-4123, https://www.aclweb.org/anthology/W17-4123

  18. Creutz M, Lagus K (2002) Unsupervised discovery of morphemes. In: Proceedings of the ACL-02 workshop on morphological and phonological learning (MPL), Association for Computational Linguistics, Philadelphia, Pennsylvania, vol 6, pp 21–30, https://doi.org/10.3115/1118647.1118650, http://portal.acm.org/citation.cfm?doid=1118647.1118650

  19. Creutz M, Lagus K (2005) Unsupervised morpheme segmentation and morphology induction from text corpora using Morfessor 1.0. Technical Report A81, Publications in Computer and Information Science, Publications in Computer and Information Science, Helsinki University of Technology

  20. Creutz M, Lagus K (2007) Unsupervised models for morpheme segmentation and morphology learning. ACM Trans Speech Lang Process 4(1):405

    Article  Google Scholar 

  21. Currey A, Miceli-Barone AV, Heafield K (2017) Copied monolingual data improves low-resource neural machine translation. In: Proceedings of the second conference on machine translation (WMT), pp 148–156

  22. Dabre R, Nakagawa T, Kazawa H (2017) An empirical study of language relatedness for transfer learning in neural machine translation. In: Proceedings of the 31st Pacific Asia conference on language, information and computation, pp 282–286

  23. Dabre R, Chu C, Kunchukuttan A (2020) A comprehensive survey of multilingual neural machine translation. ArXiv:2001.01115 [cs.CL], arXiv:2001.01115

  24. Dai AM, Le QV (2015) Semi-supervised sequence learning. In: Proceedings of advances in neural information processing systems (NIPS), pp 3079–3087

  25. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  26. Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 4171–4186, http://arxiv.org/abs/1810.04805

  27. Di Gangi MA, Federico M (2017) Monolingual embeddings for low resourced neural machine translation. In: Proceedings of the 14th international workshop on spoken language translation (IWSLT’17), pp 97–104

  28. Domhan T, Hieber F (2017) Using target-side monolingual data for neural machine translation through multi-task learning. In: Proceedings of the 2017 conference on empirical methods in natural language processing (EMNLP), pp 1500–1505

  29. Edunov S, Ott M, Auli M, Grangier D (2018) Understanding back-translation at scale. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 489–500

  30. Firat O, Cho K, Bengio Y (2016) Multi-way, multilingual neural machine translation with a shared attention mechanism. In: Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 866–875, http://arxiv.org/abs/1601.01073

  31. Forcada ML, Ginestí-Rosell M, Nordfalk J, O’Regan J, Ortiz-Rojas S, Pérez-Ortiz JA, Sánchez-Martínez F, Ramírez-Sánchez G, Tyers FM (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Transl 25(2):127–144

    Article  Google Scholar 

  32. Gage P (1994) A new algorithm for data compression. C Users J 12(2):23–38

    Google Scholar 

  33. Galuščáková P, Bojar O (2012) WMT 2011 testing set. http://hdl.handle.net/11858/00-097C-0000-0006-AADA-9, LINDAT/CLARIAH-CZ digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

  34. Goldsmith J (2001) Unsupervised learning of the morphology of a natural language. Comput Linguist 27(2):153–198

    MathSciNet  Article  Google Scholar 

  35. Goodfellow IJ, Mirza M, Xiao D, Courville A, Bengio Y (2014) An empirical investigation of catastrophic forgetting in gradient-based neural networks. In: Proceedings of international conference on learning representations (ICLR), Citeseer, https://arxiv.org/abs/1312.6211

  36. Graça M, Kim Y, Schamper J, Khadivi S, Ney H (2019) Generalizing back-translation in neural machine translation. In: Proceedings of the fourth conference on machine translation (volume 1: research papers), pp 45–52, https://arxiv.org/abs/1906.07286

  37. Grönroos SA, Virpioja S, Kurimo M (2018) Cognate-aware morphological segmentation for multilingual neural translation. In: Proceedings of the third conference on machine translation, Association for Computational Linguistics, Brussels

  38. Grönroos SA, Virpioja S, Kurimo M (2020) Morfessor EM+Prune: improved subword segmentation with expectation maximization and pruning. In: Proceedings of the 12th language resources and evaluation conference, ELRA, Marseilles

  39. Gu J, Wang Y, Chen Y, Cho K, Li VOK (2018) Meta-learning for low-resource neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 3622–3631, http://arxiv.org/abs/1808.08437

  40. Gulcehre C, Firat O, Xu K, Cho K, Barrault L, Lin HC, Bougares F, Schwenk H, Bengio Y (2015) On using monolingual corpora in neural machine translation. http://arxiv.org/abs/1503.03535

  41. Hammarström H, Borin L (2011) Unsupervised learning of morphology. Comput Linguist 37(2):309–350

    MathSciNet  Article  Google Scholar 

  42. Harris ZS (1955) From phoneme to morpheme. Language 31(2):190–222

    Article  Google Scholar 

  43. He D, Xia Y, Qin T, Wang L, Yu N, Liu TY, Ma WY (2016) Dual learning for machine translation. In: Proceedings of advances in neural information processing systems (NIPS), pp 820–828, http://arxiv.org/abs/1611.00179

  44. Iyyer M, Manjunatha V, Boyd-Graber J, Daumé III H (2015) Deep unordered composition rivals syntactic methods for text classification. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing (ACL-IJCNLP) (Volume 1: Long Papers), pp 1681–1691

  45. Johnson M, Schuster M, Le QV, Krikun M, Wu Y, Chen Z, Thorat N, Viégas F, Wattenberg M, Corrado G, Hughes M, Dean J (2017) Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans Assoc Comput Linguist 5:339–351

    Article  Google Scholar 

  46. Joshi M, Chen D, Liu Y, Weld DS, Zettlemoyer L, Levy O (2020) Spanbert: Improving pre-training by representing and predicting spans. Trans Assoc Comput Linguist 8:64–77

    Article  Google Scholar 

  47. Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing (EMNLP), pp 1700–1709

  48. Karakanta A, Dehdari J, van Genabith J (2018) Neural machine translation for low-resource languages without parallel corpora. Mach Transl 32(1–2):167–189

    Article  Google Scholar 

  49. Kiperwasser E, Ballesteros M (2018) Scheduled multi-task learning: from syntax to translation. Trans Assoc Comput Linguist 6:225–240

    Article  Google Scholar 

  50. Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) OpenNMT: Open-source toolkit for neural machine translation. In: Proceedings of the annual meeting of the association for computational linguistics (ACL). https://doi.org/10.18653/v1/P17-4012,arXiv: 1701.02810

  51. Kocmi T (2019) Exploring benefits of transfer learning in neural machine translation. PhD Thesis, Charles University

  52. Kocmi T, Bojar O (2018) Trivial transfer learning for low-resource neural machine translation. In: Proceedings of the third conference on machine translation (WMT): research papers, pp 244–252

  53. Koehn P (2005) Europarl: a parallel corpus for statistical machine translation. MT Summit 5:79–86

    Google Scholar 

  54. Kohonen O, Virpioja S, Lagus K (2010) Semi-supervised learning of concatenative morphology. In: Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology, association for computational linguistics, Uppsala, Sweden, pp 78–86, http://www.aclweb.org/anthology/W10-2210

  55. Koponen M, Salmi L, Nikulin M (2019) A product and process analysis of post-editor corrections on neural, statistical and rule-based machine translation output. Mach Transl 33(1–2):61–90

    Article  Google Scholar 

  56. Kreutzer J, Sokolov A (2018) Learning to segment inputs for NMT favors character-level processing. In: Proceedings of the 15th international workshop on spoken language translation (IWSLT), https://arxiv.org/abs/1810.01480

  57. Kudo T (2018) Subword regularization: Improving neural network translation models with multiple subword candidates. In: Proceedings of the 56th annual meeting of the association for computational linguistics (ACL) (Volume 1: Long Papers), pp 66–75, http://arxiv.org/abs/1804.10959

  58. Kudo T, Richardson J (2018) SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing. In: Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations, association for computational linguistics, Brussels, Belgium, pp 66–71, https://doi.org/10.18653/v1/D18-2012, https://www.aclweb.org/anthology/D18-2012

  59. Kurimo M, Virpioja S, Turunen V, Lagus K (2010) Morpho challenge 2005-2010: Evaluations and results. In: Heinz J, Cahill L, Wicentowski R (eds) Proceedings of the 11th meeting of the ACL special interest group on computational morphology and phonology, association for computational linguistics, Uppsala, Sweden, pp 87–95

  60. Lample G, Conneau A, Denoyer L, Ranzato M (2018a) Unsupervised machine translation using monolingual corpora only. In: International conference on learning representations (ICLR), http://arxiv.org/abs/1711.00043

  61. Lample G, Ott M, Conneau A, Denoyer L, Ranzato M (2018b) Phrase-based & neural unsupervised machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 5039–5049, https://www.aclweb.org/anthology/D18-1549.pdf

  62. Lee YS (2004) Morphological analysis for statistical machine translation. In: Proceedings of HLT-NAACL 2004: short papers, Association for Computational Linguistics, pp 57–60

  63. Lewis M, Liu Y, Goyal N, Ghazvininejad M, Mohamed A, Levy O, Stoyanov V, Zettlemoyer L (2019) BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. ArXiv:1910.13461 [cs.CL], arXiv:1910.13461

  64. Lison P, Tiedemann J (2016) OpenSubtitles2016: Extracting large parallel corpora from movie and tv subtitles. In: Proceedings of the 10th international conference on language resources and evaluation (LREC 2016), European Language Resources Association

  65. Luong MT (2016) Neural machine translation. PhD Thesis, Stanford University

  66. Luong MT, Le QV, Sutskever I, Vinyals O, Kaiser L (2015) Multi-task sequence to sequence learning. In: Proceedings of international conference on learning representations (ICLR), http://arxiv.org/abs/1511.06114

  67. McCloskey M, Cohen NJ (1989) Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of learning and motivation, vol 24, Elsevier, pp 109–165

  68. Mueller A, Nicolai G, McCarthy AD, Lewis D, Wu W, Yarowsky D (2020) An analysis of massively multilingual neural machine translation for low-resource languages. In: Proceedings of The 12th language resources and evaluation conference, pp 3710–3718

  69. Oflazer K, El-Kahlout ID (2007) Exploring different representational units in English-to-Turkish statistical machine translation. In: Proceedings of the second workshop on statistical machine translation, Association for Computational Linguistics, pp 25–32, https://doi.org/10.3115/1626355.1626359, http://portal.acm.org/citation.cfm?doid=1626355.1626359

  70. Östling R, Tiedemann J (2017) Neural machine translation for low-resource languages. ArXiv:1708.05729 [cs.CL], arXiv:1708.05729

  71. Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: 40th annual meeting of the association for computational linguistics, Association for Computational Linguistics, Philadelphia, pp 311–318, https://doi.org/10.3115/1073083.1073135, http://portal.acm.org/citation.cfm?doid=1073083.1073135

  72. Platanios EA, Sachan M, Neubig G, Mitchell T (2018) Contextual parameter generation for universal neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 425–435, https://www.aclweb.org/anthology/D18-1039

  73. Popović M (2015) chrF: character n-gram F-score for automatic MT evaluation. In: Proceedings of the 10th workshop on statistical machine translation (WMT), association for computational linguistics, pp 392–395, https://doi.org/10.18653/v1/W15-3049, http://aclweb.org/anthology/W15-3049

  74. Ramachandran P, Liu PJ, Le Q (2017) Unsupervised pretraining for sequence to sequence learning. In: Proceedings of the 2017 conference on empirical methods in natural language processing (EMNLP), pp 383–391

  75. Rissanen J (1989) Stochastic complexity in statistical inquiry, vol 15. World Scientific Series in Computer Science, Singapore

    Google Scholar 

  76. Sachan DS, Neubig G (2018) Parameter sharing methods for multilingual self-attentional translation models. In: Proceedings of the third conference on machine translation (WMT): research papers, pp 261–271, https://www.aclweb.org/anthology/W18-6327

  77. Salesky E, Runge A, Coda A, Niehues J, Neubig G (2020) Optimizing segmentation granularity for neural machine translation. Mach Transl pp 1–19

  78. Scott SL (2002) Bayesian methods for hidden Markov models: recursive computing in the 21st century. J Am Stat Assoc 97(457):337–351

    MathSciNet  Article  Google Scholar 

  79. Sennrich R, Zhang B (2019) Revisiting low-resource neural machine translation: a case study. In: Proceedings of the 57th annual meeting of the association for computational linguistics, pp 211–221

  80. Sennrich R, Haddow B, Birch A (2015) Neural machine translation of rare words with subword units. In: Proceedings of the annual meeting of the association for computational linguistics (ACL), http://arxiv.org/abs/1508.07909

  81. Sennrich R, Haddow B, Birch A (2016) Improving neural machine translation models with monolingual data. In: Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 86–96

  82. Skorokhodov I, Rykachevskiy A, Emelyanenko D, Slotin S, Ponkratov A (2018) Semi-supervised neural machine translation with language models. In: Proceedings of the AMTA 2018 workshop on technologies for MT of low resource languages (LoResMT 2018), pp 37–44

  83. Song K, Tan X, Qin T, Lu J, Liu TY (2019) Mass: Masked sequence to sequence pre-training for language generation. In: Proceedings of the international conference on machine learning (ICML), pp 5926–5936

  84. Sriram A, Jun H, Satheesh S, Coates A (2017) Cold fusion: Training seq2seq models together with language models. In: Proceedings of the Interspeech 2018, https://arxiv.org/abs/1708.06426

  85. Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  MATH  Google Scholar 

  86. Stahlberg F, Cross J, Stoyanov V (2018) Simple fusion: Return of the language model. In: Proceedings of the third conference on machine translation (WMT): research papers, pp 204–211

  87. Steinberger R, Pouliquen B, Widiger A, Ignat C, Erjavec T, Tufiş D, Varga D (2006) The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the 5th international conference on language resources and evaluation (LREC’2006), Genoa, Italy

  88. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of advances in neural information processing systems (NIPS), pp 3104–3112

  89. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826, http://arxiv.org/abs/1512.00567

  90. Thompson B, Khayrallah H, Anastasopoulos A, McCarthy AD, Duh K, Marvin R, McNamee P, Gwinnup J, Anderson T, Koehn P (2018) Freezing subnetworks to analyze domain adaptation in neural machine translation. In: Proceedings of the third conference on machine translation (WMT): research papers, pp 124–132

  91. Tiedemann J (2009) Character-based PSMT for closely related languages. In: Proceedings of the 13th conference of the european association for machine translation (EAMT 2009), pp 12–19

  92. Toral A, Sánchez-Cartagena VM (2017) A multifaceted evaluation of neural versus phrase-based machine translation for 9 language directions. In: Proceedings of the 15th conference of the European chapter of the association for computational linguistics (EACL): Volume 1, Long Papers, pp 1063–1073, https://www.aclweb.org/anthology/E17-1100

  93. Torrey L, Shavlik J (2009) Transfer learning. In: Olivas ES (ed) Handbook of research on machine learning applications and trends: algorithms, methods, and techniques: algorithms, methods, and techniques, IGI Global, pp 242–264

  94. Tu Z, Liu Y, Shang L, Liu X, Li H (2017) Neural machine translation with reconstruction. In: Thirty-first AAAI conference on artificial intelligence, http://arxiv.org/abs/1611.01874

  95. Vaibhav V, Singh S, Stewart C, Neubig G (2019) Improving robustness of machine translation with synthetic noise. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 1916–1920, https://www.aclweb.org/anthology/N19-1190/

  96. Varjokallio M, Kurimo M, Virpioja S (2013) Learning a subword vocabulary based on unigram likelihood. In: Proceedings of the 2013 IEEE workshop on automatic speech recognition and understanding (ASRU), IEEE, Olomouc, Czech Republic, pp 7–12, https://doi.org/10.1109/ASRU.2013.6707697, http://ieeexplore.ieee.org/document/6707697/

  97. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 6000–6010, http://arxiv.org/abs/1706.03762

  98. Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th international conference on machine learning (ICML), pp 1096–1103

  99. Virpioja S, Väyrynen JJ, Creutz M, Sadeniemi M (2007) Morphology-aware statistical machine translation based on morphs induced in an unsupervised manner. Machine Translation Summit XI, Copenhagen, Denmark 2007:491–498

  100. Virpioja S, Turunen VT, Spiegler S, Kohonen O, Kurimo M (2011) Empirical comparison of evaluation methods for unsupervised learning of morphology. Traitement Automatique des Langues 52(2):45–90, http://www.atala.org/Empirical-Comparison-of-Evaluation

  101. Virpioja S, Smit P, Grönroos SA, Kurimo M (2013) Morfessor 2.0: Python implementation and extensions for Morfessor Baseline. Report 25/2013 in Aalto University publication series SCIENCE + TECHNOLOGY, Department of Signal Processing and Acoustics, Aalto University, Helsinki, Finland

  102. Wang X, Pham H, Dai Z, Neubig G (2018) Switchout: an efficient data augmentation algorithm for neural machine translation. In: Proceedings of the 2018 conference on empirical methods in natural language processing (EMNLP), pp 856–861, https://www.aclweb.org/anthology/D18-1100

  103. Yang Z, Chen W, Wang F, Xu B (2018) Unsupervised neural machine translation with weight sharing. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), pp 46–55

  104. Zhang J, Zong C (2016) Exploiting source-side monolingual data in neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP), pp 1535–1545

  105. Zoph B, Yuret D, May J, Knight K (2016) Transfer learning for low-resource neural machine translation. In: Proceedings of the 2016 conference on empirical methods in natural language processing (EMNLP), pp 1568–1575

Download references

Acknowledgements

This study has been supported by the MeMAD project, funded by the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 780069), and the FoTran project, funded by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 771113). Computer resources within the Aalto University School of Science “Science-IT” project were used.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Stig-Arne Grönroos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Electronic supplementary material 1 (TXT 8 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Grönroos, SA., Virpioja, S. & Kurimo, M. Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation. Machine Translation 34, 251–286 (2020). https://doi.org/10.1007/s10590-020-09253-x

Download citation

Keywords

  • Low-resource languages
  • Multilingual machine translation
  • Transfer learning
  • Multi-task learning
  • Denoising sequence autoencoder
  • Subword segmentation