Neural Machine Translation by Generating Multiple Linguistic Factors

  • Mercedes García-MartínezEmail author
  • Loïc Barrault
  • Fethi Bougares
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10583)


Factored neural machine translation (FNMT) is founded on the idea of using the morphological and grammatical decomposition of the words (factors) at the output side of the neural network. This architecture addresses two well-known problems occurring in MT, namely the size of target language vocabulary and the number of unknown tokens produced in the translation. FNMT system is designed to manage larger vocabulary and reduce the training time (for systems with equivalent target language vocabulary size). Moreover, we can produce grammatically correct words that are not part of the vocabulary. FNMT model is evaluated on IWSLT’15 English to French task and compared to the baseline word-based and BPE-based NMT systems. Promising qualitative and quantitative results (in terms of BLEU and METEOR) are reported.


Machine translation Neural networks Deep learning Factored representation 



This work was partially funded by the French National Research Agency (ANR) through the CHIST-ERA M2CR project, under the contract number ANR-15-CHR2-0006-01.


  1. 1.
    Alexandrescu, A.: Factored neural language models. In: HLT-NAACL (2006)Google Scholar
  2. 2.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. CoRR abs/1409.0473 (2014)Google Scholar
  3. 3.
    Caglayan, O., García-Martínez, M., Bardet, A., Aransa, W., Bougares, F., Barrault, L.: NMTPY: a flexible toolkit for advanced neural machine translation systems. arXiv preprint arXiv:1706.00457 (2017)
  4. 4.
    Cettolo, M., Girardi, C., Federico, M.: WIT\(^3\): web inventory of transcribed and translated talks. In: Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), Trento, Italy, pp. 261–268, May 2012Google Scholar
  5. 5.
    Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014)Google Scholar
  6. 6.
    Chung, J., Cho, K., Bengio, Y.: A character-level decoder without explicit segmentation for neural machine translation. CoRR abs/1603.06147 (2016)Google Scholar
  7. 7.
    Costa-Jussà, M.R., Fonollosa, J.A.R.: Character-based neural machine translation. CoRR abs/1603.00810 (2016)Google Scholar
  8. 8.
    Firat, O., Cho, K.: Conditional gated recurrent unit with attention mechanism (2016).
  9. 9.
    García-Martínez, M., Barrault, L., Bougares, F.: Factored neural machine translation architectures. In: Proceedings of the International Workshop on Spoken Language Translation, IWSLT 2016, Seattle, USA (2016).
  10. 10.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS 2010). Society for Artificial Intelligence and Statistics (2010)Google Scholar
  11. 11.
    Jean, S., Cho, K., Memisevic, R., Bengio, Y.: On using very large target vocabulary for neural machine translation. CoRR abs/1412.2007 (2014)Google Scholar
  12. 12.
    Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL 2007, pp. 177–180. Association for Computational Linguistics, Stroudsburg (2007)Google Scholar
  13. 13.
    Lavie, A., Agarwal, A.: METEOR: an automatic metric for mt evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, StatMT 2007, pp. 228–231. Association for Computational Linguistics, Stroudsburg (2007)Google Scholar
  14. 14.
    Le, H.S., Oparin, I., Messaoudi, A., Allauzen, A., Gauvain, J.L., Yvon, F.: Large vocabulary SOUL neural network language models. In: INTERSPEECH (2011). sources/Le11large.pdfGoogle Scholar
  15. 15.
    Ling, W., Trancoso, I., Dyer, C., Black, A.W.: Character-based neural machine translation. CoRR abs/1511.04586 (2015)Google Scholar
  16. 16.
    Luong, T., Sutskever, I., Le, Q.V., Vinyals, O., Zaremba, W.: Addressing the rare word problem in neural machine translation. CoRR abs/1410.8206 (2014).
  17. 17.
    Nasr, A., Béchet, F., Rey, J.F., Favre, B., Roux, J.L.: MACAON, an NLP tool suite for processing word lattices. In: Proceedings of the ACL-HLT 2011 System Demonstrations, pp. 86–91 (2011)Google Scholar
  18. 18.
    Niehues, J., Ha, T.L., Cho, E., Waibel, A.: Using factored word representation in neural network language models. In: Proceedings of the First Conference on Machine Translation, pp. 74–82. Association for Computational Linguistics, Berlin, August 2016Google Scholar
  19. 19.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL 2002, Stroudsburg, PA, USA, pp. 311–318 (2002)Google Scholar
  20. 20.
    Pascanu, R., Mikolov, T., Bengio, Y.: Understanding the exploding gradient problem. CoRR abs/1211.5063 (2012)Google Scholar
  21. 21.
    Rousseau, A.: XenC: an open-source tool for data selection in natural language processing. Prague Bull. Math. Linguist. 100, 73–82 (2013)CrossRefGoogle Scholar
  22. 22.
    Sennrich, R.: How grammatical is character-level neural machine translation? Assessing MT quality with contrastive translation pairs. CoRR abs/1612.04629 (2016)Google Scholar
  23. 23.
    Sennrich, R., Haddow, B.: Linguistic input features improve neural machine translation. CoRR abs/1606.02892 (2016)Google Scholar
  24. 24.
    Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Long Papers, vol. 1, pp. 1715–1725. Association for Computational Linguistics (2016)Google Scholar
  25. 25.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014)Google Scholar
  26. 26.
    Wu, Y., Yamamoto, H., Lu, X., Matsuda, S., Hori, C., Kashioka, H.: Factored recurrent neural network language model in TED lecture transcription. In: IWSLT (2012)Google Scholar
  27. 27.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Mercedes García-Martínez
    • 1
    Email author
  • Loïc Barrault
    • 1
  • Fethi Bougares
    • 1
  1. 1.LIUMLe Mans UniversityLe MansFrance

Personalised recommendations