Advertisement

Pivot Machine Translation Using Chinese as Pivot Language

  • Chao-Hong LiuEmail author
  • Catarina Cruz Silva
  • Longyue Wang
  • Andy Way
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 954)

Abstract

Pivoting through a popular language with more parallel corpora available (e.g. English and Chinese) is a common approach to build machine translation (MT) systems for low-resource languages. For example, to build a Russian-to-Spanish MT system, we could build one system using the Russian–Spanish corpus directly. We could also build two systems, Russian-to-English and English-to-Spanish, as the resources of the two language pairs are much larger than the Russian–Spanish pair, and use them cascadingly to translate texts in Russian into Spanish by pivoting through English. There are, however, some confusing results on the Pivot MT approach in the literature. In this paper, we reviewed the performance of Pivot MT with the United Nations Parallel Corpus v1.0 (UN6Way) using both English and Chinese as pivot languages. We also report our system performance on the CWMT 2018 Pivot MT shared task, where Japanese patent sentences are translated into English using Chinese as the pivot language.

Keywords

Pivot MT Pivot language Patent MT 

Notes

Acknowledgements

The ADAPT Centre for Digital Content Technology is funded under the SFI Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund. This work has partially received funding from the European Union’s Horizon 2020 Research and Innovation programme under the Marie Skłodowska-Curie Actions (Grant No. 734211; the EU INTERACT project).

References

  1. 1.
    Chen, Y., Liu, Y., Cheng, Y., Li, V.O.: A teacher-student framework for zero-resource neural machine translation. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1925–1935 (2017)Google Scholar
  2. 2.
    Cheng, Y., Yang, Q., Liu, Y., Sun, M., Xu, W.: Joint training for pivot-based neural machine translation. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), Melbourne, Australia, pp. 3974–3980 (2017)Google Scholar
  3. 3.
    Collins, M., Koehn, P., Kucerova, I.: Clause restructuring for statistical machine translation. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, Ann Arbor, Michigan, USA, pp. 531–540 (2005)Google Scholar
  4. 4.
    Eisele, A., Chen, Y.: MultiUN: a multilingual corpus from united nation documents. In: Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC 2010), Malta, pp. 2868–2872 (2010)Google Scholar
  5. 5.
    Firat, O., Cho, K., Sankaran, B., Vural, F.T.Y., Bengio, Y.: Multi-way, multilingual neural machine translation. Comput. Speech Lang. 45, 236–252 (2017)CrossRefGoogle Scholar
  6. 6.
    Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017)CrossRefGoogle Scholar
  7. 7.
    Junczys-Dowmunt, M., Dwojak, T., Hoang, H.: Is neural machine translation ready for deployment? A case study on 30 translation directions. In: Proceedings of the 9th International Workshop on Spoken Language Translation (IWSLT), Seattle, WA, pp. 1–8 (2016)Google Scholar
  8. 8.
    Koehn, P., Birch, A., Steinberger, R.: 462 machine translation systems for Europe. In: Proceedings of the Twelfth Machine Translation Summit, Denver, Colorado, USA, pp. 65–72 (2009)Google Scholar
  9. 9.
    Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, Prague, Czech Republic, pp. 177–180 (2007)Google Scholar
  10. 10.
    Lavie, A., Agarwal, A.: METEOR: an automatic metric for MT evaluation with high levels of correlation with human judgments. In: Proceedings of the Second Workshop on Statistical Machine Translation, StatMT 2007, Prague, Czech Republic, pp. 228–231 (2007)Google Scholar
  11. 11.
    Liu, S., Wang, L., Liu, C.H.: Chinese-Portuguese machine translation: a study on building parallel corpora from comparable texts. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, pp. 1485–1494 (2018)Google Scholar
  12. 12.
    Miura, A., Neubig, G., Sudoh, K., Nakamura, S.: Tree as a pivot: syntactic matching methods in pivot translation. In: Proceedings of the Second Conference on Machine Translation, Volume 1: Research Paper, Copenhagen, Denmark, pp. 90–98 (2017)Google Scholar
  13. 13.
    Och, F.J.: Minimum error rate training in statistical machine translation. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan, pp. 160–167 (2003)Google Scholar
  14. 14.
    Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)CrossRefGoogle Scholar
  15. 15.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, PA, USA, pp. 311–318 (2002)Google Scholar
  16. 16.
    Sennrich, R., et al.: Nematus: a toolkit for neural machine translation. In: Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 65–68 (2017)Google Scholar
  17. 17.
    Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2016, 7–12 August 2016, Berlin, Germany, pp. 1715–1725 (2016)Google Scholar
  18. 18.
    Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of the 7th Biennial Conference of the Association for Machine Translation in the Americas (AMTA-2006), Cambridge, Massachusetts, USA, pp. 223–231 (2006)Google Scholar
  19. 19.
    Steinberger, R., et al.: The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. In: Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC-2006), Genoa, Italy, pp. 2142–2147 (2006)Google Scholar
  20. 20.
    Stolcke, A.: SRILM - an extensible language modeling toolkit. In: Proceedings of the 7th International Conference on Spoken Language Processing, Colorado, USA, pp. 901–904 (2002)Google Scholar
  21. 21.
    Utiyama, M., Isahara, H.: A comparison of pivot methods for phrase-based statistical machine translation. In: Proceedings of Human Language Technologies, The Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2007), Rochester, USA, pp. 484–491 (2007)Google Scholar
  22. 22.
    Wang, M.H., Lei, C.L.: Boosting election prediction accuracy by crowd wisdom on social forums. In: 2016 13th IEEE Annual Consumer Communications & Networking Conference (CCNC), pp. 348–353. IEEE, Las Vegas (2016)Google Scholar
  23. 23.
    Wu, H., Wang, H.: Pivot language approach for phrase-based statistical machine translation. Mach. Transl. 21(3), 165–181 (2007)CrossRefGoogle Scholar
  24. 24.
    Zhang, J., et al.: THUMT: an open source toolkit for neural machine translation. arXiv preprint arXiv:1706.06415 (2017)
  25. 25.
    Zhu, X., He, Z., Wu, H., Wang, H., Zhu, C., Zhao, T.: Improving pivot-based statistical machine translation using random walk. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, USA, pp. 524–534 (2013)Google Scholar
  26. 26.
    Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The united nations parallel corpus v1.0. In: Proceedings of The International Conference on Language Resources and Evaluation (LREC), Portorož, Slovenia, pp. 1–5 (2016)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Chao-Hong Liu
    • 1
    Email author
  • Catarina Cruz Silva
    • 2
  • Longyue Wang
    • 1
  • Andy Way
    • 1
  1. 1.ADAPT CentreDublin City UniversityDublinIreland
  2. 2.UnbabelLisbonPortugal

Personalised recommendations