Skip to main content

Enhancing Pivot Translation Using Grammatical and Morphological Information

  • Conference paper
  • First Online:
Computational Linguistics (PACLING 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 781))

  • 847 Accesses

Abstract

Pivot translation can be one of the solutions to overcome the problem of unavailable large bilingual corpora for training statistical machine translation models. Nevertheless, the conventional pivot method, which connect source to target phrases via common pivot phrases, lacks some potential connections when pivoting via the surface form of pivot phrases. In this work, we improve the pivot translation method by integrating grammatical and morphological information to connect pivot phrases instead of using only the surface form. Experiments were conducted on several Southeast Asian low-resource language pairs: Indonesian-Vietnamese, Malay-Vietnamese, and Filipino-Vietnamese. By integrating grammatical and morphological information, the proposed method achieved a significant improvement of 0.5 BLEU points. This showed the effectiveness of integrating grammatical and morphological features to pivot translation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/nguyenlab/pivot-linguistics.

  2. 2.

    https://wit3.fbk.eu/.

References

  1. Cettolo, M., Girardi, C., Federico, M.: WIT3: web inventory of transcribed and translated talks. In: Proceedings of EAMT, pp. 261–268 (2012)

    Google Scholar 

  2. Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of HLT/NAACL, pp. 427–436. Association for Computational Linguistics (2012)

    Google Scholar 

  3. Chu, C., Nakazawa, T., Kurohashi, S.: Constructing a Chinese-Japanese parallel corpus from Wikipedia. In: Proceedings of LREC, pp. 642–647 (2014)

    Google Scholar 

  4. Cohn, T., Lapata, M.: Machine translation by triangulation: making effective use of multi-parallel corpora. In: Proceedings of ACL, pp. 728–735. Association for Computational Linguistics, June 2007

    Google Scholar 

  5. De Gispert, A., Marino, J.B.: Catalan-English statistical machine translation without parallel corpus: bridging through Spanish. In: Proceedings of LREC, pp. 65–68. Citeseer (2006)

    Google Scholar 

  6. El Kholy, A., Habash, N., Leusch, G., Matusov, E., Sawaf, H.: Language independent connectivity strength features for phrase pivot statistical machine translation. In: Proceedings of ACL, pp. 412–418. Association for Computational Linguistics (2013)

    Google Scholar 

  7. Heafield, K.: KenLM: Faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197. Association for Computational Linguistics (2011)

    Google Scholar 

  8. Hewavitharana, S., Vogel, S.: Extracting parallel phrases from comparable data. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora, pp. 191–204. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-20128-8_10

    Chapter  Google Scholar 

  9. Hoang, D.T., Bojar, O.: Tmtriangulate: a tool for phrase table triangulation. Prague Bull. Math. Linguist. 104(1), 75–86 (2015)

    Google Scholar 

  10. Irvine, A.: Statistical machine translation in low resource settings. In: Proceedings of HLT/NAACL, pp. 54–61. Association for Computational Linguistics (2013)

    Google Scholar 

  11. Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 388–395 (2004)

    Google Scholar 

  12. Koehn, P., Hoang, H.: Factored translation models. In: EMNLP-CoNLL, pp. 868–876 (2007)

    Google Scholar 

  13. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, pp. 177–180. Association for Computational Linguistics (2007)

    Google Scholar 

  14. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford coreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)

    Google Scholar 

  15. Nuhn, M., Mauser, A., Ney, H.: Deciphering foreign language by combining language models and context vectors. In: Proceedings of ACL, pp. 156–164. Association for Computational Linguistics (2012)

    Google Scholar 

  16. Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)

    Article  MATH  Google Scholar 

  17. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  18. Ravi, S., Knight, K.: Deciphering foreign language. In: Proceedings of ACL: Human Language Technologies-Volume 1, pp. 12–21. Association for Computational Linguistics (2011)

    Google Scholar 

  19. Saluja, A., Hassan, H., Toutanova, K., Quirk, C.: Graph-based semi-supervised learning of translation models from monolingual data. In: Proceedings of ACL. pp. 676–686. Association for Computational Linguistics (2014)

    Google Scholar 

  20. Sennrich, R.: Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of EAMT, pp. 539–549 (2012)

    Google Scholar 

  21. Smith, J.R., Quirk, C., Toutanova, K.: Extracting parallel sentences from comparable corpora using document level alignment. In: Proceedings of HLT/NAACL, pp. 403–411. Association for Computational Linguistics (2010)

    Google Scholar 

  22. Thu, Y.K., Pa, W.P., Utiyama, M., Finch, A., Sumita, E.: Introducing the Asian Language Treebank (ALT). In: Proceedings of LREC, pp. 1574–1578 (2016)

    Google Scholar 

  23. Utiyama, M., Isahara, H.: A comparison of pivot methods for phrase-based statistical machine translation. In: Proceedings of HLT/NAACL, pp. 484–491. Association for Computational Linguistics (April 2007)

    Google Scholar 

  24. Wang, P., Nakov, P., Ng, H.T.: Source language adaptation approaches for resource-poor machine translation. Comput. Linguist. 42, 277–306 (2016)

    Article  MathSciNet  Google Scholar 

  25. Wu, H., Wang, H.: Pivot language approach for phrase-based statistical machine translation. In: Proceedings of ACL, pp. 856–863. Association for Computational Linguistics, June 2007

    Google Scholar 

  26. Zhu, X., He, Z., Wu, H., Wang, H., Zhu, C., Zhao, T.: Improving pivot-based statistical machine translation using random walk. In: Proceedings of EMNLP, pp. 524–534. Association for Computational Linguistics, October 2013

    Google Scholar 

  27. Zhu, X., He, Z., Wu, H., Zhu, C., Wang, H., Zhao, T.: Improving pivot-based statistical machine translation by pivoting the co-occurrence count of phrase pairs. In: Proceedings of EMNLP, pp. 1665–1675. Association for Computational Linguistics (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai-Long Trieu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Trieu, HL., Nguyen, LM. (2018). Enhancing Pivot Translation Using Grammatical and Morphological Information. In: Hasida, K., Pa, W. (eds) Computational Linguistics. PACLING 2017. Communications in Computer and Information Science, vol 781. Springer, Singapore. https://doi.org/10.1007/978-981-10-8438-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8438-6_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8437-9

  • Online ISBN: 978-981-10-8438-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics