Abstract
Pivot translation can be one of the solutions to overcome the problem of unavailable large bilingual corpora for training statistical machine translation models. Nevertheless, the conventional pivot method, which connect source to target phrases via common pivot phrases, lacks some potential connections when pivoting via the surface form of pivot phrases. In this work, we improve the pivot translation method by integrating grammatical and morphological information to connect pivot phrases instead of using only the surface form. Experiments were conducted on several Southeast Asian low-resource language pairs: Indonesian-Vietnamese, Malay-Vietnamese, and Filipino-Vietnamese. By integrating grammatical and morphological information, the proposed method achieved a significant improvement of 0.5 BLEU points. This showed the effectiveness of integrating grammatical and morphological features to pivot translation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cettolo, M., Girardi, C., Federico, M.: WIT3: web inventory of transcribed and translated talks. In: Proceedings of EAMT, pp. 261–268 (2012)
Cherry, C., Foster, G.: Batch tuning strategies for statistical machine translation. In: Proceedings of HLT/NAACL, pp. 427–436. Association for Computational Linguistics (2012)
Chu, C., Nakazawa, T., Kurohashi, S.: Constructing a Chinese-Japanese parallel corpus from Wikipedia. In: Proceedings of LREC, pp. 642–647 (2014)
Cohn, T., Lapata, M.: Machine translation by triangulation: making effective use of multi-parallel corpora. In: Proceedings of ACL, pp. 728–735. Association for Computational Linguistics, June 2007
De Gispert, A., Marino, J.B.: Catalan-English statistical machine translation without parallel corpus: bridging through Spanish. In: Proceedings of LREC, pp. 65–68. Citeseer (2006)
El Kholy, A., Habash, N., Leusch, G., Matusov, E., Sawaf, H.: Language independent connectivity strength features for phrase pivot statistical machine translation. In: Proceedings of ACL, pp. 412–418. Association for Computational Linguistics (2013)
Heafield, K.: KenLM: Faster and smaller language model queries. In: Proceedings of the Sixth Workshop on Statistical Machine Translation, pp. 187–197. Association for Computational Linguistics (2011)
Hewavitharana, S., Vogel, S.: Extracting parallel phrases from comparable data. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora, pp. 191–204. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-20128-8_10
Hoang, D.T., Bojar, O.: Tmtriangulate: a tool for phrase table triangulation. Prague Bull. Math. Linguist. 104(1), 75–86 (2015)
Irvine, A.: Statistical machine translation in low resource settings. In: Proceedings of HLT/NAACL, pp. 54–61. Association for Computational Linguistics (2013)
Koehn, P.: Statistical significance tests for machine translation evaluation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 388–395 (2004)
Koehn, P., Hoang, H.: Factored translation models. In: EMNLP-CoNLL, pp. 868–876 (2007)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of ACL, pp. 177–180. Association for Computational Linguistics (2007)
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J.R., Bethard, S., McClosky, D.: The stanford coreNLP natural language processing toolkit. In: ACL (System Demonstrations), pp. 55–60 (2014)
Nuhn, M., Mauser, A., Ney, H.: Deciphering foreign language by combining language models and context vectors. In: Proceedings of ACL, pp. 156–164. Association for Computational Linguistics (2012)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Comput. Linguist. 29(1), 19–51 (2003)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of ACL, pp. 311–318. Association for Computational Linguistics (2002)
Ravi, S., Knight, K.: Deciphering foreign language. In: Proceedings of ACL: Human Language Technologies-Volume 1, pp. 12–21. Association for Computational Linguistics (2011)
Saluja, A., Hassan, H., Toutanova, K., Quirk, C.: Graph-based semi-supervised learning of translation models from monolingual data. In: Proceedings of ACL. pp. 676–686. Association for Computational Linguistics (2014)
Sennrich, R.: Perplexity minimization for translation model domain adaptation in statistical machine translation. In: Proceedings of EAMT, pp. 539–549 (2012)
Smith, J.R., Quirk, C., Toutanova, K.: Extracting parallel sentences from comparable corpora using document level alignment. In: Proceedings of HLT/NAACL, pp. 403–411. Association for Computational Linguistics (2010)
Thu, Y.K., Pa, W.P., Utiyama, M., Finch, A., Sumita, E.: Introducing the Asian Language Treebank (ALT). In: Proceedings of LREC, pp. 1574–1578 (2016)
Utiyama, M., Isahara, H.: A comparison of pivot methods for phrase-based statistical machine translation. In: Proceedings of HLT/NAACL, pp. 484–491. Association for Computational Linguistics (April 2007)
Wang, P., Nakov, P., Ng, H.T.: Source language adaptation approaches for resource-poor machine translation. Comput. Linguist. 42, 277–306 (2016)
Wu, H., Wang, H.: Pivot language approach for phrase-based statistical machine translation. In: Proceedings of ACL, pp. 856–863. Association for Computational Linguistics, June 2007
Zhu, X., He, Z., Wu, H., Wang, H., Zhu, C., Zhao, T.: Improving pivot-based statistical machine translation using random walk. In: Proceedings of EMNLP, pp. 524–534. Association for Computational Linguistics, October 2013
Zhu, X., He, Z., Wu, H., Zhu, C., Wang, H., Zhao, T.: Improving pivot-based statistical machine translation by pivoting the co-occurrence count of phrase pairs. In: Proceedings of EMNLP, pp. 1665–1675. Association for Computational Linguistics (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Trieu, HL., Nguyen, LM. (2018). Enhancing Pivot Translation Using Grammatical and Morphological Information. In: Hasida, K., Pa, W. (eds) Computational Linguistics. PACLING 2017. Communications in Computer and Information Science, vol 781. Springer, Singapore. https://doi.org/10.1007/978-981-10-8438-6_12
Download citation
DOI: https://doi.org/10.1007/978-981-10-8438-6_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8437-9
Online ISBN: 978-981-10-8438-6
eBook Packages: Computer ScienceComputer Science (R0)