A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine
- 79 Downloads
- 2 Citations
Abstract
This study addresses the integration and incorporation of rich additional information into the phrase-based approach, aptly called factored translation, which is an extension of phrase-based statistic machine translation (PBSMT). This approach was proven successful when translating English into a morphologically rich language. PBSMT represents the baseline of this work. We extend the phrase-based translation approach by integrating additional linguistic knowledge, namely part-of-speech (POS) tags, to create a factored model. The main contribution of this study is the creation of a new approach for Arabic–English translation via the injection of the factored model into Combinatory Categorial Grammar (CCG) supertags to form an integrated model (POS + CCG). The system was trained on a freely available multi-UN corpus on Arabic–English language pairs. Moses decoder, which is an open-source factored SMT system, was used to integrate these data into the target language model and the target side of the translation model. Results showed improvements to the BLEU automatic score via various high n-gram language models (LMs). The integration of the featured factors (POS + CCG) of the translation has been successfully tested. Overall, the 3-, 5-, 7-, and 9-g LM evaluation with BLEU scores proved that our integrated model performed better than PBSMT. Compared with three other models (PBSMT, POS, and CCG models), the integrated model improved the translation quality by 1.54, 1.29, and 0.21 %, respectively, over the 3-g LM.
Keywords
Statistical machine translation Phrase-based translation model Combinatory Categorial Grammar Part-of-speech Factored translation modelPreview
Unable to display preview. Download preview PDF.
References
- 1.Tripathi S., Sarkhel J.K.: Approaches to machine translation. Ann. Libr. Inf. Stud. 57, 388–393 (2010)Google Scholar
- 2.Koehn P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)CrossRefzbMATHGoogle Scholar
- 3.Mehay, D.N.; Brew, C.: CCG syntactic reordering models for phrase-based machine translation. In: Proceedings of the Seventh Workshop on Statistical Machine Translation ACL, pp. 210–221 (2012)Google Scholar
- 4.Koehn, P.; Och, F.J.; Marcu, D.: Statistical phrase-based translation. In: Proceedings of NAACL-HLT. ACL, pp. 48–54 (2003)Google Scholar
- 5.Hassan H., Sima’an K., Way A.: Syntactically lexicalized phrase-based SMT. IEEE Trans. Audio Speech Lang. Process. 16(7), 1260–1273 (2008)CrossRefGoogle Scholar
- 6.Steedman M.: The Syntactic Process. MIT Press, Cambridge (2000)zbMATHGoogle Scholar
- 7.Koehn, P.; Hoang, H.: Factored translation models. In: EMNLP-CoNLL, pp. 868–876 (2007)Google Scholar
- 8.Hassan, H.; Sima’an, K.; Way, A.: A syntactic language model based on incremental CCG parsing. In: Spoken Language Technology Workshop, IEEE, pp. 205–208 (2008)Google Scholar
- 9.Almaghout, H.; Jiang, J., Way, A.: Extending CCG-based syntactic constraints in hierarchical phrase-based SMT. In: Proceedings of the Annual Conference of the European Association for MT (EAMT), pp. 193–200 (2012)Google Scholar
- 10.Koehn, P.; Hoang, H.; Birch, A.; Callison-Burch, C.; Federico, M.; Bertoldi, N.; Cowan, B.; Shen, W.; Moran, C.; Zens, R.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. ACL, pp. 177–180 (2007)Google Scholar
- 11.Bojar, O.: English-to-Czech factored machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation. ACL, pp. 232–239 (2007)Google Scholar
- 12.Huet, S.; Manishina, E.; Lefèvre, F.: Factored machine translation systems for Russian-English. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, pp. 152–155 (2013)Google Scholar
- 13.de Medeiros Caseli, H.; Nunes, I.A.: Factored Translation between Brazilian Portuguese and English. In: SBIA, pp. 163–172. Springer (2010)Google Scholar
- 14.Almaghout, H.; Jiang, J., Way, A.: CCG augmented hierarchical phrase-based machine-translation. In: Proceedings of the 7th International Workshop on Spoken Language Translatiopn (2010)Google Scholar
- 15.Almaghout, H.; Jiang, J., Way, A.: CCG contextual labels in hierarchical phrase-based SMT. In: Proceedings of EAMT, pp. 281–288 (2011)Google Scholar
- 16.Birch, A.; Osborne, M.; Koehn, P.: CCG supertags in factored statistical machine translation. In: Proceedings of the Second Workshop on SMT. ACL, pp. 9–16 (2007)Google Scholar
- 17.Mustafa S.H.: Character contiguity in N-gram-based word matching: the case for Arabic text searching. Inf. Process. Manag. 41(4), 819–827 (2005)CrossRefGoogle Scholar
- 18.Clark S., Curran J.R.: Wide-coverage efficient statistical parsing with CCG and log-linear models. Comput. Linguist. 33(4), 493–552 (2007)CrossRefzbMATHGoogle Scholar
- 19.Curran, J.R.; Clark, S.; Vadas, D.: Multi-tagging for lexicalized-grammar parsing. In: Proceedings of the 21st International Conference on Computational Linguistics ACL, pp. 697–704 (2006)Google Scholar
- 20.Hockenmaier, J.; Steedman, M.: CCGbank: User’s Manual. Technical Reports (CIS). Paper 52. Department of Computer & Information Science, University of Pennsylvania, Philadelphia (2005). http://repository.upenn.edu/cgi/viewcontent.cgi?article=1054&context=cis_reports
- 21.Hassan, H.; Sima’an, K.; Way, A.: Supertagged phrase-based statistical machine translation. In: Proceedings of the ACL (2007)Google Scholar
- 22.Boxwell, S.A.; Brew, C.: A Pilot Arabic CCGbank. In: Proceedings of the Seventh International Conference on LREC-10 (2010)Google Scholar
- 23.El-taher A.I., Bakr H.M.A., Zidan I., Shaalan K.: An Arabic CCG approach for determining constituent types from Arabic Treebank. J. King Saud Univ. Comput. Info. Sci. 26(4), 441–449 (2014)Google Scholar
- 24.Kaeshammer, M.; Wetzel, D.: Enriching phrase-based statistical machine translation with POS information. In: RANLP Student Research Workshop, pp. 33–40 (2011)Google Scholar
- 25.Tian, L.;Wong, D.F.; Chao, L.S.; Oliveira, F.: A relationship: word alignment, phrase table, and translation quality. Sci.World J. 2014, 438106 (2014). doi: 10.1155/2014/438106
- 26.Clark, S.; Curran, J.R.: Parsing the WSJ using CCG and log-linear models. In: Proceedings of the 42nd Annual ACL, p. 103 (2004)Google Scholar
- 27.Federico, M.; Bertoldi, N.; Cettolo, M.: IRSTLM: an open source toolkit for handling large scale language models. In: Interspeech, 9th Annual Conference of the International Speech Communication Association, pp. 1618–1621 (2008)Google Scholar
- 28.Tamchyna, A.; Bojar, O.: No free lunch in factored phrase-based machine translation. In: Computational Linguistics and Intelligent Text Processing, pp. 210–223. Springer (2013)Google Scholar
- 29.Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on ACL, pp. 311–318 (2002)Google Scholar