Skip to main content
Log in

A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

A Correction to this article was published on 17 May 2018

This article has been updated

Abstract

This study addresses the integration and incorporation of rich additional information into the phrase-based approach, aptly called factored translation, which is an extension of phrase-based statistic machine translation (PBSMT). This approach was proven successful when translating English into a morphologically rich language. PBSMT represents the baseline of this work. We extend the phrase-based translation approach by integrating additional linguistic knowledge, namely part-of-speech (POS) tags, to create a factored model. The main contribution of this study is the creation of a new approach for Arabic–English translation via the injection of the factored model into Combinatory Categorial Grammar (CCG) supertags to form an integrated model (POS + CCG). The system was trained on a freely available multi-UN corpus on Arabic–English language pairs. Moses decoder, which is an open-source factored SMT system, was used to integrate these data into the target language model and the target side of the translation model. Results showed improvements to the BLEU automatic score via various high n-gram language models (LMs). The integration of the featured factors (POS + CCG) of the translation has been successfully tested. Overall, the 3-, 5-, 7-, and 9-g LM evaluation with BLEU scores proved that our integrated model performed better than PBSMT. Compared with three other models (PBSMT, POS, and CCG models), the integrated model improved the translation quality by 1.54, 1.29, and 0.21 %, respectively, over the 3-g LM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Change history

  • 17 May 2018

    The original version of this article unfortunately contained a mistake. The family name of the first author was incomplete. The complete family name is “Rajeh Ali” as given above.

  • 17 May 2018

    The original version of this article unfortunately contained a mistake. The family name of the first author was incomplete. The complete family name is ?Rajeh Ali? as given above.

  • 17 May 2018

    The original version of this article unfortunately contained a mistake. The family name of the first author was incomplete. The complete family name is ���Rajeh Ali��� as given above.

  • 17 May 2018

    The original version of this article unfortunately contained a mistake. The family name of the first author was incomplete. The complete family name is ���Rajeh Ali��� as given above.

  • 17 May 2018

    The original version of this article unfortunately contained a mistake. The family name of the first author was incomplete. The complete family name is ���Rajeh Ali��� as given above.

  • 17 May 2018

    The original version of this article unfortunately contained a mistake. The family name of the first author was incomplete. The complete family name is ���Rajeh Ali��� as given above.

References

  1. Tripathi S., Sarkhel J.K.: Approaches to machine translation. Ann. Libr. Inf. Stud. 57, 388–393 (2010)

    Google Scholar 

  2. Koehn P.: Statistical Machine Translation. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  3. Mehay, D.N.; Brew, C.: CCG syntactic reordering models for phrase-based machine translation. In: Proceedings of the Seventh Workshop on Statistical Machine Translation ACL, pp. 210–221 (2012)

  4. Koehn, P.; Och, F.J.; Marcu, D.: Statistical phrase-based translation. In: Proceedings of NAACL-HLT. ACL, pp. 48–54 (2003)

  5. Hassan H., Sima’an K., Way A.: Syntactically lexicalized phrase-based SMT. IEEE Trans. Audio Speech Lang. Process. 16(7), 1260–1273 (2008)

    Article  Google Scholar 

  6. Steedman M.: The Syntactic Process. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  7. Koehn, P.; Hoang, H.: Factored translation models. In: EMNLP-CoNLL, pp. 868–876 (2007)

  8. Hassan, H.; Sima’an, K.; Way, A.: A syntactic language model based on incremental CCG parsing. In: Spoken Language Technology Workshop, IEEE, pp. 205–208 (2008)

  9. Almaghout, H.; Jiang, J., Way, A.: Extending CCG-based syntactic constraints in hierarchical phrase-based SMT. In: Proceedings of the Annual Conference of the European Association for MT (EAMT), pp. 193–200 (2012)

  10. Koehn, P.; Hoang, H.; Birch, A.; Callison-Burch, C.; Federico, M.; Bertoldi, N.; Cowan, B.; Shen, W.; Moran, C.; Zens, R.: Moses: Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. ACL, pp. 177–180 (2007)

  11. Bojar, O.: English-to-Czech factored machine translation. In: Proceedings of the Second Workshop on Statistical Machine Translation. ACL, pp. 232–239 (2007)

  12. Huet, S.; Manishina, E.; Lefèvre, F.: Factored machine translation systems for Russian-English. In: Proceedings of the Eighth Workshop on Statistical Machine Translation, pp. 152–155 (2013)

  13. de Medeiros Caseli, H.; Nunes, I.A.: Factored Translation between Brazilian Portuguese and English. In: SBIA, pp. 163–172. Springer (2010)

  14. Almaghout, H.; Jiang, J., Way, A.: CCG augmented hierarchical phrase-based machine-translation. In: Proceedings of the 7th International Workshop on Spoken Language Translatiopn (2010)

  15. Almaghout, H.; Jiang, J., Way, A.: CCG contextual labels in hierarchical phrase-based SMT. In: Proceedings of EAMT, pp. 281–288 (2011)

  16. Birch, A.; Osborne, M.; Koehn, P.: CCG supertags in factored statistical machine translation. In: Proceedings of the Second Workshop on SMT. ACL, pp. 9–16 (2007)

  17. Mustafa S.H.: Character contiguity in N-gram-based word matching: the case for Arabic text searching. Inf. Process. Manag. 41(4), 819–827 (2005)

    Article  Google Scholar 

  18. Clark S., Curran J.R.: Wide-coverage efficient statistical parsing with CCG and log-linear models. Comput. Linguist. 33(4), 493–552 (2007)

    Article  MATH  Google Scholar 

  19. Curran, J.R.; Clark, S.; Vadas, D.: Multi-tagging for lexicalized-grammar parsing. In: Proceedings of the 21st International Conference on Computational Linguistics ACL, pp. 697–704 (2006)

  20. Hockenmaier, J.; Steedman, M.: CCGbank: User’s Manual. Technical Reports (CIS). Paper 52. Department of Computer & Information Science, University of Pennsylvania, Philadelphia (2005). http://repository.upenn.edu/cgi/viewcontent.cgi?article=1054&context=cis_reports

  21. Hassan, H.; Sima’an, K.; Way, A.: Supertagged phrase-based statistical machine translation. In: Proceedings of the ACL (2007)

  22. Boxwell, S.A.; Brew, C.: A Pilot Arabic CCGbank. In: Proceedings of the Seventh International Conference on LREC-10 (2010)

  23. El-taher A.I., Bakr H.M.A., Zidan I., Shaalan K.: An Arabic CCG approach for determining constituent types from Arabic Treebank. J. King Saud Univ. Comput. Info. Sci. 26(4), 441–449 (2014)

    Google Scholar 

  24. Kaeshammer, M.; Wetzel, D.: Enriching phrase-based statistical machine translation with POS information. In: RANLP Student Research Workshop, pp. 33–40 (2011)

  25. Tian, L.;Wong, D.F.; Chao, L.S.; Oliveira, F.: A relationship: word alignment, phrase table, and translation quality. Sci.World J. 2014, 438106 (2014). doi:10.1155/2014/438106

  26. Clark, S.; Curran, J.R.: Parsing the WSJ using CCG and log-linear models. In: Proceedings of the 42nd Annual ACL, p. 103 (2004)

  27. Federico, M.; Bertoldi, N.; Cettolo, M.: IRSTLM: an open source toolkit for handling large scale language models. In: Interspeech, 9th Annual Conference of the International Speech Communication Association, pp. 1618–1621 (2008)

  28. Tamchyna, A.; Bojar, O.: No free lunch in factored phrase-based machine translation. In: Computational Linguistics and Intelligent Text Processing, pp. 210–223. Springer (2013)

  29. Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on ACL, pp. 311–318 (2002)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiyong Li.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajeh, H.A., Li, Z. & Ayedh, A.M. A Novel Approach by Injecting CCG Supertags into an Arabic–English Factored Translation Machine. Arab J Sci Eng 41, 3071–3080 (2016). https://doi.org/10.1007/s13369-016-2075-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-016-2075-9

Keywords

Navigation