Abstract
Morphological analysis is an essential process in translating from a morphologically poor language such as English into a morphologically rich language such as Persian. In this paper, first we analyze the output of a rule-based machine translation (RBMT) and categorize its errors. After that, we use a statistical approach to rich morphology prediction using a parallel corpus to improve the quality of RBMT. The results of error analysis show that Persian morphology comes with many challenges especially in the verb conjugation. In our approach, we define a set of linguistic features using both English and Persian linguistic information obtained from an English-Persian parallel corpus, and make our model. In our experiments, we generate inflected verb form with the most common feature values as a baseline. The results of our experiments show an improvement of almost 2.6% absolute BLEU score on a test set containing 16 K sentences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 48–54. Association for Computational Linguistics (2003)
Somers, H.: Review article: Example-based machine translation. Machine Translation 14, 113–157 (1999)
Quinlan, J.R.: Induction of decision trees. Machine Learning 1, 81–106 (1986)
Koehn, P., Hoang, H.: Factored translation models. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), vol. 868, p. 876 (2007)
Avramidis, E., Koehn, P.: Enriching morphologically poor languages for statistical machine translation. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL): Human Language Technologies, pp. 763–770 (2008)
Yeniterzi, R., Oflazer, K.: Syntax-to-morphology mapping in factored phrase-based statistical machine translation from english to turkish. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics(ACL): Human Language Technologies, pp. 454–464 (2010)
Subotin, M.: An exponential translation model for target language morphology. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics(ACL): Human Language Technologies (2011)
Goldwater, S., McClosky, D.: Improving statistical mt through morphological analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 676–683. Association for Computational Linguistics (2005)
Luong, M.T., Nakov, P., Kan, M.Y.: A hybrid morpheme-word representation for machine translation of morphologically rich languages. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 148–157. Association for Computational Linguistics (2010)
Oflazer, K.: Statistical machine translation into a morphologically complex language. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 376–387. Springer, Heidelberg (2008)
Namdar, S., Faili, H.: Using inflected word form to improve persian to english statistical machine translation. In: Proceedings of the 18th National CSI (Computer Society of Iran) Computer Conference (2013)
Minkov, E., Toutanova, K., Suzuki, H.: Generating complex morphology for machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL): Human Language Technologies, vol. 45, p. 128 (2007)
Toutanova, K., Suzuki, H., Ruopp, A.: Applying morphology generation models to machine translation. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics(ACL): Human Language Technologies, vol. 8 (2008)
Clifton, A., Sarkar, A.: Combining morpheme-based machine translation with postprocessing morpheme prediction. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL): Human Language Technologies, vol. 1, pp. 32–42 (2011)
El Kholy, A., Habash, N.: Rich morphology generation using statistical machine translation. In: Proceedings of the 7th International Natural Language Generation Conference (INLG), p. 90 (2012)
El Kholy, A., Habash, N.: Translate, predict or generate: Modeling rich morphology in statistical machine translation. In: Proceedings of European Association for Machine Translation (EAMT), vol. 12 (2012)
de Gispert, A., Marino, J.: On the impact of morphology in english to spanish statistical mt. Speech Communication 50, 1034–1046 (2008)
Vilar, D., Xu, J.: dHaro, L.F., Ney, H.: Error analysis of statistical machine translation output. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp. 697–702 (2006)
Megerdoomian, K.: Finite-state morphological analysis of persian. In: Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, pp. 35–41. Association for Computational Linguistics (2004)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics(ACL): Human Language Technologies, pp. 423–430 (2003)
Mansouri, A., Faili, H.: State-of-the-art english to persian statistical machine translation system. In: 2012 16th CSI International Symposium on Artificial Intelligence and Signal Processing (AISP), pp. 174–179. IEEE (2012)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29, 19–51 (2003)
Rasooli, M., Faili, H., Minaei-Bidgoli, B.: Unsupervised identification of persian compound verbs. In: Batyrshin, I., Sidorov, G. (eds.) MICAI 2011, Part I. LNCS, vol. 7094, pp. 394–406. Springer, Heidelberg (2011)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL): Human Language Technologies, pp. 311–318 (2002)
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., Makhoul, J.: A study of translation edit rate with targeted human annotation. In: Proceedings of Association for Machine Translation in the Americas, pp. 223–231 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mahmoudi, A., Faili, H., Arabsorkhi, M. (2013). Modeling Persian Verb Morphology to Improve English-Persian Machine Translation. In: Castro, F., Gelbukh, A., González, M. (eds) Advances in Artificial Intelligence and Its Applications. MICAI 2013. Lecture Notes in Computer Science(), vol 8265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45114-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-45114-0_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45113-3
Online ISBN: 978-3-642-45114-0
eBook Packages: Computer ScienceComputer Science (R0)