A Shallow Parser-based Hindi to Odia Machine Translation System
This paper describes a Hindi to Odia machine translation system developed using a popular open-source platform called Apertium. With population of over 1.27 billion, 18 officially recognized languages, 30 regional languages, and over 2000 dialects, the multilingual society of India needs well-developed ICT tools for the citizens to exchange and share information and knowledge between them easily. Though Hindi is the national language of India, still a lot of people of Odisha are unable to understand the information written in Hindi. In this scenario, a suitable Hindi to Odia machine translation system will help the people to understand and use Hindi in a more productive way. For development of such a machine translation system, we decided to use the Apertium platform due to several reasons. It is well suited for building machine translation systems between closely related language pairs, such as Hindi and Odia due to its shallow parser level transfer modules. The use of FST in all the modules makes this much faster as compared to other shallow parser-based platforms. Also, it is available in GPL license under free open-source software. In this paper, we have also demonstrated the linguistic and computational challenges in building linguistic resources for both Hindi and Odia languages. Specifically, the use of TAM (Tense, Aspect, and Modality) concept in transfer module is a unique approach for building transfer rules between Hindi and Odia in Apertium platform. This work can be easily extended to develop MT systems for other Indian language pairs easily.
KeywordsApertium Hindi Odia TAM Anusaaraka Transfer rules Bilingual dictionaries
The authors would like to thank Prof. Vineet Chaitanya for giving the basic insights to develop the system. We are thankful to Mr. Sriram Chaudhury (Asst. Prof., KIIT University) for his continuous support and guidance. We are also thankful to IIIT-Hyderabad, Hyderabad Central University and the Apertium group for facilitating us with useful tools and linguistic resources for the successful development of this system.
- 1.Sriram Chaudhury et al., “Anusaaraka: An Expert system based MT System,” in the proceedings of IEEE conference on Natural language processing and knowledge management (IEEE-NLP KE 2010), Beijing, China.Google Scholar
- 3.Bing Translator, http://www.bing.com/translator/.
- 4.Akshar Bharati et al., “Natural Language Processing: A Paninian Perspective,” Prentice-Hall of India, New Delhi, 1995.Google Scholar
- 5.Mikel L. Forcada, “Apertium: free/open-source rule-based machine translation”, Presentation at Fourth Machine Translation Marathon “Open Source Tools for Machine Translation,” Dublin, Ireland, 29 Jan. 2009.Google Scholar
- 7.Francis M. Tyers et al., “Free/open-source resources in the Apertium platform for machine translation research and development,” The Prague Bulletin of Mathematical Linguistics, No. 93, 2010, pp. 67–76.Google Scholar
- 8.Amba P. Kulkarni, “Design and Architecture of ‘Anusaaraka’—An Approach to Machine Translation,” Satyam Technical Review, vol 3, Oct. 2003.Google Scholar
- 9.Mikel L. Forcada et al., “Documentation of the Open-Source Shallow-Transfer Machine Translation Platform Apertium,” Departament de Llenguatges i Sistemes Inform`atics, Universitat d’Alacant, Alicante, Spain, Technical report, Mar. 10, 2009.Google Scholar
- 10.D. Cutting et al., “A practical part-of-speech tagger,” in Proceedings of Third Conference on Applied Natural Language Processing, Association for Computational Linguistics, Trento, Italy, 1992, pp. 133–140.Google Scholar
- 11.Computational Processing of the Portuguese Language: 7th International Workshop, PROPOR 2006, Itatiaia, Brazil, May 13–17, 2006, Proceedings.Google Scholar
- 12.Felipe Sánchez-Martínez et al., “Integrating corpus-based and rule-based approaches in an open-source machine translation system,” E-03071, Department de Lenguatges i Sistemes Informatics, Universitat d’Alacant, Alacant, Spain.Google Scholar
- 13.Mall, Shachi, and Umesh Chandra Jaiswal. “Developing a system for machine translation from Hindi language to English language”, ICCCT, 2013.Google Scholar
- 14.Akshar Bharati et al., “Anusaaraka: Overcoming the Language Barrier in India,” appeared in “Anuvad”, Sage Publishers, New Delhi, 2002.Google Scholar
- 15.Akshar Bharati et al., “LERIL: Collaborative Effort for Creating Lexical Resources,” in Proc. of Workshop on Language Resources in Asian Languages, together with 6th NLP Pacific Rim Symposium, Tokyo, Nov. 30, 2001.Google Scholar