Abstract
Telugu is the third most spoken language in India and one of the fifteen most spoken languages in the world. But, there is no standardized input method for Telugu, which has a widespread use. Since majority of users of Telugu typing tools on the computers are familiar with English, we propose a transliteration based text input method in which the users type Telugu using Roman script. We have shown that simple edit-distance based approach can give a light-weight system with good efficiency for a text input method. We have tested the approach with three datasets – general data, countries and places and person names. The approach has worked considerably well for all the datasets and holds promise as an efficient text input method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andrew, T.F., Sherri, L.C., Christopher, M.A.: Cross Linguistic Name Matching in English and Arabic: A One to Many Mapping Extension of the Levenshtein Edit Distance Algorithm. In: Human Language Technology Conference of the North American Chapter of the ACL, pp. 471–478 (2006)
Animesh, N., Ravi Kiran Rao, B., Pawandeep, S., Sudip, S., Ratna, S.: Named Entity Recognition for Indian Languages. In: Workshop on NER for South and South East Asian Languages (NERSSEA), International Joint Conference on Natural Language Processing (IJCNLP) (2008)
Anirudha, J., Ashish, G., Aditya, C., Vikram, P., Gaurav, M.: Keylekh: A keyboard for text entry in Indic scripts. In: Proc. Computer Human Interaction (CHI) (2004)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate Record Detection: A Survey. IEEE Transactions on Knowledge and Data Engineering 19(1), 1–16 (2007)
Prasad, P., Vasudeva, V.: Word normalization in Indian languages. In: 4th International Conference on Natural Language Processing (ICON) (2005)
Ranbeer, M., Nikita, P., Prasad, P., Vasudeva, V.: Experiments in Cross-lingual IR among Indian Languages. In: International Workshop on Cross Language Information Processing (CLIP 2007) (2007)
Report of the Committee for Standardization of Keyboard Layout for Indian Script Based Computers. Electronics Information & Planning Journal 14(1) (October 1986)
Sandeva, G., Yoshihiko, H., Yuichi, I., Fumio, K.: An Efficient and User Friendly Sinhala Input method based on Phonetic Transcription. Journal of Natural Language Processing 14(5) (October 2007)
Sandeva, G., Yoshihiko, H., Yuichi, I., Fumio, K.: SriShell Primo: A Predictive Sinhala Text Input System. In: Workshop on NLP for Less Privileged Languages (NLPLPL), International Joint Conference on Natural Language Processing (IJCNLP) (2008)
Serva, M., Petroni, F.: Indo-European languages tree by Levenshtein distance. Exploring the Frontiers of Physics (EPL) (6) (2008)
William, W.C., Pradeep, R., Stephen, E.F.: A Comparison of String Distance Metrics for Name-Matching Tasks. In: Proceedings of Association for the Advancement of Artificial Intelligence (AAAI) (2003)
Winkler, W.E.: The State of Record Linkage and Current Research Problems. In: Statistics of Income Division, Internal Revenue Service Publication, R99/04
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sowmya, V.B., Varma, V. (2009). Transliteration Based Text Input Methods for Telugu. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-00831-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00830-6
Online ISBN: 978-3-642-00831-3
eBook Packages: Computer ScienceComputer Science (R0)