Abstract
Although natural language processing (NLP) is now a popular area of research and development, less-resourced languages are not receiving much attention from developers. One of such under-resourced languages is Kafi-noonoo which is spoken in the south-western regions of Ethiopia. This paper presents the development of part-of-speech tagger for Kafi-noonoo. In order to develop the tagger, we employed a hybrid of two systems: statistical and rule-based taggers. The lexical and transitional probabilities of word classes are modeled using HMM. However, due to the limitation of corpus for the language, a set of transformation rules are applied to improve the result. The system was tested with test corpus and, with 90% of the corpus used for training, the hybrid tagger yielded an accuracy of 80.47%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allen, J.: Natural language Understanding. The Benjamin/Cummings Publishing Company, Redwood (1995)
Altunyurt, L., Orhan, Z., Güngör, T.: A Composite Approach for Part of Speech Tagging in Turkish. In: Proceeding of International Scientific Conference on Computer Science, Istanbul, Turkey (2006)
Bird, S., Klein, E., Loper, E.: Natural Language processing with python: Analyzing text with the natural language toolkit. O’Reilly Media, Cambridge (2009)
Brill, E.: Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics 21(4), 543–565 (1995)
Dand, S., Sarkar, S., Basu, A.: Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario. In: Department of Computer Science and Engineering, Kharagpur, India, Indian Institute of Technology (2007)
Harold, F.: The non-Semitic languages of Ethiopia. Michigan State University, Michigan (1976)
Jurafsky, D., Martin, J.: Speech and Language Processing: An Introduction to Natural Speech Recognition. Prentice-Hall, New Jersey (2000)
Mamo, G., Meshesha, M.: Part-of-Speech Tagging for Afaan Oromo Language. Inter. Journal of Advanced Computer Science and Applications 1(3), 1–5 (2011)
Nivre, J.: Sparse data and smoothing in statistical part-of-speech tagging. Journal of Quantitative Linguistics, 1–17 (2000)
Zin, K.: Hidden Markov Model with Rule Based Approach for Part of Speech Tagging of Myanmar Language. In: Proceedings of the 3rd International Conference on Communications and Information Technology, Florida, pp. 123–128 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mekuria, Z., Assabie, Y. (2014). A Hybrid Approach to the Development of Part-of-Speech Tagger for Kafi-noonoo Text . In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-54906-9_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54905-2
Online ISBN: 978-3-642-54906-9
eBook Packages: Computer ScienceComputer Science (R0)