Abstract
This paper presents a novel approach to recognize named entities in Odia corpus. The development of a NER system for Odia using Support Vector Machine is a challenging task in intelligent computing. NER aims at classifying each word in a document into predefined target named entity classes in a linear and non-linear fashion. Starting with named entity annotated corpora and a set of features it requires to develop a base-line NER System. Some language specific rules are added to the system to recognize specific NE classes. Moreover, some gazetteers and context patterns are added to the system to increase its performance as it is observed that identification of rules and context patterns requires language-based knowledge to make the system work better. We have used required lexical databases to prepare rules and identify the context patterns for Odia. Experimental results show that our approach achieves higher accuracy than previous approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kudo, T., Matsumoto, Y.: Chunking with support vector machine. In: Proceedings of NAACL, pp. 192–199 (2001)
Biswas, S., Mishra, S.P., Acharya, S., Mohanty, S.: A hybrid Oriya named entity recognition system: harnessing the power of rule. Int. J. Artif. Intell. Expert Syst. 1(1), 639–643 (2010)
Ekbal, A., Bandyopadhyay, S.: Bengali named entity recognition using support vector machine. In: Proceedings of the IJCNLP-08 Workshop on NER for South and South East Asian Languages, pp. 51–58 (2008)
Saha, S.K., Sarkar, S., Mitra, P.: A hybrid feature set based maximum entropy hindi named entity recognition. In: Proceedings of the 3rd International Joint Conference on NLP, Hyderabad, India, pp. 343–349, Jan 2008
Goyal, A.: Named entity recognition for South Asian languages. In: Proceedings of the IJCNLP-08 Workshop on NER for South and South-East Asian Languages, Hyderabad, India, pp. 89–96, Jan 2008
Sasidhar, B., Yohan, P.M., Babu, A.V., Govardhan, A.: A survey on named entity recognition in Indian languages with particular reference to Telugu. Int. J. Comput. Sci. 8(2). ISSN 1694-0814. www.IJCSI.org (2011)
Chieu, H.L., Ng, H.T.: Named entity recognition: a maximum entropy approach using global information. In: 19th International Conference on Computational Linguistics (COLING 2002), 24 Aug–1 Sept 2002
Dash, N.S.: Indian scenario in language corpus generation. In: Dash, N.S., Dash, P.D., Sarkar, P. (eds.) Rainbow of Linguistics, vol. I, pp. 129–162. T Media Publication, Kolkata (2007)
Das, B.R., Patnaik, S., Dash, N.S.: Development of Odia language corpus from modern news paper texts: some problems and issues. In: Proceedings of the International Conference on Intelligent Computing, Communication and Devices (ICCD 2014). SOA University, Bhubaneswar, India, Springer Book Series on AISC, pp. 88–94 (2014)
Sharma, P., Sharma, U., Kalita, J.: Named entity recognition: a survey for the Indian languages. Language in India. Special Volume: Problems of Parsing in Indian Languages 11(5). www.languageinindia.com, May 2011
Ekbal, A., Bandyopadhyay, S.: Named entity recognition using support vector machine: a language independent approach. Int. J. Electr. Electron. Eng. 4(2), 155–170 (2010)
Saha, S.K., Ghosh, P.S., Sarkar, S., Mitra, P.: Named entity recognition in Hindi using maximum entropy and transliteration. Res. J. Comput. Sci. Comput. Eng. Appl. 33–41 (2008)
Bharati, A., Sangal, R., Chaitnya, V.: Natural language processing—a Paninian perspective. Prentice Hall-India, New Delhi (1995)
Ray, P.R., Harish, V., Sarkar, S., Basu, A.: Part of speech tagging and local word grouping techniques for natural language parsing in Hindi. In: Proceedings of the International Conference on Natural Language Processing (ICON 2003), pp. 118–125 (2003)
Satish, K.: Neural Network Book: A Classroom Approach, 10th edn. TMH Publication, New Delhi (2010)
Mahapatra, D.: Adhunika Odia Byakarana (Modern Odia Grammar), 5th edn. Kitab Mahal, Cuttack (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer India
About this paper
Cite this paper
Das, B.R., Patnaik, S., Baboo, S., Dash, N.S. (2015). A System for Recognition of Named Entities in Odia Text Corpus Using Machine Learning Algorithm. In: Jain, L., Behera, H., Mandal, J., Mohapatra, D. (eds) Computational Intelligence in Data Mining - Volume 1. Smart Innovation, Systems and Technologies, vol 31. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2205-7_30
Download citation
DOI: https://doi.org/10.1007/978-81-322-2205-7_30
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2204-0
Online ISBN: 978-81-322-2205-7
eBook Packages: EngineeringEngineering (R0)