A Semi-supervised Approach for Maximum Entropy Based Hindi Named Entity Recognition

Saha, Sujan Kumar; Mitra, Pabitra; Sarkar, Sudeshna

doi:10.1007/978-3-642-11164-8_36

A Semi-supervised Approach for Maximum Entropy Based Hindi Named Entity Recognition

Sujan Kumar Saha²¹,
Pabitra Mitra²¹ &
Sudeshna Sarkar²¹

Conference paper

1441 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 5909))

Abstract

Scarcity of annotated data is a challenge in building high performance named entity recognition (NER) systems in resource poor languages. We use a semi-supervised approach which uses a small annotated corpus and a large raw corpus for the Hindi NER task using maximum entropy classifier. A novel statistical annotation confidence measure is proposed for the purpose. The confidence measure is used in selective sampling based semi-supervised NER. Also a prior modulation of maximum entropy classifier is used where the annotation confidence values are used as ‘prior weight’. The superiority of the proposed technique over baseline classifier is demonstrated extensively through experiments.

Download to read the full chapter text

Chapter PDF

References

Berger, A., Pietra, S., Pietra, V.: A maximum entropy approach to natural language processing. Computational Linguistic 22(1), 39–71 (1996)
Google Scholar
Borthwick, A.: A maximum entropy approach to named entity recognition. Ph.D. thesis, Computer Science Department, New York University (1999)
Google Scholar
Collins, M., Singer, Y.: Unsupervised models for named entity classification. In: Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora (1999)
Google Scholar
Li, W., McCallum, A.: Rapid development of Hindi named entity recognition using conditional random fields and feature induction. ACM Transactions on Asian Language Information Processing (TALIP) 2(3), 290–294 (2004)
Article Google Scholar
Mohit, B., Hwa, R.: Syntax-based semi-supervised named entity tagging. In: Proceedings of the ACL Interactive Poster and Demonstration Sessions, pp. 57–60. Association for Computational Linguistics, Ann Arbor (2005)
Chapter Google Scholar
Saha, S., Sarkar, S., Mitra, P.: A hybrid feature set based maximum entropy Hindi named entity recognition. In: Proceedings of the Third International Joint Conference on Natural Language Processing (IJCNLP), pp. 343–349 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology, Kharagpur, India, 721302
Sujan Kumar Saha, Pabitra Mitra & Sudeshna Sarkar

Authors

Sujan Kumar Saha
View author publications
You can also search for this author in PubMed Google Scholar
Pabitra Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Sudeshna Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Electrical Engineering Department, Indian Institute of Technology Delhi, 110016, New Delhi, India
Santanu Chaudhury
Center for Soft Computing Research, Indian Statistical Institute, 700 108, Kolkata, India
Sushmita Mitra
Center for Soft Computing Research, Indian Statistical Institute,
C. A. Murthy
Department of Electrical Engineering, Indian Institute of Science, 560012, Bangalore, INDIA
P. S. Sastry
Center for Soft Computing Research, Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, 700 108, Kolkata, India
Sankar K. Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saha, S.K., Mitra, P., Sarkar, S. (2009). A Semi-supervised Approach for Maximum Entropy Based Hindi Named Entity Recognition. In: Chaudhury, S., Mitra, S., Murthy, C.A., Sastry, P.S., Pal, S.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2009. Lecture Notes in Computer Science, vol 5909. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11164-8_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-11164-8_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11163-1
Online ISBN: 978-3-642-11164-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)