Skip to main content

Kannpos-Kannada Parts of Speech Tagger Using Conditional Random Fields

  • Conference paper
  • First Online:
Emerging Research in Computing, Information, Communication and Applications

Abstract

Parts Of Speech (POS) tagging is one of the basic text processing tasks of Natural Language Processing (NLP). It is a great challenge to develop POS tagger for Indian Languages, especially Kannada due to its rich morphological and highly agglutinative nature. A Kannada POS tagger has been developed using Conditional Random Fields (CRFs), a supervised machine learning technique and it is discussed in this paper. The results presented are based on experiments conducted on a large corpus consisting of 80,000 words, where 64,000 is used for training and 16,000 is used for testing. These words are collected from Kannada Wikipedia and annotated with POS tags. The tagset from Technology Development for Indian Languages (TDIL) containing 36 tags are used to assign the POS. The n-gram CRF model gave a maximum accuracy of 92.94 %. This work is the extension of “Parts of Speech (POS) Tagger for Kannada Using Conditional Random Fields (CRFs).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://timesofindia.indiatimes.com/india/Indiaspeak-English-is-our-2nd-language/articleshow/5680962.cms?referral=PM.

  2. 2.

    http://en.wikipedia.org/wiki/Kannada.

  3. 3.

    http://ltrc.iiit.ac.in/nlptools2010/files/documents/POS-Tag-List.pdf.

References

  1. Pallavi., Pillai, A.S.: Parts Of Speech (POS) Tagger for Kannada using conditional random fields (CRFs). In: National Conference on Indian Language Computing (NCILC 2014) 1st to 2nd Feb 2014

    Google Scholar 

  2. Shambhavi, B.R., Kumar, R.: Kannada Part-Of-Speech Tagging with Probabilistic Classifiers. Int. J. Comput. Appl. 48(17), 0975–888, June 2012

    Google Scholar 

  3. Sridhar, S.N.: Modern Kannada Grammar. Lordson Publishers Pvt. Ltd., Delhi (2007) (First published in 1990 by Routledge, England under the title Kannada in the Descriptive Grammars Series edited by Bernard Comrie. Reprinted in 2007 by Manohar Publishers and Distributors, 4753/23 Ansari Road, Daryaganj, New Delhi 110 002. ISBN 81-7304-767-7)

    Google Scholar 

  4. Unified Parts of Speech (POS) Standard in Indian Languages—Draft Standard–Version 1.0, Department of Information Technology Ministry of Communications and Information Technology, Govt. of India

    Google Scholar 

  5. Melinamath, B.C.: Improvement over IL-POST tagset for Kannada. Int. J. Comput. Sci. Eng. (IJCSE) 3(3), 179–186 May 2014

    Google Scholar 

  6. Patil, V.F.: Designing POS Tagset for Kannada, Linguistic Data Consortium for Indian Languages (LDC-IL), Organized by Central Institute of Indian Languages, Department of Higher Education Ministry of Human Resource Development, Government of India, March 2010

    Google Scholar 

  7. Antony, P.J., Soman, K.P.: Kernel based part of speech tagger for Kannada. In: International Conference on Machine Learning and Cybernetics (ICMLC), vol. 4, pp. 2139–2144, IEEE (2010)

    Google Scholar 

  8. Lafferty, J., McCallum, A., Pereira F.C.N.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning (ICML-2001)

    Google Scholar 

  9. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, pp. 173–180, Association for Computational Linguistics (2003)

    Google Scholar 

  10. Spitkovsky, V.I., Alshawi, H., Chang, A.X., Jurafsky, D.: Unsupervised dependency parsing without gold part-of-speech tags. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1281–1290, Association for Computational Linguistics (2011)

    Google Scholar 

  11. Täckström, O., Das, D., Petrov, S., McDonald, R., Nivre, J.: Token and type constraints for cross-lingual part-of-speech tagging. Trans. Assoc. Comput. Linguist. 1, 1–12 (2013)

    Google Scholar 

  12. Li, S., Graça, J.V., Taskar, B.: Wiki-ly supervised part-of-speech tagging. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1389–1398, Association for Computational Linguistics (2012)

    Google Scholar 

  13. Gimpel, K., Schneider, N., O’Connor, B., Das, D., Mills, D., Eisenstein, J., Heilman, M., Yogatama, D., Flanigan, J., Smith, N.A.: Part-of-speech tagging for twitter: annotation, features, and experiments. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, vol. 2, pp. 42–47. Association for Computational Linguistics (2011)

    Google Scholar 

  14. Joshi, N., Darbari, H., Mathur, I.: HMM based POS tagger for Hindi. In: Proceeding of 2013 International Conference on Artificial Intelligence, Soft Computing (AISC-2013) (2013)

    Google Scholar 

  15. Bagul, P., Mishra, Archana, Mahajan, Prachi, Kulkarni, Medinee, Dhopavkar, Gauri: Rule Based POS Tagger for Marathi Text. Proc. Int. J. Comput. Sci. Inf. Technol. (IJCSIT) 5(2), 1322–1326 (2014)

    Google Scholar 

  16. Ganesh, J., Ranjani Parthasarathi, T. V. Geetha, and J. Balaji. “Pattern Based Bootstrapping Technique for Tamil POS Tagging.” In Mining Intelligence and Knowledge Exploration, pp. 256–267. Springer International Publishing, 2014

    Google Scholar 

  17. Das, B.R., Patnaik. S.: A novel approach for odia part of speech tagging using artificial neural network. In: Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA) 2013, pp. 147–154. Springer International Publishing (2014)

    Google Scholar 

  18. Singha, Kh.R., Singha, Ksh.K.B., Purkayastha, B.S.: Developing a part of speech tagger for Manipuri. Int. J. Comput. Linguist. Nat. Lang. Process. 2(9) Sept 2013

    Google Scholar 

  19. Jayan, Jisha P., Rajeev, R.R.: Parts of speech tagger and chunker for malayalam: statistical approach. Comput. Eng. Intell. Syst. 2(2), 68–78 (2011)

    Google Scholar 

  20. Shambhavi, B.R., Ramakanth, K.P., Revanth, G.: A maximum entropy approach to Kannada part of speech tagging. Int. J. Comput. Appl. 41(13), 9–12 (2012)

    Google Scholar 

  21. Reddy, S., Serge S.: Cross language POS taggers (and other tools) for Indian languages: an experiment with Kannada using Telugu resources. Cross Ling. Inf. Access 11 (2011)

    Google Scholar 

  22. Reddy, M.V., Hanumanthappa, M.: POS Tagger for Kannada Sentence Translation. Int. J. Emerg. Trends Technol. Comput. Sci. (IJETTCS) 1 (2012)

    Google Scholar 

  23. Department of Information Technology Ministry of Communications & Information Technology Govt. of India. Unified Parts of Speech (POS) Standard in Indian Languages—Draft Standard–Version 1.0

    Google Scholar 

  24. Che, W., Wang, M., Manning, C.D., Liu, T.: Named entity recognition with bilingual constraints. In: HLT-NAACL, pp. 52–62 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this paper

Cite this paper

Pallavi, K.P., Pillai, A.S. (2016). Kannpos-Kannada Parts of Speech Tagger Using Conditional Random Fields. In: Shetty, N., Prasad, N., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2553-9_43

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2553-9_43

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2552-2

  • Online ISBN: 978-81-322-2553-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics