Kannada Stemmer and Its Effect on Kannada Documents Classification

Deepamala, N.; Ramakanth Kumar, P.

doi:10.1007/978-81-322-2202-6_7

N. Deepamala⁷ &
P. Ramakanth Kumar⁸

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 33))

1414 Accesses
3 Citations

Abstract

Stemming is reducing a word to its root or stem form. Kannada is a morphologically rich language and words get inflected to different forms based on person, number, gender and tense. Stemming is an important pre-processing step in any Natural Language Processing application. In this paper, stemming is performed on Kannada words using unsupervised method using suffix arrays. An accuracy of 0.58 % was achieved with this method. The performance of the stemmer is further improved by using a stem-list dictionary in combination with the unsupervised method. A list of 18,804 stem words is created manually in Kannada Language as part of this work. A 10 % improvement in performance is observed. The effect of the proposed stemmer on text classification of Kannada documents using Naïve Bayes and Maximum Entropy methods are compared. It is shown in this paper, that stemming improves the performance of text classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)
Google Scholar
Lovins, J.: Development of a stemming algorithm. Mech. Transl. Comput. Linguist. 11, 22–23 (1968)
Google Scholar
Paice, C., Husk, G.: Another stemmer. ACM SIGIR Forum 24(3), 566 (1990)
Google Scholar
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Proceedings of EACL, ACL (2003)
Google Scholar
Islam, Z., Uddin, N., Khan, M.: A light weight stemmer for bengali and its use in spelling checker. In: Proceedings of 1st International conference on Digital Communications and Computer Applications (DCCA 2007), Irbid, Jordan, pp. 87–93 (2007)
Google Scholar
Majumder, P., Mitra, M., Parui, S.K., Kole, G., Mitra, P., Datta, K.: Yass: yet another suffix stripper. ACM Trans. Inf. Syst. 25(4), 18 (2007)
Article Google Scholar
Pandey, A.K., Siddiqui, T.J.: An unsupervised Hindi stemmer with heuristic improvements. In: Proceedings of the Second W.orkshop on Analytics for Noisy Unstructured Text Data, AND 2008, Singapore, pp. 99–105 (2008)
Google Scholar
Dasgupta, S., Ng, V.: Unsupervised morphological parsing of bengali. Lang. Resour. Eval. 40, 311–330 (2006)
Google Scholar
Keshava, S., Pitler, E.: A simpler, intuitive approach to morpheme induction. In: Proceedings of 2nd Pascal Challenges Workshop, pp. 31–35 (2006)
Google Scholar
Majgaonker, M.M., Siddiqui, T.J.: Discovering suffixes: a case study for Marathi language. Int. J. Comput. Sci. Eng. 04, 2716–2720 (2010)
Google Scholar
Suba, K., Jiandani, D., Bhattacharyya, P.: Hybrid inflectional stemmer and rule-based derivational stemmer for Gujrati. In: 2nd Workshop on South and Southeast Asian Natural Languages Processing, Chiang Mai, Thailand (2011)
Google Scholar
Gupta, V., Lehal, G.S.: Punjabi language stemmer for nouns and proper names. In: Proceedings of the 2nd Workshop on South and Southeast Asian Natural Language Processing (WSSANLP) IJCNLP 2011, Chiang Mai, Thailand, pp. 35–39 (2011)
Google Scholar
Kumar, D., Rana, P.: Design and development of a stemmer for Punjabi. Int. J. Comput. Appl. 11(12), 0975–8887 (2010)
Google Scholar
Padma, M.C., Prathibha, R.J.: Development of morphological stemmer, analyzer and generator for Kannada nouns. In: Proceedings of International Conference, ICERECT 2012, pp. 713–723 (2014)
Google Scholar
Bhat, S.: Statistical stemming for Kannada. In: Proceedings The 4th Workshop on South and Southeast Asian NLP (WSSANLP), International Joint Conference on Natural Language Processing, Nagoya, Japan, pp. 25–33, 14–18 Oct 2013
Google Scholar
http://www.hlt.utdallas.edu/~sajib/FinalDistribution.tar.gz. Accessed 24 July 2014
Emille corpus: http://www.emille.lancs.ac.uk (2003)
Nigam, K., Lafferty, J., McCallum, A.: Using maximum entropy for text classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering, pp. 61–67 (1999)
Google Scholar
McCallum, A.K.: MALLET: a machine learning for language toolkit (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, R.V. College of Engineering, Bangalore, India
N. Deepamala
Department of Information Science and Engineering, R.V. College of Engineering, Bangalore, 560059, India
P. Ramakanth Kumar

Authors

N. Deepamala
View author publications
You can also search for this author in PubMed Google Scholar
P. Ramakanth Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. Deepamala .

Editor information

Editors and Affiliations

School of Electrical and Information Engineering, University of South Australia, South Australia, Australia
Lakhmi C. Jain
Computer Science and Engineering, Veer Surendra Sai University of Technolo, Sambalpur, Odisha, India
Himansu Sekhar Behera
Computer Science & Engineering, Kalyani University, Nadia, West Bengal, India
Jyotsna Kumar Mandal
Dept. of Computer Science and Eng., National Institute of Technology Rourkela, Rourkela, India
Durga Prasad Mohapatra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Deepamala, N., Ramakanth Kumar, P. (2015). Kannada Stemmer and Its Effect on Kannada Documents Classification. In: Jain, L., Behera, H., Mandal, J., Mohapatra, D. (eds) Computational Intelligence in Data Mining - Volume 3. Smart Innovation, Systems and Technologies, vol 33. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2202-6_7

Download citation

DOI: https://doi.org/10.1007/978-81-322-2202-6_7
Published: 12 December 2014
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2201-9
Online ISBN: 978-81-322-2202-6
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics