Design and Implementation of Rule-Based Hindi Stemmer for Hindi Information Retrieval

Kumar, Rakesh; Ramotra, Atul Kumar; Mahajan, Amit; Mansotra, Vibhakar

doi:10.1007/978-981-15-0077-0_13

Design and Implementation of Rule-Based Hindi Stemmer for Hindi Information Retrieval

Rakesh Kumar⁷,
Atul Kumar Ramotra⁷,
Amit Mahajan⁷ &
…
Vibhakar Mansotra⁷

Conference paper
First Online: 04 December 2019

675 Accesses
2 Citations

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 165))

Abstract

Stemming is a process that maps morphologically similar words to a common root/stem word by removing their prefixes or suffixes. In Natural Language Processing, stemming plays an important role in Information Retrieval, Machine Translation, Text Summarization, etc. Stemming reduces inflected word to its root form without doing any morphological analysis of the word and sometimes it is not necessary that stemming always provides us meaningful/dictionary root words as a lemmatizer always provides meaningful dictionary words. For example, in the Hindi word , (pakshion) is formed as ( (paksh) + ) having as suffix; if we remove this suffix, then it becomes (paksh) and (paksh) which is not a meaningful Hindi dictionary word. In the context of information retrieval, the stemmer reduces varied (morphologically inflected) words to a common form, thereby reducing the index size of the inverted file and increasing the recall. In this paper, researchers have attempted to develop a rule-based Hindi Stemmer Suffix Stripping Approach for Hindi Information Retrieval. A python-based web interface has been designed to implement the proposed algorithm. Also, the developed stemmer is being tested for accuracy and efficiency in two scenarios, first as an independent stemmer and second as a supporting module to indexing in Hindi Information Retrieval. The proposed stemmer has shown an accuracy of 71% as an individual stemmer and also reduced the index size by 26% (approx.) when used in indexing.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Sharma, A., Kumar, R., Mansotra, V.: Proposed stemming algorithm for Hindi information retrieval. Int. J. Innov. Res. Comput. Commun. Eng. (An ISO Certif. Organ.) 3297(6), 11449–11455 (2016)
Google Scholar
Estahbanati, S., Javidan, R., Nikkhah, M.: A new multi-phase algorithm for stemming in Farsi language based on morphology. Int. J. Comput. Theory Eng. 3(5), 623–627 (2011)
Article Google Scholar
Giridhar, N.S., Prema, K.V., Subba Reddy, N.V.: A prospective study of stemming algorithms for web text mining 1. GANPAT Univ. J. Eng. Technol. 1(1), 28–34 (2011)
Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130137 (1980)
Article Google Scholar
Mishra, U., Prakash, C.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4(05), 711–717 (2012)
Google Scholar
Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Workshop on Computational Linguistics for South-Asian Languages, EACL (2003)
Google Scholar
Kumar, D., Rana, P.: Design and development of a stemmer for Punjabi. Int. J. Comput. Appl. 11(12), 18–23 (2010)
Google Scholar
Gupta, V.: Hindi rule based stemmer for nouns. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1) (2014). ISSN: 2277-128X
Google Scholar
Shahid Husain, M.: An unsupervised approach to develop stemmer. Int. J. Nat. Lang. Comput. 1(2), 15–23 (2012)
Article Google Scholar
Paul, S., Tandon, M., Joshi, N., Mathur, I., Design of a rule based Hindi lemmatizer, pp. 67–74 (2013)
Google Scholar
Rastogi, M., Khanna, P.: Development of morphological analyzer for Bangla. Int. J. Comput. Appl. 95(17), 1–5 (2014)
Google Scholar
Eckart, T., Quasthoff, U.: Statistical corpus and language comparison on comparable corpora. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora. Springer, Heidelberg (2013); Author, F., Author, S.: Title of a proceedings paper. In: Editor, F., Editor, S. (eds.) Conference 2016, LNCS, vol. 9999, pp. 1–13. Springer, Heidelberg (2016)
Chapter Google Scholar
Hafer, M., Weiss, S.: Word segmentation by letter successor varieties. Inf. Storage Retr. 10, 371–385 (1974)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Jammu, Jammu, 180006, Jammu and Kashmir, India
Rakesh Kumar, Atul Kumar Ramotra, Amit Mahajan & Vibhakar Mansotra

Authors

Rakesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Atul Kumar Ramotra
View author publications
You can also search for this author in PubMed Google Scholar
Amit Mahajan
View author publications
You can also search for this author in PubMed Google Scholar
Vibhakar Mansotra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rakesh Kumar .

Editor information

Editors and Affiliations

Department of Informatics, University of Leicester, Leicester, UK
Yu-Dong Zhang
Department of Computer Science and Engineering, University of Kalyani, Kalyani, India
Jyotsna Kumar Mandal
Department of Computer Science, Khon Kaen University, Khon Kaen, Thailand
Chakchai So-In
Nagpur Institute of Technology, Nagpur, Maharashtra, India
Nileshsingh V. Thakur

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, R., Ramotra, A.K., Mahajan, A., Mansotra, V. (2020). Design and Implementation of Rule-Based Hindi Stemmer for Hindi Information Retrieval. In: Zhang, YD., Mandal, J., So-In, C., Thakur, N. (eds) Smart Trends in Computing and Communications. Smart Innovation, Systems and Technologies, vol 165. Springer, Singapore. https://doi.org/10.1007/978-981-15-0077-0_13

Download citation

DOI: https://doi.org/10.1007/978-981-15-0077-0_13
Published: 04 December 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0076-3
Online ISBN: 978-981-15-0077-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics