Abstract
Stemming is a process that maps morphologically similar words to a common root/stem word by removing their prefixes or suffixes. In Natural Language Processing, stemming plays an important role in Information Retrieval, Machine Translation, Text Summarization, etc. Stemming reduces inflected word to its root form without doing any morphological analysis of the word and sometimes it is not necessary that stemming always provides us meaningful/dictionary root words as a lemmatizer always provides meaningful dictionary words. For example, in the Hindi word , (pakshion) is formed as ( (paksh) + ) having as suffix; if we remove this suffix, then it becomes (paksh) and (paksh) which is not a meaningful Hindi dictionary word. In the context of information retrieval, the stemmer reduces varied (morphologically inflected) words to a common form, thereby reducing the index size of the inverted file and increasing the recall. In this paper, researchers have attempted to develop a rule-based Hindi Stemmer Suffix Stripping Approach for Hindi Information Retrieval. A python-based web interface has been designed to implement the proposed algorithm. Also, the developed stemmer is being tested for accuracy and efficiency in two scenarios, first as an independent stemmer and second as a supporting module to indexing in Hindi Information Retrieval. The proposed stemmer has shown an accuracy of 71% as an individual stemmer and also reduced the index size by 26% (approx.) when used in indexing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sharma, A., Kumar, R., Mansotra, V.: Proposed stemming algorithm for Hindi information retrieval. Int. J. Innov. Res. Comput. Commun. Eng. (An ISO Certif. Organ.) 3297(6), 11449–11455 (2016)
Estahbanati, S., Javidan, R., Nikkhah, M.: A new multi-phase algorithm for stemming in Farsi language based on morphology. Int. J. Comput. Theory Eng. 3(5), 623–627 (2011)
Giridhar, N.S., Prema, K.V., Subba Reddy, N.V.: A prospective study of stemming algorithms for web text mining 1. GANPAT Univ. J. Eng. Technol. 1(1), 28–34 (2011)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130137 (1980)
Mishra, U., Prakash, C.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4(05), 711–717 (2012)
Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Workshop on Computational Linguistics for South-Asian Languages, EACL (2003)
Kumar, D., Rana, P.: Design and development of a stemmer for Punjabi. Int. J. Comput. Appl. 11(12), 18–23 (2010)
Gupta, V.: Hindi rule based stemmer for nouns. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1) (2014). ISSN: 2277-128X
Shahid Husain, M.: An unsupervised approach to develop stemmer. Int. J. Nat. Lang. Comput. 1(2), 15–23 (2012)
Paul, S., Tandon, M., Joshi, N., Mathur, I., Design of a rule based Hindi lemmatizer, pp. 67–74 (2013)
Rastogi, M., Khanna, P.: Development of morphological analyzer for Bangla. Int. J. Comput. Appl. 95(17), 1–5 (2014)
Eckart, T., Quasthoff, U.: Statistical corpus and language comparison on comparable corpora. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora. Springer, Heidelberg (2013); Author, F., Author, S.: Title of a proceedings paper. In: Editor, F., Editor, S. (eds.) Conference 2016, LNCS, vol. 9999, pp. 1–13. Springer, Heidelberg (2016)
Hafer, M., Weiss, S.: Word segmentation by letter successor varieties. Inf. Storage Retr. 10, 371–385 (1974)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kumar, R., Ramotra, A.K., Mahajan, A., Mansotra, V. (2020). Design and Implementation of Rule-Based Hindi Stemmer for Hindi Information Retrieval. In: Zhang, YD., Mandal, J., So-In, C., Thakur, N. (eds) Smart Trends in Computing and Communications. Smart Innovation, Systems and Technologies, vol 165. Springer, Singapore. https://doi.org/10.1007/978-981-15-0077-0_13
Download citation
DOI: https://doi.org/10.1007/978-981-15-0077-0_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0076-3
Online ISBN: 978-981-15-0077-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)