Skip to main content

Design and Implementation of Rule-Based Hindi Stemmer for Hindi Information Retrieval

  • Conference paper
  • First Online:

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 165))

Abstract

Stemming is a process that maps morphologically similar words to a common root/stem word by removing their prefixes or suffixes. In Natural Language Processing, stemming plays an important role in Information Retrieval, Machine Translation, Text Summarization, etc. Stemming reduces inflected word to its root form without doing any morphological analysis of the word and sometimes it is not necessary that stemming always provides us meaningful/dictionary root words as a lemmatizer always provides meaningful dictionary words. For example, in the Hindi word , (pakshion) is formed as ( (paksh) + ) having as suffix; if we remove this suffix, then it becomes (paksh) and (paksh) which is not a meaningful Hindi dictionary word. In the context of information retrieval, the stemmer reduces varied (morphologically inflected) words to a common form, thereby reducing the index size of the inverted file and increasing the recall. In this paper, researchers have attempted to develop a rule-based Hindi Stemmer Suffix Stripping Approach for Hindi Information Retrieval. A python-based web interface has been designed to implement the proposed algorithm. Also, the developed stemmer is being tested for accuracy and efficiency in two scenarios, first as an independent stemmer and second as a supporting module to indexing in Hindi Information Retrieval. The proposed stemmer has shown an accuracy of 71% as an individual stemmer and also reduced the index size by 26% (approx.) when used in indexing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Sharma, A., Kumar, R., Mansotra, V.: Proposed stemming algorithm for Hindi information retrieval. Int. J. Innov. Res. Comput. Commun. Eng. (An ISO Certif. Organ.) 3297(6), 11449–11455 (2016)

    Google Scholar 

  2. Estahbanati, S., Javidan, R., Nikkhah, M.: A new multi-phase algorithm for stemming in Farsi language based on morphology. Int. J. Comput. Theory Eng. 3(5), 623–627 (2011)

    Article  Google Scholar 

  3. Giridhar, N.S., Prema, K.V., Subba Reddy, N.V.: A prospective study of stemming algorithms for web text mining 1. GANPAT Univ. J. Eng. Technol. 1(1), 28–34 (2011)

    Google Scholar 

  4. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130137 (1980)

    Article  Google Scholar 

  5. Mishra, U., Prakash, C.: MAULIK: an effective stemmer for Hindi language. Int. J. Comput. Sci. Eng. 4(05), 711–717 (2012)

    Google Scholar 

  6. Ramanathan, A., Rao, D.D.: A lightweight stemmer for Hindi. In: Workshop on Computational Linguistics for South-Asian Languages, EACL (2003)

    Google Scholar 

  7. Kumar, D., Rana, P.: Design and development of a stemmer for Punjabi. Int. J. Comput. Appl. 11(12), 18–23 (2010)

    Google Scholar 

  8. Gupta, V.: Hindi rule based stemmer for nouns. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4(1) (2014). ISSN: 2277-128X

    Google Scholar 

  9. Shahid Husain, M.: An unsupervised approach to develop stemmer. Int. J. Nat. Lang. Comput. 1(2), 15–23 (2012)

    Article  Google Scholar 

  10. Paul, S., Tandon, M., Joshi, N., Mathur, I., Design of a rule based Hindi lemmatizer, pp. 67–74 (2013)

    Google Scholar 

  11. Rastogi, M., Khanna, P.: Development of morphological analyzer for Bangla. Int. J. Comput. Appl. 95(17), 1–5 (2014)

    Google Scholar 

  12. Eckart, T., Quasthoff, U.: Statistical corpus and language comparison on comparable corpora. In: Sharoff, S., Rapp, R., Zweigenbaum, P., Fung, P. (eds.) Building and Using Comparable Corpora. Springer, Heidelberg (2013); Author, F., Author, S.: Title of a proceedings paper. In: Editor, F., Editor, S. (eds.) Conference 2016, LNCS, vol. 9999, pp. 1–13. Springer, Heidelberg (2016)

    Chapter  Google Scholar 

  13. Hafer, M., Weiss, S.: Word segmentation by letter successor varieties. Inf. Storage Retr. 10, 371–385 (1974)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rakesh Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, R., Ramotra, A.K., Mahajan, A., Mansotra, V. (2020). Design and Implementation of Rule-Based Hindi Stemmer for Hindi Information Retrieval. In: Zhang, YD., Mandal, J., So-In, C., Thakur, N. (eds) Smart Trends in Computing and Communications. Smart Innovation, Systems and Technologies, vol 165. Springer, Singapore. https://doi.org/10.1007/978-981-15-0077-0_13

Download citation

Publish with us

Policies and ethics