Skip to main content

Stemming

  • Reference work entry
  • First Online:
  • 16 Accesses

Synonyms

Affix removal; Suffix stripping; Suffixing; Word conflation

Definition

Stemming is a process by which word endings or other affixes are removed or modified in order that word forms which differ in non-relevant ways may be merged and treated as equivalent. A computer program which performs such a transformation is referred to as a stemmer or stemming algorithm. The output of a stemming algorithm is known as a stem.

Historical Background

The need for stemming first arose in the field of information retrieval (IR), where queries containing search terms need to be matched against document surrogates containing index terms. With the development of computer-based systems for IR, the problem immediately arose that a small difference in form between a search term and an index term could result in a failure to retrieve some relevant documents. Thus, if a query used the term “explosion” and a document was indexed by the term “explosives,” there would be no match on this term (whether or...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Adamson GW, Boreham J. The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Inf Process Manage. 1974;10(7/8):253–60.

    MATH  Google Scholar 

  2. Ahmad F, Yusoff M, Sembok MT. Experiments with a stemming algorithm for Malay words. J Am Soc Inf Sci Technol. 1996;47(12):909–18.

    Article  Google Scholar 

  3. Al-Sughaiyer IA, Al-Kharashi IA. Arabic morphological analysis techniques: a comprehensive survey. J Am Soc Inf Sci Technol. 2004;55(3):189–213.

    Article  Google Scholar 

  4. Aljlayl M, Frieder O. On arabic search: improving the retrieval effectiveness via a light stemming approach. In: Proceedings of the International Conference on Information and Knowledge Management; 2002. p. 340–7.

    Google Scholar 

  5. Bacchin M, Ferro N, Melluci M. A probabilistic model for stemmer generation. Inf Process Manage. 2005;41(1):121–37.

    Article  Google Scholar 

  6. Frakes WB, Fox CJ. Strength and similarity of affix removal stemming algorithms. SIGIR Forum. 2003;37(1):26–30.

    Article  Google Scholar 

  7. Harman D. How effective is suffixing? J Am Soc Inf Sci. 1991;42(1):7–15.

    Article  Google Scholar 

  8. Hull D. A Stemming algorithms: a case study for detailed evaluation. J Am Soc Inf Sci. 1996;47(1):70–84.

    Article  Google Scholar 

  9. Krovetz R. Viewing morphology as an inference process. Artificial Intelligence. 2000;118(1/2):277–94.

    Article  MATH  Google Scholar 

  10. Lennon M, Pierce DS, Tarry BD, Willett P. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3(4):177–83.

    Article  Google Scholar 

  11. Lovins JB. Development of a stemming algorithm. Mech Transl Comput Linguist. 1968;11:22–31.

    Google Scholar 

  12. Paice CD. Another stemmer. SIGIR Forum. 1990;24(3):56–61.

    Article  Google Scholar 

  13. Paice CD. A method for the evaluation of stemming algorithms based on error counting. J Am Soc Inf Sci. 1996;47(8):632–49.

    Article  Google Scholar 

  14. Porter MF. An algorithm for suffix stripping. Program. 1980;14(3):130–7.

    Article  Google Scholar 

  15. Xu J, Croft WB. Corpus-based stemming using coocurrence of word variants. ACM Trans Inf Syst. 1998;16(1):61–81.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chris D. Paice .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Paice, C.D. (2018). Stemming. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_942

Download citation

Publish with us

Policies and ethics