Synonyms
Affix removal; Suffix stripping; Suffixing; Word conflation
Definition
Stemming is a process by which word endings or other affixes are removed or modified in order that word forms which differ in non-relevant ways may be merged and treated as equivalent. A computer program which performs such a transformation is referred to as a stemmer or stemming algorithm. The output of a stemming algorithm is known as a stem.
Historical Background
The need for stemming first arose in the field of information retrieval (IR), where queries containing search terms need to be matched against document surrogates containing index terms. With the development of computer-based systems for IR, the problem immediately arose that a small difference in form between a search term and an index term could result in a failure to retrieve some relevant documents. Thus, if a query used the term “explosion” and a document was indexed by the term “explosives,” there would be no match on this term (whether or...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Adamson GW, Boreham J. The use of an association measure based on character structure to identify semantically related pairs of words and document titles. Inf Process Manage. 1974;10(7/8):253–60.
Ahmad F, Yusoff M, Sembok MT. Experiments with a stemming algorithm for Malay words. J Am Soc Inf Sci Technol. 1996;47(12):909–18.
Al-Sughaiyer IA, Al-Kharashi IA. Arabic morphological analysis techniques: a comprehensive survey. J Am Soc Inf Sci Technol. 2004;55(3):189–213.
Aljlayl M, Frieder O. On arabic search: improving the retrieval effectiveness via a light stemming approach. In: Proceedings of the International Conference on Information and Knowledge Management; 2002. p. 340–7.
Bacchin M, Ferro N, Melluci M. A probabilistic model for stemmer generation. Inf Process Manage. 2005;41(1):121–37.
Frakes WB, Fox CJ. Strength and similarity of affix removal stemming algorithms. SIGIR Forum. 2003;37(1):26–30.
Harman D. How effective is suffixing? J Am Soc Inf Sci. 1991;42(1):7–15.
Hull D. A Stemming algorithms: a case study for detailed evaluation. J Am Soc Inf Sci. 1996;47(1):70–84.
Krovetz R. Viewing morphology as an inference process. Artificial Intelligence. 2000;118(1/2):277–94.
Lennon M, Pierce DS, Tarry BD, Willett P. An evaluation of some conflation algorithms for information retrieval. J Inf Sci. 1981;3(4):177–83.
Lovins JB. Development of a stemming algorithm. Mech Transl Comput Linguist. 1968;11:22–31.
Paice CD. Another stemmer. SIGIR Forum. 1990;24(3):56–61.
Paice CD. A method for the evaluation of stemming algorithms based on error counting. J Am Soc Inf Sci. 1996;47(8):632–49.
Porter MF. An algorithm for suffix stripping. Program. 1980;14(3):130–7.
Xu J, Croft WB. Corpus-based stemming using coocurrence of word variants. ACM Trans Inf Syst. 1998;16(1):61–81.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Paice, C.D. (2018). Stemming. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_942
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_942
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering