Abstract
Most of the techniques used in Information Retrieval rely on the identification of terms from queries and documents, as much to carry out calculations based on the frequencies of these terms as to carry out comparisons between documents and queries. Terms coming from the same stem, either by morphological inflection or through derivation, can be presumed to have semantic proximity. The conflation of these words to a common form can produce improvements in retrieval. The stemming mechanisms used depend directly on each language. In this paper, a stemmer for Spanish and the tests conducted by applying it to the CLEF Spanish document collection are described, and the results are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hull, D.: Stemming algorithms: a case study for detailed evaluation. JASIS 47 (1996)
Porter, M.F.: An algorithm for suffix stripping. Program 14 (1980) 130–137
Harman, D.: How effective is suffixing? JASIS 42 (1991) 7–15
Popovic, M., Willet, P.: The effectiveness of stemming for natural-language access to Slovene textual data. JASIS 43 (1992) 384–390
Krovetz, R.: Viewing morphology as an inference process. In: SIGIR 93. (1993) 191–203
Bell, C., Jones, K.P.: Toward everyday languaje information retrieval system via minicomputer. JASIS 30 (1979) 334–338
Lovins, J.B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11 (1968) 22–31
Dawson, J.: Suffix removal and word conflation. ALLC bulletin 2 (1974) 33–46
Paice, C.D.: Another stemmer. In: SIGIR 90. (1990) 56–61
Schinke, R., Robertson, A., Willet, P., Greengrass, M.: A stemming algorithm for Latin text databases. Journal of Documentation 52 (1996) 172–187
Ahmad, F., Yussof, M., Sembok, M.T.: Experiments with a stemming algorithm for Malay words. JASIS 47 (1996) 909–918
Savoy, J.: Effectiveness of information retrieval systems used in a hypertext environment. Hypermedia 5 (1993) 23–46
Savoy, J.: A stemming procedure and stopword list for general French corpora. JASIS 50 (1999) 944–952
Abu-Salem, H., Al-Omari, M., Evens, M.W.: Stemming methodologies over individual queries words for an Arabian information retrieval system. JASIS 50 (1999) 524–529
Kraaij, W., Pohlmann, R.: Porter’s stemming algorithm for Dutch. In Noordman, L.G.M., de Vroomen, W.A.M., eds.: Informatiewetenschap, Tilburg, STINFON (1994)
Kraaij, W., Pohlmann, R.: Viewing stemming as recall enhancement. In: SIGIR 96. (1996) 40–48
Kalamboukis, T.Z.: Suffix stripping with moderm Greek. Program 29 (1995) 313–321
Harman, D.: The TREC conferences. In: Proceedings of the HIM’95 (Hypertext-Information Retrieval-Multimedia). (1995) 9–23
Figuerola, C.G.: La investigación sobre recuparación de la información en español. In Gonzalo García, C. y García Yebra, V., ed.: Documentación, Terminología y Traducción, Madrid, Síntesis (2000) 73–82
Rodríguez, S., Carretero, J.: A formal approach to Spanish morphology: the Coes tools. In: XII Congreso de la SEPLN, Sevilla (1996) 118–126
Carmona, J., Cervell, S., Márquez, L., Martí, M., Padrón, L., Placer, R., Rodríguez, H., Taulé, M., Turmo, J.: An environment for morphosyntactic processing of unrestricted Spanish text. In: Proceedings of the First International Conference on Language Resources and Evaluation (LREC’98), Granada, Spain (1998)
Santana, O., Pérez, J., Hernández, Z., Carreras, F., Rodríguez, G.: Flaver: Flexionador y lematizador automático de formas verbales. Lingüística Española Actual XIX (1997) 229–282
Santana, O., Pérez, J., Carreras, F., Duque, J., Hernández, Z., Rodríguez, G.: Flanom: Flexionador y lematizador automático de formas nominales. Lingüística Española Actual XXI (1999) 253–297
Robertson, A., Willet, P.: Applications of n-grams in textual information systems. Journal of Documentation 54 (1999) 28–47
Figuerola, C.G., Gómez, R., de San Román, E.L.: Stemming and n-grams in spanish: an evaluation of their impact on information retrieval. Journal of Information Science 26 (2000) 461–467
Figuerola, C.G., Berrocal, J.L.A., Rodríguez, A.F.Z.: Disseny d’un motor de re-cuperació d’informació per a ús experimental i educatú = diseño de un motor de recuperación de información para uso experimental y educativo. BiD. textos universitaris de biblioteconomia i documentació 4 (2000)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Harman, D.: Ranking algorithms. In: Information retrieval: data structures and algorithms, Upple Saddle River, NJ, Prentice-Hall (1992) 363–392
Salton, G.: Automatic Text Processing. Adisson-Wesley, Reading, MA (1989)
Harman, D. In: Relevance Feedback and Others Query Modification Techniques. Prentice-Hall, Upple Saddle River, NJ (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Figuerola, C.G., Gómez, R., Rodríguez, A.F.Z., Berrocal, J.L.A. (2002). Spanish Monolingual Track: The Impact of Stemming on Retrieval. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Evaluation of Cross-Language Information Retrieval Systems. CLEF 2001. Lecture Notes in Computer Science, vol 2406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45691-0_23
Download citation
DOI: https://doi.org/10.1007/3-540-45691-0_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44042-0
Online ISBN: 978-3-540-45691-9
eBook Packages: Springer Book Archive