Skip to main content

Spanish Monolingual Track: The Impact of Stemming on Retrieval

  • Conference paper
  • First Online:
Evaluation of Cross-Language Information Retrieval Systems (CLEF 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2406))

Included in the following conference series:

Abstract

Most of the techniques used in Information Retrieval rely on the identification of terms from queries and documents, as much to carry out calculations based on the frequencies of these terms as to carry out comparisons between documents and queries. Terms coming from the same stem, either by morphological inflection or through derivation, can be presumed to have semantic proximity. The conflation of these words to a common form can produce improvements in retrieval. The stemming mechanisms used depend directly on each language. In this paper, a stemmer for Spanish and the tests conducted by applying it to the CLEF Spanish document collection are described, and the results are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hull, D.: Stemming algorithms: a case study for detailed evaluation. JASIS 47 (1996)

    Google Scholar 

  2. Porter, M.F.: An algorithm for suffix stripping. Program 14 (1980) 130–137

    Google Scholar 

  3. Harman, D.: How effective is suffixing? JASIS 42 (1991) 7–15

    Article  MathSciNet  Google Scholar 

  4. Popovic, M., Willet, P.: The effectiveness of stemming for natural-language access to Slovene textual data. JASIS 43 (1992) 384–390

    Article  Google Scholar 

  5. Krovetz, R.: Viewing morphology as an inference process. In: SIGIR 93. (1993) 191–203

    Google Scholar 

  6. Bell, C., Jones, K.P.: Toward everyday languaje information retrieval system via minicomputer. JASIS 30 (1979) 334–338

    Article  Google Scholar 

  7. Lovins, J.B.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11 (1968) 22–31

    Google Scholar 

  8. Dawson, J.: Suffix removal and word conflation. ALLC bulletin 2 (1974) 33–46

    Google Scholar 

  9. Paice, C.D.: Another stemmer. In: SIGIR 90. (1990) 56–61

    Article  Google Scholar 

  10. Schinke, R., Robertson, A., Willet, P., Greengrass, M.: A stemming algorithm for Latin text databases. Journal of Documentation 52 (1996) 172–187

    Article  Google Scholar 

  11. Ahmad, F., Yussof, M., Sembok, M.T.: Experiments with a stemming algorithm for Malay words. JASIS 47 (1996) 909–918

    Article  Google Scholar 

  12. Savoy, J.: Effectiveness of information retrieval systems used in a hypertext environment. Hypermedia 5 (1993) 23–46

    Google Scholar 

  13. Savoy, J.: A stemming procedure and stopword list for general French corpora. JASIS 50 (1999) 944–952

    Article  Google Scholar 

  14. Abu-Salem, H., Al-Omari, M., Evens, M.W.: Stemming methodologies over individual queries words for an Arabian information retrieval system. JASIS 50 (1999) 524–529

    Article  Google Scholar 

  15. Kraaij, W., Pohlmann, R.: Porter’s stemming algorithm for Dutch. In Noordman, L.G.M., de Vroomen, W.A.M., eds.: Informatiewetenschap, Tilburg, STINFON (1994)

    Google Scholar 

  16. Kraaij, W., Pohlmann, R.: Viewing stemming as recall enhancement. In: SIGIR 96. (1996) 40–48

    Google Scholar 

  17. Kalamboukis, T.Z.: Suffix stripping with moderm Greek. Program 29 (1995) 313–321

    Google Scholar 

  18. Harman, D.: The TREC conferences. In: Proceedings of the HIM’95 (Hypertext-Information Retrieval-Multimedia). (1995) 9–23

    Google Scholar 

  19. Figuerola, C.G.: La investigación sobre recuparación de la información en español. In Gonzalo García, C. y García Yebra, V., ed.: Documentación, Terminología y Traducción, Madrid, Síntesis (2000) 73–82

    Google Scholar 

  20. Rodríguez, S., Carretero, J.: A formal approach to Spanish morphology: the Coes tools. In: XII Congreso de la SEPLN, Sevilla (1996) 118–126

    Google Scholar 

  21. Carmona, J., Cervell, S., Márquez, L., Martí, M., Padrón, L., Placer, R., Rodríguez, H., Taulé, M., Turmo, J.: An environment for morphosyntactic processing of unrestricted Spanish text. In: Proceedings of the First International Conference on Language Resources and Evaluation (LREC’98), Granada, Spain (1998)

    Google Scholar 

  22. Santana, O., Pérez, J., Hernández, Z., Carreras, F., Rodríguez, G.: Flaver: Flexionador y lematizador automático de formas verbales. Lingüística Española Actual XIX (1997) 229–282

    Google Scholar 

  23. Santana, O., Pérez, J., Carreras, F., Duque, J., Hernández, Z., Rodríguez, G.: Flanom: Flexionador y lematizador automático de formas nominales. Lingüística Española Actual XXI (1999) 253–297

    Google Scholar 

  24. Robertson, A., Willet, P.: Applications of n-grams in textual information systems. Journal of Documentation 54 (1999) 28–47

    Google Scholar 

  25. Figuerola, C.G., Gómez, R., de San Román, E.L.: Stemming and n-grams in spanish: an evaluation of their impact on information retrieval. Journal of Information Science 26 (2000) 461–467

    Article  Google Scholar 

  26. Figuerola, C.G., Berrocal, J.L.A., Rodríguez, A.F.Z.: Disseny d’un motor de re-cuperació d’informació per a ús experimental i educatú = diseño de un motor de recuperación de información para uso experimental y educativo. BiD. textos universitaris de biblioteconomia i documentació 4 (2000)

    Google Scholar 

  27. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)

    MATH  Google Scholar 

  28. Harman, D.: Ranking algorithms. In: Information retrieval: data structures and algorithms, Upple Saddle River, NJ, Prentice-Hall (1992) 363–392

    Google Scholar 

  29. Salton, G.: Automatic Text Processing. Adisson-Wesley, Reading, MA (1989)

    Google Scholar 

  30. Harman, D. In: Relevance Feedback and Others Query Modification Techniques. Prentice-Hall, Upple Saddle River, NJ (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Figuerola, C.G., Gómez, R., Rodríguez, A.F.Z., Berrocal, J.L.A. (2002). Spanish Monolingual Track: The Impact of Stemming on Retrieval. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Evaluation of Cross-Language Information Retrieval Systems. CLEF 2001. Lecture Notes in Computer Science, vol 2406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45691-0_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-45691-0_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44042-0

  • Online ISBN: 978-3-540-45691-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics