Advertisement

Experiments to Evaluate Probabilistic Models for Automatic Stemmer Generation and Query Word Translation

  • Giorgio M. Di Nunzio
  • Nicola Ferro
  • Massimo Melucci
  • Nicola Orio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3237)

Abstract

The paper describes statistical methods and experiments for stemming and for the translation of query words used in the monolingual and bilingual tracks in CLEF 2003. While there is still room for improvement in the method proposed for the bilingual track, the approach adopted for the monolingual track makes it possible to generate stemmers which learn directly how to stem the words in a document from a training word list extracted from the document collection, with no need for language-dependent knowledge. The experiments suggest that statistical approaches to stemming are as effective as classical algorithms which encapsulate predefined linguistic rules.

Keywords

Probabilistic Model Hide Markov Model Target Word Average Precision Probabilistic Framework 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agosti, M., Bacchin, M., Ferro, N., Melucci, M.: Improving the automatic retrieval of text documents. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785, pp. 279–290. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  2. 2.
    Bacchin, M., Ferro, N., Melucci, M.: The effectiveness of a graph-based algorithm for stemming. In: Proceedings of the Internation Conference on Asian Digital Libraries, Singapore, pp. 117–128 (2002)Google Scholar
  3. 3.
    Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs (1993)Google Scholar
  4. 4.
    Di Nunzio, G.: The CLEF 2003 lexer, http://www.dei.unipid.it/~dinunzio/CLEF2003Lexer.pdf
  5. 5.
    Di Nunzio, G., Ferro, N., Melucci, M., Orio, N.: The University of Padova at CLEF 2003: Experiments to evaluate probabilistic models for automatic stemmer generation and query word translation. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 211–223. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  6. 6.
    Gibbons, J.D.: Nonparametric Statistical Inference, 2nd edn. Marcel Dekker, Inc., New York (1985)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Giorgio M. Di Nunzio
    • 1
  • Nicola Ferro
    • 1
  • Massimo Melucci
    • 1
  • Nicola Orio
    • 1
  1. 1.Department of Information EngineeringUniversity of PadovaPadovaItaly

Personalised recommendations