Experiments to Evaluate Probabilistic Models for Automatic Stemmer Generation and Query Word Translation
The paper describes statistical methods and experiments for stemming and for the translation of query words used in the monolingual and bilingual tracks in CLEF 2003. While there is still room for improvement in the method proposed for the bilingual track, the approach adopted for the monolingual track makes it possible to generate stemmers which learn directly how to stem the words in a document from a training word list extracted from the document collection, with no need for language-dependent knowledge. The experiments suggest that statistical approaches to stemming are as effective as classical algorithms which encapsulate predefined linguistic rules.
KeywordsProbabilistic Model Hide Markov Model Target Word Average Precision Probabilistic Framework
Unable to display preview. Download preview PDF.
- 2.Bacchin, M., Ferro, N., Melucci, M.: The effectiveness of a graph-based algorithm for stemming. In: Proceedings of the Internation Conference on Asian Digital Libraries, Singapore, pp. 117–128 (2002)Google Scholar
- 3.Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs (1993)Google Scholar
- 4.Di Nunzio, G.: The CLEF 2003 lexer, http://www.dei.unipid.it/~dinunzio/CLEF2003Lexer.pdf
- 5.Di Nunzio, G., Ferro, N., Melucci, M., Orio, N.: The University of Padova at CLEF 2003: Experiments to evaluate probabilistic models for automatic stemmer generation and query word translation. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 211–223. Springer, Heidelberg (2004)CrossRefGoogle Scholar