Abstract
This paper describes our approach to the 2006 Adhoc Mo-nolingual Information Retrieval run for French. The goal of our experiment was to compare the performance of a proposed statistical stemmer with that of a rule-based stemmer, specifically the French version of Porter’s stemmer. The statistical stemming approach is based on lexicon clustering, using a novel string distance measure. We submitted three official runs, besides a baseline run that uses no stemming. The results show that stemming significantly improves retrieval performance (as expected) by about 9-10%, and the performance of the statistical stemmer is comparable with that of the rule-based stemmer.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Buckley, C., Singhal, A., Mitra, M.: Using Query Zoning and Correlation within SMART: TREC5. In: Voorhees, E.M., Harman, D.K. (eds.) Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication, pp. 500–238 (November 1997)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Comput. Surv. 31(3), 264–323 (1999)
Salton, G.: The SMART Retrieval System—Experiments in Automatic Document Retrieval. Prentice Hall Inc., Englewood Cliffs (1971)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Majumder, P., Mitra, M., Datta, K. (2007). Statistical vs. Rule-Based Stemming for Monolingual French Retrieval. In: Peters, C., et al. Evaluation of Multilingual and Multi-modal Information Retrieval. CLEF 2006. Lecture Notes in Computer Science, vol 4730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74999-8_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-74999-8_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74998-1
Online ISBN: 978-3-540-74999-8
eBook Packages: Computer ScienceComputer Science (R0)