Abstract
This paper describes the experiments of our team for CLEF 2001, which include both official and post-submission runs. We took part in the monolingual task for Dutch, German, and Italian. The focus of our experiments was on the effects of morphological analyses, such as stemming and compound splitting, on retrieval effectiveness. Confirming earlier reports on retrieval in compound splitting languages such as Dutch and German, we found improvements to be around 25% for German and as much as 69% for Dutch. For Italian, lexicon-based stemming resulted in gains of up to 25%.
Supported by the Physical Sciences Council with financial support from the Netherlands Organization for Scientific Research (NWO), project 612-13-001.
Supported by the Spinoza project ‘Logic in Action’ and by grants from the Netherlands Organization for Scientific Research (NWO), under project numbers 612-13-001, 365-20-005, 612.069.006, 612.000.106, and 220-80-001.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
M. Adda-Decker and G. Adda. Morphological decomposition for ASR in German. In Workshop on Phonetics and Phonology in Automatic Speech Recognition, 2000.
R. Baayen, R. Piepenbrock, and L. Gulikers. The CELEX lexical database (release 2). Distributed by the Linguistic Data Consortium, University of Pennsylvania, 1995.
C. Buckley, A. Singhal, and M. Mitra. New retrieval approaches using SMART: TREC 4. In D. Harman, editor, Proceedings of the Fourth Text REtrieval Conference (TREC-4), pages 25–48. NIST Special Publication 500-236, 1995.
G. Drosdowski, editor. Duden: Grammatik der deutschen Gegenwartssprache. Dudenverlag, fourth edition, 1984.
J. Fagan. Experiments in Automatic Phrase Indexing for Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods. PhD thesis, Department of Computer Science, Cornell University, 1987.
W. Frakes. Stemming algorithms. In W. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, pages 131–160. Prentice Hall, 1992.
J. Goodman. Parsing algorithms and metrics. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL’96), pages 177–183, 1996.
D. Harman. How effective is suffixing? Journal of the American Society for Information Science, 42:7–15, 1991.
W. Kraaij and R. Pohlmann. Viewing stemming as recall enhancement. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 40–48, 1996.
W. Kraaij and R. Pohlmann. Comparing the effect of syntactic vs. statistical phrase index strategies for Dutch. In Proceedings ECDL’98, pages 605–617, 1998.
M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 206–214, 1998.
I. Moulinier, J. McCulloh, and E. Lund. West Group at 2001: Non-English monolingual retrieval. In Proceedings CLEF-2000, 2000.
R. Pohlmann and W. Kraaij. Improving the precision of a text retrieval system with compound analysis. In J. Landsbergen, J. Odijk, K. van Deemter, and G. Veldhuijzen van Zanten, editors, Proceedings of the 7th Computational Linguistics in the Netherlands Meeting (CLIN 1996), pages 115–129, 1996.
M. Porter. An algorithm for suffix stripping. Program, 14(3): 130–137, 1980.
J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System — Experiments in Automatic Document Processing. Prentice Hall, 1971.
H. Schmid. Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing, 1994.
A. Singhal, G. Salton, M. Mitra, and C. Buckley. Document length normalization. Information Processing & Management, 32(5):619–633, 1996.
K. Sparck Jones. Automatic indexing. Journal of Documentation, 30(4):393–432, 1974.
T. Strzalkowski. Natural language information retrieval. Information Processing & Management, 31(3):397–417, 1995.
UPLIFT: Utrecht project: Linguistic information for free text retrieval. http://www-uilots.let.uu.nl/~uplift/.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Monz, C., de Rijke, M. (2002). Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Evaluation of Cross-Language Information Retrieval Systems. CLEF 2001. Lecture Notes in Computer Science, vol 2406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45691-0_24
Download citation
DOI: https://doi.org/10.1007/3-540-45691-0_24
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44042-0
Online ISBN: 978-3-540-45691-9
eBook Packages: Springer Book Archive