Skip to main content

Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian

  • Conference paper
  • First Online:
Evaluation of Cross-Language Information Retrieval Systems (CLEF 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2406))

Included in the following conference series:

Abstract

This paper describes the experiments of our team for CLEF 2001, which include both official and post-submission runs. We took part in the monolingual task for Dutch, German, and Italian. The focus of our experiments was on the effects of morphological analyses, such as stemming and compound splitting, on retrieval effectiveness. Confirming earlier reports on retrieval in compound splitting languages such as Dutch and German, we found improvements to be around 25% for German and as much as 69% for Dutch. For Italian, lexicon-based stemming resulted in gains of up to 25%.

Supported by the Physical Sciences Council with financial support from the Netherlands Organization for Scientific Research (NWO), project 612-13-001.

Supported by the Spinoza project ‘Logic in Action’ and by grants from the Netherlands Organization for Scientific Research (NWO), under project numbers 612-13-001, 365-20-005, 612.069.006, 612.000.106, and 220-80-001.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Adda-Decker and G. Adda. Morphological decomposition for ASR in German. In Workshop on Phonetics and Phonology in Automatic Speech Recognition, 2000.

    Google Scholar 

  2. R. Baayen, R. Piepenbrock, and L. Gulikers. The CELEX lexical database (release 2). Distributed by the Linguistic Data Consortium, University of Pennsylvania, 1995.

    Google Scholar 

  3. C. Buckley, A. Singhal, and M. Mitra. New retrieval approaches using SMART: TREC 4. In D. Harman, editor, Proceedings of the Fourth Text REtrieval Conference (TREC-4), pages 25–48. NIST Special Publication 500-236, 1995.

    Google Scholar 

  4. G. Drosdowski, editor. Duden: Grammatik der deutschen Gegenwartssprache. Dudenverlag, fourth edition, 1984.

    Google Scholar 

  5. J. Fagan. Experiments in Automatic Phrase Indexing for Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods. PhD thesis, Department of Computer Science, Cornell University, 1987.

    Google Scholar 

  6. W. Frakes. Stemming algorithms. In W. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms, pages 131–160. Prentice Hall, 1992.

    Google Scholar 

  7. J. Goodman. Parsing algorithms and metrics. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL’96), pages 177–183, 1996.

    Google Scholar 

  8. D. Harman. How effective is suffixing? Journal of the American Society for Information Science, 42:7–15, 1991.

    Article  MathSciNet  Google Scholar 

  9. W. Kraaij and R. Pohlmann. Viewing stemming as recall enhancement. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 40–48, 1996.

    Google Scholar 

  10. W. Kraaij and R. Pohlmann. Comparing the effect of syntactic vs. statistical phrase index strategies for Dutch. In Proceedings ECDL’98, pages 605–617, 1998.

    Google Scholar 

  11. M. Mitra, A. Singhal, and C. Buckley. Improving automatic query expansion. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 206–214, 1998.

    Google Scholar 

  12. I. Moulinier, J. McCulloh, and E. Lund. West Group at 2001: Non-English monolingual retrieval. In Proceedings CLEF-2000, 2000.

    Google Scholar 

  13. R. Pohlmann and W. Kraaij. Improving the precision of a text retrieval system with compound analysis. In J. Landsbergen, J. Odijk, K. van Deemter, and G. Veldhuijzen van Zanten, editors, Proceedings of the 7th Computational Linguistics in the Netherlands Meeting (CLIN 1996), pages 115–129, 1996.

    Google Scholar 

  14. M. Porter. An algorithm for suffix stripping. Program, 14(3): 130–137, 1980.

    Google Scholar 

  15. J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System — Experiments in Automatic Document Processing. Prentice Hall, 1971.

    Google Scholar 

  16. H. Schmid. Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing, 1994.

    Google Scholar 

  17. A. Singhal, G. Salton, M. Mitra, and C. Buckley. Document length normalization. Information Processing & Management, 32(5):619–633, 1996.

    Article  Google Scholar 

  18. K. Sparck Jones. Automatic indexing. Journal of Documentation, 30(4):393–432, 1974.

    Article  Google Scholar 

  19. T. Strzalkowski. Natural language information retrieval. Information Processing & Management, 31(3):397–417, 1995.

    Article  Google Scholar 

  20. UPLIFT: Utrecht project: Linguistic information for free text retrieval. http://www-uilots.let.uu.nl/~uplift/.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Monz, C., de Rijke, M. (2002). Shallow Morphological Analysis in Monolingual Information Retrieval for Dutch, German, and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Evaluation of Cross-Language Information Retrieval Systems. CLEF 2001. Lecture Notes in Computer Science, vol 2406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45691-0_24

Download citation

  • DOI: https://doi.org/10.1007/3-540-45691-0_24

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-44042-0

  • Online ISBN: 978-3-540-45691-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics