Advertisement

The Impact of Word Normalization Methods and Merging Strategies on Multilingual IR

  • Eija Airio
  • Heikki Keskustalo
  • Turid Hedlund
  • Ari Pirkola
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3237)

Abstract

This article deals with both multilingual and bilingual IR. The source language is English, and the target languages are English, German, Finnish, Swedish, Dutch, French, Italian and Spanish. The approach of separate indexes is followed, and four different merging strategies are tested. Two of the merging methods are classical basic methods: the Raw Score method and the Round Robin method. Two simple new merging methods were created: the Dataset Size Based method and the Score Difference Based method. Two kinds of indexing methods are tested: morphological analysis and stemming. Morphologically analyzed indexes perform a slightly better than stemmed indexes. The merging method based on the dataset size performs best.

Keywords

Average Precision Dataset Size Source Language Separate Index Merging Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hedlund, T., Keskustalo, H., Pirkola, A., Airio, E., Järvelin, K.: Utaclir @ CLEF 2001 - Effects of Compound Splitting and N-Gram Techniques. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 118–136. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  2. 2.
    Krovetz, R.: Viewing Morphology as an Inference Process. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 191–202 (1993)Google Scholar
  3. 3.
    Koskenniemi, K.: Two-level Morphology: A General Computational Model for Word-Form Recognition and Production. University of Helsinki, Finland, Publications No. 11 (1983)Google Scholar
  4. 4.
    Hiemstra, D., Kraaij, W., Pohlmann, R., Westerveld, T.: Translation Resources, Merging Strategies, and Relevance Feedback for Cross-Language Information Retrieval. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 102–115. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  5. 5.
    Chen, A.: Cross-language Retrieval Experiments at CLEF 2002. In: Working Notes for the CLEF 2002 Workshop, Italy, pp. 5–20 (2002) Google Scholar
  6. 6.
    Moulinier, I., Molina-Salgado, H.: Thomson Legal and Regulatory Experiments for CLEF 2002. In: Working Notes for the CLEF 2002 Workshop, pp. 91–96 (2002)Google Scholar
  7. 7.
    Savoy, J., Rasolofo, Y.: Report on the TREC-9 Experiment: Link-Based Retrieval and Distributed Collections. In: Proceedings of the Ninth Text Retrieval Conference, NIST Special Publication 500-249, Department of Commerce, National Institute of Standards and Technology, pp. 579–588 (2001)Google Scholar
  8. 8.
    Airio, E., Keskustalo, H., Hedlund, T., Pirkola, A.: UTACLIR @ CLEF2002 – Bilingual and Multilingual Runs with a Unified Process. Advances in Cross-Language Information Retrieval. In: Peters, C., Braschler, M., Gonzalo, J. (eds.) CLEF 2002. LNCS, vol. 2785. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  9. 9.
    Braschler, M., Ripplinger, B.: Stemming and Decompounding for German Text Retrieval. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 177–192. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  10. 10.
    Chen, A.: Multilingual Information Retrieval Using English and Chinese Queries. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 44–58. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  11. 11.
    Nie, J.: Towards a unified approach to CLIR and multilingual IR. SIGIR 2002 Workshop I, Cross-language information retrieval: a research map. University of Tampere, Finland, pp. 8–14 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Eija Airio
    • 1
  • Heikki Keskustalo
    • 1
  • Turid Hedlund
    • 1
  • Ari Pirkola
    • 1
  1. 1.Department of Information StudiesUniversity of TampereFinland

Personalised recommendations