Abstract
We investigate the effectiveness of language-dependent approaches to document retrieval, such as stemming and decompounding, and constrast them with language-independent approaches, such as character n-gramming. In order to reap the benefits of more than one type of approach, we also consider the effectiveness of the combination of both types of approaches. We focus on document retrieval in nine European languages: Dutch, English, Finnish, French, German, Italian, Russian, Spanish and Swedish. We look at four different information retrieval tasks: monolingual, bilingual, multilingual, and domain-specific retrieval. The experimental evidence is obtained using the 2003 test suite of the cross-language evaluation forum (CLEF).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Matthews, P.H.: Morphology. Cambridge University Press, Cambridge (1991)
Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual document retrieval for European languages. Information Retrieval 6 (2003)
Harman, D.: How effective is suffixing? Journal of the American Society for Information Science 42, 7–15 (1991)
Hull, D.: Stemming algorithms – a case study for detailed evaluation. Journal of the American Society for Information Science 47, 70–84 (1996)
Monz, C., de Rijke, M.: Shallow morphological analysis in monolingual information retrieval for Dutch, German and Italian. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2001. LNCS, vol. 2406, pp. 262–277. Springer, Heidelberg (2002)
Kamps, J., Monz, C., de Rijke, M.: Combining evidence for cross-language information retrieval. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds.) CLEF 2002. LNCS. Springer, Heidelberg (2003)
CLEF: Cross language evaluation forum (2003), http://www.clef-campaign.org/
Buckley, C., Singhal, A., Mitra, M.: New retrieval approaches using SMART: TREC 4. In: Harman, D. (ed.) The Fourth Text REtrieval Conference (TREC-4), National Institute for Standards and Technology, pp. 25–48. NIST Special Publication 500-236 (1996)
Snowball: Stemming algorithms for use in information retrieval (2003), http://www.snowball.tartarus.org/ .
Frakes, W.: Stemming algorithms. In: Frakes, W., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures & Algorithms, pp. 131–160. Prentice-Hall, Englewood Cliffs (1992)
Porter, M.: An algorithm for suffix stripping. Program 14, 130–137 (1980)
Pohlmann, R., Kraaij, W.: Improving the precision of a text retrieval system with compound analysis. In: Landsbergen, J., Odijk, J., van Deemter, K., Veldhuijzen van Zanten, G. (eds.) Proceedings of the 7th Computational Linguistics in the Netherlands Meeting (CLIN 1996), pp. 115–129 (1996)
McNamee, P., Mayfield, J.: Character n-gram tokenization for European language text retrieval. Information Retrieval 6 (2003)
CLEF-Neuchâtel: CLEF resources at the University of Neuchâtel (2003), http://www.unine.ch/info/clef
Rocchio Jr., J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) The SMART Retrieval System: Experiments in Automatic Document Processing, pp. 313–323. Prentice-Hall, Englewood Cliffs (1971)
Savoy, J.: Combining multiple strategies for effective monolingual and cross-language retrieval. Information Retrieval 6 (2003)
Lee, J.: Combining multiple evidence from different properties of weighting schemes. In: Fox, E., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 180–188. ACM Press, New York (1995)
Fox, E., Shaw, J.: Combination of multiple searches. In: Harman, D. (ed.) The Second Text REtrieval Conference (TREC-2), National Institute for Standards and Technology, pp. 243–252. NIST Special Publication 500-215 (1994)
Efron, B.: Bootstrap methods: Another look at the jackknife. Annals of Statistics 7, 1–26 (1979)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993)
Worldlingo: Online translator (2003), http://www.worldlingo.com/
PROMT-Reverso: Online translator (2003), http://translation2.paralink.com/
Babylon: Online dictionary (2003), http://www.babylon.com/
Kamps, J., Monz, C., de Rijke, M., Sigurbjörnsson, B.: The University of Amsterdam at CLEF-2003. In: Peters, C. (ed.) Results of the CLEF 2003 Cross-Language System Evaluation Campaign, pp. 71–78 (2003)
Schott, H. (ed.): Thesaurus Sozialwissenschaften. Informationszentrum Sozialwissenschaften, Bonn (2002) 2 Bände: Alphabetischer und systematischer Teil
Robertson, S., Walker, S., Beaulieu, M.: Experimentation as a way of life: Okapi at TREC. Information Processing & Management 36, 95–108 (2000)
Hiemstra, D.: Using Language Models for Information Retrieval. PhD thesis, Center for Telematics and Information Technology, University of Twente (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kamps, J., Monz, C., de Rijke, M., Sigurbjörnsson, B. (2004). Language-Dependent and Language-Independent Approaches to Cross-Lingual Text Retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds) Comparative Evaluation of Multilingual Information Access Systems. CLEF 2003. Lecture Notes in Computer Science, vol 3237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30222-3_14
Download citation
DOI: https://doi.org/10.1007/978-3-540-30222-3_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24017-4
Online ISBN: 978-3-540-30222-3
eBook Packages: Springer Book Archive