Abstract
This paper describes the official runs of TNO TPD for CLEF-2001. We participated in the monolingual, bilingual and multilingual tasks. The main contribution of this paper is a systematic comparison of three types of translation resources for bilingual retrieval based on query translation. We compared several techniques based on machine readable dictionaries, statistical dictionaries generated from parallel corpora with a baseline of the Babelfish MT service, which is available on the web. The study showed that the topic set is too small to draw reliable conclusions. All three methods have the potential to reach about 90% of the monolingual baseline performance, but the effectiveness is not consistent across language pairs and topic collections. Because each of the individual methods are quite sensitive to missing translations, we tested a combination approach, which yielded consistent improvements up to 98% of the monolingual baseline.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Franz, M., McCarley, J.S., Roukos, S.: Ad hoc and multilingual information retrieval at IBM. Ellen Voorhees and Donna Harman, editors, The Seventh Text REtrieval Conference (TREC-7). National Institute for Standards and Technology, 1999. Special Publication 500-242.
Braschler, M., Schäuble, P.: Carol Peters, editor, Cross-Language Information Retrieval and Evaluation, number 2069 in Lecture Notes in Computer Science. Springer Verlag, 2001.
Hiemstra, D. A linguistically motivated probabilistic model of information retrieval. Christos Nicolaou and Constantine Stephanides, editors, Research and Advanced Technology for Digital Libraries-Second European Conference, ECDL’98, Proceedings, number 1513 in Lecture Notes in Computer Science, pages 569–584 Springer Verlag, September 1998.
Kraaij, W., Pohlmann, R., Hiemstra, D.: Twenty-one at TREC-8: using language technology for information retrieval. The Eighth Text Retrieval Conference (TREC-8). National Institute for Standards and Technology, 2000.
Hiemstra, D., Kraaij, W., Pohlmann, R., Westerveld, T.: Twenty-one at clef-2000: Translation resources, merging strategies and relevance feedback. Carol Peters, editor, Cross-Language Information Retrieval and Evaluation, number 2069 in Lecture Notes in Computer Science. Springer Verlag, 2001.
Amit Singhal, Chris Buckley, and Mandar Mitra. Pivoted document length normalization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 21–29, 1996.
Robertson, S.E.: and Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 232–241, 1994.
Kraaij, W., and Pohlmann, R.: Viewing stemming as recall enhancement. Hans-Peter Frei, Donna Harman, Peter Schäuble, and Ross Wilkinson, editors, Proceedings of the 19th ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR96), pages 40–48, 1996.
Porter, M.F.:, An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
Kraaij, W.,and Pohlmann, R.: Porter’s stemming algorithm for Dutch. In L.G.M. Noordman and W.A.M. de Vroomen, editors, Informatiewetenschap 1994: Weten-schappelijke bijdragen aan de derde STINFON Conferentie, pages 167–180, 1994.
Hull, D.: Stemming algorithms — a case study for detailed evaluation. Journal of the American Society for Information Science, 47(1), 1996.
McNamee, P. and Mayfield, J.: A language-independent approach to european text retrieval. Carol Peters, editor, Cross-Language Information Retrieval and Evaluation, number 2069 in Lecture Notes in Computer Science. Springer Verlag, 2001.
Nie, J.Y., Simard, M., Isabelle, P., Durand, R.: Cross-language information retrieval based on parallel texts an d automatic mining of parallel texts in the web. Proceedings of the 22nd ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR99), pages 74–81, 1999.
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., and Mercer, R.L.,: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263–311, June 1993.
Vosse, T. G.: The Word Connection. PhD thesis, Rijksuniversiteit Leiden, Neslia Paniculata Uitgeverij, Enschede, 1994.
Nie, J.Y., Simard, M., Foster, G.,: Using parallel web pages for multi-lingual ir. Carol Peters, editor, Cross-Language Information Retrieval and Evaluation, number 2069 in Lecture Notes in Computer Science. Springer Verlag, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kraaij, W. (2002). TNO at CLEF-2001: Comparing Translation Resources. In: Peters, C., Braschler, M., Gonzalo, J., Kluck, M. (eds) Evaluation of Cross-Language Information Retrieval Systems. CLEF 2001. Lecture Notes in Computer Science, vol 2406. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45691-0_6
Download citation
DOI: https://doi.org/10.1007/3-540-45691-0_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44042-0
Online ISBN: 978-3-540-45691-9
eBook Packages: Springer Book Archive