Abstract
JHU/APL continued its participation in multilingual retrieval at CLEF in 2006. We again applied our hallmark technique for combating language diversity and morphological complexity: character n-gram tokenization. This year we participated in the ad hoc cross-language track and submitted both monolingual and bilingual runs. Our experimental results this year agree with our previous reports that n-grams perform especially well in linguistically complex languages, notably Bulgarian and Hungarian, where monolingual improvements of 27% and 70% respectively were observed compared to space-delimited word forms. As in CLEF 2005, our bilingual submissions made use of subword translation, statistical translation of character n-grams using aligned corpora, when parallel data were available, and web-based machine translation, when no suitable data was available to us.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Hiemstra, D.: Using Language Models for Information Retrieval. Ph. D. Thesis, Center for Telematics and Information Technology, The Netherlands (2000)
Koehn, P.: Europarl: A multilingual corpus for evaluation of machine translation. Unpublished, http://www.isi.edu/koehn/publications/europarl/
McNamee, P., Mayfield, J.: Character N-gram Tokenization for European Language Text Retrieval. Information Retrieval 7(1-2), 73–97 (2004)
McNamee, P., Mayfield, J.: Translating Pieces of Words. In: Proceedings of the 28th Annual International Conference on Research and Development in Information Retrieval (SIGIR-2005), Salvador, Brazil, pp. 643–644 (August 2005)
McNamee, P.: Exploring New Languages at CLEF 2005. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol. 4022, Springer, Heidelberg (2006)
Pirkola, A., Hedlund, T., Keskusalo, H., Järvelin, K.: Dictionary-Based Cross-Language Information Retrieval: Problems, Methods, and Research Findings. Information Retrieval 4, 209–230 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
McNamee, P. (2007). JHU/APL Ad Hoc Experiments at CLEF 2006. In: Peters, C., et al. Evaluation of Multilingual and Multi-modal Information Retrieval. CLEF 2006. Lecture Notes in Computer Science, vol 4730. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74999-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-74999-8_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74998-1
Online ISBN: 978-3-540-74999-8
eBook Packages: Computer ScienceComputer Science (R0)