Abstract
We designed, implemented and evaluated an automated method for query construction for CLIR from Finnish, Swedish and German to English. This method seeks to automatically extract topical information from request sentences written in one of the source languages and to create a target language query, based on translations given by a translation dictionary. We paid particular attention to morphology, compound words and query structure. we tested this approach in the bilingual track of CLEF. All the source languages are compound languages, i.e., languages rich in compound words. A compound word refers to a multi-word expression where the component words are written together. Because source language request words may appear in various inflected forms not included in a translation dictionary, morphological normalization was used to aid dictionary translation. The query resulting from this process may be structured according to the translation alternatives of each source language word or remain as an unstructured word list.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Sparck Jones, K. (1999). What is the role of NLP in text retrieval. In T. Strzalkowski (Ed.) Natural language information retrieval. Dordrecht: Kluwer Academic Publishers.
Hull, D. & Grefenstette, G. (1996). Querying across languages: A dictionary-based approach to multilingual information retrieval. In: Proceedings of the 19 th ACM /SIGIR Conference, pp. 49–57.
Oard, D. & Dorr, B. (1996). A survey of multilingual text retrieval Technical Report UMIACS-TR-96-19. University of Maryland, Institute for Advanced Computer Studies.
Pirkola, A. 1999. Studies on linguistic problems and methods in text retrieval. Ph.D. Thesis, University of Tampere. Acta Universitatis Tamperensis 672.
Sperer, R. & Oard, D.W. (2000). Structured translation for cross-language IR. In Proceedings of the 23 rd ACM/Sigir Conference, pp. 120–127.
Pirkola, A., Hedlund, T., Keskustalo, H., Järvelin, K. (2000). Cross-Lingual Information Retrieval Problems: Methods and findings for three language pairs. ProLISSa Progress in Library and Information Science in Southern Africa. First biannual DISSAnet Conference. Pretoria, 26–27 October 2000.
Pirkola, A. (1998). The Effects of Query Structure and Dictionary Setups in Dictionary-Based Cross-language Information Retrieval. In Proceedings of the 21st ACM/SIGIR Conference, pp. 55–63
Hedlund, T., Pirkola, A. and Järvelin, K. (2000). Aspects of Swedish Morphology and Semantics from the Perspective of Mono-and Cross-language Information Retrieval. Information Processing & Management vol. 37/1 pp.147–161 dec. 2000.
Kekäläinen, J. (1999). The effects of query complexity, expansion and structure on retrieval performance in probabilistic text retrieval. Ph.D. thesis, University of Tampere. Acta Universitatis Tamperensis 678.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hedlund, T., Keskustalo, H., Pirkola, A., Sepponen, M., Järvelin, K. (2001). Bilingual Tests with Swedish, Finnish, and German Queries: Dealing with Morphology, Compound Words, and Query Structure. In: Peters, C. (eds) Cross-Language Information Retrieval and Evaluation. CLEF 2000. Lecture Notes in Computer Science, vol 2069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44645-1_20
Download citation
DOI: https://doi.org/10.1007/3-540-44645-1_20
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42446-8
Online ISBN: 978-3-540-44645-3
eBook Packages: Springer Book Archive