A Language-Independent Approach to European Text Retrieval

McNamee, Paul; Mayfield, James; Piatko, Christine

doi:10.1007/3-540-44645-1_12

Paul McNamee⁵,
James Mayfield⁵ &
Christine Piatko⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2069))

Included in the following conference series:

Workshop of the Cross-Language Evaluation Forum for European Languages

324 Accesses
9 Citations

Abstract

We present an approach to multilingual information retrieval that does not depend on the existence of specific linguistic resources such as stemmers or thesauri. Using the HAIRCUT system we participated in the monolingual, bilingual, and multilingual tasks of the CLEF-2000 evaluation. Our approach, based on combining the benefits of words and character n-grams, was effective for both language-independent monolingual retrieval as well as for cross-language retrieval using translated queries. After describing our monolingual retrieval approach we compare a translation method using aligned parallel corpora to commercial machine translation software.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Braschler, M-Y. Kan, and P. Schauble, ‘The SPIDER Retrieval System and the TREC-8 Cross-Language Track.’ In E. M. Voorhees and D. K. Harman, eds., Proceedings of the Eighth Text REtrieval Conference (TREC-8). To appear.
Google Scholar
K. W. Church and P. Hanks, ‘Word Association Norms, Mutual Information, and Lexicography.’ In Computational Linguistics, 6(1), 22–29, 1990.
Google Scholar
D. Hiemstra and A. de Vries, ‘Relating the new language models of information retrieval to the traditional retrieval models.’ CTIT Technical Report TR-CTIT-00-09, May 2000.
Google Scholar
T. K. Landauer and M. L. Littman, ‘Fully automated cross-language document retrieval using latent semantic indexing.’ In the Proceedings of the Sixth Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research. 31–38, 1990.
Google Scholar
Linguistic Data Consortium (LDC), http://www.ldc.upenn.edu
J. Mayfield and P. McNamee, ‘Indexing Using Both N-grams and Words.’ E. M. Voorhees and D. K. Harman, eds., Proceedings of the Seventh Text REtrieval Conference (TREC-7), NIST Special Publication 500-242, August 1999.
Google Scholar
J. Mayfield, P. McNamee, and C. Piatko, ‘The JHU/APL HAIRCUT System at TREC-8.’ In E. M. Voorhees and D. K. Harman, eds., Proceedings of the Eighth Text REtrieval Conference (TREC-8). To appear.
Google Scholar
D. R. H. Miller, T. Leek, and R. M. Schwartz, ‘A Hidden Markov Model Information Retrieval System.’ In the Proceedings of the 22^nd International Conference on Research and Development in Information Retrieval (SIGIR-99), pp. 214–221, August 1999.
Google Scholar
E. Miller, D. Shen, J. Liu, and C. Nicholas, ‘Performance and Scalability of a Large-Scale N-gram Based Information Retrieval System.’ In the Journal of Digital Information, 1(5), January 2000.
Google Scholar
J. Ponte and W. B. Croft, ‘A Language Modeling Approach to Information Retrieval.’ In the Proceedings of the 21^st International Conference on Research and Development in Information Retrieval (SIGIR-98), pp. 275–281, August 1998.
Google Scholar
Recherche Appliquée en Linguistic (RALI), http://www-rali.iro.umontreal.ca

Download references

Author information

Authors and Affiliations

Johns Hopkins University Applied Physics Lab, 11100 Johns Hopkins Road, Laurel, MD, 20723-6099, USA
Paul McNamee, James Mayfield & Christine Piatko

Authors

Paul McNamee
View author publications
You can also search for this author in PubMed Google Scholar
James Mayfield
View author publications
You can also search for this author in PubMed Google Scholar
Christine Piatko
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Istituto di Elaborazione della Informazione, Via Moruzzi, 1, 56124, Pisa, Italy
Carol Peters

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

McNamee, P., Mayfield, J., Piatko, C. (2001). A Language-Independent Approach to European Text Retrieval. In: Peters, C. (eds) Cross-Language Information Retrieval and Evaluation. CLEF 2000. Lecture Notes in Computer Science, vol 2069. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44645-1_12

Download citation

DOI: https://doi.org/10.1007/3-540-44645-1_12
Published: 17 September 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42446-8
Online ISBN: 978-3-540-44645-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics