Multilingual Information Retrieval Using Open, Transparent Resources in CLEF 2003

  • Monica Rogati
  • Yiming Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3237)


Corpus-based approaches to cross-lingual information retrieval (CLIR) have been studied and applied for many years. However, using general-purpose commercial MT systems for CLEF has been considered easier and better performing, which is to be expected given the non-domain specific nature of newspaper articles we are using in CLEF. Corpus based approaches are easier to adapt to new domains and languages; however, it is possible that their performance would be lower on a general test collection such as CLEF. Our results show that the performance drop is not large enough to justify the loss of control, transparency and flexibility. We have participated in two bilingual runs and the small multilingual run using software and data that are free to obtain, transparent and modifiable.


Machine Translation Query Expansion Statistical Machine Translation Parallel Corpus Language Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Braschler, M., Gohring, A., Shauble, P.: Eurospider at CLEF 2002. In: Peters, C., et al. (eds.) CLEF 2002. LNCS, vol. 2785. Springer, Heidelberg (2003)Google Scholar
  2. 2.
    Brown, P.F., Pietra, D., Pietra, D., Mercer, R.L.: The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics 19, 263–312 (1993)Google Scholar
  3. 3.
    Chen, A.: Cross-language Retrieval Experiments at CLEF-2002. In: Peters, C., et al. (eds.) CLEF 2002. LNCS, vol. 2785. Springer, Heidelberg (2003)Google Scholar
  4. 4.
    Franz, M., McCarley, J.S.: Arabic Information Retrieval at IBM. In: TREC 2002 proceedings (2002)Google Scholar
  5. 5.
    Fraser, A., Xu, J., Weischedel, R.: TREC 2002 Cross-lingual Retrieval at BBN. In: TREC 2002 proceedings (2002)Google Scholar
  6. 6.
    Koehn, P.: Europarl: A Multilingual Corpus for Evaluation of Machine Translation. Draft (unpublished)Google Scholar
  7. 7.
    Martinez-Santiago, M.M., Urena, A.: SINAI on CLEF 2002: Experiments with merging strategies. In: Peters, C., et al. (eds.) CLEF 2002. LNCS, vol. 2785. Springer, Heidelberg (2003)Google Scholar
  8. 8.
    Och, F.J., Hermann, N.: Improved Statistical Alignment Models. In: Proc. of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 440–447 (2000)Google Scholar
  9. 9.
    Ogilvie, P., Callan, J.: Experiments using the Lemur toolkit. In: Proceedings of the Tenth Text Retrieval Conference (TREC-10) (2001)Google Scholar
  10. 10.
    Savoy, J.: A stemming procedure and stopword list for general French corpora. Journal of the American Society for Information Science 50(10), 944–952 (1999)CrossRefGoogle Scholar
  11. 11.
    Savoy, J.: Report on CLEF-2002 Experiments: Combining multiple sources of evidence. In: Peters, C., et al. (eds.) Advances in Cross-Language Information Retrieval: Results of the Cross-Language Evaluation Forum - CLEF 2002. LNCS, vol. 2785. Springer, Heidelberg (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Monica Rogati
    • 1
  • Yiming Yang
    • 1
  1. 1.Computer Science DepartmentCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations