Advertisement

A Probabilistic Approach to Term Translation for Cross-Lingual Retrieval

  • Jinxi Xu
  • Ralph Weischedel
Chapter
Part of the The Springer International Series on Information Retrieval book series (INRE, volume 13)

Abstract

This work has three aspects. One is to describe a probabilistic approach to term translations for cross-lingual IR. We will show that such an approach, when used with a probabilistic retrieval model, can produce better retrieval than non-probabilistic techniques such as structural query translation (Pirkola, 1998) and Machine Translation. We will also show that parallel corpora and manual lexicons are complementary and their combination is essential to high performance CLIR. The second aspect of this work is to empirically measure CLIR performance as a function of the sizes of the bilingual resources available for estimating translation probabilities. A measurement like this is useful for two reasons. First, it can help to predict CLIR performance for a new language pair. Second, it can be used as a guidance on how much more data to acquire if existing resources cannot meet a target performance level. The third aspect is to describe a technique that can potentially reduce the cost of manually creating a parallel corpus. Such a technique will be useful for language pairs with no or little parallel text.

Keywords

Information Retrieval Machine Translation Chinese Word Statistical Machine Translation Parallel Corpus 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., Callan, J., Feng, F., and Malin, D. (2000). INQUERY at TREC8. In TREC8 Proceedings. NIST.Google Scholar
  2. Ballesteros, L. and Croft, W. (1998). Resolving Ambiguity for Cross-language Retrieval. In Proceedings of ACM SIGIR 1998 Conference, pages 64–71.Google Scholar
  3. Berger, A. and Lafferty, J. (1999). Information Retrieval as Statistical Translation. In Proceedings of ACM SIGIR 1999 Conference.Google Scholar
  4. Brown, P., Pietra, S. D., Pietra, V. D., and Mercer, R. (1993). The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, pages 263–311.Google Scholar
  5. Hiemstra, D. and de Jong, F. (1999). Disambiguation Strategies for Cross-language Information Retrieval. In Proceedings of the third European Conference on Research and Advanced Technology for Digital Libraries, pages 274–293.Google Scholar
  6. Hull, D. (1997). Using Structured Queries for Disambiguation in Cross-language Information Retrieval. In AAAI Symposium on Cross-Language Text and Speech Retrieval.Google Scholar
  7. Klavans, J. and Hovy, E. (1999). Multilingual (or Cross-lingual) Information Retrieval. In Hovy, E., editor, Multilingual Information Management, current levels and future abilities.Google Scholar
  8. Kwok, K. L. (1997). Comparing Representations in Chinese Information Retrieval. In Proceedings of ACM SIGIR 1997 Conference.Google Scholar
  9. Lafferty, J. (1999). Personal Communications.Google Scholar
  10. McCarley, J. (1999). Should We Translate the Documents or the Queries in Cross-language Information Retrieval. In Proceedings of ACL 1999, pages 208–214.Google Scholar
  11. Miller, D., Leek, T., and Schwartz, R. (1999). A Hidden Markov Model Information Retrieval System. In Proceedings of ACM SIGIR 1999 Conference, pages 214–221.Google Scholar
  12. Oard, D. (1998). A Comparative Study of Query and Document Translation for Cross-language Information Retrieval. In Third Conference of the Association for Machine Translation in the Americas.Google Scholar
  13. Pirkola, A. (1998). The Effects of Query Structure and Dictionary Setups in Dictionary-based Cross-language Information Retrieval. In Proceedings of ACM SIGIR 1998 Conference, pages 55–63.Google Scholar
  14. Ponte, J. (1998). A Language Modeling Approach to Information Retrieval. In Proceedings of ACM SIGIR 1998 Conference, pages 275–281.Google Scholar
  15. Porter, M. (1980). An Algorithm for Suffix Stripping. Program,14(3):130137.Google Scholar
  16. Rabiner, L. (1989). A Tutorial on Hidden Markov Models and Selected Appli- cations in Speech Recognition. Proceedings of IEEE 77, pages 257–286.CrossRefGoogle Scholar
  17. Resnik, P. (1998). Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text. In Third Conference of the Association for Machine Translation in the Americas.Google Scholar
  18. Singhal, A., Buckley, C.,, and Mitra, M. (1996). Pivoted Document Length Normalization. In Proceedings of ACM SIGIR 1996 Conference.Google Scholar
  19. Sperer, R. and Oard, D. (2000). Structured Query Translation for Cross-language Information Retrieval. In Proceedings of ACM SIGIR 2000 Conference.Google Scholar
  20. Voorhees, E. and Harman, D., editors (1997). TREC5 Proceedings. NIST.Google Scholar
  21. Voorhees, E. and Harman, D., editors (1998). TREC6 Proceedings. NIST.Google Scholar
  22. Voorhees, E. and Harman, D., editors (2001). TREC9 Proceedings. NIST. Xu, J. and Croft, W. (1998). Corpus-based Stemming Using Co-occurrence of Word Variants. ACM TOIS,18(1):79–112.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2003

Authors and Affiliations

  • Jinxi Xu
    • 1
  • Ralph Weischedel
    • 1
  1. 1.BBN TechnologiesCambridgeUSA

Personalised recommendations