Skip to main content

Applying Machine Translation to Two-Stage Cross-Language Information Retrieval

  • Conference paper
  • First Online:
Envisioning Machine Translation in the Information Future (AMTA 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1934))

Included in the following conference series:

Abstract

Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this problem, we propose a two-stage CLIR method. First, we translate a given query into the document language, and retrieve a limited number of foreign documents. Second, we machine translate only those documents into the user language, and re-rank them based on the translation result. We also show the effectiveness of our method by way of experiments using Japanese queries and English technical documents.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ballesteros, L. and Croft, W. B.: Phrasal translation and query expansion techniques for cross-language information retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1997) 84–91

    Google Scholar 

  2. Ballesteros, L. and Croft, W. B.: Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1998) 64–71

    Google Scholar 

  3. Carbonell, J., Yang, Y., Frederking, R., Brown, R., Geng, Y. and Lee, D.: Translingual information retrieval: A comparative evaluation. In Proceedings of the 15th International Joint Conference on Artificial Intelligence. (1997) 708–714

    Google Scholar 

  4. Davis, M. and Ogden, W.: QUILT: Implementing a large-scale cross-language text retrieval system. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1997) 92–98

    Google Scholar 

  5. Fujii, A. and Ishikawa, T.: Cross-language information retrieval for technical documents. In Proceedings of the Joint ACL SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. (1999) 29–37

    Google Scholar 

  6. Gonzalo, J., Verdejo, F., Peters, C. and Calzolari, N.: Applying EuroWordNet to cross-language text retrieval. Computers and the Humanities. 32 (1998) 185–207

    Article  Google Scholar 

  7. Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1993) 329–338

    Google Scholar 

  8. Kando, N., Kuriyama, K. and Nozue, T.: NACSIS test collection workshop (NTCIR-1). In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1999) 299–300

    Google Scholar 

  9. Keen, E. M.: Presenting results of experimental retrieval comparisons. Information Processing & Management. 28(4) (1992) 491–502

    Article  Google Scholar 

  10. Kwok, K. L. and Chan, M.: Improving two-stage ad-hoc retrieval for short queries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1998) 250–256

    Google Scholar 

  11. Littman, M., Dumais, S. and Landauer, T.: Automatic cross-language information retrieval using latent semantic indexing. In Gregory Grefenstette, editor, Cross-Language Information Retrieval. Kluwer Academic Publishers. (1998) 51–62

    Google Scholar 

  12. Matsumoto, Y., Kitauchi, A., Yamashita, T., Imaichi, O. and Imamura, T.: Japanese morphological analysis system ChaSen manual. Technical Report NAIST-IS-TR97007, NAIST. (1997) (In Japanese)

    Google Scholar 

  13. McCarley, J.S.: Should we translate the documents or the queries in cross-language information retrieval? In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics. (1999) 208–214

    Google Scholar 

  14. McCarley, J.S. and Roukos, S.: Fast document translation for cross-language information retrieval. In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas. (1998) 150–157

    Google Scholar 

  15. National Center for Science Information Systems. Proceedings of the 1st NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. (1999)

    Google Scholar 

  16. Nie J.Y., Simard, M., Isabelle, P. and Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1999) 74–81

    Google Scholar 

  17. Oard, D.: A comparative study of query and document translation for cross-language information retrieval. In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas. (1998) 472–483

    Google Scholar 

  18. Salton, G.: Automatic processing of foreign language documents. Journal of the American Society for Information Science. 21(3) (1970) 187–194

    Article  Google Scholar 

  19. Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall (1971)

    Google Scholar 

  20. Salton, G. and Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management. 24(5) (1988) 513–523

    Article  Google Scholar 

  21. Srinivasan, P.: A comparison of two-poisson, inverse document frequency and discrimination value models of document representation. Information Processing & Management. 26(2) (1990) 269–278

    Article  Google Scholar 

  22. Voorhees, E.: Variations in relevance judgments and the measurement of retrieval effectiveness. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1998) 315–323

    Google Scholar 

  23. Zobel, J. and Moffat, A.: Exploring the similarity space. ACM SIGIR FORUM. 32(1) (1998) 18–34

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fujii, A., Ishikawa, T. (2000). Applying Machine Translation to Two-Stage Cross-Language Information Retrieval. In: White, J.S. (eds) Envisioning Machine Translation in the Information Future. AMTA 2000. Lecture Notes in Computer Science(), vol 1934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39965-8_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-39965-8_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41117-8

  • Online ISBN: 978-3-540-39965-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics