Abstract
Cross-language information retrieval (CLIR), where queries and documents are in different languages, needs a translation of queries and/or documents, so as to standardize both of them into a common representation. For this purpose, the use of machine translation is an effective approach. However, computational cost is prohibitive in translating large-scale document collections. To resolve this problem, we propose a two-stage CLIR method. First, we translate a given query into the document language, and retrieve a limited number of foreign documents. Second, we machine translate only those documents into the user language, and re-rank them based on the translation result. We also show the effectiveness of our method by way of experiments using Japanese queries and English technical documents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ballesteros, L. and Croft, W. B.: Phrasal translation and query expansion techniques for cross-language information retrieval. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1997) 84–91
Ballesteros, L. and Croft, W. B.: Resolving ambiguity for cross-language retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1998) 64–71
Carbonell, J., Yang, Y., Frederking, R., Brown, R., Geng, Y. and Lee, D.: Translingual information retrieval: A comparative evaluation. In Proceedings of the 15th International Joint Conference on Artificial Intelligence. (1997) 708–714
Davis, M. and Ogden, W.: QUILT: Implementing a large-scale cross-language text retrieval system. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1997) 92–98
Fujii, A. and Ishikawa, T.: Cross-language information retrieval for technical documents. In Proceedings of the Joint ACL SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. (1999) 29–37
Gonzalo, J., Verdejo, F., Peters, C. and Calzolari, N.: Applying EuroWordNet to cross-language text retrieval. Computers and the Humanities. 32 (1998) 185–207
Hull, D.: Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1993) 329–338
Kando, N., Kuriyama, K. and Nozue, T.: NACSIS test collection workshop (NTCIR-1). In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1999) 299–300
Keen, E. M.: Presenting results of experimental retrieval comparisons. Information Processing & Management. 28(4) (1992) 491–502
Kwok, K. L. and Chan, M.: Improving two-stage ad-hoc retrieval for short queries. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1998) 250–256
Littman, M., Dumais, S. and Landauer, T.: Automatic cross-language information retrieval using latent semantic indexing. In Gregory Grefenstette, editor, Cross-Language Information Retrieval. Kluwer Academic Publishers. (1998) 51–62
Matsumoto, Y., Kitauchi, A., Yamashita, T., Imaichi, O. and Imamura, T.: Japanese morphological analysis system ChaSen manual. Technical Report NAIST-IS-TR97007, NAIST. (1997) (In Japanese)
McCarley, J.S.: Should we translate the documents or the queries in cross-language information retrieval? In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics. (1999) 208–214
McCarley, J.S. and Roukos, S.: Fast document translation for cross-language information retrieval. In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas. (1998) 150–157
National Center for Science Information Systems. Proceedings of the 1st NTCIR Workshop on Research in Japanese Text Retrieval and Term Recognition. (1999)
Nie J.Y., Simard, M., Isabelle, P. and Durand, R.: Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1999) 74–81
Oard, D.: A comparative study of query and document translation for cross-language information retrieval. In Proceedings of the 3rd Conference of the Association for Machine Translation in the Americas. (1998) 472–483
Salton, G.: Automatic processing of foreign language documents. Journal of the American Society for Information Science. 21(3) (1970) 187–194
Salton, G.: The SMART Retrieval System: Experiments in Automatic Document Processing. Prentice-Hall (1971)
Salton, G. and Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing & Management. 24(5) (1988) 513–523
Srinivasan, P.: A comparison of two-poisson, inverse document frequency and discrimination value models of document representation. Information Processing & Management. 26(2) (1990) 269–278
Voorhees, E.: Variations in relevance judgments and the measurement of retrieval effectiveness. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. (1998) 315–323
Zobel, J. and Moffat, A.: Exploring the similarity space. ACM SIGIR FORUM. 32(1) (1998) 18–34
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fujii, A., Ishikawa, T. (2000). Applying Machine Translation to Two-Stage Cross-Language Information Retrieval. In: White, J.S. (eds) Envisioning Machine Translation in the Information Future. AMTA 2000. Lecture Notes in Computer Science(), vol 1934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39965-8_2
Download citation
DOI: https://doi.org/10.1007/3-540-39965-8_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41117-8
Online ISBN: 978-3-540-39965-0
eBook Packages: Springer Book Archive