Automatic dictionary extraction for cross-language information retrieval

Brown, Ralf D.; Carbonell, Jaime G.; Yang, Yiming

doi:10.1007/978-94-017-2535-4_14

Ralf D. Brown⁴,
Jaime G. Carbonell⁴ &
Yiming Yang⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 13))

256 Accesses
1 Citations

Abstract

In experiments comparing a variety of different methods for cross-language information retrieval using a bilingual training corpus—methods based on both machine translation and “traditional” information-retrieval techniques—a fairly simple statistical technique for automatically extracting a bilingual dictionary from parallel text proved to have the best performance. Surprisingly, an improvement to the dictionary extraction method that significantly increases the accuracy of the dictionary proved to be slightly detrimental to overall performance even though it is highly beneficial for other applications. This chapter will describe the extraction method and its enhancement in detail, and compare the performance of a retrieval system using the automatically-generated dictionaries with other retrieval methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ballesteros, L. and Croft, W. B. (1997). Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval. University of Massachusetts Technical Report: IR-104.
Google Scholar
Brown, P. F., Della Pietra, S., Della Pietra, V. J. and Mercer, R. L. (1993). The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2), 263311
Google Scholar
Brown, R. D. (1996). Example-Based Machine Translation in the Pangloss System. Proceedings of the 16th International Conference on Computational Linguistics (COLING-96),Copenhagen, 169–174. Available: http://www.cs.cmu.edu/—ralf/ papers.html.
Google Scholar
Brown, R. D. (1997). Automated Dictionary Extraction for “Knowledge-Free” Example-Based Translation. Proceedings of the Seventh International Conference on Theoretical and Methodological Issues in Machine Translation (TMI97), 111–118. Available: http://www.cs.cmu.edu/-ralf/papers.html.
Google Scholar
Brown, R. D. (1998). Automatically-Extracted Thesauri for Cross-Language IR: When Better is Worse. First Workshop on Computational Terminology, 15–21. Available: http://www.cs.cmu.edu/—ralf/papers.html.
Google Scholar
Buckley, C., Salton, G., Allan, A. and Singhal, A. (1995). Automatic Query Expansion Using SMART: TREC 3. Overview of the Third Text REtrieval Conference (TREC-3), 69–80.
Google Scholar
Carbonell, J. G. and Goldstein, J. (1998). The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. Proceedings of the 21’` Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), 335–336. Available: http://www.cs.cmu.edu/jade/ps/sigir98.ps.
Google Scholar
Carbonell, J. G., Yang, Y., Frederking, R. E., Brown, R. D., Geng, Y. and Lee, D. (1997). Translingual Information Retrieval: A Comparative Evaluation. Proceedings of Fifteenth International Joint Conference on Artificial Intelligence (IJCAI-97), volume I, 708–715. Available: http://www.cs.cmu.edu/—ralf/papers.html.
Google Scholar
Davis, M. W. and Dunning, T. E. (1995). A TREC Evaluation of Query Translation Methods for Multi-Lingual Text Retrieval. The Fourth Text Retrieval Conference (TREC-4), IST, 483–498.
Google Scholar
Deerwester, S., Dumais, S. T., Fumas, G. W., Landauer, T. K. and Harshman, R. (1990). Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science 1 (6), 391–407.
Article Google Scholar
Dumais, S. T., Landauer, T. K. and Littman, M. L. (1996). Automatic Cross-Linguistic Information Retrieval Using Latent Semantic Indexing. SIGIR’96 Workshop on Cross-Linguistic Information Retrieval.
Google Scholar
Frederking, R. E., Nirenburg, S., Farwell, D., Helmreich, S., Hovy, E., Knight, K., Beale, S., Domashnev, C., Attardo, D., Grannes, D. and Brown, R. D. (1994). Integrating Translations from Multiple Sources within the Pangloss Mark III Machine Translation. Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, 73–80.
Google Scholar
Gaussier, E. (1998). Flow Network Models for Word Alignment and Terminology Extraction from Bilingual Corpora. Proceedings of the 36 th Annual Meeting of the Association for Computational Linguistics and 17` h International Conference on Computational Linguistics (COLING-ACL’98), Montréal, Quebec, Canada, 444–450.
Google Scholar
Graff, D. and Finch, R. (1994). Multilingual Text Resources at the Linguistic Data Consortium. Proceedings of the 1994 ARPA Human Language Technology Workshop. Morgan Kaufmann, 18–22.
Google Scholar
Hersh, W. R., Buckley, C., Leone, T. J. and Hickman, D. (1994). OHSUMED: An Interactive Retrieval Evaluation and New Large Text Collection for Research. 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), 192–201.
Google Scholar
Hull, D. A. and Grefenstette, G. (1996). Querying Across Languages: a Dictionary-based Approach to Multilingual Information Retrieval. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 49–57.
Google Scholar
Melamed, I. D. (1997). A Word-to-Word Model of Translational Equivalence. Proceedings of the 35` h Annual Meeting of the Association for Computational Linguistics (ACL’97), 490–497.
Google Scholar
Salton, G. and Buckley, C. (1990). Improving Retrieval Performance by Relevance Feedback. Journal of American Society for Information Sciences, 41: 288–297.
Article Google Scholar
Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, Pennsylvania.
Google Scholar
Sheridan, P. and Ballerini, J. P. (1996). Experiments in Multilingual Information Retrieval using the SPIDER System. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’96), 58–65.
Google Scholar
Srinivasan, P. (1996). Optimal Document Indexing Vocabulary for MEDLINE. Information Processing and Management, 32 (5): 503–514.
Article Google Scholar
Wong, S. K. M., Ziarko, W. and Wong, P. C. N. (1985). Generalized Vector Space Model in Information Retrieval. Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’85), 18–25.
Google Scholar
Yang, Y. and Pedersen, J. P. (1997). Feature selection in statistical learning of text categorization. Proceedings of The Fourteenth International Conference on Machine Learning, 412–420. Available: http://www.cs.cmu.edu/yiming/publications.html.
Google Scholar
Yang, Y., Brown, R. D., Frederking, R. E., CarbonellJ. G., Geng, G. and Lee, D. (1997). Bilingual-corpus Based Approaches to Translingual Information Retrieval. Proceedings of The 2“^a Workshop on Multilinguality in Software Industry: The AI Contribution (MULSAIC’97).
Google Scholar
Yang, Y., Carbonell, J. G., Brown, R. D. and Frederking, R. E. (1998). Translingual Information Retrieval: Learning from Bilingual Corpora. Artificial Intelligence Journal (Special issue: Best of IJCAI-97), 103, 323–345. Available: http://www.cs.cmu.edu/—ralf/ papers.html.
Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University Language Technologies Institute, USA
Ralf D. Brown, Jaime G. Carbonell & Yiming Yang

Authors

Ralf D. Brown
View author publications
You can also search for this author in PubMed Google Scholar
Jaime G. Carbonell
View author publications
You can also search for this author in PubMed Google Scholar
Yiming Yang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Université de Provence and CNRS, 29, Avenue Robert Schuman, 13100, Aix-en-Provence, France
Jean Véronis

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brown, R.D., Carbonell, J.G., Yang, Y. (2000). Automatic dictionary extraction for cross-language information retrieval. In: Véronis, J. (eds) Parallel Text Processing. Text, Speech and Language Technology, vol 13. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2535-4_14

Download citation

DOI: https://doi.org/10.1007/978-94-017-2535-4_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5555-2
Online ISBN: 978-94-017-2535-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics