Synonyms
Cross-language informational retrieval; Cross-language text mining; Cross-language web mining; Translingual information retrieval
Definition
Cross-language mining is a task of text mining dealing with the extraction of entities and their counterparts expressed in different languages. The interested entities may be of various granularities from acronyms, synonyms, cognates, proper names to comparable or parallel corpora. Cross-Language Information Retrieval (CLIR) is a sub-field of information retrieval dealing with the retrieval of documents across language boundaries, i.e., the language of the retrieved documents is not the same as the language of the queries. Cross-language mining usually acts as an effective means to improve the performance of CLIR by complementing the translation resources exploited by CLIR systems.
Historical Background
CLIR addresses the growing demand to access large volumes of documents across language barriers. Unlike monolingual information...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Adriani M. Using statistical term similarity for sense disambiguation in cross-language information retrieval. Inf Retr. 2000;2(1):71–82.
Ballestors LA, Croft WB. Phrasal translation and query expansion techniques for cross-language information retrieval. In: Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1997. p. 84–91.
Ballestors LA, Croft WB. Resolving and ambiguity for cross-language information retrieval. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1998. p. 64–71.
Brown PF, Pietra SAD, Pietra VDJ, Mercer RL. The mathematics of machine translation: parameter estimation. Comput Linguist. 1992;19(2):263–312.
Cheng PJ, Teng JW, Chen RC, Wang JH, Lu WH, Chien LF. Translating unknown queries with Web corpora for cross-language information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2004. p. 146–53.
Dumais ST, Landauer TK, and Littman ML. Automatic cross-linguistic information retrieval using latent semantic indexing. In: Proceedings of the ACM SIGIR Workshop on Cross-Linguistic Information Retrieval; 1996. p. 16–23.
Fujii A, Ishikawa T. Applying machine translation to two-stage cross-language information retrieval. In: Proceedings of the 4th Conference on Association for Machine Translation in the Americas; 2000. p. 13–24.
Gao J, Zhou M, Nie, JY, He H, Chen W. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2002. p. 183–90.
Gao W, Niu C, Nie JY, Zhou M, Hu J, Wong KF, Hon HW. Cross-lingual query suggestion using query logs of different languages. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2007. p. 463–70.
Jiang L, Zhou M, Chien LF, Niu C. Named entity translation with Web mining and transliteration. In: Proceedings of the 20th International Joint Conference on AI; 2007. p. 1629–34.
Lu WH, Chien LF, Lee HJ. Translation of web queries using anchor text mining. ACM Trans Asian Lang Information Proc. 2002;1(2):159–72.
McCarley JS. Should we translate the documents or the queries in cross-language information retrieval? In: Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics; 1999. p. 208–14.
McNamee P, Mayfield J. Comparing cross-language query expansion techniques by degrading translation resources. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2002. p. 159–66.
Nie JY, Smard M, Isabelle P, Durand R. Cross-language information retrieval based on parallel text and automatic mining of parallel text from the Web. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1999. p. 74–81.
Pirkola A, Hedlund T, Keshusalo H, Järvelin K. Dictionary-based cross-language information retrieval: problems, methods, and research findings. Inf Retr. 2001;3(3–4):209–30.
Resnik P, Smith NA. The Web as a parallel corpus. Comput Linguist. 2003;29(3):349–80.
Shi L, Niu C, Zhou M, Gao J. A DOM Tree alignment model for mining parallel data from the Web. In: Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics; 2006. p. 489–96.
Zhang Y, Vines P. Using the Web for automated translation extraction in cross-language information retrieval. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2004.p. 162–9.
Zhang Y, Vines P, Zobel J. An empirical comparison of translation disambiguation techniques for Chinese-English Cross-Language Information Retrieval. In: Proceedings of the 3rd Asia Information Retrieval Symposium; 2006. p. 666–72.
Zhang Y, Vines P, Zobel J. Chinese OOV translation and post-translation query expansion in Chinese-English cross-lingual information retrieval. ACM Trans Asian Lang Information Proc. 2005;4(2):57–77.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Gao, W., Niu, C. (2018). Cross-Language Mining and Retrieval. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_89
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_89
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering