Regular Sound Changes for Cross-Language Information Retrieval
The aim of this project is the automatic conversion of query terms in one language into their equivalents in a second, historically related, language, so that documents in the second language can be retrieved. The method is to compile lists of regular sound changes which occur between related words of a language pair, and substitute these in the source language words to generate target language words. For example, if we know b in Italian often corresponds with a v in Spanish, an unaccented o in Italian with ó in Spanish, and a terminal e in Italian is replaced with a null in Spanish, we can construct the Spanish word autómovil (car) from the Italian automobile.
KeywordsWord Pair Edit Distance Query Term Spanish Word Language Pair
Unable to display preview. Download preview PDF.
- 1.Buckley, C., Walz, J., Mitra, M., Cardi, C.: Using Clustering and Super Concepts Within SMART. In: NIST Special Publication 500-240: The Sixth Text Retrieval Conference (TREC6) (2000), http://trec.nist.gov/pubs/trec6/t6_proceedings.html
- 3.McEnery, A.M., Oakes, M.P.: Sentence and Word Alignment in the CRATER Project. In: Thomas, J., Short, M. (eds.) Using Corpora for Language Research, pp. 211–231 (1996)Google Scholar
- 4.Kantrowicz, M., Behrang, M., Mittal, V.: Stemming and its Effects on TFIDF Ranking. In: Proceedings of the 23rd ACM SIGIR Conference, Athens, Greece (2000)Google Scholar
- 5.Nothofer, B.: The Reconstruction of Proto-Malayo-Javanic. ‘s-Gravenhage, Martinus Nijhoff (1975)Google Scholar
- 8.Gale, W., Church, K.A.: Program for Aligning Sentences in Bilingual Corpora. Computational Linguistics 19(1), 75–102 (1993)Google Scholar
- 10.Crowley, T.: An Introduction to Historical Linguistics. Oxford University Press, Oxford (1992)Google Scholar