Abstract
The aim of this project is the automatic conversion of query terms in one language into their equivalents in a second, historically related, language, so that documents in the second language can be retrieved. The method is to compile lists of regular sound changes which occur between related words of a language pair, and substitute these in the source language words to generate target language words. For example, if we know b in Italian often corresponds with a v in Spanish, an unaccented o in Italian with ó in Spanish, and a terminal e in Italian is replaced with a null in Spanish, we can construct the Spanish word autómovil (car) from the Italian automobile.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Buckley, C., Walz, J., Mitra, M., Cardi, C.: Using Clustering and Super Concepts Within SMART. In: NIST Special Publication 500-240: The Sixth Text Retrieval Conference (TREC6) (2000), http://trec.nist.gov/pubs/trec6/t6_proceedings.html
Wagner, R.A., Fischer, M.J.: The String to String Correction Problem. Journal of the ACM 21, 168 (1974)
McEnery, A.M., Oakes, M.P.: Sentence and Word Alignment in the CRATER Project. In: Thomas, J., Short, M. (eds.) Using Corpora for Language Research, pp. 211–231 (1996)
Kantrowicz, M., Behrang, M., Mittal, V.: Stemming and its Effects on TFIDF Ranking. In: Proceedings of the 23rd ACM SIGIR Conference, Athens, Greece (2000)
Nothofer, B.: The Reconstruction of Proto-Malayo-Javanic. ‘s-Gravenhage, Martinus Nijhoff (1975)
Bloomfield, L.: On the Sound System of Central Algonquian. Language 1, 130–156 (1925)
Guy, J.B.M.: An Algorithm for Identifying Cognates in Bilingual Word Lists and its Applicability to Machine Translation. Journal of Quantitative Linguistics 1(1), 34–42 (1994)
Gale, W., Church, K.A.: Program for Aligning Sentences in Bilingual Corpora. Computational Linguistics 19(1), 75–102 (1993)
Oakes, M.P.: Computer Estimation of Vocabulary in a Protolanguage from Word Lists in Four Daughter Languages. Journal of Quantitative Linguistics 7(3), 233–244 (2000)
Crowley, T.: An Introduction to Historical Linguistics. Oxford University Press, Oxford (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Oakes, M.P., Banerjee, S. (2004). Regular Sound Changes for Cross-Language Information Retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds) Comparative Evaluation of Multilingual Information Access Systems. CLEF 2003. Lecture Notes in Computer Science, vol 3237. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30222-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-540-30222-3_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24017-4
Online ISBN: 978-3-540-30222-3
eBook Packages: Springer Book Archive