Regular Sound Changes for Cross-Language Information Retrieval

  • Michael P. Oakes
  • Souvik Banerjee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3237)


The aim of this project is the automatic conversion of query terms in one language into their equivalents in a second, historically related, language, so that documents in the second language can be retrieved. The method is to compile lists of regular sound changes which occur between related words of a language pair, and substitute these in the source language words to generate target language words. For example, if we know b in Italian often corresponds with a v in Spanish, an unaccented o in Italian with ó in Spanish, and a terminal e in Italian is replaced with a null in Spanish, we can construct the Spanish word autómovil (car) from the Italian automobile.


Word Pair Edit Distance Query Term Spanish Word Language Pair 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Buckley, C., Walz, J., Mitra, M., Cardi, C.: Using Clustering and Super Concepts Within SMART. In: NIST Special Publication 500-240: The Sixth Text Retrieval Conference (TREC6) (2000),
  2. 2.
    Wagner, R.A., Fischer, M.J.: The String to String Correction Problem. Journal of the ACM 21, 168 (1974)zbMATHCrossRefMathSciNetGoogle Scholar
  3. 3.
    McEnery, A.M., Oakes, M.P.: Sentence and Word Alignment in the CRATER Project. In: Thomas, J., Short, M. (eds.) Using Corpora for Language Research, pp. 211–231 (1996)Google Scholar
  4. 4.
    Kantrowicz, M., Behrang, M., Mittal, V.: Stemming and its Effects on TFIDF Ranking. In: Proceedings of the 23rd ACM SIGIR Conference, Athens, Greece (2000)Google Scholar
  5. 5.
    Nothofer, B.: The Reconstruction of Proto-Malayo-Javanic. ‘s-Gravenhage, Martinus Nijhoff (1975)Google Scholar
  6. 6.
    Bloomfield, L.: On the Sound System of Central Algonquian. Language 1, 130–156 (1925)CrossRefGoogle Scholar
  7. 7.
    Guy, J.B.M.: An Algorithm for Identifying Cognates in Bilingual Word Lists and its Applicability to Machine Translation. Journal of Quantitative Linguistics 1(1), 34–42 (1994)CrossRefGoogle Scholar
  8. 8.
    Gale, W., Church, K.A.: Program for Aligning Sentences in Bilingual Corpora. Computational Linguistics 19(1), 75–102 (1993)Google Scholar
  9. 9.
    Oakes, M.P.: Computer Estimation of Vocabulary in a Protolanguage from Word Lists in Four Daughter Languages. Journal of Quantitative Linguistics 7(3), 233–244 (2000)CrossRefGoogle Scholar
  10. 10.
    Crowley, T.: An Introduction to Historical Linguistics. Oxford University Press, Oxford (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Michael P. Oakes
    • 1
  • Souvik Banerjee
    • 1
  1. 1.School of Computing and TechnologyUniversity of SunderlandSunderlandUnited Kingdom

Personalised recommendations