SCA: Phonetic Alignment Based on Sound Classes

List, Johann-Mattis

doi:10.1007/978-3-642-31467-4_3

Johann-Mattis List¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7415))

Included in the following conference series:

592 Accesses
14 Citations

Abstract

In this paper I present the most recent version of the SCA method for pairwise and multiple alignment analyses. In contrast to previously proposed alignment methods, SCA is based on a novel framework of sequence alignment which combines new approaches to sequence modeling in historical linguistics with recent developments in computational biology. In contrast to earlier versions of SCA [1,2] the new version comes along with a couple of modifications that significantly improve the performance and the application range of the algorithm: A new sound class model was defined which works well on highly divergent sequences, the algorithm for pairwise alignment was modified to be sensitive to secondary sequence structures such as syllable boundaries, and an algorithm for the pre-processing of the data in multiple alignment analyses [3] was included to cope for the bias resulting from progressive alignment analyses. In order to test the method, a new gold standard for pairwise and multiple alignment analyses was created which consists of 45 947 sequences covering a total of 435 different taxa belonging to six different language families.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

List, J.M.: Phonetic alignment based on sound classes. In: Slavkovik, M. (ed.) Proceedings of the 15th Student Session of the European Summer School for Logic, Language and Information, Kopenhagen, pp. 192–202 (2010)
Google Scholar
List, J.M.: Multiple sequence alignment in historical linguistics. A sound class based approach. In: Proceedings of ConSOLE XIX (2011) (forthcoming)
Google Scholar
Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee. A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 205–217 (2000)
Article Google Scholar
Gray, R.D., Atkinson, Q.D.: Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426(6965), 435–439 (2003)
Article Google Scholar
Holman, E.W., Brown, C.H., Wichmann, S., Müller, A., Velupillai, V., Hammarström, H., Sauppe, S., Jung, H., Bakker, D., Brown, P., Belyaev, O., Urban, M., Mailhammer, R., List, J.M., Egorov, D.: Automated dating of the world’s language families based on lexical similarity. Current Anthropology 52(6), 841–875 (2011)
Article Google Scholar
Baxter, W.H., Manaster Ramer, A.: Beyond lumping and splitting. Probabilistic issues in historical linguistics. In: Renfrew, C., McMahon, A., Trask, L. (eds.) Time Depth in Historical Linguistics, pp. 167–188. McDonald Institute for Archaeological Research, Cambridge (2000)
Google Scholar
Kessler, B.: The significance of word lists. Statistical tests for investigating historical connections between languages. CSLI Publications, Stanford (2001)
Google Scholar
Kondrak, G.: Algorithms for language reconstruction. Dissertation. University of Toronto, Toronto (2002)
Google Scholar
Prokić, J., Wieling, M., Nerbonne, J.: Multiple sequence alignments in linguistics. In: Proceedings of the EACL 2009 Workshop on Language Technology and Resources for Cultural Heritage, Social Sciences, Humanities, and Education, pp. 18–25. Association for Computational Linguistics, Stroudsburg (2009)
Chapter Google Scholar
Turchin, P., Peiros, I., Gell-Mann, M.: Analyzing genetic connections between languages by matching consonant classes. Journal of Language Relationship 3, 117–126 (2010)
Google Scholar
Covington, M.A.: An algorithm to align words for historical comparison. Computational Linguistics 22(4), 481–496 (1996)
Google Scholar
Ross, M., Durie, M.: Introduction. In: Durie, M. (ed.) The Comparative Method Reviewed. Regularity and Irregularity in Language Change, pp. 3–38. Oxford University Press, New York (1996)
Google Scholar
Trask, R.L. (ed.): The dictionary of historical and comparative linguistics. Edinburgh University Press, Edinburgh (2000)
Google Scholar
Lass, R.: Historical linguistics and language change. Cambridge University Press, Cambridge (1997)
Book Google Scholar
Gusfield, D.: Algorithms on strings, trees and sequences. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Needleman, S.B., Wunsch, C.D.: A gene method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48, 443–453 (1970)
Article Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. Journal of the Association for Computing Machinery 21(1), 168–173 (1974)
Article MathSciNet MATH Google Scholar
Eddy, S.R.: Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22(8), 1035–1036 (2004)
Article Google Scholar
Rosenberg, M.S.: Sequence alignment. Concepts and history. In: Rosenberg, M.S. (ed.) Sequence Alignment. Methods, Models, Concepts, and Strategies, pp. 1–22. University of California Press, Berkeley and Los Angeles and London (2009)
Google Scholar
Durbin, R., Eddy, S.R., Krogh, A., Mitchinson, G.: Biological sequence analysis. Probabilistic models of proteins and nucleic acids, 7th edn. Cambridge University Press, Cambridge (2002)
Google Scholar
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. Journal of Molecular Biology 1, 195–197 (1981)
Article Google Scholar
Morgenstern, B., Dress, A., Werner, T.D.: Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proceedings of the National Acadamy of Science, USA 93, 12098–12103 (1996)
Article MATH Google Scholar
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. PNAS 89(22), 10915–10919 (1992)
Article Google Scholar
Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationships. University of Kansas Scientific Bulletin 28, 1409–1438 (1958)
Google Scholar
Saitou, N., Nei, M.: The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(4), 406–425 (1987)
Google Scholar
Feng, D.F., Doolittle, R.F.: Progressive sequence alignment as a prerequisite to correct phylogenetic trees. Journal of Molecular Evolution 25(4), 351–360 (1987)
Article Google Scholar
Dolgopolsky, A.B.: Gipoteza drevnejšego rodstva jazykovych semej Severnoj Evrazii s verojatnostej točky zrenija (A probabilistic hypothesis concerning the oldest relationships among the language families of Northern Eurasia). Voprosy Jazykoznanija 2, 53–63 (1964)
Google Scholar
Dolgopolsky, A.B.: A probabilistic hypothesis concerning the oldest relationships among the language families of northern Eurasia. In: Shevoroshkin, V.V. (ed.) Typology, Relationship and Time, pp. 27–50. Karoma Publisher, Ann Arbor (1986)
Google Scholar
Brown, C.H., Holman, E.W., Wichmann, S.: Sound correspondences in the world’s languages (2011), Online manuscript, PDF, http://wwwstaff.eva.mpg.de/~wichmann/wwcPaper23.pdf
Brown, C.H., Holman, E.W., Wichmann, S., Velupillai, V., Cysouw, M.: Automated classification of the world’s languages. Sprachtypologie und Universalienforschung 61(4), 285–308 (2008)
Google Scholar
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W. Nucleic Acids Research 22(22), 4673–4680 (1994)
Article Google Scholar
Geisler, H.: Akzent und Lautwandel in der Romania. Narr, Tübingen (1992)
Google Scholar
Hóu, J. (ed.): Xiàndài Hànyǔ fāngyán yīnkù (Phonological database of Chinese dialects). Shànghǎi Jiàoyǔ, Shanghai (2004)
Google Scholar
Downey, S.S., Hallmark, B., Cox, M.P., Norquest, P., Lansing, S.: Computational feature-sensitive reconstruction of language relationships: Developing the ALINE distance for comparative historical linguistic reconstruction. Journal of Quantitative Linguistics 15(4), 340–369 (2008)
Article Google Scholar
Wang, F.: Comparison of languages in contact. Institute of Linguistics Academia Sinica, Taipei (2006)
Google Scholar
Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Research 27(13), 2682–2690 (1999)
Article Google Scholar
Raghava, G.P.S., Barton, G.J.: Quantification of the variation in percentage identity for protein sequence alignments. BMC Bioinformatics 7(415) (2006)
Google Scholar
Heggarty, P.: Sounds of the Andean languages. Online resource, http://www.quechua.org.uk/
Allen, B.: Bai Dialect Survey. SIL International (2007)
Google Scholar
Almberg, J., Skarbø, K.: Nordavinden og sola. En norsk dialektprøvedatabase på nettet (The North Wind and the Sun. A Norwegian dialect database on the web) (2011), Online resource, http://www.ling.hf.ntnu.no/nos/
Gauchat, L., Jeanjaquet, J., Tappolet, E.: Tableaux phonétiques des patois suisses romands. Attinger, Neuchâtel (1925)
Google Scholar
Renfrew, C., Heggarty, P.: Languages and origins in europe. Online resource, http://www.languagesandpeoples.com/

Download references

Author information

Authors and Affiliations

Heinrich Heine University Düsseldorf, Germany
Johann-Mattis List

Authors

Johann-Mattis List
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, Stanford University, 420 Jordan Hall, 450 Serra Mall, 94305, Stanford, CA, USA
Daniel Lassiter
University of Luxembourg, 6, Rue Richard Coudenhove Kalergi, 1359, Luxembourg
Marija Slavkovik

1 Electronic Supplementary Material

Electronic Supplementary Material(1,398 KB)

Electronic Supplementary Material(3,554 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

List, JM. (2012). SCA: Phonetic Alignment Based on Sound Classes. In: Lassiter, D., Slavkovik, M. (eds) New Directions in Logic, Language and Computation. ESSLLI ESSLLI 2010 2011. Lecture Notes in Computer Science, vol 7415. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31467-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-31467-4_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31466-7
Online ISBN: 978-3-642-31467-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics