Abstract
Spoken language identification consists in recognizing a language based on a sample of speech from an unknown speaker. The traditional approach for this task mainly considers the phonothactic information of languages. However, for marginalized languages –languages with few speakers or oral languages without a fixed writing standard–, this information is practically not at hand and consequently the usual approach is not applicable. In this paper, we present a method that only considers the acoustic features of the speech signal and does not use any kind of linguistic information. The experimental results on a pairwise discrimination task among nine languages demonstrated that our proposal is comparable to other similar methods. Nevertheless, its great advantage is the straightforward characterization of the acoustic signal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Casseiro, D., Troncoso, I.: Language Identification Using Minimum Linguistic Information. In: 10th Portuguese on Pattern Recognition RECPAD 1998, Lisbon, Portugal (1998)
Andersen, O., Dalsgaard, P.: Language Identification based on Cross-Language Acoustic models and Optimized Information Combination. In: EUROSPEECH 1997, Rhodes, Greece (1997)
Cummins, F., Gers, F., Schmidhuber, J.: Language Identification from Prosody without explicit Features. In: EUROSPEECH 1999, Budapest, Hungary (1999)
Rouas, J.-L., Farinas, J., Pellegrino, F., André-Obrecht, R.: Modeling prosody for language identification on read and spontaneous speech. In: IEEE ICASSP 2003, Hong Kong (2003)
Samouelian, A.: Automatic Language Identification using Inductive Inference. In: 4th International Conference on Spoken Language Processing ICSLP 1996, Philadelphia, USA (1996)
Muthusamy, Y.K., Cole, R., Oshika, B.: The OGI multi-language telephone speech corpus. In: International Conference on Spoken Language Processing, Alberta, Canada (1992)
Ramus, F., Nespor, M., Mehler, J.: Correlates of linguistic rhythm in the speech signal. Cognition 73(3), 265–293 (1999)
Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing. A Guide to Theory, Algorithm and System Development. Prentice Hall, Englewood Cliffs (2001)
Johnson, H., Amith, J.: Archive of the Indigenous Languages of Latin America. Access=public. Texas University, USA (2005), http://www.ailla.org
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Reyes-Herrera, A.L., Villaseñor-Pineda, L., Montes-y-Gómez, M. (2006). A Straightforward Method for Automatic Identification of Marginalized Languages. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds) Advances in Natural Language Processing. FinTAL 2006. Lecture Notes in Computer Science(), vol 4139. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11816508_9
Download citation
DOI: https://doi.org/10.1007/11816508_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37334-6
Online ISBN: 978-3-540-37336-0
eBook Packages: Computer ScienceComputer Science (R0)