Gaussian Segmentation and Tokenization for Low Cost Language Identification

  • Ana Montalvo
  • José Ramón Calvo de Lara
  • Gabriel Hernández-Sierra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8258)


Most common approaches to phonotactic language recognition deal with phone decoders as tokenizers. However, units that are not linked to phonetic definitions can be more universals, and therefore conceptually easier to adopt. It is assumed that the overall sound characteristics of all spoken languages can be covered by a broad collection of acoustic units, which can be characterized by acoustic segments. In this paper, such acoustic units, highly desirables for a more general language characterization, are delimited and clustered using Gaussian Mixture Model. A new segmentation method on acoustic units of the speech is proposed for later Gaussian modelling, looking for substitute the phonetic recognizer. This tokenizer is trained over untranscribed data, and it precedes the statistical language modeling phase.


Spoken language recognition Gaussian tokenization acoustic segment modeling 


  1. 1.
    Siniscalchi, S., Reed, J., Svendsen, T., Lee, C.: Universal attribute characterization of spoken languages for automatic spoken language recognition. J. Computer Speech & Language 27, 209–227 (2013)CrossRefGoogle Scholar
  2. 2.
    Li, H., Ma, B., Lee, K.: Spoken Language Recognition: From Fundamentals to Practice. J. Proceedings of the IEEE 101, 1136–1159 (2013)CrossRefGoogle Scholar
  3. 3.
    Zhao, J., Shu, H., Zhang, L., Wang, X., Gong, Q., Li, P.: Cortical competition during language discrimination. J. NeuroImage 43, 624–633 (2008)CrossRefGoogle Scholar
  4. 4.
    Sugiyama, M.: Automatic language recognition using acoustic features. In: Proc. ICASSP, pp. 813–816 (1991)Google Scholar
  5. 5.
    Adami, A., Hermansky, H.: Segmentation of speech for speaker and language recognition. In: INTERSPEECH 2003 (2003)Google Scholar
  6. 6.
    Zissman, M.: Comparison of Four Approaches to Automatic Language Identification of Telephone Speech. IEEE Transactions on Speech and Audio Processing Journal 4(1), 31–44 (1996)CrossRefGoogle Scholar
  7. 7.
    Kempton, T.: Machine-assisted Phonemic Analysis. University of Sheffield (2012)Google Scholar
  8. 8.
    Muthusamy, Y., Jain, N., Cole, R.: Perceptual Benchmarks for Automatic Language Identification. In: International Conference on Speech and Signal Processing, pp. 333–336 (1994)Google Scholar
  9. 9.
    Torres-Carrasquillo, P., Reynolds, D., Deller, J.: Language identification using Gaussian mixture model tokenization. In: ICASSP, pp. 757–760 (2002)Google Scholar
  10. 10.
    Spada, D., López, I., Toledano, D., González, J.: Acoustic Event Recognition for Low Cost Language Identification. In: V Jornadas en Tecnologías del Habla, pp. 25–28. UAM (2007)Google Scholar
  11. 11.
    Davis, S., Mermelstein, P.: Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustic, Speech and Signal Processing 28, 357–366 (1980)CrossRefGoogle Scholar
  12. 12.
    Montalvo, A., Calvo, J.: Métodos para reducir la variabilidad de sesión en el reconocimiento del locutor. Technical report, CENATAV (2012)Google Scholar
  13. 13.
    Rosenfeld, R.: The CMU Statistical Language Modeling Toolkit and its use in the 1994 ARPA CSR Evaluation. ARPA SLT 95 (1995)Google Scholar
  14. 14.
    Muthusamy, Y., Cole, R., Oshika, B.: The Ogi Multi-Language Telephone Speech Corpus. In: ICSLP, pp. 895–898 (1992)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Ana Montalvo
    • 1
  • José Ramón Calvo de Lara
    • 1
  • Gabriel Hernández-Sierra
    • 1
  1. 1.Advanced Technologies Application Center (CENATAV)PlayaCuba

Personalised recommendations