On the Use of Convolutional Neural Networks in Pairwise Language Recognition

  • Alicia Lozano-Diez
  • Javier Gonzalez-Dominguez
  • Ruben Zazo
  • Daniel Ramos
  • Joaquin Gonzalez-Rodriguez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8854)


Convolutional deep neural networks (CDNNs) have been successfully applied to different tasks within the machine learning field, and, in particular, to speech, speaker and language recognition. In this work, we have applied them to pair-wise language recognition tasks. The proposed systems have been evaluated on challenging pairs of languages from NIST LRE’09 dataset. Results have been compared with two spectral systems based on Factor Analysis and Total Variability (i-vector) strategies, respectively. Moreover, a simple fusion of the developed approaches and the reference systems has been performed. Some individual and fusion systems outperform the reference systems, obtaining ~ 17% of relative improvement in terms of minC DET for one of the challenging pairs.


Convolutional networks CDNNs pair-wise language recognition 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bengio, Y.: Learning deep architectures for AI. Foundations and Trends in Machine Learning 2(1), 1–127 (2009), also published as a book. Now Publishers (2009)Google Scholar
  2. 2.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy) (June 2010), oral PresentationGoogle Scholar
  3. 3.
    Dehak, N., Kenny, P., Dehak, R., Glembek, O., Dumouchel, P., Burget, L., Hubeika, V., Castaldo, F.: Support vector machines and joint factor analysis for speaker verification. In: ICASSP, pp. 4237–4240 (2009)Google Scholar
  4. 4.
    Ghahabi, O., Hernando, J.: i-vector modeling with deep belief networks for multi-session speaker recognition. In: Proc. ODYSSEY (2014)Google Scholar
  5. 5.
    Gonzalez-Dominguez, J., Lopez-Moreno, I., Franco-Pedroso, J., Ramos, D., Toledano, D.T., Gonzalez-Rodriguez, J.: Atvs-uam nist sre 2010 system. In: Proceedings of FALA 2010 (November 2010)Google Scholar
  6. 6.
    Gonzalez-Dominguez, J., Lopez-Moreno, I., Franco-Pedroso, J., Ramos, D., Toledano, D.T., Gonzalez-Rodriguez, J.: Multilevel and session variability compensated language recognition: Atvs-uam systems at nist lre 2009. IEEE Journal on Selected Topics in Signal Processing (2010) (article in press)Google Scholar
  7. 7.
    Hinton, G., Deng, L., Yu, D., Dahl, G., Rahman Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine (2012)Google Scholar
  8. 8.
    Jaitly, N., Nguyen, P., Senior, A., Vanhoucke, V.: Application of pretrained deep neural networks to large vocabulary speech recognition. In: Proceedings of Interspeech 2012 (2012)Google Scholar
  9. 9.
    Kenny, P., Boulianne, G., Dumouchel, P.: Eigenvoice modeling with sparse training data. IEEE Transactions on Speech and Audio Processing 13(3), 345–354 (2005)CrossRefGoogle Scholar
  10. 10.
    Kenny, P., Gupta, V., Stafylakis, T., Ouellet, P., Alam, J.: Deep neural networks for extracting baum-welch statistics for speaker recognition. In: Proc. ODYSSEY (2014)Google Scholar
  11. 11.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Intelligent Signal Processing, pp. 306–351. IEEE Press (2001)Google Scholar
  12. 12.
    Lee, H., Largman, Y., Pham, P., Ng, A.Y.: Unsupervised feature learning for audio classification using convolutional deep belief networks. In: Advances in Neural Information Processing Systems 22, pp. 1096–1104 (2009)Google Scholar
  13. 13.
    Lei, Y., Ferrer, L., Lawson, A., McLaren, M., Scheffer, N.: Application of convolutional neural networks to language identification in noisy conditions. In: Proc. ODYSSEY (2014)Google Scholar
  14. 14.
    LISA: Deep Learning Tutorial. University of Montreal,
  15. 15.
    Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O.: Automatic language identification using deep neural networks. In: Proc. ICASSP (2014)Google Scholar
  16. 16.
    Mohamed, A.R., Dahl, G.E., Hinton, G.: Acoustic modeling using deep belief networks. IEEE Trans. on Audio, Speech and Language Processing,
  17. 17.
    NIST: The 2009 nist language recognition evaluation plan (2009),
  18. 18.
    Penagarikano, M., Varona, A., Diez, M., Rodriguez-Fuentes, L.J., Bordel, G.: Study of different backends in a state-of-the-art language recognition system. In: INTERSPEECH (2012)Google Scholar
  19. 19.
    Van Leeuwen, D.A., Brummer, N.: Channel-dependent gmm and multi-class logistic regression models for language recognition. In: IEEE Odyssey 2006: The Speaker and Language Recognition Workshop, pp. 1–8. IEEE (2006)Google Scholar
  20. 20.
    Vogt, R., Sridharan, S.: Explicit modelling of session variability for speaker verification. Computer Speech & Language 22(1), 17–38 (2008)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Alicia Lozano-Diez
    • 1
  • Javier Gonzalez-Dominguez
    • 1
  • Ruben Zazo
    • 1
  • Daniel Ramos
    • 1
  • Joaquin Gonzalez-Rodriguez
    • 1
  1. 1.ATVS - Biometric Recognition GroupUniversidad Autonoma de Madrid (UAM)Spain

Personalised recommendations