Skip to main content

Factors that Affect i-Vectors Based Language Identification Systems

  • Conference paper
  • First Online:
Smart Technologies, Systems and Applications (SmartTech-IC 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1154))

  • 394 Accesses

Abstract

The performance of a language identification (LID) system that uses i-vectors as features depends on several parameters, such as algorithm parameters and data parameters. In this study, an analysis of performance of a language identification system is considered, for which we focused only on data parameters in the “Back End” of the system, analyzing the influence of the amount of data and the speaker variability in the training phases of the UBM and the total variability Matrix T. Also, the Multiclass logistic regression (MLR) classifiers were analyzed, by balancing the classes of the database to train the classifiers on each language. These tests have been carried out in the Kalaka-3 database; we have used the average detection cost function (Cavg) to evaluate the performance. It is shown experimentally that in the training phase of the UBM, speaker variability is more important than a large amount of data. In the training phase of the total variability matrix T a better performance was obtained when a larger number of audios were used. And finally, balancing classes on each language to train the MLR classifiers allowed us to get a better performance only in certain languages. Using all of these proposed variations, we got a Cavg improvement of 37% in a standard language identification system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: Proceedings of the Annual Conference of the International Speech Communication Association – INTERSPEECH, pp. 857–860 (2011)

    Google Scholar 

  2. Torres-Carrasquillo, P., Reynolds, D., Deller, J.: Language identification using Gaussian mixture model tokenization. In: 2002 IEEE International Conference on Acoustic, Speech, and Signal Processing, vol. 1, pp. 757–760 (2002)

    Google Scholar 

  3. Qu, D., Wang, B., Wei, X.: Automatic language identification based on Gaussian mixture model and universal background model. In: Proceedings of SPIE – The International Society for Optical Engineering (2003)

    Google Scholar 

  4. Jancik, Z., Plchot, O., Brummer, N., Burget, L., Glembek, O.: Data selection and calibration issues in automatic language recognition - investigation with BUT-AGNITIO NIST LRE 2009 system. In: The Speaker and Language Recognition Workshop, pp. 215–221 (2010)

    Google Scholar 

  5. Dehak, N., Kenny, P., Dehak, R., Dumounchel, P., Ouellet, P.: Frond-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  6. Verma, P., Pradip, K.: i-Vector in speech processing applications: a survey. Int. J. Speech Technol. 18(4), 529–546 (2015)

    Article  Google Scholar 

  7. Salamea, C.: Diseño y Evaluación de Técnicas de Reconocimiento de Idioma mediante la Fusión de Información Fonotáctica y Acustica. Tesis Doctoral. Universidad Politécnica de Madrid (2018)

    Google Scholar 

  8. Kanagasundaram, A., Vogt, R., Dean, D., Sridha, S., Mason, M.: i-Vector based speaker recognition on short utterances. In: Proceedings of the Annual Conference of the International Speech Communication Association – INTERSPEECH, pp. 2341–2344 (2011)

    Google Scholar 

  9. Davis, P., Mermelstein, P.: Comparision of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  10. Longting, X., Kong, A., Haizhou, A., Zhen, Y.: Sparse coding of total variability matrix. In: 2012 International Conference on Signal Processing and Communications, SPCOM 2012, pp. 1–5 (2012)

    Google Scholar 

  11. Torres-Carrasquillo, P., Singer, E., Kohler, M., Greene, R., Reynolds, D., Deller, J.: Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In: International Conference on Spoken Language Processing, ICSLP 2002, pp. 89–92 (2002)

    Google Scholar 

  12. Villegas, A.: Optimización de un Sistema de Reconocimiento de Idioma Fusionando Información Fonotáctica y Acústica con Redes Neuronales Profundas. Trabajo Fin de Master. Universidad Politécnica de Madrid (2017)

    Google Scholar 

  13. Wang, W., Song, W., Chen, C., Zhang, Z., Xin, Y.: I-vector features and deep neural network modeling for language recognition. In: 2018 International Conference on Identification, Information and Knowledge in the Internet of things, IIKI 2018 (2019). Procedia Comput. Sci. 147, 36–43

    Google Scholar 

  14. Hasan, T., Hansen, J.: A study on universal background model training in speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(7), 1890–1899 (2011)

    Article  Google Scholar 

  15. Ghahabi, O.: Deep learning for i-vector speaker and language recognition. Thesis doctoral. Universidad Politécnica de Catalunya (2018)

    Google Scholar 

  16. Chen, M., Yang, Z., Liang, J., Li, Y., Liu, W.: Improving deep neural networks based multi-accent mandarin speech recognition using i-vectors and accent-specific top layer. In: Proceedings of the Annual Conference of the International Speech Communication Association – INTERSPEECH, pp. 3620–3624 (2015)

    Google Scholar 

  17. Cumani, S., Laface, P.: Speaker recognition using e-vectors. IEEE Trans. Audio Speech Lang. Process. 26(4), 736–748 (2018)

    Article  Google Scholar 

  18. NIST: The 2015 NIST Language Recognition Evaluation Plan (LRE15). https://www.nist.gov/itl/iad/mig/2015-language-recognition-evaluation. Accessed 21 June 2019

  19. Rodriguez, L., Penagarikano, M., Varona, A., Diez, M., Bordel, G.: KALAKA-3: a database for the assessment of spoken language recognition technology on YouTube audios. Lang. Resour. Eval. 50(2), 221–243 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Romero .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Romero, D., Salamea, C., Chica, F., Narvaez, E. (2020). Factors that Affect i-Vectors Based Language Identification Systems. In: Narváez, F., Vallejo, D., Morillo, P., Proaño, J. (eds) Smart Technologies, Systems and Applications. SmartTech-IC 2019. Communications in Computer and Information Science, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-030-46785-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-46785-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-46784-5

  • Online ISBN: 978-3-030-46785-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics