Factors that Affect i-Vectors Based Language Identification Systems

Romero, David; Salamea, Christian; Chica, Fernando; Narvaez, Erick

doi:10.1007/978-3-030-46785-2_13

David Romero¹⁰,
Christian Salamea^10,11,
Fernando Chica¹⁰ &
…
Erick Narvaez¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1154))

Included in the following conference series:

International Conference on Smart Technologies, Systems and Applications

394 Accesses

Abstract

The performance of a language identification (LID) system that uses i-vectors as features depends on several parameters, such as algorithm parameters and data parameters. In this study, an analysis of performance of a language identification system is considered, for which we focused only on data parameters in the “Back End” of the system, analyzing the influence of the amount of data and the speaker variability in the training phases of the UBM and the total variability Matrix T. Also, the Multiclass logistic regression (MLR) classifiers were analyzed, by balancing the classes of the database to train the classifiers on each language. These tests have been carried out in the Kalaka-3 database; we have used the average detection cost function (Cavg) to evaluate the performance. It is shown experimentally that in the training phase of the UBM, speaker variability is more important than a large amount of data. In the training phase of the total variability matrix T a better performance was obtained when a larger number of audios were used. And finally, balancing classes on each language to train the MLR classifiers allowed us to get a better performance only in certain languages. Using all of these proposed variations, we got a Cavg improvement of 37% in a standard language identification system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: Proceedings of the Annual Conference of the International Speech Communication Association – INTERSPEECH, pp. 857–860 (2011)
Google Scholar
Torres-Carrasquillo, P., Reynolds, D., Deller, J.: Language identification using Gaussian mixture model tokenization. In: 2002 IEEE International Conference on Acoustic, Speech, and Signal Processing, vol. 1, pp. 757–760 (2002)
Google Scholar
Qu, D., Wang, B., Wei, X.: Automatic language identification based on Gaussian mixture model and universal background model. In: Proceedings of SPIE – The International Society for Optical Engineering (2003)
Google Scholar
Jancik, Z., Plchot, O., Brummer, N., Burget, L., Glembek, O.: Data selection and calibration issues in automatic language recognition - investigation with BUT-AGNITIO NIST LRE 2009 system. In: The Speaker and Language Recognition Workshop, pp. 215–221 (2010)
Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumounchel, P., Ouellet, P.: Frond-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
Verma, P., Pradip, K.: i-Vector in speech processing applications: a survey. Int. J. Speech Technol. 18(4), 529–546 (2015)
Article Google Scholar
Salamea, C.: Diseño y Evaluación de Técnicas de Reconocimiento de Idioma mediante la Fusión de Información Fonotáctica y Acustica. Tesis Doctoral. Universidad Politécnica de Madrid (2018)
Google Scholar
Kanagasundaram, A., Vogt, R., Dean, D., Sridha, S., Mason, M.: i-Vector based speaker recognition on short utterances. In: Proceedings of the Annual Conference of the International Speech Communication Association – INTERSPEECH, pp. 2341–2344 (2011)
Google Scholar
Davis, P., Mermelstein, P.: Comparision of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Article Google Scholar
Longting, X., Kong, A., Haizhou, A., Zhen, Y.: Sparse coding of total variability matrix. In: 2012 International Conference on Signal Processing and Communications, SPCOM 2012, pp. 1–5 (2012)
Google Scholar
Torres-Carrasquillo, P., Singer, E., Kohler, M., Greene, R., Reynolds, D., Deller, J.: Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In: International Conference on Spoken Language Processing, ICSLP 2002, pp. 89–92 (2002)
Google Scholar
Villegas, A.: Optimización de un Sistema de Reconocimiento de Idioma Fusionando Información Fonotáctica y Acústica con Redes Neuronales Profundas. Trabajo Fin de Master. Universidad Politécnica de Madrid (2017)
Google Scholar
Wang, W., Song, W., Chen, C., Zhang, Z., Xin, Y.: I-vector features and deep neural network modeling for language recognition. In: 2018 International Conference on Identification, Information and Knowledge in the Internet of things, IIKI 2018 (2019). Procedia Comput. Sci. 147, 36–43
Google Scholar
Hasan, T., Hansen, J.: A study on universal background model training in speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(7), 1890–1899 (2011)
Article Google Scholar
Ghahabi, O.: Deep learning for i-vector speaker and language recognition. Thesis doctoral. Universidad Politécnica de Catalunya (2018)
Google Scholar
Chen, M., Yang, Z., Liang, J., Li, Y., Liu, W.: Improving deep neural networks based multi-accent mandarin speech recognition using i-vectors and accent-specific top layer. In: Proceedings of the Annual Conference of the International Speech Communication Association – INTERSPEECH, pp. 3620–3624 (2015)
Google Scholar
Cumani, S., Laface, P.: Speaker recognition using e-vectors. IEEE Trans. Audio Speech Lang. Process. 26(4), 736–748 (2018)
Article Google Scholar
NIST: The 2015 NIST Language Recognition Evaluation Plan (LRE15). https://www.nist.gov/itl/iad/mig/2015-language-recognition-evaluation. Accessed 21 June 2019
Rodriguez, L., Penagarikano, M., Varona, A., Diez, M., Bordel, G.: KALAKA-3: a database for the assessment of spoken language recognition technology on YouTube audios. Lang. Resour. Eval. 50(2), 221–243 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Interaction, Robotics, and Automation Research Group, Universidad Politécnica Salesiana, Calle Vieja 12-30 y Elia Liut, Cuenca, Ecuador
David Romero, Christian Salamea, Fernando Chica & Erick Narvaez
Speech Technology Group, Information and Telecomunication Center, Universidad Politécnica de Madrid, Ciudad Universitaria Av. Complutense 30, 28040, Madrid, Spain
Christian Salamea

Authors

David Romero
View author publications
You can also search for this author in PubMed Google Scholar
Christian Salamea
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Chica
View author publications
You can also search for this author in PubMed Google Scholar
Erick Narvaez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Romero .

Editor information

Editors and Affiliations

Universidad Politécnica Salesiana, Quito, Ecuador
Fabián R. Narváez
Universidad Politécnica Salesiana, Quito, Ecuador
Diego F. Vallejo
Universidad Politécnica Salesiana, Quito, Ecuador
Paulina A. Morillo
Universidad Politécnica Salesiana, Quito, Ecuador
Julio R. Proaño

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Romero, D., Salamea, C., Chica, F., Narvaez, E. (2020). Factors that Affect i-Vectors Based Language Identification Systems. In: Narváez, F., Vallejo, D., Morillo, P., Proaño, J. (eds) Smart Technologies, Systems and Applications. SmartTech-IC 2019. Communications in Computer and Information Science, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-030-46785-2_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-46785-2_13
Published: 01 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46784-5
Online ISBN: 978-3-030-46785-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics