Abstract
The performance of a language identification (LID) system that uses i-vectors as features depends on several parameters, such as algorithm parameters and data parameters. In this study, an analysis of performance of a language identification system is considered, for which we focused only on data parameters in the “Back End” of the system, analyzing the influence of the amount of data and the speaker variability in the training phases of the UBM and the total variability Matrix T. Also, the Multiclass logistic regression (MLR) classifiers were analyzed, by balancing the classes of the database to train the classifiers on each language. These tests have been carried out in the Kalaka-3 database; we have used the average detection cost function (Cavg) to evaluate the performance. It is shown experimentally that in the training phase of the UBM, speaker variability is more important than a large amount of data. In the training phase of the total variability matrix T a better performance was obtained when a larger number of audios were used. And finally, balancing classes on each language to train the MLR classifiers allowed us to get a better performance only in certain languages. Using all of these proposed variations, we got a Cavg improvement of 37% in a standard language identification system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: Proceedings of the Annual Conference of the International Speech Communication Association – INTERSPEECH, pp. 857–860 (2011)
Torres-Carrasquillo, P., Reynolds, D., Deller, J.: Language identification using Gaussian mixture model tokenization. In: 2002 IEEE International Conference on Acoustic, Speech, and Signal Processing, vol. 1, pp. 757–760 (2002)
Qu, D., Wang, B., Wei, X.: Automatic language identification based on Gaussian mixture model and universal background model. In: Proceedings of SPIE – The International Society for Optical Engineering (2003)
Jancik, Z., Plchot, O., Brummer, N., Burget, L., Glembek, O.: Data selection and calibration issues in automatic language recognition - investigation with BUT-AGNITIO NIST LRE 2009 system. In: The Speaker and Language Recognition Workshop, pp. 215–221 (2010)
Dehak, N., Kenny, P., Dehak, R., Dumounchel, P., Ouellet, P.: Frond-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Verma, P., Pradip, K.: i-Vector in speech processing applications: a survey. Int. J. Speech Technol. 18(4), 529–546 (2015)
Salamea, C.: Diseño y Evaluación de Técnicas de Reconocimiento de Idioma mediante la Fusión de Información Fonotáctica y Acustica. Tesis Doctoral. Universidad Politécnica de Madrid (2018)
Kanagasundaram, A., Vogt, R., Dean, D., Sridha, S., Mason, M.: i-Vector based speaker recognition on short utterances. In: Proceedings of the Annual Conference of the International Speech Communication Association – INTERSPEECH, pp. 2341–2344 (2011)
Davis, P., Mermelstein, P.: Comparision of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)
Longting, X., Kong, A., Haizhou, A., Zhen, Y.: Sparse coding of total variability matrix. In: 2012 International Conference on Signal Processing and Communications, SPCOM 2012, pp. 1–5 (2012)
Torres-Carrasquillo, P., Singer, E., Kohler, M., Greene, R., Reynolds, D., Deller, J.: Approaches to language identification using Gaussian mixture models and shifted delta cepstral features. In: International Conference on Spoken Language Processing, ICSLP 2002, pp. 89–92 (2002)
Villegas, A.: Optimización de un Sistema de Reconocimiento de Idioma Fusionando Información Fonotáctica y Acústica con Redes Neuronales Profundas. Trabajo Fin de Master. Universidad Politécnica de Madrid (2017)
Wang, W., Song, W., Chen, C., Zhang, Z., Xin, Y.: I-vector features and deep neural network modeling for language recognition. In: 2018 International Conference on Identification, Information and Knowledge in the Internet of things, IIKI 2018 (2019). Procedia Comput. Sci. 147, 36–43
Hasan, T., Hansen, J.: A study on universal background model training in speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(7), 1890–1899 (2011)
Ghahabi, O.: Deep learning for i-vector speaker and language recognition. Thesis doctoral. Universidad Politécnica de Catalunya (2018)
Chen, M., Yang, Z., Liang, J., Li, Y., Liu, W.: Improving deep neural networks based multi-accent mandarin speech recognition using i-vectors and accent-specific top layer. In: Proceedings of the Annual Conference of the International Speech Communication Association – INTERSPEECH, pp. 3620–3624 (2015)
Cumani, S., Laface, P.: Speaker recognition using e-vectors. IEEE Trans. Audio Speech Lang. Process. 26(4), 736–748 (2018)
NIST: The 2015 NIST Language Recognition Evaluation Plan (LRE15). https://www.nist.gov/itl/iad/mig/2015-language-recognition-evaluation. Accessed 21 June 2019
Rodriguez, L., Penagarikano, M., Varona, A., Diez, M., Bordel, G.: KALAKA-3: a database for the assessment of spoken language recognition technology on YouTube audios. Lang. Resour. Eval. 50(2), 221–243 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Romero, D., Salamea, C., Chica, F., Narvaez, E. (2020). Factors that Affect i-Vectors Based Language Identification Systems. In: Narváez, F., Vallejo, D., Morillo, P., Proaño, J. (eds) Smart Technologies, Systems and Applications. SmartTech-IC 2019. Communications in Computer and Information Science, vol 1154. Springer, Cham. https://doi.org/10.1007/978-3-030-46785-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-46785-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46784-5
Online ISBN: 978-3-030-46785-2
eBook Packages: Computer ScienceComputer Science (R0)