Abstract
Recent trends have indicated the use of very high computations for solving the problem of speaker recognition. However, there are cases when gains are not commensurate to the additional computations involved. We have studied the effect of size of UBM and the total variability matrix, T, in i-vector modeling on the recognition performance. Results indicate that after T size 50, there is a very small performance improvement. For UBM size, 128 is observed as the optimal mixture count. For performing the experiments, we have used the ALIZE toolkit and TED-LIUM database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Reynolds, D.A., Rose, C.R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Sig. Process. 10(1), 19–41 (2000)
Kenny, P., Boulianne, G., Dumouchel, P.: Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)
Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Jointfactor analysis versus eigenchannelsin speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)
Verma, P., Das, P.K.: i-Vectors in speech processing applications: a survey. Int. J. Speech Technol. 18:1381–2416 (2015)
Dehak, N., Karam, Z.N., Reynolds, D.A., Dehak, R., Campbell, W.M., Glass, J.R.: A channel-blind system for speaker verification. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4536–4539 (2011)
Glembek, O., Burget, L., Matejka, P., Karafiat, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2011)
Aronowitz, H., Barkan, O.: Efficient approximated i-vector extraction. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4789–4792 (2012)
Jiang, Y., Lee, K.A., Tang, Z., Ma, B., Larcher, A., Li, H.: PLDA modeling in i-vector and supervector space for speaker verification. In: INTERSPEECH-2012, pp. 1680–1683 (2012)
Sarkar, A.K., Matrouf, D., Bousquet, P.M., Bonastre, J.F.: Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification. In: INTERSPEECH-2012, 2662–2665 (2012)
Kenny, P., Stafylakis, T., Ouellet, P., Alam, M.J., Dumouchel, P.: PLDA for speaker verification with utterances of arbitrary duration. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7649–7653 (2013)
Biswas, S., Johan R., Koichi S.: i-Vector selection for effective PLDA modeling in speaker recognition. In: Proceedings of Odyssey Workshop, ISCA, pp. 100–105 (2014)
Mandasari, M.I., McLaren, M., van Leeuwen, D.A.: The effect of noise on modern automatic speaker recognition systems. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4249–4252 (2012)
MartÃnez, D., Plchot, O., Burget, L., Glembek, O., MatÄ›jka, P.: Language recognition in ivectors space. In: INTERSPEECH-2011, pp. 861–864 (2011)
Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D.A., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: INTERSPEECH-2011, pp. 857–860 (2011)
Li, M., Liu, W.: Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features. In: INTERSPEECH-2014, pp. 1120–1124 (2014)
Slomka, S., Castellano, P., Barger, P., Sridharan, S., Narasimhan, V.L.: A comparison of Gaussian mixture and multiple binary classifier models for speaker verification. In: Australian and New Zealand Conference on Intelligent Information Systems, 1996, pp. 316–319 (1996)
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Rousseau, A., Deléglise, P.,Estève, Y.: Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (2014)
Liu, Q., Sung, A., Qiao, M.: Temporal derivative-based spectrum and mel-cepstrum audio steganalysis. IEEE Trans. Inf. Forensics Secur. 4(3), 359–368 (2009)
Sharma, S., Kumar, M., Das, P.K.: A technique for dimension reduction of MFCC spectral features for speech recognition. In: International Conference on Industrial Instrumentation and Control 2015, pp. 99–104 (2015)
Larcher, A., Bonastre, J., Fauve, B.G.B., Lee, K., Levy, C., Li, H., et al.: ALIZE 3.0: open source toolkit for state-of-the-art speaker recognition. In: INTERSPEECH 2013, pp. 2768–2772 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kumar, M., Dutta, D., Das, P.K. (2018). Issues in i-Vector Modeling: An Analysis of Total Variability Space and UBM Size. In: Agrawal, S., Devi, A., Wason, R., Bansal, P. (eds) Speech and Language Processing for Human-Machine Communications. Advances in Intelligent Systems and Computing, vol 664. Springer, Singapore. https://doi.org/10.1007/978-981-10-6626-9_18
Download citation
DOI: https://doi.org/10.1007/978-981-10-6626-9_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6625-2
Online ISBN: 978-981-10-6626-9
eBook Packages: EngineeringEngineering (R0)