Issues in i-Vector Modeling: An Analysis of Total Variability Space and UBM Size

Kumar, Mohit; Dutta, Dipangshu; Das, Pradip K.

doi:10.1007/978-981-10-6626-9_18

Mohit Kumar¹⁸,
Dipangshu Dutta¹⁸ &
Pradip K. Das¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 664))

855 Accesses
1 Citations

Abstract

Recent trends have indicated the use of very high computations for solving the problem of speaker recognition. However, there are cases when gains are not commensurate to the additional computations involved. We have studied the effect of size of UBM and the total variability matrix, T, in i-vector modeling on the recognition performance. Results indicate that after T size 50, there is a very small performance improvement. For UBM size, 128 is observed as the optimal mixture count. For performing the experiments, we have used the ALIZE toolkit and TED-LIUM database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Reynolds, D.A., Rose, C.R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Sig. Process. 10(1), 19–41 (2000)
Article Google Scholar
Kenny, P., Boulianne, G., Dumouchel, P.: Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)
Article Google Scholar
Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Jointfactor analysis versus eigenchannelsin speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)
Article Google Scholar
Verma, P., Das, P.K.: i-Vectors in speech processing applications: a survey. Int. J. Speech Technol. 18:1381–2416 (2015)
Google Scholar
Dehak, N., Karam, Z.N., Reynolds, D.A., Dehak, R., Campbell, W.M., Glass, J.R.: A channel-blind system for speaker verification. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4536–4539 (2011)
Google Scholar
Glembek, O., Burget, L., Matejka, P., Karafiat, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2011)
Google Scholar
Aronowitz, H., Barkan, O.: Efficient approximated i-vector extraction. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4789–4792 (2012)
Google Scholar
Jiang, Y., Lee, K.A., Tang, Z., Ma, B., Larcher, A., Li, H.: PLDA modeling in i-vector and supervector space for speaker verification. In: INTERSPEECH-2012, pp. 1680–1683 (2012)
Google Scholar
Sarkar, A.K., Matrouf, D., Bousquet, P.M., Bonastre, J.F.: Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification. In: INTERSPEECH-2012, 2662–2665 (2012)
Google Scholar
Kenny, P., Stafylakis, T., Ouellet, P., Alam, M.J., Dumouchel, P.: PLDA for speaker verification with utterances of arbitrary duration. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7649–7653 (2013)
Google Scholar
Biswas, S., Johan R., Koichi S.: i-Vector selection for effective PLDA modeling in speaker recognition. In: Proceedings of Odyssey Workshop, ISCA, pp. 100–105 (2014)
Google Scholar
Mandasari, M.I., McLaren, M., van Leeuwen, D.A.: The effect of noise on modern automatic speaker recognition systems. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4249–4252 (2012)
Google Scholar
Martínez, D., Plchot, O., Burget, L., Glembek, O., Matějka, P.: Language recognition in ivectors space. In: INTERSPEECH-2011, pp. 861–864 (2011)
Google Scholar
Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D.A., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: INTERSPEECH-2011, pp. 857–860 (2011)
Google Scholar
Li, M., Liu, W.: Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features. In: INTERSPEECH-2014, pp. 1120–1124 (2014)
Google Scholar
Slomka, S., Castellano, P., Barger, P., Sridharan, S., Narasimhan, V.L.: A comparison of Gaussian mixture and multiple binary classifier models for speaker verification. In: Australian and New Zealand Conference on Intelligent Information Systems, 1996, pp. 316–319 (1996)
Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
Rousseau, A., Deléglise, P.,Estève, Y.: Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (2014)
Google Scholar
Liu, Q., Sung, A., Qiao, M.: Temporal derivative-based spectrum and mel-cepstrum audio steganalysis. IEEE Trans. Inf. Forensics Secur. 4(3), 359–368 (2009)
Article Google Scholar
Sharma, S., Kumar, M., Das, P.K.: A technique for dimension reduction of MFCC spectral features for speech recognition. In: International Conference on Industrial Instrumentation and Control 2015, pp. 99–104 (2015)
Google Scholar
Larcher, A., Bonastre, J., Fauve, B.G.B., Lee, K., Levy, C., Li, H., et al.: ALIZE 3.0: open source toolkit for state-of-the-art speaker recognition. In: INTERSPEECH 2013, pp. 2768–2772 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of CSE, IIT Guwahati, Assam, 781039, India
Mohit Kumar, Dipangshu Dutta & Pradip K. Das

Authors

Mohit Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Dipangshu Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Pradip K. Das
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohit Kumar .

Editor information

Editors and Affiliations

KIIT, Gurgaon, Haryana, India
S. S. Agrawal
Bhai Parmanand Institute of Business Studies, New Delhi, Delhi, India
Amita Devi
MCA Department, Bhrati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi, Delhi, India
Ritika Wason
Maharaja Surajmal Institute of Technology, GGSIP University, New Delhi, Delhi, India
Poonam Bansal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kumar, M., Dutta, D., Das, P.K. (2018). Issues in i-Vector Modeling: An Analysis of Total Variability Space and UBM Size. In: Agrawal, S., Devi, A., Wason, R., Bansal, P. (eds) Speech and Language Processing for Human-Machine Communications. Advances in Intelligent Systems and Computing, vol 664. Springer, Singapore. https://doi.org/10.1007/978-981-10-6626-9_18

Download citation

DOI: https://doi.org/10.1007/978-981-10-6626-9_18
Published: 16 November 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6625-2
Online ISBN: 978-981-10-6626-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics