Skip to main content

Issues in i-Vector Modeling: An Analysis of Total Variability Space and UBM Size

  • Conference paper
  • First Online:
Speech and Language Processing for Human-Machine Communications

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 664))

Abstract

Recent trends have indicated the use of very high computations for solving the problem of speaker recognition. However, there are cases when gains are not commensurate to the additional computations involved. We have studied the effect of size of UBM and the total variability matrix, T, in i-vector modeling on the recognition performance. Results indicate that after T size 50, there is a very small performance improvement. For UBM size, 128 is observed as the optimal mixture count. For performing the experiments, we have used the ALIZE toolkit and TED-LIUM database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Reynolds, D.A., Rose, C.R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)

    Article  Google Scholar 

  2. Reynolds, D.A., Quatieri, T.F., Dunn, R.B.: Speaker verification using adapted Gaussian mixture models. Digit. Sig. Process. 10(1), 19–41 (2000)

    Article  Google Scholar 

  3. Kenny, P., Boulianne, G., Dumouchel, P.: Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)

    Article  Google Scholar 

  4. Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Jointfactor analysis versus eigenchannelsin speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)

    Article  Google Scholar 

  5. Verma, P., Das, P.K.: i-Vectors in speech processing applications: a survey. Int. J. Speech Technol. 18:1381–2416 (2015)

    Google Scholar 

  6. Dehak, N., Karam, Z.N., Reynolds, D.A., Dehak, R., Campbell, W.M., Glass, J.R.: A channel-blind system for speaker verification. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4536–4539 (2011)

    Google Scholar 

  7. Glembek, O., Burget, L., Matejka, P., Karafiat, M., Kenny, P.: Simplification and optimization of i-vector extraction. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2011)

    Google Scholar 

  8. Aronowitz, H., Barkan, O.: Efficient approximated i-vector extraction. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4789–4792 (2012)

    Google Scholar 

  9. Jiang, Y., Lee, K.A., Tang, Z., Ma, B., Larcher, A., Li, H.: PLDA modeling in i-vector and supervector space for speaker verification. In: INTERSPEECH-2012, pp. 1680–1683 (2012)

    Google Scholar 

  10. Sarkar, A.K., Matrouf, D., Bousquet, P.M., Bonastre, J.F.: Study of the effect of i-vector modeling on short and mismatch utterance duration for speaker verification. In: INTERSPEECH-2012, 2662–2665 (2012)

    Google Scholar 

  11. Kenny, P., Stafylakis, T., Ouellet, P., Alam, M.J., Dumouchel, P.: PLDA for speaker verification with utterances of arbitrary duration. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7649–7653 (2013)

    Google Scholar 

  12. Biswas, S., Johan R., Koichi S.: i-Vector selection for effective PLDA modeling in speaker recognition. In: Proceedings of Odyssey Workshop, ISCA, pp. 100–105 (2014)

    Google Scholar 

  13. Mandasari, M.I., McLaren, M., van Leeuwen, D.A.: The effect of noise on modern automatic speaker recognition systems. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4249–4252 (2012)

    Google Scholar 

  14. Martínez, D., Plchot, O., Burget, L., Glembek, O., Matějka, P.: Language recognition in ivectors space. In: INTERSPEECH-2011, pp. 861–864 (2011)

    Google Scholar 

  15. Dehak, N., Torres-Carrasquillo, P.A., Reynolds, D.A., Dehak, R.: Language recognition via i-vectors and dimensionality reduction. In: INTERSPEECH-2011, pp. 857–860 (2011)

    Google Scholar 

  16. Li, M., Liu, W.: Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features. In: INTERSPEECH-2014, pp. 1120–1124 (2014)

    Google Scholar 

  17. Slomka, S., Castellano, P., Barger, P., Sridharan, S., Narasimhan, V.L.: A comparison of Gaussian mixture and multiple binary classifier models for speaker verification. In: Australian and New Zealand Conference on Intelligent Information Systems, 1996, pp. 316–319 (1996)

    Google Scholar 

  18. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  19. Rousseau, A., Deléglise, P.,Estève, Y.: Enhancing the TED-LIUM corpus with selected data for language modeling and more TED talks. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (2014)

    Google Scholar 

  20. Liu, Q., Sung, A., Qiao, M.: Temporal derivative-based spectrum and mel-cepstrum audio steganalysis. IEEE Trans. Inf. Forensics Secur. 4(3), 359–368 (2009)

    Article  Google Scholar 

  21. Sharma, S., Kumar, M., Das, P.K.: A technique for dimension reduction of MFCC spectral features for speech recognition. In: International Conference on Industrial Instrumentation and Control 2015, pp. 99–104 (2015)

    Google Scholar 

  22. Larcher, A., Bonastre, J., Fauve, B.G.B., Lee, K., Levy, C., Li, H., et al.: ALIZE 3.0: open source toolkit for state-of-the-art speaker recognition. In: INTERSPEECH 2013, pp. 2768–2772 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohit Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kumar, M., Dutta, D., Das, P.K. (2018). Issues in i-Vector Modeling: An Analysis of Total Variability Space and UBM Size. In: Agrawal, S., Devi, A., Wason, R., Bansal, P. (eds) Speech and Language Processing for Human-Machine Communications. Advances in Intelligent Systems and Computing, vol 664. Springer, Singapore. https://doi.org/10.1007/978-981-10-6626-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6626-9_18

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6625-2

  • Online ISBN: 978-981-10-6626-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics