Skip to main content

VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency

  • Chapter
  • First Online:
Advanced Computing and Systems for Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 883))

Abstract

The biometric recognition of human through the speech signal is known as automatic speaker recognition (ASR) or voice biometric recognition. Plenty of acoustic features have been used in ASR so far, but among them Mel-frequency cepstral coefficients (MFCCs) and Gammatone frequency cepstral coefficients (GFCCs) are popularly used. To make ASR language and channel independent (if training and testing microphones and languages are not same), i-Vector feature and unwanted variability compensation techniques like linear discriminative analysis (LDA) or probabilistic LDA (PLDA), within-class covariance normalization (WCCN) are extensively used. At the very present days, the techniques for modeling/classification that are used are Gaussian mixture models (GMMs), vector quantization (VQ), hidden Markov model (HMM), deep neural network (DNN), and artificial neural network (ANN). Sometimes, model-domain normalization techniques are used to compensate unwanted variability due to language and channel mismatch in training and testing data. In the present paper, we have used maximum log-likelihood (MLL) to evaluate the performance of ASR on the databases(DBs), namely ELSDSR, Hyke-2011, and IITG-MV SR Phase-I & II, based on MFCCs and VQ/GMM where the scoring technique MLL is used for the recognition of speakers. The experiment is carried out to examine the language dependency and environmental mismatch between training and testing data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M.: An ASR system using MFCC and VQ/GMM with emphasis on environmental dependency. In: 2017 IEEE Calcutta Conference (CALCON), pp. 362–366, Dec 2017

    Google Scholar 

  2. Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M.: Closed-set text-independent automatic speaker recognition system using VQ/GMM. In: Intelligent Engineering Informatics, pp. 337–346. Springer Singapore, Singapore (2018)

    Google Scholar 

  3. Bie, F., Wang, D., Wang, J., Zheng, T.F.: Detection and reconstruction of clipped speech for speaker recognition. Speech Commun. 72, 218–231 (2015)

    Article  Google Scholar 

  4. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  5. Dişken, G., Tüfekçi, Z., Saribulut, L., Çevik, U.: A review on feature extraction for speaker recognition under degraded conditions. IETE Tech. Rev. 34(3), 321–332 (2017)

    Article  Google Scholar 

  6. Fant, G.: Acoustic Theory of Speech Production: With Calculations Based on X-Ray Studies of Russian Articulations, p. 2. Walter de Gruyter (1971)

    Google Scholar 

  7. Feng, L., Hansen, L.K.: A new database for speaker recognition. Technical report (2005)

    Google Scholar 

  8. Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. Interspeech 2011, 249–252 (2011)

    Google Scholar 

  9. Ghahabi, O., Hernando, J.: Restricted Boltzmann machines for vector representation of speech in speaker recognition. Comput. Speech Lang. 47, 16–29 (2018)

    Article  Google Scholar 

  10. Haris, B.C., Pradhan, G., Misra, A., Prasanna, S., Das, R.K., Sinha, R.: Multivariability speaker recognition database in Indian scenario. Int. J. Speech Technol. 15(4), 441–453 (2012)

    Article  Google Scholar 

  11. Hirszhorn, A., Dov, D., Talmon, R., Cohen, I.: Transient interference suppression in speech signals based on the OM-LSA algorithm. In: International Workshop on Acoustic Signal Enhancement; Proceedings of IWAENC 2012, pp. 1–4. VDE (2012)

    Google Scholar 

  12. Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-vector based speaker recognition on short utterances. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association, pp. 2341–2344. International Speech Communication Association (ISCA) (2011)

    Google Scholar 

  13. Kanrar, S.: i vector used in speaker identification by dimension compactness. arXiv:1704.03934 (2017)

  14. Kheder, W.B., Matrouf, D., Bousquet, P.M., Bonastre, J.F., Ajili, M.: Fast i-vector denoising using map estimation and a noise distributions database for robust speaker recognition. Comput. Speech Lang. 45, 104–122 (2017)

    Article  Google Scholar 

  15. Madikeri, S.R., Murthy, H.A.: Mel filter bank energy-based slope feature and its application to speaker recognition. In: 2011 National Conference on Communications (NCC), pp. 1–4. IEEE (2011)

    Google Scholar 

  16. Murthy, H.A., Yegnanarayana, B.: Group delay functions and its applications in speech technology. Sadhana 36(5), 745–782 (2011)

    Article  Google Scholar 

  17. Nakagawa, S., Wang, L., Ohtsuka, S.: Speaker identification and verification by combining MFCC and phase information. IEEE Trans. Audio Speech Lang. Process. 20(4), 1085–1095 (2012)

    Article  Google Scholar 

  18. Pal, S.K., Mitra, P.: Pattern Recognition Algorithms for Data Mining. CRC Press (2004)

    Google Scholar 

  19. Paulose, S., Mathew, D., Thomas, A.: Performance evaluation of different modeling methods and classifiers with MFCC and IHC features for speaker recognition. Procedia Comput. Sci. 115, 55–62 (2017)

    Article  Google Scholar 

  20. Pruzansky, S.: Pattern-matching procedure for automatic talker recognition. J. Acoust. Soc. Am. 35(3), 354–358 (1963)

    Article  Google Scholar 

  21. Reda, A., Panjwani, S., Cutrell, E.: Hyke: a low-cost remote attendance tracking system for developing regions. In: Proceedings of the 5th ACM Workshop on Networked Systems for Developing Regions, pp. 15–20. ACM (2011)

    Google Scholar 

  22. Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)

    Article  Google Scholar 

  23. Sapijaszko, G.I., Mikhael, W.B.: An overview of recent window based feature extraction algorithms for speaker recognition. In: 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 880–883. IEEE (2012)

    Google Scholar 

  24. Soong, F.K., Rosenberg, A.E., Juang, B.H., Rabiner, L.R.: Report: a vector quantization approach to speaker recognition. AT&T Tech. J. 66(2), 14–26 (1987)

    Article  Google Scholar 

  25. Xu, L., Lee, K.A., Li, H., Yang, Z.: Rapid computation of i-vector. In: Odyssey: The Speaker and Language Recognition Workshop, pp. 47–52 (2016)

    Google Scholar 

  26. Yamada, T., Wang, L., Kai, A.: Improvement of distant-talking speaker identification using bottleneck features of DNN. In: Interspeech, pp. 3661–3664 (2013)

    Google Scholar 

  27. Zhao, X., Wang, D.: Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7204–7208. IEEE (2013)

    Google Scholar 

  28. Zhao, X., Wang, Y., Wang, D.: Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(4), 836–845 (2014)

    Article  Google Scholar 

Download references

Acknowledgements

This project is partially supported by the CMATER laboratory of the Computer Science and Engineering Department, Jadavpur University, India, TEQIP-II, PURSE-II and UPE-II projects of Govt. of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bidhan Barai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M. (2019). VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency. In: Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, vol 883. Springer, Singapore. https://doi.org/10.1007/978-981-13-3702-4_8

Download citation

Publish with us

Policies and ethics