VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency

Barai, Bidhan; Das, Debayan; Das, Nibaran; Basu, Subhadip; Nasipuri, Mita

doi:10.1007/978-981-13-3702-4_8

Bidhan Barai¹⁸,
Debayan Das¹⁸,
Nibaran Das¹⁸,
Subhadip Basu¹⁸ &
…
Mita Nasipuri¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 883))

308 Accesses
4 Citations

Abstract

The biometric recognition of human through the speech signal is known as automatic speaker recognition (ASR) or voice biometric recognition. Plenty of acoustic features have been used in ASR so far, but among them Mel-frequency cepstral coefficients (MFCCs) and Gammatone frequency cepstral coefficients (GFCCs) are popularly used. To make ASR language and channel independent (if training and testing microphones and languages are not same), i-Vector feature and unwanted variability compensation techniques like linear discriminative analysis (LDA) or probabilistic LDA (PLDA), within-class covariance normalization (WCCN) are extensively used. At the very present days, the techniques for modeling/classification that are used are Gaussian mixture models (GMMs), vector quantization (VQ), hidden Markov model (HMM), deep neural network (DNN), and artificial neural network (ANN). Sometimes, model-domain normalization techniques are used to compensate unwanted variability due to language and channel mismatch in training and testing data. In the present paper, we have used maximum log-likelihood (MLL) to evaluate the performance of ASR on the databases(DBs), namely ELSDSR, Hyke-2011, and IITG-MV SR Phase-I & II, based on MFCCs and VQ/GMM where the scoring technique MLL is used for the recognition of speakers. The experiment is carried out to examine the language dependency and environmental mismatch between training and testing data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M.: An ASR system using MFCC and VQ/GMM with emphasis on environmental dependency. In: 2017 IEEE Calcutta Conference (CALCON), pp. 362–366, Dec 2017
Google Scholar
Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M.: Closed-set text-independent automatic speaker recognition system using VQ/GMM. In: Intelligent Engineering Informatics, pp. 337–346. Springer Singapore, Singapore (2018)
Google Scholar
Bie, F., Wang, D., Wang, J., Zheng, T.F.: Detection and reconstruction of clipped speech for speaker recognition. Speech Commun. 72, 218–231 (2015)
Article Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Dişken, G., Tüfekçi, Z., Saribulut, L., Çevik, U.: A review on feature extraction for speaker recognition under degraded conditions. IETE Tech. Rev. 34(3), 321–332 (2017)
Article Google Scholar
Fant, G.: Acoustic Theory of Speech Production: With Calculations Based on X-Ray Studies of Russian Articulations, p. 2. Walter de Gruyter (1971)
Google Scholar
Feng, L., Hansen, L.K.: A new database for speaker recognition. Technical report (2005)
Google Scholar
Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. Interspeech 2011, 249–252 (2011)
Google Scholar
Ghahabi, O., Hernando, J.: Restricted Boltzmann machines for vector representation of speech in speaker recognition. Comput. Speech Lang. 47, 16–29 (2018)
Article Google Scholar
Haris, B.C., Pradhan, G., Misra, A., Prasanna, S., Das, R.K., Sinha, R.: Multivariability speaker recognition database in Indian scenario. Int. J. Speech Technol. 15(4), 441–453 (2012)
Article Google Scholar
Hirszhorn, A., Dov, D., Talmon, R., Cohen, I.: Transient interference suppression in speech signals based on the OM-LSA algorithm. In: International Workshop on Acoustic Signal Enhancement; Proceedings of IWAENC 2012, pp. 1–4. VDE (2012)
Google Scholar
Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-vector based speaker recognition on short utterances. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association, pp. 2341–2344. International Speech Communication Association (ISCA) (2011)
Google Scholar
Kanrar, S.: i vector used in speaker identification by dimension compactness. arXiv:1704.03934 (2017)
Kheder, W.B., Matrouf, D., Bousquet, P.M., Bonastre, J.F., Ajili, M.: Fast i-vector denoising using map estimation and a noise distributions database for robust speaker recognition. Comput. Speech Lang. 45, 104–122 (2017)
Article Google Scholar
Madikeri, S.R., Murthy, H.A.: Mel filter bank energy-based slope feature and its application to speaker recognition. In: 2011 National Conference on Communications (NCC), pp. 1–4. IEEE (2011)
Google Scholar
Murthy, H.A., Yegnanarayana, B.: Group delay functions and its applications in speech technology. Sadhana 36(5), 745–782 (2011)
Article Google Scholar
Nakagawa, S., Wang, L., Ohtsuka, S.: Speaker identification and verification by combining MFCC and phase information. IEEE Trans. Audio Speech Lang. Process. 20(4), 1085–1095 (2012)
Article Google Scholar
Pal, S.K., Mitra, P.: Pattern Recognition Algorithms for Data Mining. CRC Press (2004)
Google Scholar
Paulose, S., Mathew, D., Thomas, A.: Performance evaluation of different modeling methods and classifiers with MFCC and IHC features for speaker recognition. Procedia Comput. Sci. 115, 55–62 (2017)
Article Google Scholar
Pruzansky, S.: Pattern-matching procedure for automatic talker recognition. J. Acoust. Soc. Am. 35(3), 354–358 (1963)
Article Google Scholar
Reda, A., Panjwani, S., Cutrell, E.: Hyke: a low-cost remote attendance tracking system for developing regions. In: Proceedings of the 5th ACM Workshop on Networked Systems for Developing Regions, pp. 15–20. ACM (2011)
Google Scholar
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Article Google Scholar
Sapijaszko, G.I., Mikhael, W.B.: An overview of recent window based feature extraction algorithms for speaker recognition. In: 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 880–883. IEEE (2012)
Google Scholar
Soong, F.K., Rosenberg, A.E., Juang, B.H., Rabiner, L.R.: Report: a vector quantization approach to speaker recognition. AT&T Tech. J. 66(2), 14–26 (1987)
Article Google Scholar
Xu, L., Lee, K.A., Li, H., Yang, Z.: Rapid computation of i-vector. In: Odyssey: The Speaker and Language Recognition Workshop, pp. 47–52 (2016)
Google Scholar
Yamada, T., Wang, L., Kai, A.: Improvement of distant-talking speaker identification using bottleneck features of DNN. In: Interspeech, pp. 3661–3664 (2013)
Google Scholar
Zhao, X., Wang, D.: Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7204–7208. IEEE (2013)
Google Scholar
Zhao, X., Wang, Y., Wang, D.: Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(4), 836–845 (2014)
Article Google Scholar

Download references

Acknowledgements

This project is partially supported by the CMATER laboratory of the Computer Science and Engineering Department, Jadavpur University, India, TEQIP-II, PURSE-II and UPE-II projects of Govt. of India.

Author information

Authors and Affiliations

Jadavpur University, Kolkata, 700032, India
Bidhan Barai, Debayan Das, Nibaran Das, Subhadip Basu & Mita Nasipuri

Authors

Bidhan Barai
View author publications
You can also search for this author in PubMed Google Scholar
Debayan Das
View author publications
You can also search for this author in PubMed Google Scholar
Nibaran Das
View author publications
You can also search for this author in PubMed Google Scholar
Subhadip Basu
View author publications
You can also search for this author in PubMed Google Scholar
Mita Nasipuri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bidhan Barai .

Editor information

Editors and Affiliations

A.K. Choudhury School of Information Technology, University of Calcutta, Kolkata, West Bengal, India
Rituparna Chaki
Dipartimento di Scienze Ambientali, Informatica e Statistica, Università Ca’ Foscari, Mestre, Venice, Venezia, Italy
Agostino Cortesi
Faculty of Computer Science, Bialystok University of Technology, Bialystok, Poland
Khalid Saeed
Department of Computer Science and Engineering, University of Calcutta, Kolkata, West Bengal, India
Nabendu Chaki

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M. (2019). VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency. In: Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, vol 883. Springer, Singapore. https://doi.org/10.1007/978-981-13-3702-4_8

Download citation

DOI: https://doi.org/10.1007/978-981-13-3702-4_8
Published: 17 January 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3701-7
Online ISBN: 978-981-13-3702-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics