Abstract
The biometric recognition of human through the speech signal is known as automatic speaker recognition (ASR) or voice biometric recognition. Plenty of acoustic features have been used in ASR so far, but among them Mel-frequency cepstral coefficients (MFCCs) and Gammatone frequency cepstral coefficients (GFCCs) are popularly used. To make ASR language and channel independent (if training and testing microphones and languages are not same), i-Vector feature and unwanted variability compensation techniques like linear discriminative analysis (LDA) or probabilistic LDA (PLDA), within-class covariance normalization (WCCN) are extensively used. At the very present days, the techniques for modeling/classification that are used are Gaussian mixture models (GMMs), vector quantization (VQ), hidden Markov model (HMM), deep neural network (DNN), and artificial neural network (ANN). Sometimes, model-domain normalization techniques are used to compensate unwanted variability due to language and channel mismatch in training and testing data. In the present paper, we have used maximum log-likelihood (MLL) to evaluate the performance of ASR on the databases(DBs), namely ELSDSR, Hyke-2011, and IITG-MV SR Phase-I & II, based on MFCCs and VQ/GMM where the scoring technique MLL is used for the recognition of speakers. The experiment is carried out to examine the language dependency and environmental mismatch between training and testing data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M.: An ASR system using MFCC and VQ/GMM with emphasis on environmental dependency. In: 2017 IEEE Calcutta Conference (CALCON), pp. 362–366, Dec 2017
Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M.: Closed-set text-independent automatic speaker recognition system using VQ/GMM. In: Intelligent Engineering Informatics, pp. 337–346. Springer Singapore, Singapore (2018)
Bie, F., Wang, D., Wang, J., Zheng, T.F.: Detection and reconstruction of clipped speech for speaker recognition. Speech Commun. 72, 218–231 (2015)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)
Dişken, G., Tüfekçi, Z., Saribulut, L., Çevik, U.: A review on feature extraction for speaker recognition under degraded conditions. IETE Tech. Rev. 34(3), 321–332 (2017)
Fant, G.: Acoustic Theory of Speech Production: With Calculations Based on X-Ray Studies of Russian Articulations, p. 2. Walter de Gruyter (1971)
Feng, L., Hansen, L.K.: A new database for speaker recognition. Technical report (2005)
Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. Interspeech 2011, 249–252 (2011)
Ghahabi, O., Hernando, J.: Restricted Boltzmann machines for vector representation of speech in speaker recognition. Comput. Speech Lang. 47, 16–29 (2018)
Haris, B.C., Pradhan, G., Misra, A., Prasanna, S., Das, R.K., Sinha, R.: Multivariability speaker recognition database in Indian scenario. Int. J. Speech Technol. 15(4), 441–453 (2012)
Hirszhorn, A., Dov, D., Talmon, R., Cohen, I.: Transient interference suppression in speech signals based on the OM-LSA algorithm. In: International Workshop on Acoustic Signal Enhancement; Proceedings of IWAENC 2012, pp. 1–4. VDE (2012)
Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-vector based speaker recognition on short utterances. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association, pp. 2341–2344. International Speech Communication Association (ISCA) (2011)
Kanrar, S.: i vector used in speaker identification by dimension compactness. arXiv:1704.03934 (2017)
Kheder, W.B., Matrouf, D., Bousquet, P.M., Bonastre, J.F., Ajili, M.: Fast i-vector denoising using map estimation and a noise distributions database for robust speaker recognition. Comput. Speech Lang. 45, 104–122 (2017)
Madikeri, S.R., Murthy, H.A.: Mel filter bank energy-based slope feature and its application to speaker recognition. In: 2011 National Conference on Communications (NCC), pp. 1–4. IEEE (2011)
Murthy, H.A., Yegnanarayana, B.: Group delay functions and its applications in speech technology. Sadhana 36(5), 745–782 (2011)
Nakagawa, S., Wang, L., Ohtsuka, S.: Speaker identification and verification by combining MFCC and phase information. IEEE Trans. Audio Speech Lang. Process. 20(4), 1085–1095 (2012)
Pal, S.K., Mitra, P.: Pattern Recognition Algorithms for Data Mining. CRC Press (2004)
Paulose, S., Mathew, D., Thomas, A.: Performance evaluation of different modeling methods and classifiers with MFCC and IHC features for speaker recognition. Procedia Comput. Sci. 115, 55–62 (2017)
Pruzansky, S.: Pattern-matching procedure for automatic talker recognition. J. Acoust. Soc. Am. 35(3), 354–358 (1963)
Reda, A., Panjwani, S., Cutrell, E.: Hyke: a low-cost remote attendance tracking system for developing regions. In: Proceedings of the 5th ACM Workshop on Networked Systems for Developing Regions, pp. 15–20. ACM (2011)
Reynolds, D.A., Rose, R.C.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. Speech Audio Process. 3(1), 72–83 (1995)
Sapijaszko, G.I., Mikhael, W.B.: An overview of recent window based feature extraction algorithms for speaker recognition. In: 2012 IEEE 55th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 880–883. IEEE (2012)
Soong, F.K., Rosenberg, A.E., Juang, B.H., Rabiner, L.R.: Report: a vector quantization approach to speaker recognition. AT&T Tech. J. 66(2), 14–26 (1987)
Xu, L., Lee, K.A., Li, H., Yang, Z.: Rapid computation of i-vector. In: Odyssey: The Speaker and Language Recognition Workshop, pp. 47–52 (2016)
Yamada, T., Wang, L., Kai, A.: Improvement of distant-talking speaker identification using bottleneck features of DNN. In: Interspeech, pp. 3661–3664 (2013)
Zhao, X., Wang, D.: Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7204–7208. IEEE (2013)
Zhao, X., Wang, Y., Wang, D.: Robust speaker identification in noisy and reverberant conditions. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 22(4), 836–845 (2014)
Acknowledgements
This project is partially supported by the CMATER laboratory of the Computer Science and Engineering Department, Jadavpur University, India, TEQIP-II, PURSE-II and UPE-II projects of Govt. of India.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Barai, B., Das, D., Das, N., Basu, S., Nasipuri, M. (2019). VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency. In: Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, vol 883. Springer, Singapore. https://doi.org/10.1007/978-981-13-3702-4_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-3702-4_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3701-7
Online ISBN: 978-981-13-3702-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)