Closed-Set Text-Independent Automatic Speaker Recognition System Using VQ/GMM
Automatic speaker recognition (ASR) is one type of biometric recognition of human, known as voice biometric recognition. Among plenty of acoustic features, Mel-Frequency Cepstral Coefficients (MFCCs) and Gammatone Frequency Cepstral Coefficients (GFCCs) are used popularly in ASR. The state-of-the-art techniques for modeling/classification(s) are Vector Quantization (VQ), Gaussian Mixture Models (GMMs), Hidden Markov Model (HMM), Artificial Neural Network (ANN), Deep Neural Network (DNN). In this paper, we cite our experimental results upon three databases, namely Hyke-2011, ELSDSR, and IITG-MV SR Phase-I, based on MFCCs and VQ/GMM where maximum log-likelihood (MLL) scoring technique is used for the recognition of speakers and analyzed the effect of Gaussian components as well as Mel-scale filter bank’s minimum frequency. By adjusting proper Gaussian components and minimum frequency, the accuracies have been increased by 10–20% in noisy environment.
KeywordsASR Acoustic Feature MFCC GFCC VQ GMM MLL Score
This project is partially supported by the CMATER laboratory of the Computer Science and Engineering Department, Jadavpur University, India, TEQIP-II, PURSE-II, and UPE-II projects of Government of India. Subhadip Basu is partially supported by the Research Award (F.30-31/2016(SA-II)) from UGC, Government of India. Bidhan Barai is partially supported by the RGNF Research Award (F1-17.1/2014-15/RGNF-2014-15-SC-WES-67459/(SA-III)) from UGC, Government of India.
- 3.Jain, V.K., Kumar, S., Fernandes, S.L.: Extraction of emotions from multilingual text using intelligent text processing and computational linguistics. J. Comput. Sci. (2017)Google Scholar
- 4.Kanagasundaram, A., Vogt, R., Dean, D.B., Sridharan, S., Mason, M.W.: I-vector based speaker recognition on short utterances. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association, pp. 2341–2344. International Speech Communication Association (ISCA) (2011)Google Scholar
- 5.Madikeri, S.R., Murthy, H.A.: Mel filter bank energy-based slope feature and its application to speaker recognition. In: Communications (NCC), 2011 National Conference on, pp. 1–4. IEEE (2011)Google Scholar
- 12.Sapijaszko, G.I., Mikhael, W.B.: An overview of recent window based feature extraction algorithms for speaker recognition. In: Circuits and Systems (MWSCAS), 2012 IEEE 55th International Midwest Symposium on, pp. 880–883. IEEE (2012)Google Scholar
- 14.Zhao, X., Wang, D.: Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 7204–7208. IEEE (2013)Google Scholar