Abstract
For speaker identification purposes, features are first extracted and then compared with those of the training set to find the closest match. So, finding effective and robust features for classifying speakers is beneficial to improve the overall identification performance, especially in the presence of noise. In this paper, a new method of feature extraction based on feature fusion is proposed, where Gammatone Frequency Cepstral Coefficients (GFCC) and wavelet components are extracted and fused for training and testing the Support Vector Machines (SVM) classifier. The performance of the proposed scheme is validated and compared with conventional GFCC using clean and noise corrupted signals from Voxforge database. From the experimental results, it is evident that our algorithm has a higher identification accuracy compared to baseline GFCC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Reynolds, D.A.: An overview of automatic speaker recognition technology. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, pp. IV-4072–IV-4075, May 2002
Faundez-Zanuy, M., Monte-Moreno, E.: State-of-the-art in speaker recognition. IEEE Aerosp. Electron. Syst. Mag. 20(5), 7–12 (2005)
Gish, H., Schmidt, M.: Text-independent speaker identification. IEEE Sig. Process. Mag. 11(4), 18–32 (1994)
Rao, K.S., Sarkar, S.: Robust Speaker Recognition in Noisy Environments. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07130-5
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Sig. Process. 28(4), 357–366 (1980)
Prasad, A., Periyasamy, V., Ghosh, P.K.: Estimation of the invariant and variant characteristics in speech articulation and its application to speaker identification. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4265–4269, April 2015
Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Robust speaker identification in a meeting with short audio segments. In: Czarnowski, I., Caballero, A.M., Howlett, R.J., Jain, L.C. (eds.) Intelligent Decision Technologies 2016. SIST, vol. 57, pp. 465–477. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-39627-9_41
Zhao, X., Wang, D.: Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7204–7208, May 2013
Sekkate, S., Khalil, M., Adib, A.: Speaker identification: a way to reduce call-sign confusion events. In: 2017 International Conference on Advanced Technologies for Signal & Image Processing, May 2017
Sadjadi, S., Hansen, J.: Mean hilbert envelope coefficients (MHEC) for robust speaker and language identification. Speech Commun. 72(6), 138–148 (2015)
Shao, Y., Srinivasan, S., Wang, D.: Incorporating auditory feature uncertainties in robust speaker identification. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2007, Honolulu, Hawaii, USA, 15–20 April, pp. 277–280 (2007)
Wang, J., Johnson, M.T.: Physiologically-motivated feature extraction for speaker identification. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1690–1694, May 2014
Wan, V., Campbell, W.M.: Support vector machines for speaker verification and identification. In: Proceedings of the 2000 IEEE Signal Processing Society Workshop on Neural Networks for Signal Processing X (Cat. No. 00TH8501), vol. 2, pp. 775–784 (2000)
Markov, K., Nakagawa, S.: Integrating pitch and LPC-residual information with LPC-cepstrum for text-independent speaker recognition. J. Acoust. Soc. Jpn. 20(01), 281–291 (1999)
Nakagawa, S., Wang, L., Ohtsuka, S.: Speaker identification and verification by combining MFCC and phase information. IEEE Trans. Audio Speech Lang. Process. 20(4), 1085–1095 (2012)
Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker identification by combining MFCC and phase information in noisy environments. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4502–4505, March 2010
Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T., Shikano, K., Itahashi, S.: JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn. (E) 20(3), 199–206 (1999)
Sarangi, S.K., Saha, G.: A novel approach in feature level for robust text-independent speaker identification system. In: 2012 4th International Conference on Intelligent Human Computer Interaction (IHCI), pp. 1–5, Dec 2012
Sadjadi, S.O., Hansen, J.H.L.: Robust front-end processing for speaker identification over extremely degraded communication channels. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 7214–7218, May 2013
Verma, G.K.: Multi-feature fusion for closed set text independent speaker identification. In: Dua, S., Sahni, S., Goyal, D.P. (eds.) ICISTM 2011. CCIS, vol. 141, pp. 170–179. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19423-8_18
Kawakami, Y., Wang, L., Kai, A., Nakagawa, S.: Speaker identification by combining various vocal tract and vocal source features. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS (LNAI), vol. 8655, pp. 382–389. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10816-2_46
Holschneider, M., Kronland-Martinet, R., Morlet, J., Tchamitchian, P.: A real-time algorithm for signal analysis with the help of the wavelet transform. In: Combes, J.M., Grossmann, A., Tchamitchian, P. (eds.) Wavelets, pp. 289–297. Springer, Heidelberg (1990). https://doi.org/10.1007/978-3-642-75988-8_28
Walker, J.S.: A Primer on Wavelets and Their Scientific Applications. CRC Press, Boca Raton (2008)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT 1992, New York. ACM, pp. 144–152 (1992)
Vapnik, V.N.: Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York (1998)
Kressel, U.H.G.: Advances in Kernel Methods, pp. 255–268. MIT Press, Cambridge (1999)
Yuan, G.X., Ho, C.H., Lin, C.J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)
Voxforge database. Technical report
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Sekkate, S., Khalil, M., Adib, A. (2018). A Feature Level Fusion Scheme for Robust Speaker Identification. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-96292-4_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96291-7
Online ISBN: 978-3-319-96292-4
eBook Packages: Computer ScienceComputer Science (R0)