International Journal of Speech Technology

, Volume 21, Issue 2, pp 185–192 | Cite as

Robust noise MKMFCC–SVM automatic speaker identification

  • Osama S. Faragallah


This paper proposes robust noise automatic speaker identification (ASI) scheme named MKMFCC–SVM. It based on the Multiple Kernel Weighted Mel Frequency Cepstral Coefficient (MKMFCC) and support vector machine (SVM). Firstly, the MKMFCC is employed for extracting features from degraded audio and it uses multiple kernels such as the exponential and tangential and for MFCC’s weighting. Secondly, the extracted features are then categorized with the SVM classification technique. A comparative study is performed between the proposed MKMFCC–SVM and the MFCC–SVM ASI schemes using the MKMFCC and MFCCs with five schemes for extracting features from telephone-analogous and noisy-like degraded audio signals. Experimental tests prove that the proposed MKMFCC–SVM ASI scheme yields higher identification rate in noise presence or degradation.


Automatic speaker identification (ASI) MKMFCCs SVM 


  1. Boujelbene, S. Z., Mezghani, D. B. A., & Ellouze, N. (2010). Improving SVM by modifying kernel functions for speaker identification task. International Journal of Digital Content Technology and its Applications, 4(6, 100–105.Google Scholar
  2. Campbell, W. M., Campbell, J. P., Gleason, T. P., Reynolds, D. A., & Shen, W. (2007). Speaker verification using support vector machines and high-level features. IEEE Transactions on Audio, Speech and Language Processing, 15(7), 2085–2094.CrossRefGoogle Scholar
  3. Dharanipragada, S., Yapanel, U. H., & Rao, B. D. (2007) Robust feature extraction for continuous speech recognition using the MVDR spectrum estimation method. IEEE Transactions on Audio, Speech, and Language Processing, 15(1), 224–234.CrossRefGoogle Scholar
  4. Ding, I.-J., & Yen, C.-T. (2015) Enhancing GMM speaker identification by incorporating SVM speaker verification for intelligent web-based speech applications. Multimedia Tools and Applications, 74, 5131–5140.CrossRefGoogle Scholar
  5. Furui, S. (1981) Cepstral Analysis Technique for Automatic Speaker Verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 20(2), 254–272.CrossRefGoogle Scholar
  6. Galushkin, A. I. (2007). Neural networks theory. Berlin: Springer.zbMATHGoogle Scholar
  7. Gandhiraj, R., Sathidevi, P. S. (2007). Auditory-based wavelet packet filter bank for speech recognition using neural network. In Proceedings of the 15th International Conference on Advanced Computing and Communications, pp. 666–671.Google Scholar
  8. Hayati, M., shirvany, Y. (2007). Artificial neural network approach for short term load forecasting for Illam region. Proceeding of World Academy of Science, Engineering and Technology, 22. ISSN 1307–6884.Google Scholar
  9. Hossain, M., Ahmed, B., Asrafi, M. (2007). A real time speaker identification using artificial neural network. In 10th International Conference on Computer and Information Technology, pp. 1–5.Google Scholar
  10. Huang, C., Song, B., & Zhao, L. (2016). Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering. International Journal of Speech Technology, 19, 805–816.CrossRefGoogle Scholar
  11. Li, Z., & Gao, Y. (2016). Acoustic feature extraction method for robust speaker identification. Multimedia Tools and Applications, 75, 7391–7406.CrossRefGoogle Scholar
  12. Mellahi, T., & Hamdi, R. (2015). LPC-based formant enhancement method in Kalman filtering for speech enhancement. International Journal of Electronics and Communications, 69(2), 545–554.CrossRefGoogle Scholar
  13. Naeeni, B. H., Amindavar, H., & Bakhshi, H. (2010). Blind per tone equalization of multilevel signals using support vector machines for OFDM in wireless communication. International Journal of Electronics and Communications, 64(2), 186–190.CrossRefGoogle Scholar
  14. Polur, P. D., & Miller, G. E. (2005). Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden Markov model. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(4), 558–561.CrossRefGoogle Scholar
  15. Qian, F., Hu, G., & Yao, X. (2008). Semi-supervised internet network traffic classification using a Gaussian mixture model. International Journal of Electronics and Communications, 62(7), 557–564.CrossRefGoogle Scholar
  16. Ramaiah, V. S., & Rao, R. R. (2016). Speaker diarization system using MKMFCC parameterization and WLI-fuzzy clustering. International Journal of Speech Technology, 19, 945–963.CrossRefGoogle Scholar
  17. Selva Nidhyananthan, S., Shantha Selva Kumari, R., & Senthur Selvi, T. (2016). Noise robust speaker identification using RASTA-MFCC Feature with quadrilateral filter bank structure. Wireless Personal Communications, 91, 1321–1333.CrossRefGoogle Scholar
  18. Shuling, L., & Wang C. (2009). Nonspecific speech recognition method based on composite LVQ1 and LVQ2 network. In Chinese Control and Decision Conference (CCDC), pp. 2304–2388.Google Scholar
  19. Xu, L., & Yang, Z. (2016). Speaker identification based on state space model. International Journal of Speech Technology, 19, 404–414.CrossRefGoogle Scholar
  20. You, C. H., Lee, K. A., & Li, H. (2010). GMM-SVM kernel with a Bhattacharyya-based distance for speaker recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1300–1312.CrossRefGoogle Scholar
  21. Zergat, K. Y., & Amrouche, A. (2014). New scheme based on GMM-PCA-SVM modeling for automatic speaker recognition. International Journal of Speech Technology, 17, 373–381.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science and Engineering, Faculty of Electronic EngineeringMenoufia UniversityMenoufEgypt
  2. 2.Department of Information Technology, College of Computers and Information TechnologyTaif UniversityAl-HawiyaKingdom of Saudi Arabia

Personalised recommendations