Multi-feature Fusion for Closed Set Text Independent Speaker Identification

  • Gyanendra K. Verma
Part of the Communications in Computer and Information Science book series (CCIS, volume 141)


An intra-modal fusion, a fusion of different features of the same modal is proposed for speaker identification system. Two fusion methods at feature level and at decision level for multiple features are proposed in this study. We used multiple features from MFCC and wavelet transform of speech signal. Wavelet transform based features capture frequency variation across time while MFCC features mainly approximate the base frequency information, and both are important. A final score is calculated using weighted sum rule by taking matching results of different features. We evaluate the proposed fusion strategies on VoxForge speech dataset using K-Nearest Neighbor classifier. We got the promising result with multiple features in compare to separate one. Further, multi-features also performed well at different SNRs on NOIZEUS, a noisy speech corpus.


Multi-feature fusion intra-modal fusion speaker identification MFCC wavelet transform K-Nearest Neighbor (KNN) 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Multimodel Data Fusion,
  2. 2.
    Marcel, S., Bengio, S.: Improving face verification using skin color information. In: 16th International Conference on Pattern Recognition, pp. 378–381 (2002)Google Scholar
  3. 3.
    Czyz, J., Kittler, J., Vandendorpe, L.: Multiple classifier combination for face-based identity verification. Pattern Recognition 37(7), 1459–1469 (2004)CrossRefGoogle Scholar
  4. 4.
    Wang, Y., Tan, T., Jain, A.K.: Combining face and iris biometrics for identity verification. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688. Springer, Heidelberg (2003)Google Scholar
  5. 5.
    Hong, L., Jain, A.K., Pankanti, S.: Can multi-biometrics improve performance? In: Technical Report MSU-CSE-99-39, Department of Computer Science, Michigan State University, East Lansing, Michigan (1999)Google Scholar
  6. 6.
    An Introduction to Data Fusion, Royal Military Academy,
  7. 7.
    Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker identification by combining MFCC and phase information in noisy environments. In: 35th International Conference on Acoustics, Speech, and Signal Processing, Dallas, Texas, U.S.A. (2010)Google Scholar
  8. 8.
    Patel, I., Srinivas Rao, Y.: A Frequency Spectral Feature Modeling for Hidden Markov Model Based Automated Speech Recognition. In: Meghanathan, N., Boumerdassi, S., Chaki, N., Nagamalai, D. (eds.) NeCoM 2010. CCIS, vol. 90, pp. 134–143. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Dutta, T.: Dynamic time warping based approach to text dependent speaker identification using spectrograms. Congress on Image and Signal Processing 2, 354–360 (2008)CrossRefGoogle Scholar
  10. 10.
    Tzanetakis, G., Essl, G., Cook, P.: Audio analysis using the discrete wavelet transform. In: The Proceedings of Conference in Acoustics and Music Theory Applications, Skiathos, Greece (2001)Google Scholar
  11. 11.
    Liu, Y., Shengjun, L., Dongsheng, Z.: Feature Extraction and classification of lung sounds based on wavelet coefficients. In: Proceeding of the 6th International Progress, Wavelet Analysis and Active Media Technology, Chongqing, China, pp. 773–778. World Scientific, Singapore (2005)CrossRefGoogle Scholar
  12. 12.
    Toh, A.M., Togneri, R., Northolt, S.: Spectral entropy as speech features for speech recognition. In: The Proceedings of PEECS, Perth, pp. 22–25 (2005)Google Scholar
  13. 13.
    VoxForge Speech Corpus,
  14. 14.
    NOIZEUS: A Noisy Speech Corpus for Evaluation of Speech Enhancement Algorithms,

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Gyanendra K. Verma
    • 1
  1. 1.Indian Institute of Information Technology, Allahabad, JhalwaAllahabadIndia

Personalised recommendations