Skip to main content

Multi-feature Fusion for Closed Set Text Independent Speaker Identification

  • Conference paper
Information Intelligence, Systems, Technology and Management (ICISTM 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 141))

Abstract

An intra-modal fusion, a fusion of different features of the same modal is proposed for speaker identification system. Two fusion methods at feature level and at decision level for multiple features are proposed in this study. We used multiple features from MFCC and wavelet transform of speech signal. Wavelet transform based features capture frequency variation across time while MFCC features mainly approximate the base frequency information, and both are important. A final score is calculated using weighted sum rule by taking matching results of different features. We evaluate the proposed fusion strategies on VoxForge speech dataset using K-Nearest Neighbor classifier. We got the promising result with multiple features in compare to separate one. Further, multi-features also performed well at different SNRs on NOIZEUS, a noisy speech corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Multimodel Data Fusion, http://www.multitel.be/?page=data

  2. Marcel, S., Bengio, S.: Improving face verification using skin color information. In: 16th International Conference on Pattern Recognition, pp. 378–381 (2002)

    Google Scholar 

  3. Czyz, J., Kittler, J., Vandendorpe, L.: Multiple classifier combination for face-based identity verification. Pattern Recognition 37(7), 1459–1469 (2004)

    Article  Google Scholar 

  4. Wang, Y., Tan, T., Jain, A.K.: Combining face and iris biometrics for identity verification. In: Kittler, J., Nixon, M.S. (eds.) AVBPA 2003. LNCS, vol. 2688. Springer, Heidelberg (2003)

    Google Scholar 

  5. Hong, L., Jain, A.K., Pankanti, S.: Can multi-biometrics improve performance? In: Technical Report MSU-CSE-99-39, Department of Computer Science, Michigan State University, East Lansing, Michigan (1999)

    Google Scholar 

  6. An Introduction to Data Fusion, Royal Military Academy, http://www.sic.rma.ac.be/Research/Fusion/Intro/content.html

  7. Wang, L., Minami, K., Yamamoto, K., Nakagawa, S.: Speaker identification by combining MFCC and phase information in noisy environments. In: 35th International Conference on Acoustics, Speech, and Signal Processing, Dallas, Texas, U.S.A. (2010)

    Google Scholar 

  8. Patel, I., Srinivas Rao, Y.: A Frequency Spectral Feature Modeling for Hidden Markov Model Based Automated Speech Recognition. In: Meghanathan, N., Boumerdassi, S., Chaki, N., Nagamalai, D. (eds.) NeCoM 2010. CCIS, vol. 90, pp. 134–143. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Dutta, T.: Dynamic time warping based approach to text dependent speaker identification using spectrograms. Congress on Image and Signal Processing 2, 354–360 (2008)

    Article  Google Scholar 

  10. Tzanetakis, G., Essl, G., Cook, P.: Audio analysis using the discrete wavelet transform. In: The Proceedings of Conference in Acoustics and Music Theory Applications, Skiathos, Greece (2001)

    Google Scholar 

  11. Liu, Y., Shengjun, L., Dongsheng, Z.: Feature Extraction and classification of lung sounds based on wavelet coefficients. In: Proceeding of the 6th International Progress, Wavelet Analysis and Active Media Technology, Chongqing, China, pp. 773–778. World Scientific, Singapore (2005)

    Chapter  Google Scholar 

  12. Toh, A.M., Togneri, R., Northolt, S.: Spectral entropy as speech features for speech recognition. In: The Proceedings of PEECS, Perth, pp. 22–25 (2005)

    Google Scholar 

  13. VoxForge Speech Corpus, http://www.voxforge.org

  14. NOIZEUS: A Noisy Speech Corpus for Evaluation of Speech Enhancement Algorithms, http://www.utdallas.edu/~loizou/speech/noizeus/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Verma, G.K. (2011). Multi-feature Fusion for Closed Set Text Independent Speaker Identification. In: Dua, S., Sahni, S., Goyal, D.P. (eds) Information Intelligence, Systems, Technology and Management. ICISTM 2011. Communications in Computer and Information Science, vol 141. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19423-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19423-8_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19422-1

  • Online ISBN: 978-3-642-19423-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics