Skip to main content

Auditory-Based Feature Extraction and Robust Speaker Identification

  • Chapter
  • First Online:
Speaker Authentication

Part of the book series: Signals and Communication Technology ((SCT))

  • 775 Accesses

Abstract

In the previous chapter, we introduced a robust auditory transform (AT). In this chapter, we present an auditory-based feature extraction algorithm based on the AT and apply it to robust speaker identification. Usually, the performances of acoustic models trained in clean speech drop significantly when tested in noisy speech. The presented features, however, have shown strong robustness in this kind of situation. We present a typical text-independent speaker identification system in the experiment section. Under all three different mismatched testing conditions, with white noise, car noise, or babble noise, the auditory features consistently perform better than the baseline mel frequency cepstral coefficient (FMCC) features. The auditory features are also compared with perceptual linear predictive (PLP) and RASTA-PLP features, The features consistently perform much better than PLP. Under white noise, the FMCC features are much better than RASTA-PLP. Under car and babble noises, the performace are similar.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.dcs.shef.ac.uk/~martin/SpeechSeparationChallenge/

  2. http://www.icsi.berkeley.edu/~dpwe/projects/sprach/sprachcore.html

  3. Atal, B. S.: “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification”. Journal of the Acoustical Society of America 55, 1304–1312 (1974)

    Article  Google Scholar 

  4. Davis, S. B., Mermelstein P.: “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-28, 357–366 (1980)

    Article  Google Scholar 

  5. Grimaldi, M., Cummins, F.: “Speaker identification using instantaneous frequencies”. IEEE Trans. on Audio, Speech, and Language Processing 16, 1097–1111 (2008)

    Article  Google Scholar 

  6. Hermansky, H.: “Perceptual linear predictive (PLP) analysis of speech”. J. Acoust. Soc. Am. 87, 1738–1752 (1990)

    Article  Google Scholar 

  7. Hermansky, H., Morgan, N.: “Rasta processing of speech”. IEEE Trans. Speech and Audio Proc. 2, 578–589 (1994)

    Article  Google Scholar 

  8. Li, Q.: “An auditory-based transform for audio signal processing,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY), Oct. 2009

    Google Scholar 

  9. Li, Q.: “Solution for pervasive speaker recognition,” SBIR Phase I Proposal, Submitted to NSF IT.F4, Li Creative Technologies, Inc., NJ, June 2003

    Google Scholar 

  10. Li, Q., Huang, Y.; “An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions,” IEEE Trans. on Audio, Speech and Language Processing, Sept. 2011

    Google Scholar 

  11. Li, Q., Huang, Y.: “Robust speaker identification using an auditory-based feature,” in ICASSP 2010 (2010)

    Google Scholar 

  12. Li, Q., Soong, F. K., Olivier, S.: “An auditory system-based feature for robust speech recognition,” in Proc. 7th European Conf. on Speech Communication and Technology (Denmark), pp. 619–622, Sept. 2001

    Google Scholar 

  13. Li, Q., Soong, F. K., Siohan, O.: “A high-performance auditory feature for robust speech recognition,” in Proceedings of 6th Int’l Conf. on Spoken Language Processing (Beijing), pp. III 51–54, Oct. 2000

    Google Scholar 

  14. Makhoul, J.: “Linear prediction: a tutorial review”. Proceedings of the IEEE 63, 561–580 (1975)

    Article  Google Scholar 

  15. Moore, B. C. J., Glasberg, B. R.: “Suggested formula for calculating auditory-filter bandwidth and excitation patterns,” J. Acoust. Soc. Am. 4, 750–753 (1983)

    Article  Google Scholar 

  16. Moore, B. C.: An introduction to the psychology of hearing. Academic Press, NY (1997)

    Google Scholar 

  17. Reynolds, D., , Rose, R. C.: “Robust text-independent speaker identification using Gaussian mixture speaker models”. IEEE Trans. on Speech and Audio Processing 3, 72–83 (1995)

    Article  Google Scholar 

  18. Shao, Y., Wang, D.: “Robust speaker identification using auditory features and computational auditory scene analysis,” in Proceedings of IEEE ICASSP, pp. 1589–1592, 2008

    Google Scholar 

  19. Stevens, S. S.: “On the psychophysical law”. Psychol. Rev. 64, 153–181 (1957)

    Article  Google Scholar 

  20. Stevens, S. S.: “Perceived level of noise by Mark VII and decibels (E)”. J. Acoustic. Soc. Am. 51, 575–601 (1972)

    Article  Google Scholar 

  21. Zwicker, E., Terhardt, E.: “Analytical expressions for critical-band rate and critical bandwidth as a function of frequency”. J. Acoust. Soc. Am. 68, 1523–1525 (1980)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi (Peter) Li .

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Li, Q.(. (2012). Auditory-Based Feature Extraction and Robust Speaker Identification. In: Speaker Authentication. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23731-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23731-7_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23730-0

  • Online ISBN: 978-3-642-23731-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics