Abstract
In the previous chapter, we introduced a robust auditory transform (AT). In this chapter, we present an auditory-based feature extraction algorithm based on the AT and apply it to robust speaker identification. Usually, the performances of acoustic models trained in clean speech drop significantly when tested in noisy speech. The presented features, however, have shown strong robustness in this kind of situation. We present a typical text-independent speaker identification system in the experiment section. Under all three different mismatched testing conditions, with white noise, car noise, or babble noise, the auditory features consistently perform better than the baseline mel frequency cepstral coefficient (FMCC) features. The auditory features are also compared with perceptual linear predictive (PLP) and RASTA-PLP features, The features consistently perform much better than PLP. Under white noise, the FMCC features are much better than RASTA-PLP. Under car and babble noises, the performace are similar.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
http://www.dcs.shef.ac.uk/~martin/SpeechSeparationChallenge/
http://www.icsi.berkeley.edu/~dpwe/projects/sprach/sprachcore.html
Atal, B. S.: “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification”. Journal of the Acoustical Society of America 55, 1304–1312 (1974)
Davis, S. B., Mermelstein P.: “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences”. IEEE Trans. on Acoustics, Speech, and Signal Processing ASSP-28, 357–366 (1980)
Grimaldi, M., Cummins, F.: “Speaker identification using instantaneous frequencies”. IEEE Trans. on Audio, Speech, and Language Processing 16, 1097–1111 (2008)
Hermansky, H.: “Perceptual linear predictive (PLP) analysis of speech”. J. Acoust. Soc. Am. 87, 1738–1752 (1990)
Hermansky, H., Morgan, N.: “Rasta processing of speech”. IEEE Trans. Speech and Audio Proc. 2, 578–589 (1994)
Li, Q.: “An auditory-based transform for audio signal processing,” in Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (New Paltz, NY), Oct. 2009
Li, Q.: “Solution for pervasive speaker recognition,” SBIR Phase I Proposal, Submitted to NSF IT.F4, Li Creative Technologies, Inc., NJ, June 2003
Li, Q., Huang, Y.; “An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions,” IEEE Trans. on Audio, Speech and Language Processing, Sept. 2011
Li, Q., Huang, Y.: “Robust speaker identification using an auditory-based feature,” in ICASSP 2010 (2010)
Li, Q., Soong, F. K., Olivier, S.: “An auditory system-based feature for robust speech recognition,” in Proc. 7th European Conf. on Speech Communication and Technology (Denmark), pp. 619–622, Sept. 2001
Li, Q., Soong, F. K., Siohan, O.: “A high-performance auditory feature for robust speech recognition,” in Proceedings of 6th Int’l Conf. on Spoken Language Processing (Beijing), pp. III 51–54, Oct. 2000
Makhoul, J.: “Linear prediction: a tutorial review”. Proceedings of the IEEE 63, 561–580 (1975)
Moore, B. C. J., Glasberg, B. R.: “Suggested formula for calculating auditory-filter bandwidth and excitation patterns,” J. Acoust. Soc. Am. 4, 750–753 (1983)
Moore, B. C.: An introduction to the psychology of hearing. Academic Press, NY (1997)
Reynolds, D., , Rose, R. C.: “Robust text-independent speaker identification using Gaussian mixture speaker models”. IEEE Trans. on Speech and Audio Processing 3, 72–83 (1995)
Shao, Y., Wang, D.: “Robust speaker identification using auditory features and computational auditory scene analysis,” in Proceedings of IEEE ICASSP, pp. 1589–1592, 2008
Stevens, S. S.: “On the psychophysical law”. Psychol. Rev. 64, 153–181 (1957)
Stevens, S. S.: “Perceived level of noise by Mark VII and decibels (E)”. J. Acoustic. Soc. Am. 51, 575–601 (1972)
Zwicker, E., Terhardt, E.: “Analytical expressions for critical-band rate and critical bandwidth as a function of frequency”. J. Acoust. Soc. Am. 68, 1523–1525 (1980)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Li, Q.(. (2012). Auditory-Based Feature Extraction and Robust Speaker Identification. In: Speaker Authentication. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23731-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-23731-7_8
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23730-0
Online ISBN: 978-3-642-23731-7
eBook Packages: EngineeringEngineering (R0)