Speaker Invariant and Noise Robust Speech Recognition Using Enhanced Auditory and VTL Based Features

Umarani, S. D.; Wahidabanu, R. S. D.; Raviram, P.

doi:10.1007/978-3-642-35314-7_10

S. D. Umarani⁴,
R. S. D. Wahidabanu⁴ &
P. Raviram⁵

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 199))

2329 Accesses

Abstract

This paper focuses on design and implementation of a noise-resilient and speaker independent speech recognition system for isolated word recognition. In this work auditory transform (AT) based features called as Cochlear Filter Cepstral Coefficients (CFCCs) has been used for feature extraction and its robustness against noise and variation in vocal track length (VTL) performance has been enhanced by the application of wavelet based denoising algorithm and invariant-integration method respectively. The resultant features are called as enhanced CFCC Invariant-Integration Features (ECFCCIIFs). To accomplish the objective of this paper, feature-finding neural network (FFNN) is used as classifier for the recognition of isolated words. Results are compared with the results obtained by the standard CFCC features and it is observed that, at both matching and mismatching conditions the ECFCCIIFs features remains high recognition rate under low Signal-to-noise ratios (SNRs) and their performance are more effective under high SNRs too.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Acero, A., Stern, R.M.: Environmental robustness in automatic speech recognition. In: International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1990), vol. 2, pp. 849–852. IEEE Press, Albuquerque (1990)
Chapter Google Scholar
Li, Q.: An auditory-based transform for audio signal processing. In: Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, New York (2009)
Google Scholar
Zhang, J., Li, G.-L., Zheng, Y.-Z., Liu, X.-Y.: A Novel Noise-robust Speech Recognition System Based on Adaptively Enhanced Bark Wavelet MFCC. In: Sixth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2009), Tianjin, pp. 443–447 (2009)
Google Scholar
Muller, F., Mertins, A.: Invariant-integration method for robust feature extraction in speaker-independent speech recognition. In: Int. Conf. Spoken Language Processing (Interspeech 2009-ICSLP), Brighton, pp. 2975–2978 (2009)
Google Scholar
Muller, F., Mertins, A.: On Using the Auditory Image Model and Invariant-Integration for Noise Robust Automatic Speech Recognition. In: Proc. Int. Conf. Audio, Speech, and Signal Processing, Kyoto, Japan, pp. 4905–4908 (2012)
Google Scholar
Gramss, T., Strube, H.W.: Recognition of isolated words based on psychoacoustics and neurobiology. Speech Communication 9, 35–40 (1990)
Article Google Scholar
Cooke, M., Lee, T.-W.: Speech separation challenge, http://www.interspeech2006.org

Download references

Author information

Authors and Affiliations

Government College of Engineering, Salem, 636011, India
S. D. Umarani & R. S. D. Wahidabanu
Department of CSE, Mahendra Engineering College, Tiruchengode, 637503, India
P. Raviram

Authors

S. D. Umarani
View author publications
You can also search for this author in PubMed Google Scholar
R. S. D. Wahidabanu
View author publications
You can also search for this author in PubMed Google Scholar
P. Raviram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. D. Umarani .

Editor information

Editors and Affiliations

Dept of Computer Science Engineering, Anil Neerukonda Institute of Technology and Sciences, Vishakapatnam, India
Suresh Chandra Satapathy
AI Lab, University of Hyderabad, Hyderabad, India
Siba K. Udgata
Bhubaneswar Engineering College, Bhubaneswar, India
Bhabendra Narayan Biswal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Umarani, S.D., Wahidabanu, R.S.D., Raviram, P. (2013). Speaker Invariant and Noise Robust Speech Recognition Using Enhanced Auditory and VTL Based Features. In: Satapathy, S., Udgata, S., Biswal, B. (eds) Proceedings of the International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA). Advances in Intelligent Systems and Computing, vol 199. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35314-7_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-35314-7_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35313-0
Online ISBN: 978-3-642-35314-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics