Robust speech recognition method based on discriminative environment feature extraction

Han, Jiqing; Gao, Wen

doi:10.1007/BF02948964

Robust speech recognition method based on discriminative environment feature extraction

Correspondence
Published: September 2001

Volume 16, pages 458–464, (2001)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Han Jiqing¹ &
Gao Wen²

45 Accesses
2 Citations
Explore all metrics

Abstract

It is an effective approach to learn the influence of environmental parameters, such as additive noise and channel distortions, from training data for robust speech recognition. Most of the previous methods are based on maximum likelihood estimation criterion. However, these methods do not lead to a minimum error rate result. In this paper, a novel discriminative learning method of environmental parameters, which is based on Minimum Classification Error (MCE) criterion, is proposed. In the method, a simple classifier and the Generalized Probabilistic Descent (GPD) algorithm are adopted to iteratively learn the envirnnmental parameters. Consequently, the clean speech features are estimated from the noisy speech features with the estimated environmental parameters, and then the estimations of clean speech features are utilized in the back-end HMM classifier. Experiments show that the best error rate reduction of 32.1% is obtained, tested on a task of 18 isolated confusion Korean words, relative to aconventional HMM system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

A Strategic Approach for Robust Dysarthric Speech Recognition

Article 01 February 2024

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

Article 22 April 2024

References

Acero A, Stern R. Environmental Robustness in Automatic Speech Recognition. InProc. IEEE 1990 Int. Conf. Acoust., Speech, Signal Processing (ICASSP’90), 1990, pp.849–852.
Pedro J. Speech recognition in noisy environments [dissertation]. Department of Electrical and Computer Engineering, Carnegie Mellon University, April, 1996.
Gales M, Young S. Robust speech recognition in additive and convolutional noise using parallel model combination.Computer Speech and Language, 1995, 9: 289–307.
Article Google Scholar
Juang B, Katagiri S. Discriminative learning for minimum error, classification.IEEE Trans. Signal Processing, 1992, 40(12): 3043–3054.
Article MATH Google Scholar
Juang B, Chou W, Lee C. Minimum classification error rate methods for speech recognition.IEEE Trans. Speech and Audio Processing, 1997, 5(3): 257–265.
Article Google Scholar
Biem A, Katagiri S. Feature extraction based on minimum classification error/generalized probabilistic descent method. InProc. IEEE 1993 Int. Conf. Acoust., Signal Processing (ICASSP’93), 1993, pp. II275–II278
Biem A, Katagiri S. Filter Bank Design Based on Discriminative Feature Extraction. InProc. IEEE 1994 Int. Conf. Acoust., Speech, Signal Processing (ICASSP’94), 1994, pp. I485–I488.
Rathinavelu C, Deng L. HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features.IEEE Trans. Speech and Audio Processing, 1997, 5(3): 243–256.
Article Google Scholar
Furui S. Cepstral analysis technique for automatic speaker verification.IEEE Trans. Acoust., Speech, Signal Processing, 1981, 29(4): 254–272.
Article Google Scholar
Linde Y, Buzo A, Gray R. An algorithm for vector quantizer design.IEEE Trans. Communication, 1980, 28: 84–95.
Article Google Scholar
Han J, Han M, Park Get al. Relative mel-frequency cepstral coefficients compensation for robust telephone speech recognition. InProc. 1997 Europ. Conf. Speech Communication and Technology (Eurospeech’97), 1997, pp. 1531–1534.
Han J, Han M, Park Get al. Discriminative learning of additive noise and channel distortions for robust speech recognition. InProc. IEEE 1998 Int. Conf. Acoust., Speech, Signal Processing (ICASSP’98), 1998, pp. 81–84.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Harbin Institute of Technology, 150001, Harbin, P.R. China
Han Jiqing
Institute of Computing Technology, The Chinese Academy of Sciences, 100080, Beijing, P.R. China
Gao Wen

Authors

Han Jiqing
View author publications
You can also search for this author in PubMed Google Scholar
Gao Wen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Han Jiqing.

Additional information

This work is supported partially by the ‘863’ High-Tech Programme of China (No. 863-306ZT03-02-3) and partially by the National Natural Science Foundation of China (No. 60085001).

HAN Jiqing received the B.S. degree and M.S. degree in electrical engineering from Harbin Institute of Technology (HIT), Harbin, China in 1987 and 1990, respectively, and the Ph.D. degree in computer science and engineering from HIT in 1998. From 1996 to 1998 he worked in Systems Engineering Research Institute, Korean Institute of Science and Technology, Korea, as a visiting scientist. He is an associate professor in Department of Computer Science and Engineering, HIT, and deputy director of teaching and research section. His research interests include robust speech recognition, signal processing.

GAO Wen received his first Ph.D. degree in computer science and engineering from Harbin Institute of Technology (HIT), China in 1988, and the second Ph.D. degree in electrical engineering from University of Tokyo, Japan in 1990. He is a professor of Institute of Computing Technology, CAS and Vice President of the Graduate School, CAS. His research interests include image processing, computer vision and human computer interaction.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, J., Gao, W. Robust speech recognition method based on discriminative environment feature extraction. J. Comput. Sci. & Technol. 16, 458–464 (2001). https://doi.org/10.1007/BF02948964

Download citation

Received: 23 December 1999
Revised: 27 October 2000
Issue Date: September 2001
DOI: https://doi.org/10.1007/BF02948964

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust speech recognition method based on discriminative environment feature extraction

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

A Strategic Approach for Robust Dysarthric Speech Recognition

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust speech recognition method based on discriminative environment feature extraction

Abstract

Access this article

Similar content being viewed by others

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

A Strategic Approach for Robust Dysarthric Speech Recognition

Defending Adversarial Attacks Against ASV Systems Using Spectral Masking

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation