Abstract
It is an effective approach to learn the influence of environmental parameters, such as additive noise and channel distortions, from training data for robust speech recognition. Most of the previous methods are based on maximum likelihood estimation criterion. However, these methods do not lead to a minimum error rate result. In this paper, a novel discriminative learning method of environmental parameters, which is based on Minimum Classification Error (MCE) criterion, is proposed. In the method, a simple classifier and the Generalized Probabilistic Descent (GPD) algorithm are adopted to iteratively learn the envirnnmental parameters. Consequently, the clean speech features are estimated from the noisy speech features with the estimated environmental parameters, and then the estimations of clean speech features are utilized in the back-end HMM classifier. Experiments show that the best error rate reduction of 32.1% is obtained, tested on a task of 18 isolated confusion Korean words, relative to aconventional HMM system.
Similar content being viewed by others
References
Acero A, Stern R. Environmental Robustness in Automatic Speech Recognition. InProc. IEEE 1990 Int. Conf. Acoust., Speech, Signal Processing (ICASSP’90), 1990, pp.849–852.
Pedro J. Speech recognition in noisy environments [dissertation]. Department of Electrical and Computer Engineering, Carnegie Mellon University, April, 1996.
Gales M, Young S. Robust speech recognition in additive and convolutional noise using parallel model combination.Computer Speech and Language, 1995, 9: 289–307.
Juang B, Katagiri S. Discriminative learning for minimum error, classification.IEEE Trans. Signal Processing, 1992, 40(12): 3043–3054.
Juang B, Chou W, Lee C. Minimum classification error rate methods for speech recognition.IEEE Trans. Speech and Audio Processing, 1997, 5(3): 257–265.
Biem A, Katagiri S. Feature extraction based on minimum classification error/generalized probabilistic descent method. InProc. IEEE 1993 Int. Conf. Acoust., Signal Processing (ICASSP’93), 1993, pp. II275–II278
Biem A, Katagiri S. Filter Bank Design Based on Discriminative Feature Extraction. InProc. IEEE 1994 Int. Conf. Acoust., Speech, Signal Processing (ICASSP’94), 1994, pp. I485–I488.
Rathinavelu C, Deng L. HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features.IEEE Trans. Speech and Audio Processing, 1997, 5(3): 243–256.
Furui S. Cepstral analysis technique for automatic speaker verification.IEEE Trans. Acoust., Speech, Signal Processing, 1981, 29(4): 254–272.
Linde Y, Buzo A, Gray R. An algorithm for vector quantizer design.IEEE Trans. Communication, 1980, 28: 84–95.
Han J, Han M, Park Get al. Relative mel-frequency cepstral coefficients compensation for robust telephone speech recognition. InProc. 1997 Europ. Conf. Speech Communication and Technology (Eurospeech’97), 1997, pp. 1531–1534.
Han J, Han M, Park Get al. Discriminative learning of additive noise and channel distortions for robust speech recognition. InProc. IEEE 1998 Int. Conf. Acoust., Speech, Signal Processing (ICASSP’98), 1998, pp. 81–84.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported partially by the ‘863’ High-Tech Programme of China (No. 863-306ZT03-02-3) and partially by the National Natural Science Foundation of China (No. 60085001).
HAN Jiqing received the B.S. degree and M.S. degree in electrical engineering from Harbin Institute of Technology (HIT), Harbin, China in 1987 and 1990, respectively, and the Ph.D. degree in computer science and engineering from HIT in 1998. From 1996 to 1998 he worked in Systems Engineering Research Institute, Korean Institute of Science and Technology, Korea, as a visiting scientist. He is an associate professor in Department of Computer Science and Engineering, HIT, and deputy director of teaching and research section. His research interests include robust speech recognition, signal processing.
GAO Wen received his first Ph.D. degree in computer science and engineering from Harbin Institute of Technology (HIT), China in 1988, and the second Ph.D. degree in electrical engineering from University of Tokyo, Japan in 1990. He is a professor of Institute of Computing Technology, CAS and Vice President of the Graduate School, CAS. His research interests include image processing, computer vision and human computer interaction.
Rights and permissions
About this article
Cite this article
Han, J., Gao, W. Robust speech recognition method based on discriminative environment feature extraction. J. Comput. Sci. & Technol. 16, 458–464 (2001). https://doi.org/10.1007/BF02948964
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02948964