Skip to main content
Log in

Robust speech recognition method based on discriminative environment feature extraction

  • Correspondence
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

It is an effective approach to learn the influence of environmental parameters, such as additive noise and channel distortions, from training data for robust speech recognition. Most of the previous methods are based on maximum likelihood estimation criterion. However, these methods do not lead to a minimum error rate result. In this paper, a novel discriminative learning method of environmental parameters, which is based on Minimum Classification Error (MCE) criterion, is proposed. In the method, a simple classifier and the Generalized Probabilistic Descent (GPD) algorithm are adopted to iteratively learn the envirnnmental parameters. Consequently, the clean speech features are estimated from the noisy speech features with the estimated environmental parameters, and then the estimations of clean speech features are utilized in the back-end HMM classifier. Experiments show that the best error rate reduction of 32.1% is obtained, tested on a task of 18 isolated confusion Korean words, relative to aconventional HMM system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Acero A, Stern R. Environmental Robustness in Automatic Speech Recognition. InProc. IEEE 1990 Int. Conf. Acoust., Speech, Signal Processing (ICASSP’90), 1990, pp.849–852.

  2. Pedro J. Speech recognition in noisy environments [dissertation]. Department of Electrical and Computer Engineering, Carnegie Mellon University, April, 1996.

  3. Gales M, Young S. Robust speech recognition in additive and convolutional noise using parallel model combination.Computer Speech and Language, 1995, 9: 289–307.

    Article  Google Scholar 

  4. Juang B, Katagiri S. Discriminative learning for minimum error, classification.IEEE Trans. Signal Processing, 1992, 40(12): 3043–3054.

    Article  MATH  Google Scholar 

  5. Juang B, Chou W, Lee C. Minimum classification error rate methods for speech recognition.IEEE Trans. Speech and Audio Processing, 1997, 5(3): 257–265.

    Article  Google Scholar 

  6. Biem A, Katagiri S. Feature extraction based on minimum classification error/generalized probabilistic descent method. InProc. IEEE 1993 Int. Conf. Acoust., Signal Processing (ICASSP’93), 1993, pp. II275–II278

  7. Biem A, Katagiri S. Filter Bank Design Based on Discriminative Feature Extraction. InProc. IEEE 1994 Int. Conf. Acoust., Speech, Signal Processing (ICASSP’94), 1994, pp. I485–I488.

  8. Rathinavelu C, Deng L. HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features.IEEE Trans. Speech and Audio Processing, 1997, 5(3): 243–256.

    Article  Google Scholar 

  9. Furui S. Cepstral analysis technique for automatic speaker verification.IEEE Trans. Acoust., Speech, Signal Processing, 1981, 29(4): 254–272.

    Article  Google Scholar 

  10. Linde Y, Buzo A, Gray R. An algorithm for vector quantizer design.IEEE Trans. Communication, 1980, 28: 84–95.

    Article  Google Scholar 

  11. Han J, Han M, Park Get al. Relative mel-frequency cepstral coefficients compensation for robust telephone speech recognition. InProc. 1997 Europ. Conf. Speech Communication and Technology (Eurospeech’97), 1997, pp. 1531–1534.

  12. Han J, Han M, Park Get al. Discriminative learning of additive noise and channel distortions for robust speech recognition. InProc. IEEE 1998 Int. Conf. Acoust., Speech, Signal Processing (ICASSP’98), 1998, pp. 81–84.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Han Jiqing.

Additional information

This work is supported partially by the ‘863’ High-Tech Programme of China (No. 863-306ZT03-02-3) and partially by the National Natural Science Foundation of China (No. 60085001).

HAN Jiqing received the B.S. degree and M.S. degree in electrical engineering from Harbin Institute of Technology (HIT), Harbin, China in 1987 and 1990, respectively, and the Ph.D. degree in computer science and engineering from HIT in 1998. From 1996 to 1998 he worked in Systems Engineering Research Institute, Korean Institute of Science and Technology, Korea, as a visiting scientist. He is an associate professor in Department of Computer Science and Engineering, HIT, and deputy director of teaching and research section. His research interests include robust speech recognition, signal processing.

GAO Wen received his first Ph.D. degree in computer science and engineering from Harbin Institute of Technology (HIT), China in 1988, and the second Ph.D. degree in electrical engineering from University of Tokyo, Japan in 1990. He is a professor of Institute of Computing Technology, CAS and Vice President of the Graduate School, CAS. His research interests include image processing, computer vision and human computer interaction.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Han, J., Gao, W. Robust speech recognition method based on discriminative environment feature extraction. J. Comput. Sci. & Technol. 16, 458–464 (2001). https://doi.org/10.1007/BF02948964

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02948964

Keywords

Navigation