Abstract
Robustness is a very important issue in the field of automatic speech recognition (ASR) research, especially to provide high recognition accuracy in practical applications. However, the recognition rate based on the traditional whole frequency band HMM will decrease when only partial frequency bands are corrupted by noise. In order to solve this problem, a speech enhancement algorithm based on hybrid parallel subbands HMM and neural network model was proposed. The whole frequency band HMM was split into a few subbands HMM and extract some new feature parameters according to all subbands HMM outputs, and then merged them by BP neural network to yield a global recognition decision. The experimental results show that the hybrid parallel subbands HMM and neural network (PSHMM/NN) model can improve the robustness of speech recognition system in noisy environments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
San-Segundo R, Martínez-Hinarejos CD, Ortega A (2012) Review of research on speech technology: main contributions from Spanish research groups. J Speech Sci 1(1):31–53
Ishizuka K, Miyazaki N (2004) Speech feature extraction method representing periodicity and aperiodicity in sub bands for robust speech recognition. Acoust Speech Sig Processing, 2004. Proceedings of (ICASSP’04). IEEE international conference on, 2004, I- 141-4 vol. 1
Hu Y, Loizou PC (2007) A comparative intelligibility study of single-microphone noise reduction algorithms. J Acoust Soc Am 122:1777
Fan H, Tsai Y, Hung J (2012) Enhancing the sub-band modulation spectra of speech features via nonnegative matrix factorization for robust speech recognition. System science and engineering (ICSSE), 2012 international conference on IEEE, pp 179–182
Bourlard H, Dupont S (1996) A new ASR approach based on independent processing and recombination of partial frequency bands. Spoken language, 1996, vol 1. Proceedings of fourth International Conference on IEEE, 1996, ICSLP 96, pp 426–429
Hennansky H, Tibrewala S, Pavel M (1996) Towards ASR on partially corrupted speech. Spoken language, 1996, vol 1. Proceedings of fourth international conference on IEEE, 1996, ICSLP 96, pp 462–465
Wang S S, Hung J, Tsao Y (2012) A study on cepstral sub-band normalization for robust ASR. Chinese spoken language Processing (ISCSLP), 2012 8th International Symposium on IEEE, 2012: 141–145
Acknowledgments
The research work described in this paper was supported by Anhui University Academic and Technical Leaders Introduce Engineering Foundation (02303203), National Nature Science Foundation (61271352), and Training Program on Anhui University College Students Innovation Experiment (KYXL2012058).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
lv, Z., Ni, L., Chen, S., Wu, X. (2014). A Research on Speech Enhancement Based on Hybrid Parallel Subbands HMM and Neural Network Model. In: Farag, A., Yang, J., Jiao, F. (eds) Proceedings of the 3rd International Conference on Multimedia Technology (ICMT 2013). Lecture Notes in Electrical Engineering, vol 278. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41407-7_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-41407-7_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41406-0
Online ISBN: 978-3-642-41407-7
eBook Packages: EngineeringEngineering (R0)