Abstract
An effective feature compensation method is evaluated for reliable speech recognition in real-life in-vehicle environments. The CU-Move corpus, previously collected by RSPG (currently, CRSS) (http://www.utdallas.edu/research/utdrive; Hansen et al., DSP for In-Vehicle and Mobile Systems, 2004), contains a range of speech and noise signals collected for a number of speakers from across the United States under actual driving conditions. PCGMM (parallel combined Gaussian mixture model)-based feature compensation (Kim et al., Eurospeech 2003, 2003; Kim et al., ICASSP 2004, 2004), considered in this chapter, utilizes parallel model combination to generate noise-corrupted speech models by combining clean speech and noise models. In order to address unknown time-varying background noise, an interpolation method of multiple environmental models is employed. To alleviate computational expenses due to multiple models, a noise transition model is proposed, which is motivated from the noise language modeling concept developed in Environmental Sniffing (Akbacak and Hansen, IEEE Trans Audio Speech Lang Process, 15(2): 465–477, 2007). The PCGMM method and the proposed scheme are evaluated on the connected single digits portion of the CU-Move database using the Aurora2 evaluation toolkit. Experimental results indicate that our feature compensation method is effective for improving speech recognition in real-life in-vehicle conditions. Here, a 26.78% computational reduction was obtained by employing the noise transition model with only a slight change in overall recognition performance. The resulting system therefore demonstrates an effective speech recognition strategy for robust speech recognition for noisy in-vehicle environments.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
J.H.L. Hansen, X. Zhang, M. Akbacak, U. Yapanel, B. Pellom, W. Ward, and P. Angkititrakul, “CU-Move: Advanced In-Vehicle Speech Systems for Route Navigation,” DSP for In-Vehicle and Mobile Systems, Springer, 2004.
X. Zhang and J.H.L. Hansen, “CSA-BF: A Constrained Switched Adaptive Beamformer for Speech Enhancement and Recognition in Real Car Environments,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 733–745, 2003.
W. Kim, S. Ahn, and H. Ko, “Feature Compensation Scheme Based on Parallel Combined Mixture Model,” Eurospeech2003, pp. 667–680, 2003.
W. Kim, O. Kwon, and H. Ko, “PCMM-Based Feature Compensation Scheme Using Model Interpolation and Mixture Sharing,” ICASSP2004, pp. 989–992, 2004.
M. Akbacak and J.H.L. Hansen, “Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems,” IEEE Transactions on Audio, Speech and Language Processing, vol.15, no 2, pp. 465–477, 2007.
A.P. Varga and R.K. Moore, “Hidden Markov Model Decomposition of Speech and Noise,” ICASSP90, pp. 845–848, 1990.
M.J.F. Gales and S.J. Young, “Robust Continuous Speech Recognition Using Parallel Model Combination,” IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5, pp. 352–359, 1996.
P.J. Moreno, B. Raj, and R.M. Stern, “Data-Driven Environmental Compensation for Speech Recognition: A Unified Approach,” Speech Communication, vol. 24, pp. 267–85, 1998.
H.G. Hirsch and D. Pearce, “The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems Under Noisy Conditions,” ISCA ITRW ASR2000, Sept. 2000.
ETSI Standard Doc., “Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Compression Algorithms,” ETSI ES 201 108 v1.1.2 (2000–04), 2000.
NIST SPeech Quality Assurance (SPQA) package version 2.3, http://www.nist.gov/speech.
ETSI Standard Doc., “Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithms,” ETSI ES 202 050 v1.1.1 (2002–10), 2002.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Kim, W., Hansen, J.H.L. (2009). Feature Compensation Employing Model Combination for Robust In-Vehicle Speech Recognition. In: Takeda, K., Erdogan, H., Hansen, J.H.L., Abut, H. (eds) In-Vehicle Corpus and Signal Processing for Driver Behavior. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79582-9_19
Download citation
DOI: https://doi.org/10.1007/978-0-387-79582-9_19
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-79581-2
Online ISBN: 978-0-387-79582-9
eBook Packages: EngineeringEngineering (R0)