Feature Compensation Employing Model Combination for Robust In-Vehicle Speech Recognition

Kim, Wooil; Hansen, John H. L.

doi:10.1007/978-0-387-79582-9_19

Feature Compensation Employing Model Combination for Robust In-Vehicle Speech Recognition

Wooil Kim &
John H. L. Hansen⁵

Chapter
First Online: 01 January 2008

1085 Accesses

Abstract

An effective feature compensation method is evaluated for reliable speech recognition in real-life in-vehicle environments. The CU-Move corpus, previously collected by RSPG (currently, CRSS) (http://www.utdallas.edu/research/utdrive; Hansen et al., DSP for In-Vehicle and Mobile Systems, 2004), contains a range of speech and noise signals collected for a number of speakers from across the United States under actual driving conditions. PCGMM (parallel combined Gaussian mixture model)-based feature compensation (Kim et al., Eurospeech 2003, 2003; Kim et al., ICASSP 2004, 2004), considered in this chapter, utilizes parallel model combination to generate noise-corrupted speech models by combining clean speech and noise models. In order to address unknown time-varying background noise, an interpolation method of multiple environmental models is employed. To alleviate computational expenses due to multiple models, a noise transition model is proposed, which is motivated from the noise language modeling concept developed in Environmental Sniffing (Akbacak and Hansen, IEEE Trans Audio Speech Lang Process, 15(2): 465–477, 2007). The PCGMM method and the proposed scheme are evaluated on the connected single digits portion of the CU-Move database using the Aurora2 evaluation toolkit. Experimental results indicate that our feature compensation method is effective for improving speech recognition in real-life in-vehicle conditions. Here, a 26.78% computational reduction was obtained by employing the noise transition model with only a slight change in overall recognition performance. The resulting system therefore demonstrates an effective speech recognition strategy for robust speech recognition for noisy in-vehicle environments.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

http://www.utdallas.edu/research/utdrive.
J.H.L. Hansen, X. Zhang, M. Akbacak, U. Yapanel, B. Pellom, W. Ward, and P. Angkititrakul, “CU-Move: Advanced In-Vehicle Speech Systems for Route Navigation,” DSP for In-Vehicle and Mobile Systems, Springer, 2004.
Google Scholar
X. Zhang and J.H.L. Hansen, “CSA-BF: A Constrained Switched Adaptive Beamformer for Speech Enhancement and Recognition in Real Car Environments,” IEEE Transactions on Speech and Audio Processing, vol. 11, no. 6, pp. 733–745, 2003.
Article Google Scholar
W. Kim, S. Ahn, and H. Ko, “Feature Compensation Scheme Based on Parallel Combined Mixture Model,” Eurospeech2003, pp. 667–680, 2003.
Google Scholar
W. Kim, O. Kwon, and H. Ko, “PCMM-Based Feature Compensation Scheme Using Model Interpolation and Mixture Sharing,” ICASSP2004, pp. 989–992, 2004.
Google Scholar
M. Akbacak and J.H.L. Hansen, “Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems,” IEEE Transactions on Audio, Speech and Language Processing, vol.15, no 2, pp. 465–477, 2007.
Article Google Scholar
A.P. Varga and R.K. Moore, “Hidden Markov Model Decomposition of Speech and Noise,” ICASSP90, pp. 845–848, 1990.
Google Scholar
M.J.F. Gales and S.J. Young, “Robust Continuous Speech Recognition Using Parallel Model Combination,” IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5, pp. 352–359, 1996.
Article Google Scholar
P.J. Moreno, B. Raj, and R.M. Stern, “Data-Driven Environmental Compensation for Speech Recognition: A Unified Approach,” Speech Communication, vol. 24, pp. 267–85, 1998.
Article Google Scholar
H.G. Hirsch and D. Pearce, “The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems Under Noisy Conditions,” ISCA ITRW ASR2000, Sept. 2000.
Google Scholar
ETSI Standard Doc., “Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Front-End Feature Extraction Algorithm; Compression Algorithms,” ETSI ES 201 108 v1.1.2 (2000–04), 2000.
Google Scholar
NIST SPeech Quality Assurance (SPQA) package version 2.3, http://www.nist.gov/speech.
ETSI Standard Doc., “Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithms,” ETSI ES 202 050 v1.1.1 (2002–10), 2002.
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, Texas, USA
John H. L. Hansen

Authors

Wooil Kim
View author publications
You can also search for this author in PubMed Google Scholar
John H. L. Hansen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John H. L. Hansen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kim, W., Hansen, J.H.L. (2009). Feature Compensation Employing Model Combination for Robust In-Vehicle Speech Recognition. In: Takeda, K., Erdogan, H., Hansen, J.H.L., Abut, H. (eds) In-Vehicle Corpus and Signal Processing for Driver Behavior. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-79582-9_19

Download citation

DOI: https://doi.org/10.1007/978-0-387-79582-9_19
Published: 06 October 2008
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-79581-2
Online ISBN: 978-0-387-79582-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics