Uncertainty Decoding and Conditional Bayesian Estimation

Haeb-Umbach, Reinhold

doi:10.1007/978-3-642-21317-5_2

Reinhold Haeb-Umbach³

892 Accesses
5 Citations

Abstract

In this contribution classification rules for HMM-based speech recognition in the presence of a mismatch between training and test data are presented. The observed feature vectors are regarded as corrupted versions of underlying and unobservable clean feature vectors, which have the same statistics as the training data. Optimal classification then consists of two steps. First, the posterior density of the clean feature vector, given the observed feature vectors, has to be determined, and second, this posterior is employed in a modified classification rule, which accounts for imperfect estimates. We discuss different variants of the classification rule and further elaborate on the estimation of the clean speech feature posterior, using conditional Bayesian estimation. It is shown that this concept is fairly general and can be applied to different scenarios, such as noisy or reverberant speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Acero, A.: Acoustical and environmental robustness in automatic speech recognition. Ph.D. thesis, Carnegie Mellon University (1990)
Google Scholar
Afifi, M., Cui, X., Gao, Y.: Stereo-based stochastic mapping for robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Honolulu, Hi. (2007)
Google Scholar
Allen, J.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–590 (1979)
Article Google Scholar
Arrowood, J., Clements, M.: Using observation uncertainty in HMM decoding. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Denver, Colorado (2002)
Google Scholar
Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. SODA, pp. 1027–1035 (2007)
Google Scholar
Bar-Shalom, Y., Rong Li, X., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation. John Wiley and Sons, Inc. (2001)
Google Scholar
Barker, J., Cooke, M., Ellis, D.: Decoding speech in the presence of other sources. Speech Commununication 45, 5–25 (2005)
Article Google Scholar
Barker, J., Josifovski, L., Cooke, M., Green, P.: Soft decisions in missing data techniques for robust automatic speech recognition. In: Proc. of International Conference on Spoken Language Processing (ICSLP), pp. 373–376. Beijing, China (2000)
Google Scholar
Bernard, A., Alwan, A.: Joint channel decoding – Viterbi recognition for wireless applications. In: Proc. of Eurospeech, Aalborg, Denmark (2001)
Google Scholar
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commununication 34(3), 267 – 285 (2001)
Article MATH Google Scholar
van Dalen, R., Gales, M.: Asymptotically exact noise-corrupted speech likelihoods. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Makuhari, Japan (2010)
Google Scholar
Deng, J., Bouchard, M., Yeap, T.H.: Speech feature estimation under the presence of noise with a switching linear dynamical model. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toulouse, France (2006)
Google Scholar
Deng, J., Bouchard, M., Yeap, T.H.: Noisy speech feature estimation on the Aurora2 database using a switching linear dynamic model. Journal of Multimedia 2(2), 47–52 (2007)
Article Google Scholar
Deng, L., Acero, A., Plumpe, M., Huang, X.: Large vocabulary speech recognition under adverse acoustic environments. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Beijing, China (2000)
Google Scholar
Deng, L., Droppo, J., Acero, A.: Log-domain speech feature enhancement using sequential map noise estimation and a phase-sensitive model of the acoustic environment. In: Proc. of International Conference on Spoken Language Processing (ICSLP), vol. 1, pp. 192–195. Denver, Co. (2002)
Google Scholar
Deng, L., Droppo, J., Acero, A.: Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Transactions on Speech and Audio Processing 12(2), 133 – 143 (2004)
Article Google Scholar
Deng, L., Droppo, J., Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Trans. Speech and Audio Processing 13(3), 412–421 (2005)
Article Google Scholar
Droppo, J., Acero, A.: Noise robust speech recognition with a switching linear dynamic model. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Montreal, Canada (2004)
Google Scholar
Droppo, J., Acero, A.: Environmental robustness. In: J. Benesty, M. Sondhi, Y. Huang (eds.) Handbook of Speech Processing. Springer, London (2008)
Google Scholar
Droppo, J., Acero, A., Deng, L.: Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Orlando, Fl. (2002)
Google Scholar
Droppo, J., Deng, L., Acero, A.: A comparison of three non-linear observation models for noisy speech features. In: Proc. Eurospeech, vol. 1, pp. 681–684. Geneva, Switzerland (2003)
Google Scholar
Duda, R., Hart, P., Stork, D.: Pattern Classification. John Wiley and Sons (2001)
Google Scholar
Faubel, F., McDonough, J., Klakow, D.: A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Brisbane, Australia (2008)
Google Scholar
Frey, B.J., Deng, L., Acero, A., Kristjansson, T.T.: ALGONQUIN: Iterating Laplace’s method to remove multiple types of acoustic distortion for robust speech recognition. In: Proc. of Eurospeech. Aalborg, Denmark (2001)
Google Scholar
Gales, M.: Model-based approaches to handling uncertainty. In: Robust Speech Recognition of Uncertain or Missing Data. Springer, London (2011)
Google Scholar
Haeb-Umbach, R., Ion, V.: Soft features for improved distributed speech recognition over wireless networks. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Jeju Island, Korea (2004)
Google Scholar
Huo, Q., Lee, C.H.: A Bayesian predictive approach to robust speech recognition. IEEE Trans. Speech and Audio Processing 8(8), 200–204 (2000)
Google Scholar
Ion, V., Haeb-Umbach, R.: A novel uncertainty decoding rule with applications to transmission error robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 16, 1047–1060 (2008)
Article Google Scholar
Julier, S., Uhlmann, J., Durrant-Whyte, H.: A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Transactions on Automatic Control 45(3), 477–482 (2000)
Article MATH MathSciNet Google Scholar
Kolossa, D., Klimas, A., Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 82–85 (2005)
Google Scholar
Kristjansson, T.T., Frey, B.J.: Accounting for uncertainty in observations: A new paradigm for robust automatic speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Orlando, Fl. (2002)
Google Scholar
Krueger, A., Haeb-Umbach, R.: Model based feature enhancement for reverberant speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 18(7), 1692–1707 (2010)
Article Google Scholar
Krueger, A., Leutnant, V., Haeb-Umbach, R., Ackermann, M., Bloemer, J.: On the initialisation of dynamic models for speech features. In: Proc. ITG Fachtagung Speech Communication. Bochum, Germany (2010)
Google Scholar
Leutnant, V., Haeb-Umbach, R.: Conditional Bayesian estimation employing a phase-sensitive observation model for noise robust speech recognition. In: Robust Speech Recognition of Uncertain or Missing Data. Springer, London (2011)
Google Scholar
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4069–4072. Las Vegas, Nv. (2008)
Google Scholar
Liao, H., Gales, M.J.F.: Joint uncertainty decoding for noise robust speech recognition. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Lisbon, Portugal (2005)
Google Scholar
Liao, H., Gales, M.J.F.: Issues with uncertainty decoding for noise robust speech recognition. Speech Commununication 50, 265–277 (2008)
Google Scholar
Lindberg, B., Tan, Z. (eds.): Automatic Speech Recognition on Mobile Devices and over Communication Networks. Springer, London (2008)
MATH Google Scholar
Morris, A., Barker, J., Bourlard, H.: From missing data to maybe useful data: Soft data modelling for noise robust ASR. Proc. WISP 06 (2001)
Google Scholar
Neumeyer, L., Weintraub, M.: Probabilistic optimum filtering for robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Adelaide, Australia (1994)
Google Scholar
Peinado, A.M., Segura, J.C.: Speech Recognition over Digital Channels. John Wiley & Sons Ltd. (2006)
Google Scholar
Raj, B., Stern, R.: Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine 22(5), 101–116 (2005)
Article Google Scholar
Ristic, B., Arulampalam, S., Gordon, N.: Beyond the Kalman Filter – Particle Filters for Tracking Applications. Artech House (2004)
Google Scholar
Schmalenstroeer, J., Haeb-Umbach, R.: A comparison of particle filtering variants for speech feature enhancement. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Lisbon, Portugal (2005)
Google Scholar
Sehr, A., Kellermann, W.: Towards robust distant-talking automatic speech recognition in reverberant environments. In: E. Hänsler, G. Schmidt (eds.) Speech and Audio Processing in Adverse Environments. Springer, London (2008)
Google Scholar
Stouten, V., Van hamme, H., Wambacq, P.: Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Jeju Island, Korea (2004)
Google Scholar
Stouten, V., Van hamme, H., Wambacq, P.: Model-based feature enhancement with uncertainty decoding for noise robust ASR. Speech Commununication 48(11) (2006)
Google Scholar
Tan, Z.H., Dalsgaard, P., Lindberg, B.: Automatic speech recognition over error-prone wireless networks. Speech Commununication 47(1-2), 220–242 (2005)
Article Google Scholar
Vary, P.: Speech enhancement by conditional estimation: Noise reduction, error concealment & bandwidth extension, what makes the difference? In: Proc. International Workshop on Acoustic Echo and Noise Control (IWAENC) (2008)
Google Scholar
Windmann, S., Haeb-Umbach, R.: Approaches to iterative speech feature enhancement and recognition. IEEE Transactions on Audio, Speech, and Language Processing 17(5), 974–984 (2009)
Article Google Scholar
Wölfel, M.: Enhanced speech features by single-channel joint compensation of noise and reverberation. IEEE Transactions on Audio, Speech, and Language Processing 17(2), 312–323 (2009)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Communications Engineering, University of Paderborn, Warburger Straße 100, 33098, Paderborn, Germany
Reinhold Haeb-Umbach

Authors

Reinhold Haeb-Umbach
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reinhold Haeb-Umbach .

Editor information

Editors and Affiliations

Institute of Communication Acoustics, Ruhr-Universität Bochum, Universitätsstrasse 150, Bochum, 44801, Germany
Dorothea Kolossa
, Dept. of Communications Engineering, University of Paderborn, Warburger Strasse 100, Paderborn, 33098, Germany
Reinhold Häb-Umbach

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Haeb-Umbach, R. (2011). Uncertainty Decoding and Conditional Bayesian Estimation. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-21317-5_2
Published: 23 June 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics