Skip to main content

Uncertainty Decoding and Conditional Bayesian Estimation

  • Chapter
  • First Online:
Robust Speech Recognition of Uncertain or Missing Data

Abstract

In this contribution classification rules for HMM-based speech recognition in the presence of a mismatch between training and test data are presented. The observed feature vectors are regarded as corrupted versions of underlying and unobservable clean feature vectors, which have the same statistics as the training data. Optimal classification then consists of two steps. First, the posterior density of the clean feature vector, given the observed feature vectors, has to be determined, and second, this posterior is employed in a modified classification rule, which accounts for imperfect estimates. We discuss different variants of the classification rule and further elaborate on the estimation of the clean speech feature posterior, using conditional Bayesian estimation. It is shown that this concept is fairly general and can be applied to different scenarios, such as noisy or reverberant speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Acero, A.: Acoustical and environmental robustness in automatic speech recognition. Ph.D. thesis, Carnegie Mellon University (1990)

    Google Scholar 

  2. Afifi, M., Cui, X., Gao, Y.: Stereo-based stochastic mapping for robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Honolulu, Hi. (2007)

    Google Scholar 

  3. Allen, J.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–590 (1979)

    Article  Google Scholar 

  4. Arrowood, J., Clements, M.: Using observation uncertainty in HMM decoding. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Denver, Colorado (2002)

    Google Scholar 

  5. Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. SODA, pp. 1027–1035 (2007)

    Google Scholar 

  6. Bar-Shalom, Y., Rong Li, X., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation. John Wiley and Sons, Inc. (2001)

    Google Scholar 

  7. Barker, J., Cooke, M., Ellis, D.: Decoding speech in the presence of other sources. Speech Commununication 45, 5–25 (2005)

    Article  Google Scholar 

  8. Barker, J., Josifovski, L., Cooke, M., Green, P.: Soft decisions in missing data techniques for robust automatic speech recognition. In: Proc. of International Conference on Spoken Language Processing (ICSLP), pp. 373–376. Beijing, China (2000)

    Google Scholar 

  9. Bernard, A., Alwan, A.: Joint channel decoding – Viterbi recognition for wireless applications. In: Proc. of Eurospeech, Aalborg, Denmark (2001)

    Google Scholar 

  10. Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commununication 34(3), 267 – 285 (2001)

    Article  MATH  Google Scholar 

  11. van Dalen, R., Gales, M.: Asymptotically exact noise-corrupted speech likelihoods. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Makuhari, Japan (2010)

    Google Scholar 

  12. Deng, J., Bouchard, M., Yeap, T.H.: Speech feature estimation under the presence of noise with a switching linear dynamical model. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toulouse, France (2006)

    Google Scholar 

  13. Deng, J., Bouchard, M., Yeap, T.H.: Noisy speech feature estimation on the Aurora2 database using a switching linear dynamic model. Journal of Multimedia 2(2), 47–52 (2007)

    Article  Google Scholar 

  14. Deng, L., Acero, A., Plumpe, M., Huang, X.: Large vocabulary speech recognition under adverse acoustic environments. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Beijing, China (2000)

    Google Scholar 

  15. Deng, L., Droppo, J., Acero, A.: Log-domain speech feature enhancement using sequential map noise estimation and a phase-sensitive model of the acoustic environment. In: Proc. of International Conference on Spoken Language Processing (ICSLP), vol. 1, pp. 192–195. Denver, Co. (2002)

    Google Scholar 

  16. Deng, L., Droppo, J., Acero, A.: Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Transactions on Speech and Audio Processing 12(2), 133 – 143 (2004)

    Article  Google Scholar 

  17. Deng, L., Droppo, J., Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Trans. Speech and Audio Processing 13(3), 412–421 (2005)

    Article  Google Scholar 

  18. Droppo, J., Acero, A.: Noise robust speech recognition with a switching linear dynamic model. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Montreal, Canada (2004)

    Google Scholar 

  19. Droppo, J., Acero, A.: Environmental robustness. In: J. Benesty, M. Sondhi, Y. Huang (eds.) Handbook of Speech Processing. Springer, London (2008)

    Google Scholar 

  20. Droppo, J., Acero, A., Deng, L.: Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Orlando, Fl. (2002)

    Google Scholar 

  21. Droppo, J., Deng, L., Acero, A.: A comparison of three non-linear observation models for noisy speech features. In: Proc. Eurospeech, vol. 1, pp. 681–684. Geneva, Switzerland (2003)

    Google Scholar 

  22. Duda, R., Hart, P., Stork, D.: Pattern Classification. John Wiley and Sons (2001)

    Google Scholar 

  23. Faubel, F., McDonough, J., Klakow, D.: A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Brisbane, Australia (2008)

    Google Scholar 

  24. Frey, B.J., Deng, L., Acero, A., Kristjansson, T.T.: ALGONQUIN: Iterating Laplace’s method to remove multiple types of acoustic distortion for robust speech recognition. In: Proc. of Eurospeech. Aalborg, Denmark (2001)

    Google Scholar 

  25. Gales, M.: Model-based approaches to handling uncertainty. In: Robust Speech Recognition of Uncertain or Missing Data. Springer, London (2011)

    Google Scholar 

  26. Haeb-Umbach, R., Ion, V.: Soft features for improved distributed speech recognition over wireless networks. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Jeju Island, Korea (2004)

    Google Scholar 

  27. Huo, Q., Lee, C.H.: A Bayesian predictive approach to robust speech recognition. IEEE Trans. Speech and Audio Processing 8(8), 200–204 (2000)

    Google Scholar 

  28. Ion, V., Haeb-Umbach, R.: A novel uncertainty decoding rule with applications to transmission error robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 16, 1047–1060 (2008)

    Article  Google Scholar 

  29. Julier, S., Uhlmann, J., Durrant-Whyte, H.: A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Transactions on Automatic Control 45(3), 477–482 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  30. Kolossa, D., Klimas, A., Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 82–85 (2005)

    Google Scholar 

  31. Kristjansson, T.T., Frey, B.J.: Accounting for uncertainty in observations: A new paradigm for robust automatic speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Orlando, Fl. (2002)

    Google Scholar 

  32. Krueger, A., Haeb-Umbach, R.: Model based feature enhancement for reverberant speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 18(7), 1692–1707 (2010)

    Article  Google Scholar 

  33. Krueger, A., Leutnant, V., Haeb-Umbach, R., Ackermann, M., Bloemer, J.: On the initialisation of dynamic models for speech features. In: Proc. ITG Fachtagung Speech Communication. Bochum, Germany (2010)

    Google Scholar 

  34. Leutnant, V., Haeb-Umbach, R.: Conditional Bayesian estimation employing a phase-sensitive observation model for noise robust speech recognition. In: Robust Speech Recognition of Uncertain or Missing Data. Springer, London (2011)

    Google Scholar 

  35. Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4069–4072. Las Vegas, Nv. (2008)

    Google Scholar 

  36. Liao, H., Gales, M.J.F.: Joint uncertainty decoding for noise robust speech recognition. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Lisbon, Portugal (2005)

    Google Scholar 

  37. Liao, H., Gales, M.J.F.: Issues with uncertainty decoding for noise robust speech recognition. Speech Commununication 50, 265–277 (2008)

    Google Scholar 

  38. Lindberg, B., Tan, Z. (eds.): Automatic Speech Recognition on Mobile Devices and over Communication Networks. Springer, London (2008)

    MATH  Google Scholar 

  39. Morris, A., Barker, J., Bourlard, H.: From missing data to maybe useful data: Soft data modelling for noise robust ASR. Proc. WISP 06 (2001)

    Google Scholar 

  40. Neumeyer, L., Weintraub, M.: Probabilistic optimum filtering for robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Adelaide, Australia (1994)

    Google Scholar 

  41. Peinado, A.M., Segura, J.C.: Speech Recognition over Digital Channels. John Wiley & Sons Ltd. (2006)

    Google Scholar 

  42. Raj, B., Stern, R.: Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine 22(5), 101–116 (2005)

    Article  Google Scholar 

  43. Ristic, B., Arulampalam, S., Gordon, N.: Beyond the Kalman Filter – Particle Filters for Tracking Applications. Artech House (2004)

    Google Scholar 

  44. Schmalenstroeer, J., Haeb-Umbach, R.: A comparison of particle filtering variants for speech feature enhancement. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Lisbon, Portugal (2005)

    Google Scholar 

  45. Sehr, A., Kellermann, W.: Towards robust distant-talking automatic speech recognition in reverberant environments. In: E. Hänsler, G. Schmidt (eds.) Speech and Audio Processing in Adverse Environments. Springer, London (2008)

    Google Scholar 

  46. Stouten, V., Van hamme, H., Wambacq, P.: Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Jeju Island, Korea (2004)

    Google Scholar 

  47. Stouten, V., Van hamme, H., Wambacq, P.: Model-based feature enhancement with uncertainty decoding for noise robust ASR. Speech Commununication 48(11) (2006)

    Google Scholar 

  48. Tan, Z.H., Dalsgaard, P., Lindberg, B.: Automatic speech recognition over error-prone wireless networks. Speech Commununication 47(1-2), 220–242 (2005)

    Article  Google Scholar 

  49. Vary, P.: Speech enhancement by conditional estimation: Noise reduction, error concealment & bandwidth extension, what makes the difference? In: Proc. International Workshop on Acoustic Echo and Noise Control (IWAENC) (2008)

    Google Scholar 

  50. Windmann, S., Haeb-Umbach, R.: Approaches to iterative speech feature enhancement and recognition. IEEE Transactions on Audio, Speech, and Language Processing 17(5), 974–984 (2009)

    Article  Google Scholar 

  51. Wölfel, M.: Enhanced speech features by single-channel joint compensation of noise and reverberation. IEEE Transactions on Audio, Speech, and Language Processing 17(2), 312–323 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reinhold Haeb-Umbach .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Haeb-Umbach, R. (2011). Uncertainty Decoding and Conditional Bayesian Estimation. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21317-5_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21316-8

  • Online ISBN: 978-3-642-21317-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics