Abstract
In this contribution classification rules for HMM-based speech recognition in the presence of a mismatch between training and test data are presented. The observed feature vectors are regarded as corrupted versions of underlying and unobservable clean feature vectors, which have the same statistics as the training data. Optimal classification then consists of two steps. First, the posterior density of the clean feature vector, given the observed feature vectors, has to be determined, and second, this posterior is employed in a modified classification rule, which accounts for imperfect estimates. We discuss different variants of the classification rule and further elaborate on the estimation of the clean speech feature posterior, using conditional Bayesian estimation. It is shown that this concept is fairly general and can be applied to different scenarios, such as noisy or reverberant speech recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Acero, A.: Acoustical and environmental robustness in automatic speech recognition. Ph.D. thesis, Carnegie Mellon University (1990)
Afifi, M., Cui, X., Gao, Y.: Stereo-based stochastic mapping for robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Honolulu, Hi. (2007)
Allen, J.: Image method for efficiently simulating small-room acoustics. J. Acoust. Soc. Am. 65(4), 943–590 (1979)
Arrowood, J., Clements, M.: Using observation uncertainty in HMM decoding. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Denver, Colorado (2002)
Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. SODA, pp. 1027–1035 (2007)
Bar-Shalom, Y., Rong Li, X., Kirubarajan, T.: Estimation with Applications to Tracking and Navigation. John Wiley and Sons, Inc. (2001)
Barker, J., Cooke, M., Ellis, D.: Decoding speech in the presence of other sources. Speech Commununication 45, 5–25 (2005)
Barker, J., Josifovski, L., Cooke, M., Green, P.: Soft decisions in missing data techniques for robust automatic speech recognition. In: Proc. of International Conference on Spoken Language Processing (ICSLP), pp. 373–376. Beijing, China (2000)
Bernard, A., Alwan, A.: Joint channel decoding – Viterbi recognition for wireless applications. In: Proc. of Eurospeech, Aalborg, Denmark (2001)
Cooke, M., Green, P., Josifovski, L., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commununication 34(3), 267 – 285 (2001)
van Dalen, R., Gales, M.: Asymptotically exact noise-corrupted speech likelihoods. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Makuhari, Japan (2010)
Deng, J., Bouchard, M., Yeap, T.H.: Speech feature estimation under the presence of noise with a switching linear dynamical model. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Toulouse, France (2006)
Deng, J., Bouchard, M., Yeap, T.H.: Noisy speech feature estimation on the Aurora2 database using a switching linear dynamic model. Journal of Multimedia 2(2), 47–52 (2007)
Deng, L., Acero, A., Plumpe, M., Huang, X.: Large vocabulary speech recognition under adverse acoustic environments. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Beijing, China (2000)
Deng, L., Droppo, J., Acero, A.: Log-domain speech feature enhancement using sequential map noise estimation and a phase-sensitive model of the acoustic environment. In: Proc. of International Conference on Spoken Language Processing (ICSLP), vol. 1, pp. 192–195. Denver, Co. (2002)
Deng, L., Droppo, J., Acero, A.: Enhancement of log Mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Transactions on Speech and Audio Processing 12(2), 133 – 143 (2004)
Deng, L., Droppo, J., Acero, A.: Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Trans. Speech and Audio Processing 13(3), 412–421 (2005)
Droppo, J., Acero, A.: Noise robust speech recognition with a switching linear dynamic model. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Montreal, Canada (2004)
Droppo, J., Acero, A.: Environmental robustness. In: J. Benesty, M. Sondhi, Y. Huang (eds.) Handbook of Speech Processing. Springer, London (2008)
Droppo, J., Acero, A., Deng, L.: Uncertainty decoding with SPLICE for noise robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Orlando, Fl. (2002)
Droppo, J., Deng, L., Acero, A.: A comparison of three non-linear observation models for noisy speech features. In: Proc. Eurospeech, vol. 1, pp. 681–684. Geneva, Switzerland (2003)
Duda, R., Hart, P., Stork, D.: Pattern Classification. John Wiley and Sons (2001)
Faubel, F., McDonough, J., Klakow, D.: A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Brisbane, Australia (2008)
Frey, B.J., Deng, L., Acero, A., Kristjansson, T.T.: ALGONQUIN: Iterating Laplace’s method to remove multiple types of acoustic distortion for robust speech recognition. In: Proc. of Eurospeech. Aalborg, Denmark (2001)
Gales, M.: Model-based approaches to handling uncertainty. In: Robust Speech Recognition of Uncertain or Missing Data. Springer, London (2011)
Haeb-Umbach, R., Ion, V.: Soft features for improved distributed speech recognition over wireless networks. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Jeju Island, Korea (2004)
Huo, Q., Lee, C.H.: A Bayesian predictive approach to robust speech recognition. IEEE Trans. Speech and Audio Processing 8(8), 200–204 (2000)
Ion, V., Haeb-Umbach, R.: A novel uncertainty decoding rule with applications to transmission error robust speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 16, 1047–1060 (2008)
Julier, S., Uhlmann, J., Durrant-Whyte, H.: A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Transactions on Automatic Control 45(3), 477–482 (2000)
Kolossa, D., Klimas, A., Orglmeister, R.: Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 82–85 (2005)
Kristjansson, T.T., Frey, B.J.: Accounting for uncertainty in observations: A new paradigm for robust automatic speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Orlando, Fl. (2002)
Krueger, A., Haeb-Umbach, R.: Model based feature enhancement for reverberant speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 18(7), 1692–1707 (2010)
Krueger, A., Leutnant, V., Haeb-Umbach, R., Ackermann, M., Bloemer, J.: On the initialisation of dynamic models for speech features. In: Proc. ITG Fachtagung Speech Communication. Bochum, Germany (2010)
Leutnant, V., Haeb-Umbach, R.: Conditional Bayesian estimation employing a phase-sensitive observation model for noise robust speech recognition. In: Robust Speech Recognition of Uncertain or Missing Data. Springer, London (2011)
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4069–4072. Las Vegas, Nv. (2008)
Liao, H., Gales, M.J.F.: Joint uncertainty decoding for noise robust speech recognition. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Lisbon, Portugal (2005)
Liao, H., Gales, M.J.F.: Issues with uncertainty decoding for noise robust speech recognition. Speech Commununication 50, 265–277 (2008)
Lindberg, B., Tan, Z. (eds.): Automatic Speech Recognition on Mobile Devices and over Communication Networks. Springer, London (2008)
Morris, A., Barker, J., Bourlard, H.: From missing data to maybe useful data: Soft data modelling for noise robust ASR. Proc. WISP 06 (2001)
Neumeyer, L., Weintraub, M.: Probabilistic optimum filtering for robust speech recognition. In: Proc. of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Adelaide, Australia (1994)
Peinado, A.M., Segura, J.C.: Speech Recognition over Digital Channels. John Wiley & Sons Ltd. (2006)
Raj, B., Stern, R.: Missing-feature approaches in speech recognition. IEEE Signal Processing Magazine 22(5), 101–116 (2005)
Ristic, B., Arulampalam, S., Gordon, N.: Beyond the Kalman Filter – Particle Filters for Tracking Applications. Artech House (2004)
Schmalenstroeer, J., Haeb-Umbach, R.: A comparison of particle filtering variants for speech feature enhancement. In: Proc. of Annual Conference of the International Speech Communication Association (Interspeech). Lisbon, Portugal (2005)
Sehr, A., Kellermann, W.: Towards robust distant-talking automatic speech recognition in reverberant environments. In: E. Hänsler, G. Schmidt (eds.) Speech and Audio Processing in Adverse Environments. Springer, London (2008)
Stouten, V., Van hamme, H., Wambacq, P.: Accounting for the uncertainty of speech estimates in the context of model-based feature enhancement. In: Proc. of International Conference on Spoken Language Processing (ICSLP). Jeju Island, Korea (2004)
Stouten, V., Van hamme, H., Wambacq, P.: Model-based feature enhancement with uncertainty decoding for noise robust ASR. Speech Commununication 48(11) (2006)
Tan, Z.H., Dalsgaard, P., Lindberg, B.: Automatic speech recognition over error-prone wireless networks. Speech Commununication 47(1-2), 220–242 (2005)
Vary, P.: Speech enhancement by conditional estimation: Noise reduction, error concealment & bandwidth extension, what makes the difference? In: Proc. International Workshop on Acoustic Echo and Noise Control (IWAENC) (2008)
Windmann, S., Haeb-Umbach, R.: Approaches to iterative speech feature enhancement and recognition. IEEE Transactions on Audio, Speech, and Language Processing 17(5), 974–984 (2009)
Wölfel, M.: Enhanced speech features by single-channel joint compensation of noise and reverberation. IEEE Transactions on Audio, Speech, and Language Processing 17(2), 312–323 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Haeb-Umbach, R. (2011). Uncertainty Decoding and Conditional Bayesian Estimation. In: Kolossa, D., Häb-Umbach, R. (eds) Robust Speech Recognition of Uncertain or Missing Data. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21317-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-21317-5_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21316-8
Online ISBN: 978-3-642-21317-5
eBook Packages: EngineeringEngineering (R0)