Skip to main content

Gaussian Models in Automatic Speech Recognition

  • Chapter
Handbook of Signal Processing in Acoustics

Abstract

Most automatic speech recognition (ASR) systems express probability densities over sequences of acoustic feature vectors using Gaussian or Gaussian-mixture hidden Markov models. In this chapter, we explore how graphical models can help describe a variety of tied (i.e., parameter shared) and regularized Gaussian mixture systems. Unlike many previous such tied systems, however, here we allow sub-portions of the Gaussians to be tied in arbitrary ways. The space of such models includes regularized, tied, and adaptive versions of mixture conditional Gaussian models and also a regularized version of maximum-likelihood linear regression (MLLR). We derive expectation-maximization (EM) update equations and explore consequences to the training algorithm under relevant variants of the equations. In particular, we find that for certain combinations of regularization and/or tying, it is no longer the case that we may achieve a closed-form analytic solution to the EM update equations. We describe, however, a generalized EM (GEM) procedure that will still increase the likelihood and has the same fixed-points as the standard EM algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 629.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 799.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 799.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. L.R. Bahl, P.F. Brown, P.V. de Souza, and R.L. Mercer. Maximum mutual information estimation of HMM parameters for speech recognition. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 49–52, Tokyo, Japan, December 1986.

    Google Scholar 

  2. L.E. Baum and T. Petrie. Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Statist., 37(6):1554–1563, 1966.

    Article  MATH  MathSciNet  Google Scholar 

  3. J. Bilmes. Buried markov models: A graphical modeling approach to automatic speech recognition. Comput. Speech Lang., 17:213–231, April–July 2003.

    Google Scholar 

  4. J. Bilmes and C. Bartels. Graphical model architectures for speech recognition. IEEE Signal Process. Mag., 22(5):89–100, September 2005.

    Google Scholar 

  5. J. A. Bilmes. Graphical models and automatic speech recognition. In R. Rosenfeld, M. Ostendorf, S. Khudanpur, and M. Johnson, editors, Mathematical Foundations of Speech and Language Processing. Springer-Verlag, New York, 2003.

    Google Scholar 

  6. J.A. Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI, 1997.

    Google Scholar 

  7. J.A. Bilmes. Buried Markov models for speech recognition. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Process, Phoenix, AZ, March 1999.

    Google Scholar 

  8. J.A. Bilmes. Factored sparse inverse covariance matrices. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, 2000.

    Google Scholar 

  9. C. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.

    Google Scholar 

  10. C.-P. Chen, J. Bilmes, and D. Ellis. Speech feature smoothing for robust ASR. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, March 2005.

    Google Scholar 

  11. C. Chesta, O. Siohan, and C.-H. Lee. Maximum a posteriori linear regression for hidden markov model adaptation. In European Conf. on Speech Communication and Technology (Eurospeech), 1999.

    Google Scholar 

  12. W. Chou and X. He. Maximum a posteriori linear regression (maplr) variance adaptation for continuous density hmms. In European Conf. on Speech Communication and Technology (Eurospeech), 1513–1516, Geneva, Switzerland, 2003.

    Google Scholar 

  13. A. P. Dawid. Conditional independence in statistical theory. J. R. Stat. Soc. B, 41(1):1–31, 1989.

    ADS  MathSciNet  Google Scholar 

  14. T. Dean and K. Kanazawa. Probabilistic temporal reasoning. AAAI, 524–528, 1988.

    Google Scholar 

  15. A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum-likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B., 39, 1977.

    Google Scholar 

  16. Y. Ephraim, A. Dembo, and L. Rabiner. A minimum discrimination information approach for HMM. IEEE Trans. Info. Theory, 35(5):1001–1013, September 1989.

    Article  MATH  MathSciNet  Google Scholar 

  17. Y. Ephraim and L. Rabiner. On the relations between modeling approaches for information sources. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 24–27, 1988.

    Google Scholar 

  18. R. Fletcher. Practical Methods of Optimization. John Wiley & Sons, New York, NY, 1980.

    Google Scholar 

  19. N. Friedman. The Bayesian structural EM algorithm. 14th Conf. on Uncertainty in Artificial Intelligence, 1998.

    Google Scholar 

  20. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In IJCAI, 1300–1309, 1999.

    Google Scholar 

  21. Z. Gajic. Lyapunov Matrix Equation in System Stability and Control. Academic Press, San Diego, 1995.

    Google Scholar 

  22. M.J.F. Gales. Maximum likelihood linear transformations for hmm-based speech recognition. Comput. Speech Lang., 12:75–98, 1998.

    Article  Google Scholar 

  23. M.J.F. Gales. Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech Audio Process., 7(3):272–281, May 1999.

    Google Scholar 

  24. M.J.F. Gales and P.C. Woodland. Mean and variance adaptation within the mllr framework. Comput. Speech Lang., 10, 1996.

    Google Scholar 

  25. J.D. Gardiner, A.J. Laub, J.J. Amato, and C.B. Moler. Solution of the sylvester matrix equation axbt+cxdt=e. ACM Trans. Math. Softw., 18(2):223–231, 1992.

    Article  MATH  MathSciNet  Google Scholar 

  26. J.L. Gauvain and C.H. Lee. Maximum a-posteriori estimation for multivariate Gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process., 2:291–298, 1994.

    Article  Google Scholar 

  27. G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins, Baltimore, 1996.

    MATH  Google Scholar 

  28. R. A. Gopinath. Maximum likelihood modeling with Gaussian distributions for classification. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1998.

    Google Scholar 

  29. D.A. Harville. Matrix Algebra from a Statistician’s Perspective. Springer, New York, 1997.

    MATH  Google Scholar 

  30. T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. J. Mach. Learn. Res., 5:1391–1415, October 2004.

    Google Scholar 

  31. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York, 2001.

    MATH  Google Scholar 

  32. D. Heckerman, D. Geiger, and D.M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Technical Report MSR-TR-94-09, Microsoft, 1994.

    Google Scholar 

  33. D. Heckerman and C. Meek. Embedded bayesian network classifiers. Technical Report MSR-TR-97-06, Microsoft Research, Redmond, WA, 1997.

    Google Scholar 

  34. D. Heckerman and C. Meek. Models and selection criteria for regression and classification. In Proc. Thirteenth Conf. on Uncertainty in Artificial Intelligence, Providence, RI. Morgan Kaufmann, August 1997.

    Google Scholar 

  35. H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am., 87(4):1738–1752, April 1990.

    Google Scholar 

  36. M.J. Hunt, M. Lennig, and P. Mermelstein. Experiments in syllable-based recognition of continuous speech. Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1980.

    Google Scholar 

  37. F.V. Jensen. An Introduction to Bayesian Networks. Springer, New York, 1996.

    Google Scholar 

  38. B.-H. Juang, W. Chou, and C.-H. Lee. Minimum classification error rate methods for speech recognition. IEEE Trans. on Speech and Audio Signal Processing, 5(3):257–265, May 1997.

    Google Scholar 

  39. B.-H. Juang and S. Katagiri. Discriminative learning for minimum error classification. IEEE Trans. on Signal Process., 40(12):3043–3054, December 1992.

    Google Scholar 

  40. S.L. Lauritzen. Graphical Models. Oxford Science Publications, Oxford, 1996.

    Google Scholar 

  41. C.J. Leggetter and P.C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang., 9:171–185, 1995.

    Article  Google Scholar 

  42. X. Li and J. Bilmes. Regularized adaptation of discriminative classifiers. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 2006.

    Google Scholar 

  43. X. Li, J. Bilmes, and J. Malkin. Maximum margin learning and adaptation of mlp classifers. In European Conf. on Speech Communication and Technology (Eurospeech), 2005.

    Google Scholar 

  44. R.J. Mammone, X. Zhang, and R.P. Ramachandran. Robust speaker recognition. IEEE Signal Process. Mag., 13(5):58–71, September 1996.

    Google Scholar 

  45. K. Murphy. Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, U.C. Berkeley, Dept. of EECS, CS Division, 2002.

    Google Scholar 

  46. R.M. Neal and G.E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In M.I. Jordan, editor, Learning in Graphical Models, 355–368. Kluwer Academic Publishers, Dordrecht, 1998.

    Google Scholar 

  47. L. Neumeyer, A. Sankar, and V. Digalakis. A comparative study of speaker adaptation techniques. In European Conf. on Speech Communication and Technology (Eurospeech), 1127–1130, Madrid, Spain, 1995.

    Google Scholar 

  48. A. Ng and M. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Neural Information Processing Systems (NIPS), 14, Vancouver, Canada, December 2002.

    Google Scholar 

  49. P. Olsen and R. Gopinath. Modeling inverse covariance matrices by basis expansion. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 945–948, 2002.

    Google Scholar 

  50. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Los Altos, 2nd printing edition, 1988.

    Google Scholar 

  51. J. Pearl. Causality. Cambridge University Press, Cambridge, 2000.

    MATH  Google Scholar 

  52. L.R. Rabiner and B.H. Juang. An introduction to hidden Markov models. IEEE ASSP Mag., 1986.

    Google Scholar 

  53. S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Neural Comput., 11:305–345, 1999.

    Article  Google Scholar 

  54. G. Saon. A non-linear speaker adaptation technique using kernel ridge regression. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 2006.

    Google Scholar 

  55. F. Sha and L. K. Saul. Large margin Gaussian mixture modeling for phonetic classification and recognition. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Tolouse, France, 2006.

    Google Scholar 

  56. O. Siohan, T. Myrvol, and C. Lee. Structural maximum a posteriori linear regression for fast hmm adaptation. Comput. Speech Lang., 16:5–24, 2002.

    Article  Google Scholar 

  57. V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.

    MATH  Google Scholar 

  58. G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.

    Google Scholar 

  59. P.C. Woodland. Speaker adaptation: Techniques and challenges. In Proc. IEEE ASRU, 1999.

    Google Scholar 

  60. P.C. Woodland and D. Povey. Large scale discriminative training for speech recognition. In ICSA ITRW ASR2000, 2000.

    Google Scholar 

  61. S. Young. A review of large-vocabulary continuous-speech recognition. IEEE Signal Process. Mag., 13(5):45–56, September 1996.

    Google Scholar 

  62. S. Young, J. Jansen, J. Odell, D. Ollason, and P. Woodland. The HTK Book. Entropic Labs and Cambridge University, 2.1 edition,1990’s.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Bilmes, J. (2008). Gaussian Models in Automatic Speech Recognition. In: Havelock, D., Kuwano, S., Vorländer, M. (eds) Handbook of Signal Processing in Acoustics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30441-0_29

Download citation

Publish with us

Policies and ethics