Abstract
Most automatic speech recognition (ASR) systems express probability densities over sequences of acoustic feature vectors using Gaussian or Gaussian-mixture hidden Markov models. In this chapter, we explore how graphical models can help describe a variety of tied (i.e., parameter shared) and regularized Gaussian mixture systems. Unlike many previous such tied systems, however, here we allow sub-portions of the Gaussians to be tied in arbitrary ways. The space of such models includes regularized, tied, and adaptive versions of mixture conditional Gaussian models and also a regularized version of maximum-likelihood linear regression (MLLR). We derive expectation-maximization (EM) update equations and explore consequences to the training algorithm under relevant variants of the equations. In particular, we find that for certain combinations of regularization and/or tying, it is no longer the case that we may achieve a closed-form analytic solution to the EM update equations. We describe, however, a generalized EM (GEM) procedure that will still increase the likelihood and has the same fixed-points as the standard EM algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L.R. Bahl, P.F. Brown, P.V. de Souza, and R.L. Mercer. Maximum mutual information estimation of HMM parameters for speech recognition. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 49–52, Tokyo, Japan, December 1986.
L.E. Baum and T. Petrie. Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Statist., 37(6):1554–1563, 1966.
J. Bilmes. Buried markov models: A graphical modeling approach to automatic speech recognition. Comput. Speech Lang., 17:213–231, April–July 2003.
J. Bilmes and C. Bartels. Graphical model architectures for speech recognition. IEEE Signal Process. Mag., 22(5):89–100, September 2005.
J. A. Bilmes. Graphical models and automatic speech recognition. In R. Rosenfeld, M. Ostendorf, S. Khudanpur, and M. Johnson, editors, Mathematical Foundations of Speech and Language Processing. Springer-Verlag, New York, 2003.
J.A. Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI, 1997.
J.A. Bilmes. Buried Markov models for speech recognition. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Process, Phoenix, AZ, March 1999.
J.A. Bilmes. Factored sparse inverse covariance matrices. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, 2000.
C. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
C.-P. Chen, J. Bilmes, and D. Ellis. Speech feature smoothing for robust ASR. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, March 2005.
C. Chesta, O. Siohan, and C.-H. Lee. Maximum a posteriori linear regression for hidden markov model adaptation. In European Conf. on Speech Communication and Technology (Eurospeech), 1999.
W. Chou and X. He. Maximum a posteriori linear regression (maplr) variance adaptation for continuous density hmms. In European Conf. on Speech Communication and Technology (Eurospeech), 1513–1516, Geneva, Switzerland, 2003.
A. P. Dawid. Conditional independence in statistical theory. J. R. Stat. Soc. B, 41(1):1–31, 1989.
T. Dean and K. Kanazawa. Probabilistic temporal reasoning. AAAI, 524–528, 1988.
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum-likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B., 39, 1977.
Y. Ephraim, A. Dembo, and L. Rabiner. A minimum discrimination information approach for HMM. IEEE Trans. Info. Theory, 35(5):1001–1013, September 1989.
Y. Ephraim and L. Rabiner. On the relations between modeling approaches for information sources. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 24–27, 1988.
R. Fletcher. Practical Methods of Optimization. John Wiley & Sons, New York, NY, 1980.
N. Friedman. The Bayesian structural EM algorithm. 14th Conf. on Uncertainty in Artificial Intelligence, 1998.
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In IJCAI, 1300–1309, 1999.
Z. Gajic. Lyapunov Matrix Equation in System Stability and Control. Academic Press, San Diego, 1995.
M.J.F. Gales. Maximum likelihood linear transformations for hmm-based speech recognition. Comput. Speech Lang., 12:75–98, 1998.
M.J.F. Gales. Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech Audio Process., 7(3):272–281, May 1999.
M.J.F. Gales and P.C. Woodland. Mean and variance adaptation within the mllr framework. Comput. Speech Lang., 10, 1996.
J.D. Gardiner, A.J. Laub, J.J. Amato, and C.B. Moler. Solution of the sylvester matrix equation axbt+cxdt=e. ACM Trans. Math. Softw., 18(2):223–231, 1992.
J.L. Gauvain and C.H. Lee. Maximum a-posteriori estimation for multivariate Gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process., 2:291–298, 1994.
G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins, Baltimore, 1996.
R. A. Gopinath. Maximum likelihood modeling with Gaussian distributions for classification. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1998.
D.A. Harville. Matrix Algebra from a Statistician’s Perspective. Springer, New York, 1997.
T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. J. Mach. Learn. Res., 5:1391–1415, October 2004.
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York, 2001.
D. Heckerman, D. Geiger, and D.M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Technical Report MSR-TR-94-09, Microsoft, 1994.
D. Heckerman and C. Meek. Embedded bayesian network classifiers. Technical Report MSR-TR-97-06, Microsoft Research, Redmond, WA, 1997.
D. Heckerman and C. Meek. Models and selection criteria for regression and classification. In Proc. Thirteenth Conf. on Uncertainty in Artificial Intelligence, Providence, RI. Morgan Kaufmann, August 1997.
H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am., 87(4):1738–1752, April 1990.
M.J. Hunt, M. Lennig, and P. Mermelstein. Experiments in syllable-based recognition of continuous speech. Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1980.
F.V. Jensen. An Introduction to Bayesian Networks. Springer, New York, 1996.
B.-H. Juang, W. Chou, and C.-H. Lee. Minimum classification error rate methods for speech recognition. IEEE Trans. on Speech and Audio Signal Processing, 5(3):257–265, May 1997.
B.-H. Juang and S. Katagiri. Discriminative learning for minimum error classification. IEEE Trans. on Signal Process., 40(12):3043–3054, December 1992.
S.L. Lauritzen. Graphical Models. Oxford Science Publications, Oxford, 1996.
C.J. Leggetter and P.C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang., 9:171–185, 1995.
X. Li and J. Bilmes. Regularized adaptation of discriminative classifiers. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 2006.
X. Li, J. Bilmes, and J. Malkin. Maximum margin learning and adaptation of mlp classifers. In European Conf. on Speech Communication and Technology (Eurospeech), 2005.
R.J. Mammone, X. Zhang, and R.P. Ramachandran. Robust speaker recognition. IEEE Signal Process. Mag., 13(5):58–71, September 1996.
K. Murphy. Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, U.C. Berkeley, Dept. of EECS, CS Division, 2002.
R.M. Neal and G.E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In M.I. Jordan, editor, Learning in Graphical Models, 355–368. Kluwer Academic Publishers, Dordrecht, 1998.
L. Neumeyer, A. Sankar, and V. Digalakis. A comparative study of speaker adaptation techniques. In European Conf. on Speech Communication and Technology (Eurospeech), 1127–1130, Madrid, Spain, 1995.
A. Ng and M. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Neural Information Processing Systems (NIPS), 14, Vancouver, Canada, December 2002.
P. Olsen and R. Gopinath. Modeling inverse covariance matrices by basis expansion. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 945–948, 2002.
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Los Altos, 2nd printing edition, 1988.
J. Pearl. Causality. Cambridge University Press, Cambridge, 2000.
L.R. Rabiner and B.H. Juang. An introduction to hidden Markov models. IEEE ASSP Mag., 1986.
S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Neural Comput., 11:305–345, 1999.
G. Saon. A non-linear speaker adaptation technique using kernel ridge regression. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 2006.
F. Sha and L. K. Saul. Large margin Gaussian mixture modeling for phonetic classification and recognition. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Tolouse, France, 2006.
O. Siohan, T. Myrvol, and C. Lee. Structural maximum a posteriori linear regression for fast hmm adaptation. Comput. Speech Lang., 16:5–24, 2002.
V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
P.C. Woodland. Speaker adaptation: Techniques and challenges. In Proc. IEEE ASRU, 1999.
P.C. Woodland and D. Povey. Large scale discriminative training for speech recognition. In ICSA ITRW ASR2000, 2000.
S. Young. A review of large-vocabulary continuous-speech recognition. IEEE Signal Process. Mag., 13(5):45–56, September 1996.
S. Young, J. Jansen, J. Odell, D. Ollason, and P. Woodland. The HTK Book. Entropic Labs and Cambridge University, 2.1 edition,1990’s.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Bilmes, J. (2008). Gaussian Models in Automatic Speech Recognition. In: Havelock, D., Kuwano, S., Vorländer, M. (eds) Handbook of Signal Processing in Acoustics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30441-0_29
Download citation
DOI: https://doi.org/10.1007/978-0-387-30441-0_29
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-77698-9
Online ISBN: 978-0-387-30441-0
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)