Gaussian Models in Automatic Speech Recognition

Bilmes, Jeff

doi:10.1007/978-0-387-30441-0_29

Jeff Bilmes⁴

755 Accesses
3 Citations

Abstract

Most automatic speech recognition (ASR) systems express probability densities over sequences of acoustic feature vectors using Gaussian or Gaussian-mixture hidden Markov models. In this chapter, we explore how graphical models can help describe a variety of tied (i.e., parameter shared) and regularized Gaussian mixture systems. Unlike many previous such tied systems, however, here we allow sub-portions of the Gaussians to be tied in arbitrary ways. The space of such models includes regularized, tied, and adaptive versions of mixture conditional Gaussian models and also a regularized version of maximum-likelihood linear regression (MLLR). We derive expectation-maximization (EM) update equations and explore consequences to the training algorithm under relevant variants of the equations. In particular, we find that for certain combinations of regularization and/or tying, it is no longer the case that we may achieve a closed-form analytic solution to the EM update equations. We describe, however, a generalized EM (GEM) procedure that will still increase the likelihood and has the same fixed-points as the standard EM algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 629.00; Price excludes VAT (USA)

Softcover Book: USD 799.99; Price excludes VAT (USA)

Hardcover Book: USD 799.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L.R. Bahl, P.F. Brown, P.V. de Souza, and R.L. Mercer. Maximum mutual information estimation of HMM parameters for speech recognition. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 49–52, Tokyo, Japan, December 1986.
Google Scholar
L.E. Baum and T. Petrie. Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Statist., 37(6):1554–1563, 1966.
Article MATH MathSciNet Google Scholar
J. Bilmes. Buried markov models: A graphical modeling approach to automatic speech recognition. Comput. Speech Lang., 17:213–231, April–July 2003.
Google Scholar
J. Bilmes and C. Bartels. Graphical model architectures for speech recognition. IEEE Signal Process. Mag., 22(5):89–100, September 2005.
Google Scholar
J. A. Bilmes. Graphical models and automatic speech recognition. In R. Rosenfeld, M. Ostendorf, S. Khudanpur, and M. Johnson, editors, Mathematical Foundations of Speech and Language Processing. Springer-Verlag, New York, 2003.
Google Scholar
J.A. Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI, 1997.
Google Scholar
J.A. Bilmes. Buried Markov models for speech recognition. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Process, Phoenix, AZ, March 1999.
Google Scholar
J.A. Bilmes. Factored sparse inverse covariance matrices. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, 2000.
Google Scholar
C. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.
Google Scholar
C.-P. Chen, J. Bilmes, and D. Ellis. Speech feature smoothing for robust ASR. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, March 2005.
Google Scholar
C. Chesta, O. Siohan, and C.-H. Lee. Maximum a posteriori linear regression for hidden markov model adaptation. In European Conf. on Speech Communication and Technology (Eurospeech), 1999.
Google Scholar
W. Chou and X. He. Maximum a posteriori linear regression (maplr) variance adaptation for continuous density hmms. In European Conf. on Speech Communication and Technology (Eurospeech), 1513–1516, Geneva, Switzerland, 2003.
Google Scholar
A. P. Dawid. Conditional independence in statistical theory. J. R. Stat. Soc. B, 41(1):1–31, 1989.
ADS MathSciNet Google Scholar
T. Dean and K. Kanazawa. Probabilistic temporal reasoning. AAAI, 524–528, 1988.
Google Scholar
A.P. Dempster, N.M. Laird, and D.B. Rubin. Maximum-likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B., 39, 1977.
Google Scholar
Y. Ephraim, A. Dembo, and L. Rabiner. A minimum discrimination information approach for HMM. IEEE Trans. Info. Theory, 35(5):1001–1013, September 1989.
Article MATH MathSciNet Google Scholar
Y. Ephraim and L. Rabiner. On the relations between modeling approaches for information sources. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 24–27, 1988.
Google Scholar
R. Fletcher. Practical Methods of Optimization. John Wiley & Sons, New York, NY, 1980.
Google Scholar
N. Friedman. The Bayesian structural EM algorithm. 14th Conf. on Uncertainty in Artificial Intelligence, 1998.
Google Scholar
N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In IJCAI, 1300–1309, 1999.
Google Scholar
Z. Gajic. Lyapunov Matrix Equation in System Stability and Control. Academic Press, San Diego, 1995.
Google Scholar
M.J.F. Gales. Maximum likelihood linear transformations for hmm-based speech recognition. Comput. Speech Lang., 12:75–98, 1998.
Article Google Scholar
M.J.F. Gales. Semi-tied covariance matrices for hidden Markov models. IEEE Trans. Speech Audio Process., 7(3):272–281, May 1999.
Google Scholar
M.J.F. Gales and P.C. Woodland. Mean and variance adaptation within the mllr framework. Comput. Speech Lang., 10, 1996.
Google Scholar
J.D. Gardiner, A.J. Laub, J.J. Amato, and C.B. Moler. Solution of the sylvester matrix equation axbt+cxdt=e. ACM Trans. Math. Softw., 18(2):223–231, 1992.
Article MATH MathSciNet Google Scholar
J.L. Gauvain and C.H. Lee. Maximum a-posteriori estimation for multivariate Gaussian mixture observations of markov chains. IEEE Trans. Speech Audio Process., 2:291–298, 1994.
Article Google Scholar
G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins, Baltimore, 1996.
MATH Google Scholar
R. A. Gopinath. Maximum likelihood modeling with Gaussian distributions for classification. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1998.
Google Scholar
D.A. Harville. Matrix Algebra from a Statistician’s Perspective. Springer, New York, 1997.
MATH Google Scholar
T. Hastie, S. Rosset, R. Tibshirani, and J. Zhu. The entire regularization path for the support vector machine. J. Mach. Learn. Res., 5:1391–1415, October 2004.
Google Scholar
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, New York, 2001.
MATH Google Scholar
D. Heckerman, D. Geiger, and D.M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Technical Report MSR-TR-94-09, Microsoft, 1994.
Google Scholar
D. Heckerman and C. Meek. Embedded bayesian network classifiers. Technical Report MSR-TR-97-06, Microsoft Research, Redmond, WA, 1997.
Google Scholar
D. Heckerman and C. Meek. Models and selection criteria for regression and classification. In Proc. Thirteenth Conf. on Uncertainty in Artificial Intelligence, Providence, RI. Morgan Kaufmann, August 1997.
Google Scholar
H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am., 87(4):1738–1752, April 1990.
Google Scholar
M.J. Hunt, M. Lennig, and P. Mermelstein. Experiments in syllable-based recognition of continuous speech. Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 1980.
Google Scholar
F.V. Jensen. An Introduction to Bayesian Networks. Springer, New York, 1996.
Google Scholar
B.-H. Juang, W. Chou, and C.-H. Lee. Minimum classification error rate methods for speech recognition. IEEE Trans. on Speech and Audio Signal Processing, 5(3):257–265, May 1997.
Google Scholar
B.-H. Juang and S. Katagiri. Discriminative learning for minimum error classification. IEEE Trans. on Signal Process., 40(12):3043–3054, December 1992.
Google Scholar
S.L. Lauritzen. Graphical Models. Oxford Science Publications, Oxford, 1996.
Google Scholar
C.J. Leggetter and P.C. Woodland. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang., 9:171–185, 1995.
Article Google Scholar
X. Li and J. Bilmes. Regularized adaptation of discriminative classifiers. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 2006.
Google Scholar
X. Li, J. Bilmes, and J. Malkin. Maximum margin learning and adaptation of mlp classifers. In European Conf. on Speech Communication and Technology (Eurospeech), 2005.
Google Scholar
R.J. Mammone, X. Zhang, and R.P. Ramachandran. Robust speaker recognition. IEEE Signal Process. Mag., 13(5):58–71, September 1996.
Google Scholar
K. Murphy. Dynamic Bayesian Networks: Representation, Inference and Learning. PhD thesis, U.C. Berkeley, Dept. of EECS, CS Division, 2002.
Google Scholar
R.M. Neal and G.E. Hinton. A view of the EM algorithm that justifies incremental, sparse, and other variants. In M.I. Jordan, editor, Learning in Graphical Models, 355–368. Kluwer Academic Publishers, Dordrecht, 1998.
Google Scholar
L. Neumeyer, A. Sankar, and V. Digalakis. A comparative study of speaker adaptation techniques. In European Conf. on Speech Communication and Technology (Eurospeech), 1127–1130, Madrid, Spain, 1995.
Google Scholar
A. Ng and M. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes. In Neural Information Processing Systems (NIPS), 14, Vancouver, Canada, December 2002.
Google Scholar
P. Olsen and R. Gopinath. Modeling inverse covariance matrices by basis expansion. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 945–948, 2002.
Google Scholar
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, Los Altos, 2nd printing edition, 1988.
Google Scholar
J. Pearl. Causality. Cambridge University Press, Cambridge, 2000.
MATH Google Scholar
L.R. Rabiner and B.H. Juang. An introduction to hidden Markov models. IEEE ASSP Mag., 1986.
Google Scholar
S. Roweis and Z. Ghahramani. A unifying review of linear Gaussian models. Neural Comput., 11:305–345, 1999.
Article Google Scholar
G. Saon. A non-linear speaker adaptation technique using kernel ridge regression. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 2006.
Google Scholar
F. Sha and L. K. Saul. Large margin Gaussian mixture modeling for phonetic classification and recognition. In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Tolouse, France, 2006.
Google Scholar
O. Siohan, T. Myrvol, and C. Lee. Structural maximum a posteriori linear regression for fast hmm adaptation. Comput. Speech Lang., 16:5–24, 2002.
Article Google Scholar
V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998.
MATH Google Scholar
G. Wahba. Spline Models for Observational Data, volume 59 of CBMS-NSF Regional Conference Series in Applied Mathematics. SIAM, Philadelphia, 1990.
Google Scholar
P.C. Woodland. Speaker adaptation: Techniques and challenges. In Proc. IEEE ASRU, 1999.
Google Scholar
P.C. Woodland and D. Povey. Large scale discriminative training for speech recognition. In ICSA ITRW ASR2000, 2000.
Google Scholar
S. Young. A review of large-vocabulary continuous-speech recognition. IEEE Signal Process. Mag., 13(5):45–56, September 1996.
Google Scholar
S. Young, J. Jansen, J. Odell, D. Ollason, and P. Woodland. The HTK Book. Entropic Labs and Cambridge University, 2.1 edition,1990’s.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, University of Washington, Seattle, WA, USA
Jeff Bilmes

Authors

Jeff Bilmes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Research Council Institute for Microstructural Sciences, Acoustics and Signal Processing Group, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
David Havelock
Department of Environmental Psychology, Osaka University Graduate School of Human Sciences, 1-2 Yamadaok Suita, Osaka, Japan
Sonoko Kuwano
Institute of Technical Acoustics, RWTH Aachen University, Aachen, Germany
Michael Vorländer

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bilmes, J. (2008). Gaussian Models in Automatic Speech Recognition. In: Havelock, D., Kuwano, S., Vorländer, M. (eds) Handbook of Signal Processing in Acoustics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-30441-0_29

Download citation

DOI: https://doi.org/10.1007/978-0-387-30441-0_29
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-77698-9
Online ISBN: 978-0-387-30441-0
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)

Publish with us

Policies and ethics