Adaptive Learning in Acoustic and Language Modeling

Lee, Chin-Hui; Gauvain, Jean-Luc

doi:10.1007/978-3-642-57745-1_2

Chin-Hui Lee² &
Jean-Luc Gauvain³

Part of the book series: NATO ASI Series ((NATO ASI F,volume 147))

230 Accesses

Abstract

We present a mathematical framework for Bayesian adaptive learning of the parameters of stochastic models. Maximum a posteriori (MAP) estimation algorithms are developed for hidden Markov models and for a number of useful models commonly used in automatic speech recognition and natural language processing. The MAP formulation offers a way to combine existing prior knowledge and a small set of newly acquired task-specific data in an optimal manner. It is therefore ideal for adaptive learning applications such as speaker and task adaptation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Acero and R. Stern, “Environmental Robustness in Automatic Speech Recognition,” Proc. ICASSP-90, Albuquerque, pp.849–852, April 1990.
Google Scholar
L. E. Baum, T. Petrie, G. Soules and N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Annal Math. Stat., Vol. 41, pp. 164–171, 1970.
Article MathSciNet MATH Google Scholar
J. R. Bellegarda and D. Nahamoo, “Tied Mixture Continuous Parameter Modeling for Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. 38, No. 12, pp. 2033–2045, December 1990.
Article Google Scholar
S. J. Cox and J. S. Bridle, “Unsupervised Speaker Adaptation by Probabilistic Fitting,” Proc. ICASSP-89, Glasgow, pp. 294–297, May 1989.
Google Scholar
M. DeGroot, Optimal Statistical Decisions, McGraw-Hill, 1970.
Google Scholar
A. Dempster, N. Laird and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM algorithm,” J. Roy. Statist. Soc. Ser. B, 39, pp. 1–38, 1977.
MathSciNet MATH Google Scholar
S. Della Pietra, V Della Pietra, R. L. Mercer and S. Roukos, “Adaptive Language Modeling Using Minimum Discriminant Estimation,” Proc. ICASSP-92, San Francisco, pp. 633–636, April 1992.
Google Scholar
R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, New York, 1973.
MATH Google Scholar
S. Furui, “Unsupervised Speaker Adaptation Method Based on Hierarchical Spectral Clustering” Proc. ICASSP-89, Glasgow, pp. 286–289, May 1989.
Google Scholar
J.-L. Gauvain and C.-H. Lee, “Bayesian Learning for Hidden Markov Models With Gaussian Mixture State Observation Densities,” Speech Communication, Vol. 11, Nos. 2-3, pp. 205–214, 1992.
Article Google Scholar
J.-L. Gauvain and C.-H. Lee, “Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” to appear in IEEE Trans. Speech and Audio Processing, Vol. 2, No. 2, 1994.
Google Scholar
I. J. Good, “The Population Frequencies of Species and the Estimation of Population Parameters,” Biometrika, Vol. 40, pp. 237–264, 1953.
MathSciNet MATH Google Scholar
H. Hattori and S. Sagayama, “Vector Field Smoothing Principle for Speaker Adaptation,” Proc. ICSLP-92, Banff, pp. 381–384, October 1992.
Google Scholar
C. T. Hemphill, J. J. Godfrey and G. D. Doddington, “The ATIS Spoken Language System Pilot Corpus”, Proc. DARPA Speech and Natural Language Workshop, Hidden Valley, PA, 1990.
Google Scholar
H.-W. Hon, “Vocabulary-Independent Speech Recognition: The VOCIND System,” Ph. D. Thesis, School of Computer Science, Carnegie Mellon University, March 1992.
Google Scholar
X. Huang and M. A. Jack, “Semi-continuous hidden Markov models for speech signal,” Computer, Speech and Language, Vol. 3, No. 3, pp. 239–251, 1989.
Article Google Scholar
Q. Huo, C. Chan and C.-H. Lee, “Bayesian Learning of the Parameters of Discrete and Tied Mixture HMMs for Speech Recognition,” Proc. EuroSpeech-93, Berlin, pp. 1567–1570, September 1992.
Google Scholar
F. Jelinek and R. L. Mercer, “Interpolated Estimation of Markov Source Parameters from Sparse Data,” in Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, Editors, North-Holland Publishing Co., Amsterdam, 1980.
Google Scholar
F. Jelinek, “The Development of An Experimental Discrete Dictation Recognizer,” Proc. IEEE 73, pp. 1616–1624, 1985.
Article Google Scholar
F. Jelinek, B. Merialdo, S. Roukos and M. Strauss, “A Dynamic Language Model For Speech Recognition,” Proc. DARPA Speech and Natural Language Workshop, Pacific Grove, pp. 293–295, 1991.
Google Scholar
B.-H. Juang, “Maximum-Likelihood Estimation for Mixture Multivariate Stochastic Observations of Markov Chains “ AT&T Technical Journal, Vol. 64, No. 6, 1985.
Google Scholar
S. M. Katz, “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. 35, No. 3, pp. 400–401, 1987.
Article Google Scholar
R. Lau, R. Rosenfield and S. Roukos,“Trigger-Based Language Models: A Maximum Entropy Approach,” Proc. ICASSP-93, Minneapolis, pp. 11–45-48, 1993.
Google Scholar
C.-H. Lee, C.-H. Lin and B.-H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models” IEEE Trans, on ASSP, Vol. ASSP-39, No. 4, pp. 806–814, April 1991.
Google Scholar
C.-H. Lee, E. Giachin, L. R. Rabiner, R. Pieraccini and A. E. Rosenberg, “Improved acoustic modeling for large vocabulary continuous speech recognition,” Computer Speech and Language, Vol. 6, No. 2, pp. 103–127, April 1992.
Article Google Scholar
C.-H. Lee and J.-L. Gauvain, “Speaker Adaptation Based on MAP Estimation of HMM Parameters,” Proc. IEEE ICASSP-93, Minneapolis, pp. 11–652-655, April 1993.
Google Scholar
K.-F. Lee, Automatic Speech Recognition-The Development of the SPHINX-System, Kluwer Academic Publishers, Boston, 1989.
Google Scholar
L. R. Liporace, “Maximum Likelihood Estimation for Multivariate Observations of Markov Sources,” IEEE Trans. Inform. Theory, vol. IT-28, No. 5, pp. 729–734, 1982.
Article MathSciNet Google Scholar
T. Matsuoka and C.-H. Lee, “A Study of On-line Bayesian Adaptation for HMM-Based Speech Recognition,” Proc. EuroSpeech-93, Berlin, pp. 815–818, September 1993.
Google Scholar
N. Merhav and C.-H. Lee, “A Minimax Classification Approach with Application to Robust Speech Recognition,” IEEE Trans. Speech and Audio Processing, Vol. 1, No. 1, pp. 90–100, January 1993.
Article Google Scholar
P. Placeway, R. Schwartz, P. Fung and L. Nguyen, “The Estimation of Powerful Language Models from Small and Large Corpora“ Proc. ICASSP-93, Minneapolis, Vol. 2, pp. 33–36, 1993.
Google Scholar
P. J. Price, W. Fisher, J. Bernstein, and D. Palicu, “A Database for Continuous Speech Recognition in a 1000-Word Domain,” Proc. ICASSP-88, New York, pp. 651–654, April 1988.
Google Scholar
L. R. Rabiner, J. G. Wilpon and B.-H. Juang, “A Segmental K-Means Training Procedure for Connected Word Recognition,” AT&T Tech. Journal, Vol. 65, pp. 21–31, 1986.
Google Scholar
L. R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
Google Scholar
R. A. Redner and H. F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Review, vol. 26, no. 2, pp. 195–239, April 1984.
Article MathSciNet MATH Google Scholar
H. Robbins, “The Empirical Bayes Approach to Statistical Decision Problems,” Ann. Math. Statist., vol. 35, pp. 1–20, 1964.
Article MathSciNet MATH Google Scholar
R. Schwartz and F. Kubala, “Hidden Markov Models and Speaker Adaptation,” in Speech Recognition and Understanding-Recent Advances, Trends and Applications, edited by P. Laface and R. De Mori, NATO ASI Series F75, pp. 31–57, 1991.
Google Scholar
Y. Zhao, “A New Speaker Adaptation Technique Using Very Short Calibration Speech,” Proc. ICASSP-93, Minneapolis, pp. 11–592-595, April 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Research Department, AT&T Bell Laboratories, 600 Mountain Avenue, NJ, 07974, USA
Chin-Hui Lee
Speech Communication Group, LIMSI/CNRS, B. P. 133, 91403, Orsay Cedex, Paris, France
Jean-Luc Gauvain

Authors

Chin-Hui Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Gauvain
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electronics and Technology of Computers Faculty of Sciences, University of Granada, E-18071, Granada, Spain
Antonio J. Rubio Ayuso & Juan M. López Soler &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, CH., Gauvain, JL. (1995). Adaptive Learning in Acoustic and Language Modeling. In: Ayuso, A.J.R., Soler, J.M.L. (eds) Speech Recognition and Coding. NATO ASI Series, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-57745-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-57745-1_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-63344-7
Online ISBN: 978-3-642-57745-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics