Skip to main content

Bayesian Adaptive Learning and Map Estimation of HMM

  • Chapter
Automatic Speech and Speaker Recognition

Abstract

A mathematical framework for Bayesian adaptive learning of the parameters of stochastic models is presented. Maximum a posteriori (MAP) estimation algorithms are then developed for hidden Markov models and for a number of useful parametric densities commonly used in automatic speech recognition and natural language processing. The MAP formulation offers a way to combine existing prior knowledge and a small set of newly acquired task-specific data in an optimal manner. Other techniques can also be combined with Bayesian learning to improve adaptation efficiency and effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Acero and R. Stern, “Environmental Robustness in Automatic Speech Recognition,” Proc. ICASSP-90, Albuquerque, pp.849–852, April 1990.

    Google Scholar 

  2. L. E. Baum, T. Pétrie, G. Soules and N. Weiss, “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Annal Math. Stat., Vol. 41, pp. 164–171, 1970.

    Article  MATH  Google Scholar 

  3. J. R. Bellegarda, P. V. De Sousa, A. J. Nadas, D. Nahamoo, M. A. Picheny, and L. R. Bahl, “The Metamorphic Algorithm: A Speaker Mapping Approach to Data Augmentation,” IEEE Trans. Speech and Audio Processing, Vol. 2, No. 3, pp. 413–420, July 1994.

    Article  Google Scholar 

  4. J. R. Bellegarda and D. Nahamoo, “Tied Mixture Continuous Parameter Modeling for Speech Recognition,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. 38, No. 12, pp. 2033–2045, December 1990.

    Article  Google Scholar 

  5. S. J. Cox and J. S. Bridle, “Unsupervised Speaker Adaptation by Probabilistic Fitting,” Proc. ICASSP-89, Glasgow, pp. 294–297, May 1989.

    Google Scholar 

  6. S. J. Cox, “Speaker Adaptation Using a Predictive Model,” Proc. EuroSpeech-93, Berlin, Vol. 3, pp. 2283–2286, Sept 1993.

    Google Scholar 

  7. M. DeGroot, Optimal Statistical Decisions, McGraw-Hill, 1970.

    MATH  Google Scholar 

  8. A. Dempster, N. Laird and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM algorithm,” J. Roy. Statist. Soc. Ser. B, 39, pp. I–38, 1977.

    MathSciNet  MATH  Google Scholar 

  9. V. Digalakis and L. Neumeyer, “Speaker Adaptation Using Combined Transformation and Bayesian Methods,” Proc. ICASSP-95, pp. I–680–683, May 1995.

    Google Scholar 

  10. R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, New York, 1973.

    MATH  Google Scholar 

  11. S. Furui, “A Training Procedure for Isolated Word Recognition Systems,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. 28, No. 2, pp. 129–136, 1980.

    Article  MathSciNet  Google Scholar 

  12. S. Furui, “Unsupervised Speaker Adaptation Method Based on Hierarchical Spectral Clustering,” Proc. ICASSP-89, Glasgow, pp. 286–289, May 1989.

    Google Scholar 

  13. J.-L. Gauvain and C.-H. Lee, “Bayesian Learning for Hidden Markov Models with Gaussian Mixture State Observation Densities,” Speech Communication, Vol. 11, Nos. 2–3, pp. 205–214, 1992.

    Article  Google Scholar 

  14. J.-L. Gauvain and C.-H. Lee, “Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observations of Markov Chains,” EEE Trans. Speech and Audio Processing, Vol. 2, No. 2, pp. 291–298, April 1994.

    Article  Google Scholar 

  15. I. J. Good, “The Population Frequencies of Species and the Estimation of Population Parameters,” Biometrika, Vol.40, pp. 237–264, 1953.

    MathSciNet  MATH  Google Scholar 

  16. H.-W. Hon, “Vocabulary-Independent Speech Recognition: The VOCIND System,” Ph. D. Thesis, School of Computer Science, Carnegie Mellon University, March 1992.

    Google Scholar 

  17. X. Huang and M. A. Jack, “Semi-continuous hidden Markov models for speech signal,” Computer, Speech and Language, Vol. 3, No. 3, pp. 239–251, 1989.

    Article  Google Scholar 

  18. Q. Huo, C. Chan and C.-H. Lee, “Bayesian Adaptive Learning of the Parameters of Hidden Markov Models for Speech Recognition,” IEEE Trans. on Audio and Speech Processing, Vol. 3, No. 5, pp. 334–345, Sept. 1995.

    Article  Google Scholar 

  19. Q. Huo, C. Chan and C.-H. Lee, “On-Line Adaptation of the SCHMM Parameters Based on the Segmental Quasi-Bayes Learning for Speech Recognition,” to appear in IEEE Trans. Speech and Audio Processing.

    Google Scholar 

  20. F. Jelinek and R. L. Mercer, “Interpolated Estimation of Markov Source Parameters from Sparse Data,” in Pattern Recognition in Practice, E. S. Gelsema and L. N. Kanal, Editors, North-Holland Publishing Co., Amsterdam, 1980.

    Google Scholar 

  21. F. Jelinek, “The Development of An Experimental Discrete Dictation Recognizer,” Proc. IEEE 73, pp. 1616–1624, 1985.

    Article  Google Scholar 

  22. B.-H. Juang, “Maximum-Likelihood Estimation for Mixture Multivariate Stochastic Observations of Markov Chains,” AT&T Technical Journal, Vol. 64, No. 6, 1985.

    Google Scholar 

  23. S. M. Katz, “Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. 35, No. 3, pp. 400–401, 1987.

    Article  Google Scholar 

  24. C.-H. Lee, C.-H. Lin and B.-H. Juang, “A Study on Speaker Adaptation of the Parameters of Continuous Density Hidden Markov Models,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-39, No. 4, pp. 806–814, April 1991.

    Google Scholar 

  25. C.-H. Lee, E. Giachin, L. R. Rabiner, R. Pieraccini and A. E. Rosenberg, “Improved acoustic modeling for large vocabulary continuous speech recognition,” Computer Speech and Language, Vol. 6, No. 2, pp. 103–127, April 1992.

    Article  Google Scholar 

  26. C.-H. Lee and J.-L. Gauvain, “Speaker Adaptation Based on MAP Estimation of HMM Parameters,” Proc. IEEE ICASSP-93, pp. 11–652–655, April 1993.

    Google Scholar 

  27. K.-F. Lee, Automatic Speech Recognition — The Development of the SPHINX-System, Kluwer Academic Publishers, Boston, 1989.

    Google Scholar 

  28. C. J. Leggetter and P. C. Woodland, “Speaker Adaptation of Continuous Density HMMs Using Linear Regression,” Proc. ICSLP-94, Yokohama, 1994.

    Google Scholar 

  29. L. R. Liporace, “Maximum Likelihood Estimation for Multivariate Observations of Markov Sources,” IEEE Trans. Inform. Theory, vol. IT-28, No. 5, pp. 729–734, 1982.

    Article  MathSciNet  Google Scholar 

  30. T. Matsuoka and C.-H. Lee, “A Study of On-line Bayesian Adaptation for HMM-Based Speech Recognition,” Proc. EuroSpeech-93, Berlin, pp. 815–818, 1993.

    Google Scholar 

  31. N. Merhav and C.-H. Lee, “A Minimax Classification Approach with Application to Robust Speech Recognition,” IEEE Trans. Speech and Audio Processing, Vol. 1, No. 1, pp. 90–100, January 1993.

    Article  Google Scholar 

  32. S. Moon and J.-N. Hwang, “Noisy Speech Recognition Using Robust Inversion of Hidden Markov Models,” Proc. ICASSP-95, pp. I–145–148, May 1995.

    Google Scholar 

  33. K. Ohkura, M. Sugiyama and S. Sagayama, “Speaker Adaptation Based on Transfer Vector Field Smoothing with Continuous Mixture Density HMMs,” Proc. ICSLP-92, Banff, pp. 369–372, October 1992.

    Google Scholar 

  34. P. Placeway, R. Schwartz, P. Fung and L. Nguyen, “The Estimation of Powerful Language Models from Small and Large Corpora,” Proc. ICASSP-93, Vol. 2, pp. 33–36, 1993.

    Google Scholar 

  35. L. R. Rabiner, J. G. Wilpon and B.-H. Juang, “A Segmental K-Means Training Procedure for Connected Word Recognition,” AT&T Tech. Journal, Vol. 65, pp. 21–31, 1986.

    Google Scholar 

  36. L. R. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.

    Google Scholar 

  37. R. A. Redner and H. F. Walker, “Mixture Densities, Maximum Likelihood and the EM Algorithm,” SIAM Review, vol. 26, no. 2, pp. 195–239, April 1984.

    Article  MathSciNet  MATH  Google Scholar 

  38. H. Robbins, “The Empirical Bayes Approach to Statistical Decision Problems,” Ann. Math. Statist, vol. 35, pp. I–20, 1964.

    Article  MathSciNet  MATH  Google Scholar 

  39. A. Sankar and C.-H. Lee, “A Maximum-Likelihood Approach to Stochastic Matching for Robust Speech Recognition,” to appear in IEEE Trans. Speech and Audio Processing.

    Google Scholar 

  40. R. Schwartz and F. Kubala, “Hidden Markov Models and Speaker Adaptation,” in Speech Recognition and Understanding — Recent Advances, Trends and Applications, edited by P. Laface and R. De Mori, NATO ASI Series F75, pp. 31–57, 1991.

    Google Scholar 

  41. J.-I. Takahashi and S. Sagayama, “Vector-Field-Smoothed Bayesian Learning for Incremental Speaker Adaptation,” Proc. ICASSP-95, pp. I–696–699, May 1995.

    Google Scholar 

  42. M. Tonomura, T. Kosaka and S. Matsunaga, “Speaker Adaptation Based on Transfer Vector Field Smoothing Using Maximum A Posteriori Probability Estimation,” Proc. ICASSP-95, pp. I–688–691, May 1995.

    Google Scholar 

  43. G. Zavaliagkos, R. Schwartz and J. Makhoul, “Batch, Incremental and Instantaneous Adaptation Techniques for Speech Recognition,” Proc. ICASSP-95, pp. I–676–679, May 1995.

    Google Scholar 

  44. Y. Zhao, “A New Speaker Adaptation Technique Using Very Short Calibration Speech,” Proc. ICASSP-93, pp. 11–592–595, April 1993.

    Google Scholar 

  45. Y. Zhao, “Iterative Self-Learning Speaker and Channel Adaptation Under Various Initial Conditions,” Proc. ICASSP-95, pp. I–712–715, May 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Kluwer Academic Publishers

About this chapter

Cite this chapter

Lee, CH., Gauvain, JL. (1996). Bayesian Adaptive Learning and Map Estimation of HMM. In: Lee, CH., Soong, F.K., Paliwal, K.K. (eds) Automatic Speech and Speaker Recognition. The Kluwer International Series in Engineering and Computer Science, vol 355. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-1367-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-1367-0_4

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4612-8590-8

  • Online ISBN: 978-1-4613-1367-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics