Skip to main content

Hidden Markov Models for Speech Recognition — Strengths and Limitations

  • Conference paper
Speech Recognition and Understanding

Part of the book series: NATO ASI Series ((NATO ASI F,volume 75))

Abstract

The use of hidden Markov models for speech recognition has become predominant for the last several years, as evidenced by the number of published papers and talks at major speech conferences. The reasons why this method has become so popular are the inherent statistical (mathematically precise) framework, the ease and availability of training algorithms for estimating the parameters of the models from finite training sets of speech data, the flexibility of the resulting recognition system where one can easily change the size, type, or architecture of the models to suit particular words, sounds etc., and the ease of implementation of the overall recognition system. However, although hidden Markov model technology has brought speech recognition system performance to new high levels for a variety of applications, there remain some fundamental areas where aspects of the theory are either inadequate for speech, or for which the assumptions that are made do not apply. Examples of such areas range from the fundamental modeling assumption, i.e. that a maximum likelihood estimate of the model parameters provides the best system performance, to issues involved with inadequate training data which leads to the concepts of parameter tying across states, deleted interpolation and other smoothing methods, etc. Other aspects of the basic hidden Markov modeling methodology which are still not well understood include; ways of integrating new features (e.g. prosodic versus spectral features) into the framework in a consistent and meaningful way; the way to properly model sound durations (both within a state and across states of a model); the way to properly use the information in state transitions; and finally the way in which models can be split or clustered as warranted by the training data. It is the purpose of this paper to examine each of these strengths and limitations and discuss how they affect overall performance of a typical speech recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allen, J.B. and Rabiner, L.R.: “A unified theory of short-time spectrum analysis and synthesis,” Proc. IEEE, vol. 65, no. 11, pp. 1558–1564, Nov. 1977.

    Article  Google Scholar 

  2. Atal, B.S. and Hanauer, S.L.: “Speech analysis and synthesis by linear prediction of the speech wave,” J. Acoust. Soc. Am., vol. 50, no. 2, pp. 637–655, Aug. 1971.

    Article  Google Scholar 

  3. Bahl, L.R., Brown, P.E., de Souza, P.V. and Mercer, R.L.: “A new algorithm for the estimation of hidden Markov model parameters,” Proc. ICASSP-88, pp. 493–496, New York, Apr. 1988.

    Google Scholar 

  4. Bahl, L.R., Jelinek, F. and Mercer, R. L.: “A maximum likelihood approach to continuous speech recognition,” IEEE Trans. Pattern Analysis Machine Intell., vol. PAMI-5, pp. 179–190, 1983.

    Google Scholar 

  5. Bakis, R.: “Continuous speech word recognition via centisecond acoustic states,” in Proc. ASA Meeting (Washington, DC), Apr. 1976.

    Google Scholar 

  6. Baum, L.E.: “An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes,” Inequalities, vol. 3, pp. 1–8, 1972.

    Google Scholar 

  7. Baum, L.E., Petrie, T., Soules, G. and Weiss, N.: “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Ann. Math. Stat., vol. 41, no. 1, pp. 164–171, 1970.

    Article  MATH  MathSciNet  Google Scholar 

  8. Bocchieri, E.L. and Doddington, G.R.: “Frame-specific statistical features for speaker independent speech recognition,” IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-34, no. 4, pp. 755–764, Aug. 1986.

    Google Scholar 

  9. Breiman, L. et al.: Classification and regression trees, Wadsworth, Monterey, CA 1984.

    MATH  Google Scholar 

  10. Cadzow, J.A.: “ARMA modeling of time series,” IEEE Trans. Pattern Anal. Machine Intell, vol. PAMI-4, Mar. 1982.

    Google Scholar 

  11. Chow, Y.L. et al.: “BYBLOS: The BBN continuous speech recognition system,” in Proc. ICASSP’87 (Dallas, TX), Paper 3.7.1, pp. 89–92, Apr. 1987.

    Google Scholar 

  12. Ferguson, J.D.: “Hidden Markov Analysis: An Introduction,” in Hidden Markov Models for Speech, Institute for Defense Analysis, Princeton, NJ, Oct. 1980.

    Google Scholar 

  13. Forney, G.D.: “The Viterbi algorithm,” Proc. IEEE, vol. 61, pp. 268–278, Mar. 1973.

    Article  MathSciNet  Google Scholar 

  14. Furui, S.: “Speaker independent isolated word recognition based on dynamics emphasized cepstrum,” Trans. IECE Japan, vol. 69, no. 12, pp. 1310–1317, Dec. 1986.

    Google Scholar 

  15. Gupta, V.N., Lennig, M. and Mermelstein, P.: “Intergration of acoustic information in a large vocabulary word recognizer,” in Conf. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 697–700, Apr. 1987.

    Google Scholar 

  16. Huang, X. and Jack, M.A.: “Unified techniques for vector quantization and hidden Markov models using semi-continuous models,” Proc. ICASSP-89, pp. 639–642, Glassgow, Scotland, May 1989.

    Google Scholar 

  17. Jelinek, F.: “A fast sequential decoding algorithm using a stack,” IBM J. Res. Develop., vol. 13, pp. 675–685, Nov. 1969.

    Article  MATH  MathSciNet  Google Scholar 

  18. Jelinek, F.: “Continuous speech recognition by statistical methods,” Proc. IEEE, vol. 64, pp. 532–536, Apr. 1976.

    Article  Google Scholar 

  19. Juang, B.H. and Rabiner, L.R.: “The segmental k-means algorithm for estimating parameters of hidden Markov models,” IEEE Trans. Acoust. Speech Signal Processing, vol. 38, no. 9, pp. 1639–1641, Sept. 1990.

    Article  MATH  Google Scholar 

  20. Lee, C.H, Rabiner, L.R., Pieraccini, R. and Wilpon, J.G.: “Acoustic Modeling for Large Vocabulary Speech Recognition,” Computer Speech and Language, vol. 4, no. 2, pp. 127–166, April 1990.

    Article  Google Scholar 

  21. Lee, K.F.: “Automatic Speech Recognition, The Development of the SPHINX System,” Kluwer Academic Publishers, Boston, 1989.

    Google Scholar 

  22. Levinson, S.E.: “Continuously variable duration hidden Markov models for automatic speech recognition,” Computer, Speech and Language, vol. 1, no. 1, pp. 29–45, Mar. 1986.

    Article  Google Scholar 

  23. Levinson, S.E., Rabiner, L.R. and Sondhi, M.M.: “An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition,” Bell Syst. Tech. J., vol. 62, no. 4, pp. 1035–1074, Apr. 1983.

    MATH  MathSciNet  Google Scholar 

  24. Linde, Y., Buzo, A. and Gray, R.M.: “An algorithm for vector quantizer design,” IEEE Trans. Comm., COM-28, pp. 84–95, Jan. 1980.

    Article  Google Scholar 

  25. Liporace, L.A.: “Maximum likelihood estimation for multivariate observations of Markov sources,” IEEE Trans. Informat. Theory, vol. IT-28, no. 5, pp. 729–734, 1982.

    Article  MathSciNet  Google Scholar 

  26. Lowerre, B. and Reddy, R.: “The HARPY speech understanding system,” in Trends in Speech Recognition, W. Lea, Editor. Englewood Cliffs, NJ: Prentice-Hall, 1980, pp. 340–346.

    Google Scholar 

  27. Makhoul, J.: “Linear Prediction: A tutorial review,” Proc. IEEE, vol. 63, pp. 561–580, 1975.

    Article  Google Scholar 

  28. Markel, J.D. and Gray, A.H. Jr.: Linear Prediction of Speech, Springer-Verlag, New York, 1976.

    Book  MATH  Google Scholar 

  29. Nadas, A., Nahamoo, D. and Picheny, M.A.: “On a model-robust training method for speech recognition,” IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-36, no. 9, pp. 1432–1436, Sept 1988.

    Article  Google Scholar 

  30. Paul, D.B.: “Training of HMM recognizers by simulated annealing,” Proc. ICASSP-85, pp. 13–16, Tampa, Florida, March 1985.

    Google Scholar 

  31. Poritz, A.B. and Richter, A.G.: “On hidden Markov models in isolated word recognition,” in Proc. ICASSP’86 (Tokyo, Japan), pp. 705–708, Apr. 1986.

    Google Scholar 

  32. Rabiner, L.R.: “On the application of energy contours to the recognition of connected word sequences,” AT&T Bell Labs Tech. J., 63, pp. 1981–95, Dec. 1984.

    Google Scholar 

  33. Rabiner, L.R.: “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989.

    Article  Google Scholar 

  34. Rabiner, L.R., Lee, C.H., Juang, B.H. and Wilpon, J.G.: “HMM clustering for connected word recognition.” Proc. ICASSP-89, pp. 405–408, Glassgow, Scotland, May 1989.

    Google Scholar 

  35. Rabiner L.R., Levinson, S.E. and Sondhi, M.M.: “On the application of vector quantization and hidden Markov models to speaker-independent isolated word recognition,” Bell Syst. Tech. J., vol. 62, no. 4, pp. 1075–1105, Apr. 1983.

    MathSciNet  Google Scholar 

  36. Russell, M.J. and Moore, R.K.: “Explicit modeling of state occupancy in hidden Markov models for automatic speech recognition,” in Proc. ICASSP’85 (Tampa, FL), pp. 5–8, Mar. 1985.

    Google Scholar 

  37. Wellenkens, C: “Explicit time correlation in hidden Markov models for speech recognition,” Proc ICASSP, pp. 384–386, 1987.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rabiner, L.R., Juang, B.H. (1992). Hidden Markov Models for Speech Recognition — Strengths and Limitations. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-76626-8_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-76628-2

  • Online ISBN: 978-3-642-76626-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics