Abstract
The use of hidden Markov models for speech recognition has become predominant for the last several years, as evidenced by the number of published papers and talks at major speech conferences. The reasons why this method has become so popular are the inherent statistical (mathematically precise) framework, the ease and availability of training algorithms for estimating the parameters of the models from finite training sets of speech data, the flexibility of the resulting recognition system where one can easily change the size, type, or architecture of the models to suit particular words, sounds etc., and the ease of implementation of the overall recognition system. However, although hidden Markov model technology has brought speech recognition system performance to new high levels for a variety of applications, there remain some fundamental areas where aspects of the theory are either inadequate for speech, or for which the assumptions that are made do not apply. Examples of such areas range from the fundamental modeling assumption, i.e. that a maximum likelihood estimate of the model parameters provides the best system performance, to issues involved with inadequate training data which leads to the concepts of parameter tying across states, deleted interpolation and other smoothing methods, etc. Other aspects of the basic hidden Markov modeling methodology which are still not well understood include; ways of integrating new features (e.g. prosodic versus spectral features) into the framework in a consistent and meaningful way; the way to properly model sound durations (both within a state and across states of a model); the way to properly use the information in state transitions; and finally the way in which models can be split or clustered as warranted by the training data. It is the purpose of this paper to examine each of these strengths and limitations and discuss how they affect overall performance of a typical speech recognition system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allen, J.B. and Rabiner, L.R.: “A unified theory of short-time spectrum analysis and synthesis,” Proc. IEEE, vol. 65, no. 11, pp. 1558–1564, Nov. 1977.
Atal, B.S. and Hanauer, S.L.: “Speech analysis and synthesis by linear prediction of the speech wave,” J. Acoust. Soc. Am., vol. 50, no. 2, pp. 637–655, Aug. 1971.
Bahl, L.R., Brown, P.E., de Souza, P.V. and Mercer, R.L.: “A new algorithm for the estimation of hidden Markov model parameters,” Proc. ICASSP-88, pp. 493–496, New York, Apr. 1988.
Bahl, L.R., Jelinek, F. and Mercer, R. L.: “A maximum likelihood approach to continuous speech recognition,” IEEE Trans. Pattern Analysis Machine Intell., vol. PAMI-5, pp. 179–190, 1983.
Bakis, R.: “Continuous speech word recognition via centisecond acoustic states,” in Proc. ASA Meeting (Washington, DC), Apr. 1976.
Baum, L.E.: “An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes,” Inequalities, vol. 3, pp. 1–8, 1972.
Baum, L.E., Petrie, T., Soules, G. and Weiss, N.: “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Ann. Math. Stat., vol. 41, no. 1, pp. 164–171, 1970.
Bocchieri, E.L. and Doddington, G.R.: “Frame-specific statistical features for speaker independent speech recognition,” IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-34, no. 4, pp. 755–764, Aug. 1986.
Breiman, L. et al.: Classification and regression trees, Wadsworth, Monterey, CA 1984.
Cadzow, J.A.: “ARMA modeling of time series,” IEEE Trans. Pattern Anal. Machine Intell, vol. PAMI-4, Mar. 1982.
Chow, Y.L. et al.: “BYBLOS: The BBN continuous speech recognition system,” in Proc. ICASSP’87 (Dallas, TX), Paper 3.7.1, pp. 89–92, Apr. 1987.
Ferguson, J.D.: “Hidden Markov Analysis: An Introduction,” in Hidden Markov Models for Speech, Institute for Defense Analysis, Princeton, NJ, Oct. 1980.
Forney, G.D.: “The Viterbi algorithm,” Proc. IEEE, vol. 61, pp. 268–278, Mar. 1973.
Furui, S.: “Speaker independent isolated word recognition based on dynamics emphasized cepstrum,” Trans. IECE Japan, vol. 69, no. 12, pp. 1310–1317, Dec. 1986.
Gupta, V.N., Lennig, M. and Mermelstein, P.: “Intergration of acoustic information in a large vocabulary word recognizer,” in Conf. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 697–700, Apr. 1987.
Huang, X. and Jack, M.A.: “Unified techniques for vector quantization and hidden Markov models using semi-continuous models,” Proc. ICASSP-89, pp. 639–642, Glassgow, Scotland, May 1989.
Jelinek, F.: “A fast sequential decoding algorithm using a stack,” IBM J. Res. Develop., vol. 13, pp. 675–685, Nov. 1969.
Jelinek, F.: “Continuous speech recognition by statistical methods,” Proc. IEEE, vol. 64, pp. 532–536, Apr. 1976.
Juang, B.H. and Rabiner, L.R.: “The segmental k-means algorithm for estimating parameters of hidden Markov models,” IEEE Trans. Acoust. Speech Signal Processing, vol. 38, no. 9, pp. 1639–1641, Sept. 1990.
Lee, C.H, Rabiner, L.R., Pieraccini, R. and Wilpon, J.G.: “Acoustic Modeling for Large Vocabulary Speech Recognition,” Computer Speech and Language, vol. 4, no. 2, pp. 127–166, April 1990.
Lee, K.F.: “Automatic Speech Recognition, The Development of the SPHINX System,” Kluwer Academic Publishers, Boston, 1989.
Levinson, S.E.: “Continuously variable duration hidden Markov models for automatic speech recognition,” Computer, Speech and Language, vol. 1, no. 1, pp. 29–45, Mar. 1986.
Levinson, S.E., Rabiner, L.R. and Sondhi, M.M.: “An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition,” Bell Syst. Tech. J., vol. 62, no. 4, pp. 1035–1074, Apr. 1983.
Linde, Y., Buzo, A. and Gray, R.M.: “An algorithm for vector quantizer design,” IEEE Trans. Comm., COM-28, pp. 84–95, Jan. 1980.
Liporace, L.A.: “Maximum likelihood estimation for multivariate observations of Markov sources,” IEEE Trans. Informat. Theory, vol. IT-28, no. 5, pp. 729–734, 1982.
Lowerre, B. and Reddy, R.: “The HARPY speech understanding system,” in Trends in Speech Recognition, W. Lea, Editor. Englewood Cliffs, NJ: Prentice-Hall, 1980, pp. 340–346.
Makhoul, J.: “Linear Prediction: A tutorial review,” Proc. IEEE, vol. 63, pp. 561–580, 1975.
Markel, J.D. and Gray, A.H. Jr.: Linear Prediction of Speech, Springer-Verlag, New York, 1976.
Nadas, A., Nahamoo, D. and Picheny, M.A.: “On a model-robust training method for speech recognition,” IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-36, no. 9, pp. 1432–1436, Sept 1988.
Paul, D.B.: “Training of HMM recognizers by simulated annealing,” Proc. ICASSP-85, pp. 13–16, Tampa, Florida, March 1985.
Poritz, A.B. and Richter, A.G.: “On hidden Markov models in isolated word recognition,” in Proc. ICASSP’86 (Tokyo, Japan), pp. 705–708, Apr. 1986.
Rabiner, L.R.: “On the application of energy contours to the recognition of connected word sequences,” AT&T Bell Labs Tech. J., 63, pp. 1981–95, Dec. 1984.
Rabiner, L.R.: “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989.
Rabiner, L.R., Lee, C.H., Juang, B.H. and Wilpon, J.G.: “HMM clustering for connected word recognition.” Proc. ICASSP-89, pp. 405–408, Glassgow, Scotland, May 1989.
Rabiner L.R., Levinson, S.E. and Sondhi, M.M.: “On the application of vector quantization and hidden Markov models to speaker-independent isolated word recognition,” Bell Syst. Tech. J., vol. 62, no. 4, pp. 1075–1105, Apr. 1983.
Russell, M.J. and Moore, R.K.: “Explicit modeling of state occupancy in hidden Markov models for automatic speech recognition,” in Proc. ICASSP’85 (Tampa, FL), pp. 5–8, Mar. 1985.
Wellenkens, C: “Explicit time correlation in hidden Markov models for speech recognition,” Proc ICASSP, pp. 384–386, 1987.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rabiner, L.R., Juang, B.H. (1992). Hidden Markov Models for Speech Recognition — Strengths and Limitations. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-76626-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-76628-2
Online ISBN: 978-3-642-76626-8
eBook Packages: Springer Book Archive