Hidden Markov Models for Speech Recognition — Strengths and Limitations

Rabiner, L. R.; Juang, B. H.

doi:10.1007/978-3-642-76626-8_1

L. R. Rabiner³ &
B. H. Juang³

Part of the book series: NATO ASI Series ((NATO ASI F,volume 75))

328 Accesses
11 Citations

Abstract

The use of hidden Markov models for speech recognition has become predominant for the last several years, as evidenced by the number of published papers and talks at major speech conferences. The reasons why this method has become so popular are the inherent statistical (mathematically precise) framework, the ease and availability of training algorithms for estimating the parameters of the models from finite training sets of speech data, the flexibility of the resulting recognition system where one can easily change the size, type, or architecture of the models to suit particular words, sounds etc., and the ease of implementation of the overall recognition system. However, although hidden Markov model technology has brought speech recognition system performance to new high levels for a variety of applications, there remain some fundamental areas where aspects of the theory are either inadequate for speech, or for which the assumptions that are made do not apply. Examples of such areas range from the fundamental modeling assumption, i.e. that a maximum likelihood estimate of the model parameters provides the best system performance, to issues involved with inadequate training data which leads to the concepts of parameter tying across states, deleted interpolation and other smoothing methods, etc. Other aspects of the basic hidden Markov modeling methodology which are still not well understood include; ways of integrating new features (e.g. prosodic versus spectral features) into the framework in a consistent and meaningful way; the way to properly model sound durations (both within a state and across states of a model); the way to properly use the information in state transitions; and finally the way in which models can be split or clustered as warranted by the training data. It is the purpose of this paper to examine each of these strengths and limitations and discuss how they affect overall performance of a typical speech recognition system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allen, J.B. and Rabiner, L.R.: “A unified theory of short-time spectrum analysis and synthesis,” Proc. IEEE, vol. 65, no. 11, pp. 1558–1564, Nov. 1977.
Article Google Scholar
Atal, B.S. and Hanauer, S.L.: “Speech analysis and synthesis by linear prediction of the speech wave,” J. Acoust. Soc. Am., vol. 50, no. 2, pp. 637–655, Aug. 1971.
Article Google Scholar
Bahl, L.R., Brown, P.E., de Souza, P.V. and Mercer, R.L.: “A new algorithm for the estimation of hidden Markov model parameters,” Proc. ICASSP-88, pp. 493–496, New York, Apr. 1988.
Google Scholar
Bahl, L.R., Jelinek, F. and Mercer, R. L.: “A maximum likelihood approach to continuous speech recognition,” IEEE Trans. Pattern Analysis Machine Intell., vol. PAMI-5, pp. 179–190, 1983.
Google Scholar
Bakis, R.: “Continuous speech word recognition via centisecond acoustic states,” in Proc. ASA Meeting (Washington, DC), Apr. 1976.
Google Scholar
Baum, L.E.: “An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes,” Inequalities, vol. 3, pp. 1–8, 1972.
Google Scholar
Baum, L.E., Petrie, T., Soules, G. and Weiss, N.: “A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains,” Ann. Math. Stat., vol. 41, no. 1, pp. 164–171, 1970.
Article MATH MathSciNet Google Scholar
Bocchieri, E.L. and Doddington, G.R.: “Frame-specific statistical features for speaker independent speech recognition,” IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-34, no. 4, pp. 755–764, Aug. 1986.
Google Scholar
Breiman, L. et al.: Classification and regression trees, Wadsworth, Monterey, CA 1984.
MATH Google Scholar
Cadzow, J.A.: “ARMA modeling of time series,” IEEE Trans. Pattern Anal. Machine Intell, vol. PAMI-4, Mar. 1982.
Google Scholar
Chow, Y.L. et al.: “BYBLOS: The BBN continuous speech recognition system,” in Proc. ICASSP’87 (Dallas, TX), Paper 3.7.1, pp. 89–92, Apr. 1987.
Google Scholar
Ferguson, J.D.: “Hidden Markov Analysis: An Introduction,” in Hidden Markov Models for Speech, Institute for Defense Analysis, Princeton, NJ, Oct. 1980.
Google Scholar
Forney, G.D.: “The Viterbi algorithm,” Proc. IEEE, vol. 61, pp. 268–278, Mar. 1973.
Article MathSciNet Google Scholar
Furui, S.: “Speaker independent isolated word recognition based on dynamics emphasized cepstrum,” Trans. IECE Japan, vol. 69, no. 12, pp. 1310–1317, Dec. 1986.
Google Scholar
Gupta, V.N., Lennig, M. and Mermelstein, P.: “Intergration of acoustic information in a large vocabulary word recognizer,” in Conf. Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 697–700, Apr. 1987.
Google Scholar
Huang, X. and Jack, M.A.: “Unified techniques for vector quantization and hidden Markov models using semi-continuous models,” Proc. ICASSP-89, pp. 639–642, Glassgow, Scotland, May 1989.
Google Scholar
Jelinek, F.: “A fast sequential decoding algorithm using a stack,” IBM J. Res. Develop., vol. 13, pp. 675–685, Nov. 1969.
Article MATH MathSciNet Google Scholar
Jelinek, F.: “Continuous speech recognition by statistical methods,” Proc. IEEE, vol. 64, pp. 532–536, Apr. 1976.
Article Google Scholar
Juang, B.H. and Rabiner, L.R.: “The segmental k-means algorithm for estimating parameters of hidden Markov models,” IEEE Trans. Acoust. Speech Signal Processing, vol. 38, no. 9, pp. 1639–1641, Sept. 1990.
Article MATH Google Scholar
Lee, C.H, Rabiner, L.R., Pieraccini, R. and Wilpon, J.G.: “Acoustic Modeling for Large Vocabulary Speech Recognition,” Computer Speech and Language, vol. 4, no. 2, pp. 127–166, April 1990.
Article Google Scholar
Lee, K.F.: “Automatic Speech Recognition, The Development of the SPHINX System,” Kluwer Academic Publishers, Boston, 1989.
Google Scholar
Levinson, S.E.: “Continuously variable duration hidden Markov models for automatic speech recognition,” Computer, Speech and Language, vol. 1, no. 1, pp. 29–45, Mar. 1986.
Article Google Scholar
Levinson, S.E., Rabiner, L.R. and Sondhi, M.M.: “An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition,” Bell Syst. Tech. J., vol. 62, no. 4, pp. 1035–1074, Apr. 1983.
MATH MathSciNet Google Scholar
Linde, Y., Buzo, A. and Gray, R.M.: “An algorithm for vector quantizer design,” IEEE Trans. Comm., COM-28, pp. 84–95, Jan. 1980.
Article Google Scholar
Liporace, L.A.: “Maximum likelihood estimation for multivariate observations of Markov sources,” IEEE Trans. Informat. Theory, vol. IT-28, no. 5, pp. 729–734, 1982.
Article MathSciNet Google Scholar
Lowerre, B. and Reddy, R.: “The HARPY speech understanding system,” in Trends in Speech Recognition, W. Lea, Editor. Englewood Cliffs, NJ: Prentice-Hall, 1980, pp. 340–346.
Google Scholar
Makhoul, J.: “Linear Prediction: A tutorial review,” Proc. IEEE, vol. 63, pp. 561–580, 1975.
Article Google Scholar
Markel, J.D. and Gray, A.H. Jr.: Linear Prediction of Speech, Springer-Verlag, New York, 1976.
Book MATH Google Scholar
Nadas, A., Nahamoo, D. and Picheny, M.A.: “On a model-robust training method for speech recognition,” IEEE Trans. Acoust. Speech Signal Processing, vol. ASSP-36, no. 9, pp. 1432–1436, Sept 1988.
Article Google Scholar
Paul, D.B.: “Training of HMM recognizers by simulated annealing,” Proc. ICASSP-85, pp. 13–16, Tampa, Florida, March 1985.
Google Scholar
Poritz, A.B. and Richter, A.G.: “On hidden Markov models in isolated word recognition,” in Proc. ICASSP’86 (Tokyo, Japan), pp. 705–708, Apr. 1986.
Google Scholar
Rabiner, L.R.: “On the application of energy contours to the recognition of connected word sequences,” AT&T Bell Labs Tech. J., 63, pp. 1981–95, Dec. 1984.
Google Scholar
Rabiner, L.R.: “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989.
Article Google Scholar
Rabiner, L.R., Lee, C.H., Juang, B.H. and Wilpon, J.G.: “HMM clustering for connected word recognition.” Proc. ICASSP-89, pp. 405–408, Glassgow, Scotland, May 1989.
Google Scholar
Rabiner L.R., Levinson, S.E. and Sondhi, M.M.: “On the application of vector quantization and hidden Markov models to speaker-independent isolated word recognition,” Bell Syst. Tech. J., vol. 62, no. 4, pp. 1075–1105, Apr. 1983.
MathSciNet Google Scholar
Russell, M.J. and Moore, R.K.: “Explicit modeling of state occupancy in hidden Markov models for automatic speech recognition,” in Proc. ICASSP’85 (Tampa, FL), pp. 5–8, Mar. 1985.
Google Scholar
Wellenkens, C: “Explicit time correlation in hidden Markov models for speech recognition,” Proc ICASSP, pp. 384–386, 1987.
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Research Department, AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill, NJ, 07974, USA
L. R. Rabiner & B. H. Juang

Authors

L. R. Rabiner
View author publications
You can also search for this author in PubMed Google Scholar
B. H. Juang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Torino, Italy
Pietro Laface
School of Computer Science, 3480 University St., Montreal, Quebec, H3A 2A7, Canada
Renato De Mori

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rabiner, L.R., Juang, B.H. (1992). Hidden Markov Models for Speech Recognition — Strengths and Limitations. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_1

Download citation

DOI: https://doi.org/10.1007/978-3-642-76626-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-76628-2
Online ISBN: 978-3-642-76626-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics