Speaker Independent Continuous Speech Recognition Using Continuous Density Hidden Markov Models

Lee, Chin-Hui; Rabiner, Lawrence R.; Pieraccini, Roberto

doi:10.1007/978-3-642-76626-8_16

Chin-Hui Lee³,
Lawrence R. Rabiner³ &
Roberto Pieraccini³

Part of the book series: NATO ASI Series ((NATO ASI F,volume 75))

284 Accesses
2 Citations

Abstract

The field of large vocabulary continuous speech recognition has advanced to the point where there are several systems capable of providing greater than 95% word accuracy for speaker independent recognition, of a 1000 word vocabulary, spoken fluently for a task with a perplexity of about 60. There are several factors which account for the high performance achieved by these systems, including the use of effective feature analysis, the use of hidden Markov model (HMM) methodology, the use of context-dependent sub-word units to capture intra-word and inter-word phonemic variations, and the use of corrective training techniques to emphasize differences between acoustically similar words in the vocabulary. In this paper we describe a large vocabulary continuous speech recognition system developed at AT&T Bell Laboratories, and discuss the methods used to provide high word recognition accuracy. In particular we focus our discussion on the techniques adopted to select the set of fundamental speech units and to provide the acoustic models of these sub-word units based on a continuous density HMM (CDHMM) framework. Different modeling approaches, such as a discrete HMM and a tied-mixture HMM, will also be discussed and compared to the CDHMM approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bahl, L. R., Brown, P. F., de Souza, P. V. and Mercer, R. L.: Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition. Proc. ICASSP 86, pp. 49–52, Tokyo, Japan, April 1986.
Google Scholar
Bahl, L. R., Brown, P. F., de Souza, P. V. and Mercer, R. L.: A New Algorithm for the Estimation of Hidden Markov Model Parameters. Proc. ICASSP 88, pp. 493–496, New York, April 1988.
Google Scholar
Bellegarda, J. R. and Nahamoo, D.: Tied Mixture Continuous Parameter Models for Large Vocabulary Isolated Speech Recognition. Proc. ICASSP 89, pp. 13–16, Glasgow Scotland, May 1989.
Google Scholar
Ephraim, Y., Dembo, A. and Rabiner, L. R.: A Minimum Discrimination Information Approach for Hidden Markov Modeling. IEEE Trans. on Information Theory, Vol. IT-35, No. 5, pp. 1001–1013, Sept 1989.
Article MATH MathSciNet Google Scholar
Fissore, L., Laface, P., Micca, G. and Pieraccini, R.: Lexical Access to Large Vocabularies for Speech Recognition. IEEE Trans. on Acous., Speech and Signal Proc., pp. 1197–1213, Vol. ASSP-37, No. 8, August 1989.
Article Google Scholar
Giachin, E., Rosenberg, A. E. and Lee, C.-H.: Word Juncture Coarticulation Modeling Using Phonological Rules for HMM-based Continuous Speech Recognition. Proc. ICASSP 90, pp. 737–740, Albuquerque, NM, April 1990.
Google Scholar
Giachin, E., Lee, C.-H., Rabiner, L. R. and Pieraccini, R.: Word Juncture Modeling Using Inter-Word Context-Dependent Phone-Like Units: submitted for publication.
Google Scholar
Hon, H. W., Lee, K. F. and Weide, R.: Towards Speech Recognition Without Vocabulary Specific Training. Proc. EuroSpeech 89, pp. 481–484, Paris, France, September 1989.
Google Scholar
Huang, X. D. and Jack, M. A.: Semi-Continuous Hidden Markov Models for Speech Signals. Computer Speech, and Language, Vol. 3, No. 3, pp. 239–251, July 1989.
Article Google Scholar
Huang X. D., Alleva, F., Hayamizu, S., Hon, H. W., Hwang, M. Y. and Lee, K. F.: Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA, June 1990.
Google Scholar
Huang, E. F. and Soong, F. K.: A Fast Tree-Trellis Search for Finding the N-Best Sentence Hypotheses in Continuous Speech Recognition. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA, June 1990.
Google Scholar
Hwang, M. Y., Hon, H. W. and Lee, K. F.: Modelling between-Word Coarticulation in Continuous Speech Recognition. Proc. EuroSpeech 89, Paris, September 1989.
Google Scholar
Jelinek, F.: A Fast Sequential Decoding Algorithm Using A Stack. IBM J. Res. Develop., vol. 13, pp. 675–685, Nov. 1969.
Article MATH MathSciNet Google Scholar
Jelinek, F.: Continuous Speech Recognition by Statistical Methods. Proc. IEEE, Vol. 64, No. 4, pp. 532–536, April 1976.
Article Google Scholar
Jelinek, F. and Mercer, R. L.: Interpolated Estimation of Markov Source Parameters from Sparse Data. Pattern Recognition in Practice, E. S. Gelsema, and L. N. Kanal, Ed., North-Holland Publishing Co., Amsterdam, pp. 381–397, 1980.
Google Scholar
Jelinek, F.: The Development of an Experimental Discrete Dictation Recognizer. Proc. IEEE, Vol. 73, No. 11, pp. 1616–1624, November 1985.
Article Google Scholar
Juang, B.-H., Wong, D. Y. and Gray, A. H. Jr.: Distortion Performance of Vector Quantization for LPC Voice Coding. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. ASSP-30, pp. 294–304, April 1982.
Article Google Scholar
Juang, B.-H., Rabiner, L. R. and Wilpon, J. G.: On the Use of Bandpass Liftering in Speech Recognition. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. ASSP-35, No. 7, pp. 947–954, July 1987.
Article Google Scholar
Juang, B.-H. and Rabiner, L. R.: Issues in Using Hidden Markov Models for Speech Recognition. To appear in Advances in Speech Signal Processing, S. Furui and M. Sondhi editors, Marcel Dekker Inc., New York, 1990.
Google Scholar
Katagiri, S. and Lee, C.-H.: A New HMM/LVQ Hybrid Algorithm for Speech Recognition. To appear in Proc. GLOBECOM-90, San Diego, CA, December 1990.
Google Scholar
Katagiri, S., Lee, C.-H. and Juang, B.-H.: A Generalized Probability Descent Method. Proc. Acous. Soc. of Japan, Nagoya, Japan, Sept. 1990.
Google Scholar
Lee, C.-H., Soong, F. K. and Juang, B.-H.: A Segment Model Based Approach to Speech Recognition. Proc. ICASSP 88, New York, pp. 501–504, April 1988.
Google Scholar
Lee, C.-H., Juang, B.-H., Soong, F. K. and Rabiner, L. R.: Word Recognition Using Whole Word, and Subword Models. Proc. ICASSP 89, pp. 683–686, Glasgow, Scotland, May 1989.
Google Scholar
Lee, C.-H., Rabiner, L. R., Pieraccini, R. and Wilpon, J. G.: Acoustic Modeling for Large Vocabulary Speech Recognition. Computer Speech and Language, Vol. 4, pp. 127–165, 1990.
Article Google Scholar
Lee, C.-H., Lin, C.-H. and Juang, B.-H.: A Study on Speaker Adaptation of the for Continuous Density HMM Parameters. Proc. ICASSP 90, pp. 145–148, Albuquerque, April 1990.
Google Scholar
Lee, C.-H., Giachin, E., Rabiner, L. R., Pieraccini, R. and Rosenberg, A. E.: Improved Acoustic Modeling for Continuous Speech Recognition. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA., June 1990.
Google Scholar
Lee, K. F.: Automatic Speech Recognition — The Development of the SPHINX System, Kluwer Academic Publishers, Boston, 1989.
Google Scholar
Lee, K. F. and Mahajan, S.: Corrective and Reinforcement Learning for Speaker-Independent Continuous Speech Recognition. Proc. EuroSpeech 89, pp. 485–488, Paris, France, September 1989.
Google Scholar
Lee, K.-F. et al: Allophone Clustering for Continuous Speech Recognition. Proc. ICASSP 90, pp. 749–752, Albuquerque, NM, April 1990.
Google Scholar
Levinson, S. E.: Structural Methods in Automatic Speech Recognition. Proc. IEEE, Vol. 73, No. 11, pp. 1625–1650, Nov. 1985.
Article Google Scholar
Levision, S. E., Liberman, M. Y., Ljolje, A. and Miller, L. G.: Speaker Independent Phonetic Transcription of Fluent Speech for Large Vocabulary Speech Recognition. Proc. ICASSP 89, pp. 442–444, Glasgow, Scotland, May 1989.
Google Scholar
Ljolje, A., Ephraim, Y. and Rabiner, L. R.: Estimation of Hidden Markov Parameters by Minimizing Empirical Error Rate. Proc. ICASSP 90, pp. 709–712, Albuquerque, NM, April 1990.
Google Scholar
Lowerre, B. and Reddy, D. R.: The HARPY Speech Understanding System. Trends in Speech Recognition, W. Lee, Ed., Prentice-Hall Inc., pp. 340–346, 1980.
Google Scholar
Merhav, N. and Ephraim, Y.: Maximum Likelihood Hidden Markov Modeling Using a Dominant Sequence of States. Submitted for publication.
Google Scholar
Ney, H.: Acoustic-Phonetic Modeling Using Continuous Mixture Densities for the 991-Word DARPA Speech Recognition Task. Proc. ICASSP 90, pp. 713–716, Albuquerque, NM, April 1990.
Google Scholar
Pallett, D.: Test Procedures for the March 1987 DARPA Benchmark Tests. DARPA Speech Recognition Workshop, pp. 75–78, March 1987.
Google Scholar
Pallett, D.: Selected Test Material for the March 1987 DARPA Benchmark Tests. DARPA Speech Recognition Workshop, pp. 79–81, March 1987.
Google Scholar
Paul, D.: The Lincoln Continuous Speech Recognition System: Recent Development and Results. Proc. DARPA Speech and Natural Language Processing Workshop, Philadelphia, Feb. 1989.
Google Scholar
Pieraccini, R., Lee, C.-H., Giachin, E. and Rabiner, L. R.: Implementation Aspects of Large Vocabulary Recognition Based on Intra-word and Inter-word Phonetic Units. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA, June 1990.
Google Scholar
Pieraccini, R., Su, K.-Y. and Lee, C.-H.: unpublished work.
Google Scholar
Price, P. J., Fisher, W., Bernstein, J. and Pallett, D.: A Database for Continuous Speech Recognition in a 1000-Word Domain. Proc. ICASSP 88, New York, NY, pp. 651–654, April 1989.
Google Scholar
Rabiner, L. R., Wilpon, J. G. and Juang, B.-H.: A Segmental K-Means Training Procedure for Connected Word Recognition. AT&T Tech. J., Vol. 65, No. 3, pp. 21–31, May-June 1986.
Google Scholar
Rabiner, L. R.: A Tutorial on Hidden Markov Models, and Selected Applications in Speech Recognition. Proc. IEEE, Vol. 77, No. 2, pp. 257–286, Feb. 1989.
Article Google Scholar
Rabiner, L. R., Wilpon, J. G. and Soong, F. K.: High Performance Connected Digit Recognition Using Hidden Markov Models. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. 37, No. 8, pp. 1197–1213, Aug. 1989.
Article Google Scholar
Rabiner, L. R., Lee, C.-H., Juang, B.-H., Roe, D. B. and Wilpon, J. G.: Improved Training Procedure for Hidden Markov Models. J. Acoust. Soc. Am., suppl. 1, vol. 84, S61, Fall, 1988.
Article Google Scholar
Rosenberg, A. E., Lee, C.-H., Soong, F. K. and McGee, M. A.: Experiments in Automatic Talker Verification Using Sub-Word Hidden Markov Models. Proc. ICS1P90, Kobe Japan, November 1990.
Google Scholar
Sagayama, S.: Phoneme Environment Clustering for Speech Recognition. Proc. ICASSP 89, Glasgow, Scotland, pp. 397–400, May 1989.
Google Scholar
Schwartz, R. et al: Context Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech. Proc. ICASSP 85, pp. 1205–1208, Tampa, Florida, March 1985.
Google Scholar
Schwartz, R., Chow, Y. L. and Kubala, F.: Rapid Speaker Adaptation Using a Probabilistic Spectral Mapping. Proc. ICASSP 87, pp. 633–636, Dallas, April 1987.
Google Scholar
Schwartz, R. et al: The BBN BYBLOS Continuous Speech Recognition System. Proc. Speech and Natural Language Workshop, pp. 94–99, Philadelphia, Feb. 1989.
Google Scholar
Schwartz, R. and Chow, Y. L.: The N-Best Algorithm: An Efficient and Exact Procedure for Finding the N Most Likely Sentence Hypotheses. Proc. ICASSP 90, pp. 81–84, Albuquerque, NM, April 1990.
Google Scholar
Soong, F. K. and Rosenberg, A. E.: On the Use of Instantaneous, and Transitional Spectral Information in Speaker Recognition. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. 36, No. 6, pp. 871–879, June 1988.
Article MATH Google Scholar
Su, K.-Y. and Lee, C.-H.: Robustness and Discrimination Oriented Speech Recognition Using Weighted HMM and Subspace Projection Approaches. Submitted for publication.
Google Scholar
Weintraub, M. et al: Linguistic Constraints in Hidden Markov Model Based Speech Recognition. Proc. ICASSP 89, pp. 699–702, Glasgow, Scotland, May 1989.
Google Scholar
Zue, V., Glass, J., Phillips, M. and Seneff, S.: The MIT Summit Speech Recognition System: A Progress Report. Proc. Speech and Natural Language Workshop, Philadelphia, Feb. 1989.
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Research Department, AT&T Bell Laboratories, Murray Hill, NJ, 07974, USA
Chin-Hui Lee, Lawrence R. Rabiner & Roberto Pieraccini

Authors

Chin-Hui Lee
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence R. Rabiner
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Pieraccini
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Torino, Italy
Pietro Laface
School of Computer Science, 3480 University St., Montreal, Quebec, H3A 2A7, Canada
Renato De Mori

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, CH., Rabiner, L.R., Pieraccini, R. (1992). Speaker Independent Continuous Speech Recognition Using Continuous Density Hidden Markov Models. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-76626-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-76628-2
Online ISBN: 978-3-642-76626-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics