Abstract
The field of large vocabulary continuous speech recognition has advanced to the point where there are several systems capable of providing greater than 95% word accuracy for speaker independent recognition, of a 1000 word vocabulary, spoken fluently for a task with a perplexity of about 60. There are several factors which account for the high performance achieved by these systems, including the use of effective feature analysis, the use of hidden Markov model (HMM) methodology, the use of context-dependent sub-word units to capture intra-word and inter-word phonemic variations, and the use of corrective training techniques to emphasize differences between acoustically similar words in the vocabulary. In this paper we describe a large vocabulary continuous speech recognition system developed at AT&T Bell Laboratories, and discuss the methods used to provide high word recognition accuracy. In particular we focus our discussion on the techniques adopted to select the set of fundamental speech units and to provide the acoustic models of these sub-word units based on a continuous density HMM (CDHMM) framework. Different modeling approaches, such as a discrete HMM and a tied-mixture HMM, will also be discussed and compared to the CDHMM approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bahl, L. R., Brown, P. F., de Souza, P. V. and Mercer, R. L.: Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition. Proc. ICASSP 86, pp. 49–52, Tokyo, Japan, April 1986.
Bahl, L. R., Brown, P. F., de Souza, P. V. and Mercer, R. L.: A New Algorithm for the Estimation of Hidden Markov Model Parameters. Proc. ICASSP 88, pp. 493–496, New York, April 1988.
Bellegarda, J. R. and Nahamoo, D.: Tied Mixture Continuous Parameter Models for Large Vocabulary Isolated Speech Recognition. Proc. ICASSP 89, pp. 13–16, Glasgow Scotland, May 1989.
Ephraim, Y., Dembo, A. and Rabiner, L. R.: A Minimum Discrimination Information Approach for Hidden Markov Modeling. IEEE Trans. on Information Theory, Vol. IT-35, No. 5, pp. 1001–1013, Sept 1989.
Fissore, L., Laface, P., Micca, G. and Pieraccini, R.: Lexical Access to Large Vocabularies for Speech Recognition. IEEE Trans. on Acous., Speech and Signal Proc., pp. 1197–1213, Vol. ASSP-37, No. 8, August 1989.
Giachin, E., Rosenberg, A. E. and Lee, C.-H.: Word Juncture Coarticulation Modeling Using Phonological Rules for HMM-based Continuous Speech Recognition. Proc. ICASSP 90, pp. 737–740, Albuquerque, NM, April 1990.
Giachin, E., Lee, C.-H., Rabiner, L. R. and Pieraccini, R.: Word Juncture Modeling Using Inter-Word Context-Dependent Phone-Like Units: submitted for publication.
Hon, H. W., Lee, K. F. and Weide, R.: Towards Speech Recognition Without Vocabulary Specific Training. Proc. EuroSpeech 89, pp. 481–484, Paris, France, September 1989.
Huang, X. D. and Jack, M. A.: Semi-Continuous Hidden Markov Models for Speech Signals. Computer Speech, and Language, Vol. 3, No. 3, pp. 239–251, July 1989.
Huang X. D., Alleva, F., Hayamizu, S., Hon, H. W., Hwang, M. Y. and Lee, K. F.: Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA, June 1990.
Huang, E. F. and Soong, F. K.: A Fast Tree-Trellis Search for Finding the N-Best Sentence Hypotheses in Continuous Speech Recognition. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA, June 1990.
Hwang, M. Y., Hon, H. W. and Lee, K. F.: Modelling between-Word Coarticulation in Continuous Speech Recognition. Proc. EuroSpeech 89, Paris, September 1989.
Jelinek, F.: A Fast Sequential Decoding Algorithm Using A Stack. IBM J. Res. Develop., vol. 13, pp. 675–685, Nov. 1969.
Jelinek, F.: Continuous Speech Recognition by Statistical Methods. Proc. IEEE, Vol. 64, No. 4, pp. 532–536, April 1976.
Jelinek, F. and Mercer, R. L.: Interpolated Estimation of Markov Source Parameters from Sparse Data. Pattern Recognition in Practice, E. S. Gelsema, and L. N. Kanal, Ed., North-Holland Publishing Co., Amsterdam, pp. 381–397, 1980.
Jelinek, F.: The Development of an Experimental Discrete Dictation Recognizer. Proc. IEEE, Vol. 73, No. 11, pp. 1616–1624, November 1985.
Juang, B.-H., Wong, D. Y. and Gray, A. H. Jr.: Distortion Performance of Vector Quantization for LPC Voice Coding. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. ASSP-30, pp. 294–304, April 1982.
Juang, B.-H., Rabiner, L. R. and Wilpon, J. G.: On the Use of Bandpass Liftering in Speech Recognition. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. ASSP-35, No. 7, pp. 947–954, July 1987.
Juang, B.-H. and Rabiner, L. R.: Issues in Using Hidden Markov Models for Speech Recognition. To appear in Advances in Speech Signal Processing, S. Furui and M. Sondhi editors, Marcel Dekker Inc., New York, 1990.
Katagiri, S. and Lee, C.-H.: A New HMM/LVQ Hybrid Algorithm for Speech Recognition. To appear in Proc. GLOBECOM-90, San Diego, CA, December 1990.
Katagiri, S., Lee, C.-H. and Juang, B.-H.: A Generalized Probability Descent Method. Proc. Acous. Soc. of Japan, Nagoya, Japan, Sept. 1990.
Lee, C.-H., Soong, F. K. and Juang, B.-H.: A Segment Model Based Approach to Speech Recognition. Proc. ICASSP 88, New York, pp. 501–504, April 1988.
Lee, C.-H., Juang, B.-H., Soong, F. K. and Rabiner, L. R.: Word Recognition Using Whole Word, and Subword Models. Proc. ICASSP 89, pp. 683–686, Glasgow, Scotland, May 1989.
Lee, C.-H., Rabiner, L. R., Pieraccini, R. and Wilpon, J. G.: Acoustic Modeling for Large Vocabulary Speech Recognition. Computer Speech and Language, Vol. 4, pp. 127–165, 1990.
Lee, C.-H., Lin, C.-H. and Juang, B.-H.: A Study on Speaker Adaptation of the for Continuous Density HMM Parameters. Proc. ICASSP 90, pp. 145–148, Albuquerque, April 1990.
Lee, C.-H., Giachin, E., Rabiner, L. R., Pieraccini, R. and Rosenberg, A. E.: Improved Acoustic Modeling for Continuous Speech Recognition. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA., June 1990.
Lee, K. F.: Automatic Speech Recognition — The Development of the SPHINX System, Kluwer Academic Publishers, Boston, 1989.
Lee, K. F. and Mahajan, S.: Corrective and Reinforcement Learning for Speaker-Independent Continuous Speech Recognition. Proc. EuroSpeech 89, pp. 485–488, Paris, France, September 1989.
Lee, K.-F. et al: Allophone Clustering for Continuous Speech Recognition. Proc. ICASSP 90, pp. 749–752, Albuquerque, NM, April 1990.
Levinson, S. E.: Structural Methods in Automatic Speech Recognition. Proc. IEEE, Vol. 73, No. 11, pp. 1625–1650, Nov. 1985.
Levision, S. E., Liberman, M. Y., Ljolje, A. and Miller, L. G.: Speaker Independent Phonetic Transcription of Fluent Speech for Large Vocabulary Speech Recognition. Proc. ICASSP 89, pp. 442–444, Glasgow, Scotland, May 1989.
Ljolje, A., Ephraim, Y. and Rabiner, L. R.: Estimation of Hidden Markov Parameters by Minimizing Empirical Error Rate. Proc. ICASSP 90, pp. 709–712, Albuquerque, NM, April 1990.
Lowerre, B. and Reddy, D. R.: The HARPY Speech Understanding System. Trends in Speech Recognition, W. Lee, Ed., Prentice-Hall Inc., pp. 340–346, 1980.
Merhav, N. and Ephraim, Y.: Maximum Likelihood Hidden Markov Modeling Using a Dominant Sequence of States. Submitted for publication.
Ney, H.: Acoustic-Phonetic Modeling Using Continuous Mixture Densities for the 991-Word DARPA Speech Recognition Task. Proc. ICASSP 90, pp. 713–716, Albuquerque, NM, April 1990.
Pallett, D.: Test Procedures for the March 1987 DARPA Benchmark Tests. DARPA Speech Recognition Workshop, pp. 75–78, March 1987.
Pallett, D.: Selected Test Material for the March 1987 DARPA Benchmark Tests. DARPA Speech Recognition Workshop, pp. 79–81, March 1987.
Paul, D.: The Lincoln Continuous Speech Recognition System: Recent Development and Results. Proc. DARPA Speech and Natural Language Processing Workshop, Philadelphia, Feb. 1989.
Pieraccini, R., Lee, C.-H., Giachin, E. and Rabiner, L. R.: Implementation Aspects of Large Vocabulary Recognition Based on Intra-word and Inter-word Phonetic Units. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA, June 1990.
Pieraccini, R., Su, K.-Y. and Lee, C.-H.: unpublished work.
Price, P. J., Fisher, W., Bernstein, J. and Pallett, D.: A Database for Continuous Speech Recognition in a 1000-Word Domain. Proc. ICASSP 88, New York, NY, pp. 651–654, April 1989.
Rabiner, L. R., Wilpon, J. G. and Juang, B.-H.: A Segmental K-Means Training Procedure for Connected Word Recognition. AT&T Tech. J., Vol. 65, No. 3, pp. 21–31, May-June 1986.
Rabiner, L. R.: A Tutorial on Hidden Markov Models, and Selected Applications in Speech Recognition. Proc. IEEE, Vol. 77, No. 2, pp. 257–286, Feb. 1989.
Rabiner, L. R., Wilpon, J. G. and Soong, F. K.: High Performance Connected Digit Recognition Using Hidden Markov Models. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. 37, No. 8, pp. 1197–1213, Aug. 1989.
Rabiner, L. R., Lee, C.-H., Juang, B.-H., Roe, D. B. and Wilpon, J. G.: Improved Training Procedure for Hidden Markov Models. J. Acoust. Soc. Am., suppl. 1, vol. 84, S61, Fall, 1988.
Rosenberg, A. E., Lee, C.-H., Soong, F. K. and McGee, M. A.: Experiments in Automatic Talker Verification Using Sub-Word Hidden Markov Models. Proc. ICS1P90, Kobe Japan, November 1990.
Sagayama, S.: Phoneme Environment Clustering for Speech Recognition. Proc. ICASSP 89, Glasgow, Scotland, pp. 397–400, May 1989.
Schwartz, R. et al: Context Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech. Proc. ICASSP 85, pp. 1205–1208, Tampa, Florida, March 1985.
Schwartz, R., Chow, Y. L. and Kubala, F.: Rapid Speaker Adaptation Using a Probabilistic Spectral Mapping. Proc. ICASSP 87, pp. 633–636, Dallas, April 1987.
Schwartz, R. et al: The BBN BYBLOS Continuous Speech Recognition System. Proc. Speech and Natural Language Workshop, pp. 94–99, Philadelphia, Feb. 1989.
Schwartz, R. and Chow, Y. L.: The N-Best Algorithm: An Efficient and Exact Procedure for Finding the N Most Likely Sentence Hypotheses. Proc. ICASSP 90, pp. 81–84, Albuquerque, NM, April 1990.
Soong, F. K. and Rosenberg, A. E.: On the Use of Instantaneous, and Transitional Spectral Information in Speaker Recognition. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. 36, No. 6, pp. 871–879, June 1988.
Su, K.-Y. and Lee, C.-H.: Robustness and Discrimination Oriented Speech Recognition Using Weighted HMM and Subspace Projection Approaches. Submitted for publication.
Weintraub, M. et al: Linguistic Constraints in Hidden Markov Model Based Speech Recognition. Proc. ICASSP 89, pp. 699–702, Glasgow, Scotland, May 1989.
Zue, V., Glass, J., Phillips, M. and Seneff, S.: The MIT Summit Speech Recognition System: A Progress Report. Proc. Speech and Natural Language Workshop, Philadelphia, Feb. 1989.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, CH., Rabiner, L.R., Pieraccini, R. (1992). Speaker Independent Continuous Speech Recognition Using Continuous Density Hidden Markov Models. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-76626-8_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-76628-2
Online ISBN: 978-3-642-76626-8
eBook Packages: Springer Book Archive