Skip to main content

Speaker Independent Continuous Speech Recognition Using Continuous Density Hidden Markov Models

  • Conference paper
Speech Recognition and Understanding

Part of the book series: NATO ASI Series ((NATO ASI F,volume 75))

Abstract

The field of large vocabulary continuous speech recognition has advanced to the point where there are several systems capable of providing greater than 95% word accuracy for speaker independent recognition, of a 1000 word vocabulary, spoken fluently for a task with a perplexity of about 60. There are several factors which account for the high performance achieved by these systems, including the use of effective feature analysis, the use of hidden Markov model (HMM) methodology, the use of context-dependent sub-word units to capture intra-word and inter-word phonemic variations, and the use of corrective training techniques to emphasize differences between acoustically similar words in the vocabulary. In this paper we describe a large vocabulary continuous speech recognition system developed at AT&T Bell Laboratories, and discuss the methods used to provide high word recognition accuracy. In particular we focus our discussion on the techniques adopted to select the set of fundamental speech units and to provide the acoustic models of these sub-word units based on a continuous density HMM (CDHMM) framework. Different modeling approaches, such as a discrete HMM and a tied-mixture HMM, will also be discussed and compared to the CDHMM approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bahl, L. R., Brown, P. F., de Souza, P. V. and Mercer, R. L.: Maximum Mutual Information Estimation of Hidden Markov Model Parameters for Speech Recognition. Proc. ICASSP 86, pp. 49–52, Tokyo, Japan, April 1986.

    Google Scholar 

  2. Bahl, L. R., Brown, P. F., de Souza, P. V. and Mercer, R. L.: A New Algorithm for the Estimation of Hidden Markov Model Parameters. Proc. ICASSP 88, pp. 493–496, New York, April 1988.

    Google Scholar 

  3. Bellegarda, J. R. and Nahamoo, D.: Tied Mixture Continuous Parameter Models for Large Vocabulary Isolated Speech Recognition. Proc. ICASSP 89, pp. 13–16, Glasgow Scotland, May 1989.

    Google Scholar 

  4. Ephraim, Y., Dembo, A. and Rabiner, L. R.: A Minimum Discrimination Information Approach for Hidden Markov Modeling. IEEE Trans. on Information Theory, Vol. IT-35, No. 5, pp. 1001–1013, Sept 1989.

    Article  MATH  MathSciNet  Google Scholar 

  5. Fissore, L., Laface, P., Micca, G. and Pieraccini, R.: Lexical Access to Large Vocabularies for Speech Recognition. IEEE Trans. on Acous., Speech and Signal Proc., pp. 1197–1213, Vol. ASSP-37, No. 8, August 1989.

    Article  Google Scholar 

  6. Giachin, E., Rosenberg, A. E. and Lee, C.-H.: Word Juncture Coarticulation Modeling Using Phonological Rules for HMM-based Continuous Speech Recognition. Proc. ICASSP 90, pp. 737–740, Albuquerque, NM, April 1990.

    Google Scholar 

  7. Giachin, E., Lee, C.-H., Rabiner, L. R. and Pieraccini, R.: Word Juncture Modeling Using Inter-Word Context-Dependent Phone-Like Units: submitted for publication.

    Google Scholar 

  8. Hon, H. W., Lee, K. F. and Weide, R.: Towards Speech Recognition Without Vocabulary Specific Training. Proc. EuroSpeech 89, pp. 481–484, Paris, France, September 1989.

    Google Scholar 

  9. Huang, X. D. and Jack, M. A.: Semi-Continuous Hidden Markov Models for Speech Signals. Computer Speech, and Language, Vol. 3, No. 3, pp. 239–251, July 1989.

    Article  Google Scholar 

  10. Huang X. D., Alleva, F., Hayamizu, S., Hon, H. W., Hwang, M. Y. and Lee, K. F.: Improved Hidden Markov Modeling for Speaker-Independent Continuous Speech Recognition. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA, June 1990.

    Google Scholar 

  11. Huang, E. F. and Soong, F. K.: A Fast Tree-Trellis Search for Finding the N-Best Sentence Hypotheses in Continuous Speech Recognition. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA, June 1990.

    Google Scholar 

  12. Hwang, M. Y., Hon, H. W. and Lee, K. F.: Modelling between-Word Coarticulation in Continuous Speech Recognition. Proc. EuroSpeech 89, Paris, September 1989.

    Google Scholar 

  13. Jelinek, F.: A Fast Sequential Decoding Algorithm Using A Stack. IBM J. Res. Develop., vol. 13, pp. 675–685, Nov. 1969.

    Article  MATH  MathSciNet  Google Scholar 

  14. Jelinek, F.: Continuous Speech Recognition by Statistical Methods. Proc. IEEE, Vol. 64, No. 4, pp. 532–536, April 1976.

    Article  Google Scholar 

  15. Jelinek, F. and Mercer, R. L.: Interpolated Estimation of Markov Source Parameters from Sparse Data. Pattern Recognition in Practice, E. S. Gelsema, and L. N. Kanal, Ed., North-Holland Publishing Co., Amsterdam, pp. 381–397, 1980.

    Google Scholar 

  16. Jelinek, F.: The Development of an Experimental Discrete Dictation Recognizer. Proc. IEEE, Vol. 73, No. 11, pp. 1616–1624, November 1985.

    Article  Google Scholar 

  17. Juang, B.-H., Wong, D. Y. and Gray, A. H. Jr.: Distortion Performance of Vector Quantization for LPC Voice Coding. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. ASSP-30, pp. 294–304, April 1982.

    Article  Google Scholar 

  18. Juang, B.-H., Rabiner, L. R. and Wilpon, J. G.: On the Use of Bandpass Liftering in Speech Recognition. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. ASSP-35, No. 7, pp. 947–954, July 1987.

    Article  Google Scholar 

  19. Juang, B.-H. and Rabiner, L. R.: Issues in Using Hidden Markov Models for Speech Recognition. To appear in Advances in Speech Signal Processing, S. Furui and M. Sondhi editors, Marcel Dekker Inc., New York, 1990.

    Google Scholar 

  20. Katagiri, S. and Lee, C.-H.: A New HMM/LVQ Hybrid Algorithm for Speech Recognition. To appear in Proc. GLOBECOM-90, San Diego, CA, December 1990.

    Google Scholar 

  21. Katagiri, S., Lee, C.-H. and Juang, B.-H.: A Generalized Probability Descent Method. Proc. Acous. Soc. of Japan, Nagoya, Japan, Sept. 1990.

    Google Scholar 

  22. Lee, C.-H., Soong, F. K. and Juang, B.-H.: A Segment Model Based Approach to Speech Recognition. Proc. ICASSP 88, New York, pp. 501–504, April 1988.

    Google Scholar 

  23. Lee, C.-H., Juang, B.-H., Soong, F. K. and Rabiner, L. R.: Word Recognition Using Whole Word, and Subword Models. Proc. ICASSP 89, pp. 683–686, Glasgow, Scotland, May 1989.

    Google Scholar 

  24. Lee, C.-H., Rabiner, L. R., Pieraccini, R. and Wilpon, J. G.: Acoustic Modeling for Large Vocabulary Speech Recognition. Computer Speech and Language, Vol. 4, pp. 127–165, 1990.

    Article  Google Scholar 

  25. Lee, C.-H., Lin, C.-H. and Juang, B.-H.: A Study on Speaker Adaptation of the for Continuous Density HMM Parameters. Proc. ICASSP 90, pp. 145–148, Albuquerque, April 1990.

    Google Scholar 

  26. Lee, C.-H., Giachin, E., Rabiner, L. R., Pieraccini, R. and Rosenberg, A. E.: Improved Acoustic Modeling for Continuous Speech Recognition. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA., June 1990.

    Google Scholar 

  27. Lee, K. F.: Automatic Speech Recognition — The Development of the SPHINX System, Kluwer Academic Publishers, Boston, 1989.

    Google Scholar 

  28. Lee, K. F. and Mahajan, S.: Corrective and Reinforcement Learning for Speaker-Independent Continuous Speech Recognition. Proc. EuroSpeech 89, pp. 485–488, Paris, France, September 1989.

    Google Scholar 

  29. Lee, K.-F. et al: Allophone Clustering for Continuous Speech Recognition. Proc. ICASSP 90, pp. 749–752, Albuquerque, NM, April 1990.

    Google Scholar 

  30. Levinson, S. E.: Structural Methods in Automatic Speech Recognition. Proc. IEEE, Vol. 73, No. 11, pp. 1625–1650, Nov. 1985.

    Article  Google Scholar 

  31. Levision, S. E., Liberman, M. Y., Ljolje, A. and Miller, L. G.: Speaker Independent Phonetic Transcription of Fluent Speech for Large Vocabulary Speech Recognition. Proc. ICASSP 89, pp. 442–444, Glasgow, Scotland, May 1989.

    Google Scholar 

  32. Ljolje, A., Ephraim, Y. and Rabiner, L. R.: Estimation of Hidden Markov Parameters by Minimizing Empirical Error Rate. Proc. ICASSP 90, pp. 709–712, Albuquerque, NM, April 1990.

    Google Scholar 

  33. Lowerre, B. and Reddy, D. R.: The HARPY Speech Understanding System. Trends in Speech Recognition, W. Lee, Ed., Prentice-Hall Inc., pp. 340–346, 1980.

    Google Scholar 

  34. Merhav, N. and Ephraim, Y.: Maximum Likelihood Hidden Markov Modeling Using a Dominant Sequence of States. Submitted for publication.

    Google Scholar 

  35. Ney, H.: Acoustic-Phonetic Modeling Using Continuous Mixture Densities for the 991-Word DARPA Speech Recognition Task. Proc. ICASSP 90, pp. 713–716, Albuquerque, NM, April 1990.

    Google Scholar 

  36. Pallett, D.: Test Procedures for the March 1987 DARPA Benchmark Tests. DARPA Speech Recognition Workshop, pp. 75–78, March 1987.

    Google Scholar 

  37. Pallett, D.: Selected Test Material for the March 1987 DARPA Benchmark Tests. DARPA Speech Recognition Workshop, pp. 79–81, March 1987.

    Google Scholar 

  38. Paul, D.: The Lincoln Continuous Speech Recognition System: Recent Development and Results. Proc. DARPA Speech and Natural Language Processing Workshop, Philadelphia, Feb. 1989.

    Google Scholar 

  39. Pieraccini, R., Lee, C.-H., Giachin, E. and Rabiner, L. R.: Implementation Aspects of Large Vocabulary Recognition Based on Intra-word and Inter-word Phonetic Units. Proc. DARPA Speech and Natural Language Workshop, Somerset, PA, June 1990.

    Google Scholar 

  40. Pieraccini, R., Su, K.-Y. and Lee, C.-H.: unpublished work.

    Google Scholar 

  41. Price, P. J., Fisher, W., Bernstein, J. and Pallett, D.: A Database for Continuous Speech Recognition in a 1000-Word Domain. Proc. ICASSP 88, New York, NY, pp. 651–654, April 1989.

    Google Scholar 

  42. Rabiner, L. R., Wilpon, J. G. and Juang, B.-H.: A Segmental K-Means Training Procedure for Connected Word Recognition. AT&T Tech. J., Vol. 65, No. 3, pp. 21–31, May-June 1986.

    Google Scholar 

  43. Rabiner, L. R.: A Tutorial on Hidden Markov Models, and Selected Applications in Speech Recognition. Proc. IEEE, Vol. 77, No. 2, pp. 257–286, Feb. 1989.

    Article  Google Scholar 

  44. Rabiner, L. R., Wilpon, J. G. and Soong, F. K.: High Performance Connected Digit Recognition Using Hidden Markov Models. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. 37, No. 8, pp. 1197–1213, Aug. 1989.

    Article  Google Scholar 

  45. Rabiner, L. R., Lee, C.-H., Juang, B.-H., Roe, D. B. and Wilpon, J. G.: Improved Training Procedure for Hidden Markov Models. J. Acoust. Soc. Am., suppl. 1, vol. 84, S61, Fall, 1988.

    Article  Google Scholar 

  46. Rosenberg, A. E., Lee, C.-H., Soong, F. K. and McGee, M. A.: Experiments in Automatic Talker Verification Using Sub-Word Hidden Markov Models. Proc. ICS1P90, Kobe Japan, November 1990.

    Google Scholar 

  47. Sagayama, S.: Phoneme Environment Clustering for Speech Recognition. Proc. ICASSP 89, Glasgow, Scotland, pp. 397–400, May 1989.

    Google Scholar 

  48. Schwartz, R. et al: Context Dependent Modeling for Acoustic-Phonetic Recognition of Continuous Speech. Proc. ICASSP 85, pp. 1205–1208, Tampa, Florida, March 1985.

    Google Scholar 

  49. Schwartz, R., Chow, Y. L. and Kubala, F.: Rapid Speaker Adaptation Using a Probabilistic Spectral Mapping. Proc. ICASSP 87, pp. 633–636, Dallas, April 1987.

    Google Scholar 

  50. Schwartz, R. et al: The BBN BYBLOS Continuous Speech Recognition System. Proc. Speech and Natural Language Workshop, pp. 94–99, Philadelphia, Feb. 1989.

    Google Scholar 

  51. Schwartz, R. and Chow, Y. L.: The N-Best Algorithm: An Efficient and Exact Procedure for Finding the N Most Likely Sentence Hypotheses. Proc. ICASSP 90, pp. 81–84, Albuquerque, NM, April 1990.

    Google Scholar 

  52. Soong, F. K. and Rosenberg, A. E.: On the Use of Instantaneous, and Transitional Spectral Information in Speaker Recognition. IEEE Trans. on Acoustics, Speech, and Signal Proc., Vol. 36, No. 6, pp. 871–879, June 1988.

    Article  MATH  Google Scholar 

  53. Su, K.-Y. and Lee, C.-H.: Robustness and Discrimination Oriented Speech Recognition Using Weighted HMM and Subspace Projection Approaches. Submitted for publication.

    Google Scholar 

  54. Weintraub, M. et al: Linguistic Constraints in Hidden Markov Model Based Speech Recognition. Proc. ICASSP 89, pp. 699–702, Glasgow, Scotland, May 1989.

    Google Scholar 

  55. Zue, V., Glass, J., Phillips, M. and Seneff, S.: The MIT Summit Speech Recognition System: A Progress Report. Proc. Speech and Natural Language Workshop, Philadelphia, Feb. 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1992 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, CH., Rabiner, L.R., Pieraccini, R. (1992). Speaker Independent Continuous Speech Recognition Using Continuous Density Hidden Markov Models. In: Laface, P., De Mori, R. (eds) Speech Recognition and Understanding. NATO ASI Series, vol 75. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76626-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-76626-8_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-76628-2

  • Online ISBN: 978-3-642-76626-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics