Skip to main content

Hidden Markov Models and the Variants

  • Chapter
  • First Online:
Automatic Speech Recognition

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

This chapter builds upon the reviews in the previous chapter on aspects of probability theory and statistics, including random variables and Gaussian mixture models, and extends the reviews to the Markov chain and the hidden Markov sequence or model (HMM). Central to the HMM is the concept of state, which is itself a random variable typically taking discrete values. Extending from a Markov chain to an HMM involves adding uncertainty or a statistical distribution on each of the states in the Markov chain. Hence, an HMM is a doubly-stochastic process, or probabilistic function of a Markov chain. When the state of the Markov sequence or HMM is confined to be discrete and the distributions associated with the HMM states do not overlap, we reduce it to a Markov chain. This chapter covers several key aspects of the HMM, including its parametric characterization, its simulation by random number generators, its likelihood evaluation, its parameter estimation via the EM algorithm, and its state decoding via the Viterbi algorithm or a dynamic programming procedure. We then provide discussions on the use of the HMM as a generative model for speech feature sequences and its use as the basis for speech recognition. Finally, we discuss the limitations of the HMM, leading to its various extended versions, where each state is made associated with a dynamic system or a hidden time-varying trajectory instead of with a temporally independent stationary distribution such as a Gaussian mixture. These variants of the HMM with state-conditioned dynamic systems expressed in the state-space formulation are introduced as a generative counterpart of the recurrent neural networks to be described in detail in Chap. 13.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Acero, A., Deng, L., Kristjansson, T.T., Zhang, J.: HMM adaptation using vector taylor series for noisy speech recognition. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 869–872 (2000)

    Google Scholar 

  2. Bahl, L., Brown, P., de Souza, P., Mercer, R.: Maximum mutual information estimation of HMM parameters for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 49–52 (1986)

    Google Scholar 

  3. Baker, J.: Stochastic modeling for automatic speech recognition. In: Reddy, D. (ed.) Speech Recognition. Academic, New York (1976)

    Google Scholar 

  4. Baker, J., Deng, L., Glass, J., Khudanpur, S., Lee, C.H., Morgan, N., O’Shgughnessy, D.: Research developments and directions in speech recognition and understanding, part i. IEEE Signal Process. Mag. 26(3), 75–80 (2009)

    Article  Google Scholar 

  5. Baker, J., Deng, L., Glass, J., Khudanpur, S., Lee, C.H., Morgan, N., O’Shgughnessy, D.: Updated minds report on speech recognition and understanding (research developments and directions in speech recognition and understanding, part ii ). IEEE Signal Process. Mag. 26(4), 78–85 (2009)

    Article  Google Scholar 

  6. Baum, L., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Statist. 37(6), 1554–1563 (1966)

    Article  MATH  MathSciNet  Google Scholar 

  7. Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  8. Bengio, Y.: Estimating or propagating gradients through stochastic neurons. CoRR (2013)

    Google Scholar 

  9. Biem, A., Katagiri, S., McDermott, E., Juang, B.H.: An application of discriminative feature extraction to filter-bank-based speech recognition. IEEE Trans. Speech Audio Process. 9, 96–110 (2001)

    Article  Google Scholar 

  10. Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI (1997)

    Google Scholar 

  11. Bilmes, J.: Buried Markov models: a graphical modeling approach to automatic speech recognition. Comput. Speech Lang. 17, 213–231 (2003)

    Article  Google Scholar 

  12. Bilmes, J.: What HMMs can do. IEICE Trans. Inf. Syst. E89-D(3), 869–891 (2006)

    Google Scholar 

  13. Bilmes, J.: Dynamic graphical models. IEEE Signal Process. Mag. 33, 29–42 (2010)

    Google Scholar 

  14. Bilmes, J., Bartels, C.: Graphical model architectures for speech recognition. IEEE Signal Process. Mag. 22, 89–100 (2005)

    Article  Google Scholar 

  15. Bridle, J., Deng, L., Picone, J., Richards, H., Ma, J., Kamm, T., Schuster, M., Pike, S., Reagan, R.: An investigation fo segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for 1998 Workshop on Langauge Engineering, CLSP, Johns Hopkins (1998)

    Google Scholar 

  16. Chen, X., Eversole, A., Li, G., Yu, D., Seide, F.: Pipelined back-propagation for context-dependent deep neural networks. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)

    Google Scholar 

  17. Chengalvarayan, R., Deng, L.: HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features. IEEE Trans. Speech Audio Process. 5, 243–256 (1997)

    Article  Google Scholar 

  18. Chengalvarayan, R., Deng, L.: Speech trajectory discrimination using the minimum classification error learning. IEEE Trans. Speech Audio Process. 6, 505–515 (1998)

    Article  Google Scholar 

  19. Dahl, G., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011)

    Google Scholar 

  20. Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  21. Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  22. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum-likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B. 39 (1977)

    Google Scholar 

  23. Deng, L.: A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal. Signal Process. 27(1), 65–78 (1992)

    Article  MATH  Google Scholar 

  24. Deng, L.: A stochastic model of speech incorporating hierarchical nonstationarity. IEEE Trans. Acoust. Speech Signal Process. 1(4), 471–475 (1993)

    Article  Google Scholar 

  25. Deng, L.: A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition. Speech Commun. 24(4), 299–323 (1998)

    Article  Google Scholar 

  26. Deng, L.: Articulatory features and associated production models in statistical speech recognition. In: Computational Models of Speech Pattern Processing, pp. 214–224. Springer, New York (1999)

    Google Scholar 

  27. Deng, L.: Computational models for speech production. In: Computational Models of Speech Pattern Processing, pp. 199–213. Springer, New York (1999)

    Google Scholar 

  28. Deng, L.: Switching dynamic system models for speech articulation and acoustics. In: Mathematical Foundations of Speech and Language Processing, pp. 115–134. Springer, New York (2003)

    Google Scholar 

  29. Deng, L.: Dynamic Speech Models—Theory, Algorithm, and Applications. Morgan and Claypool (2006)

    Google Scholar 

  30. Deng, L.: Front-End, Back-End, and hybrid techniques to noise-robust speech recognition. Chapter 4 in Book: Robust Speech Recognition of Uncertain Data. Springer (2011)

    Google Scholar 

  31. Deng, L., Acero, A., Plumpe, M., Huang, X.: Large vocabulary speech recognition under adverse acoustic environment. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 806–809 (2000)

    Google Scholar 

  32. Deng, L., Aksmanovic, M., Sun, D., Wu, J.: Speech recognition using hidden Markov models with polynomial regression functions as non-stationary states. IEEE Trans. Acoust. Speech Signal Process. 2(4), 101–119 (1994)

    Google Scholar 

  33. Deng, L., Attias, H., Lee, L., Acero, A.: Adaptive kalman smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model. IEEE Trans. Audio, Speech Lang. Process. 15, 13–23 (2007)

    Article  Google Scholar 

  34. Deng, L., Bazzi, I., Acero, A.: Tracking vocal tract resonances using an analytical nonlinear predictor and a target-guided temporal constraint. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH) (2003)

    Google Scholar 

  35. Deng, L., Dang, J.: Speech analysis: the production-perception perspective. In: Advances in Chinese Spoken Language Processing. World Scientific Publishing, Singapore (2007)

    Google Scholar 

  36. Deng, L., Droppo, J., Acero, A.: Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition. IEEE Trans. Speech Audio Process. 11, 568–580 (2003)

    Article  Google Scholar 

  37. Deng, L., Droppo, J., Acero, A.: A Bayesian approach to speech feature enhancement using the dynamic cepstral prior. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-829–I-832 (2002)

    Google Scholar 

  38. Deng, L., Droppo, J., Acero, A.: Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Trans. Speech Audio Process. 12(2), 133–143 (2004)

    Article  Google Scholar 

  39. Deng, L., Kenny, P., Lennig, M., Gupta, V., Seitz, F., Mermelsten, P.: Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition. IEEE Trans. Acoust. Speech Signal Process. 39(7), 1677–1681 (1991)

    Article  Google Scholar 

  40. Deng, L., Lennig, M., Seitz, F., Mermelstein, P.: Large vocabulary word recognition using context-dependent allophonic hidden Markov models. Comput. Speech Lang. 4, 345–357 (1991)

    Article  Google Scholar 

  41. Deng, L., Li, X.: Machine learning paradigms in speech recognition: an overview. IEEE Trans. Audio, Speech Lang. Process. 21(5), 1060–1089 (2013)

    Article  Google Scholar 

  42. Deng, L., Mark, J.: Parameter estimation for Markov modulated poisson processes via the em algorithm with time discretization. In: Telecommunication Systems (1993)

    Google Scholar 

  43. Deng, L., O’Shaughnessy, D.: Speech Processing—A Dynamic and Optimization-Oriented Approach. Marcel Dekker Inc, New York (2003)

    Google Scholar 

  44. Deng, L., Ramsay, G., Sun, D.: Production models as a structural basis for automatic speech recognition. Speech Commun. 33(2–3), 93–111 (1997)

    Article  Google Scholar 

  45. Deng, L., Rathinavelu, C.: A Markov model containing state-conditioned second-order non-stationarity: application to speech recognition. Comput. Speech Lang. 9(1), 63–86 (1995)

    Article  Google Scholar 

  46. Deng, L., Sameti, H.: Transitional speech units and their representation by regressive Markov states: applications to speech recognition. IEEE Trans. Speech Audio Process. 4(4), 301–306 (1996)

    Article  Google Scholar 

  47. Deng, L., Sun, D.: A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features. J. Acoust. Soc. Am. 85, 2702–2719 (1994)

    Article  Google Scholar 

  48. Deng, L., Wang, K., Acero, A., Hon, H., Droppo, J., Boulis, C., Wang, Y., Jacoby, D., Mahajan, M., Chelba, C., Huang, X.: Distributed speech processing in mipad’s multimodal user interface. IEEE Trans. Audio, Speech Lang. Process. 20(9), 2409–2419 (2012)

    Google Scholar 

  49. Deng, L., Wu, J., Droppo, J., Acero, A.: Analysis and comparisons of two speech feature extraction/compensation algorithms (2005)

    Google Scholar 

  50. Deng, L., Yu, D.: Use of differential cepstra as acoustic features in hidden trajectory modelling for phonetic recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 445–448 (2007)

    Google Scholar 

  51. Deng, L., Yu, D., Acero, A.: A bidirectional target filtering model of speech coarticulation: two-stage implementation for phonetic recognition. IEEE Trans. Speech Audio Process. 14, 256–265 (2006)

    Article  Google Scholar 

  52. Deng, L., Yu, D., Acero, A.: Structured speech modeling. IEEE Trans. Speech Audio Process. 14, 1492–1504 (2006)

    Article  Google Scholar 

  53. Droppo, J., Acero, A.: Noise robust speech recognition with a switching linear dynamic model. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-953–I-956 (2004)

    Google Scholar 

  54. Fox, E., Sudderth, E., Jordan, M., Willsky, A.: Bayesian nonparametric methods for learning Markov switching processes. IEEE Signal Process. Mag. 27(6), 43–54 (2010)

    Google Scholar 

  55. Frey, B., Deng, L., Acero, A., Kristjansson, T.: Algonquin: iterating laplaces method to remove multiple types of acoustic distortion for robust speech recognition. In: Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH) (2001)

    Google Scholar 

  56. Fu, Q., Zhao, Y., Juang, B.H.: Automatic speech recognition based on non-uniform error criteria. IEEE Trans. Audio, Speech Lang. Process. 20(3), 780–793 (2012)

    Article  Google Scholar 

  57. Gales, M., Watanabe, S., Fosler-Lussier, E.: Structured discriminative models for speech recognition. IEEE Signal Process. Mag. 29, 70–81 (2012)

    Article  Google Scholar 

  58. Gales, M., Young, S.: Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)

    Article  Google Scholar 

  59. Gao, Y., Bakis, R., Huang, J., Xiang, B.: Multistage coarticulation model combining articulatory, formant and cepstral features. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 25–28. Beijing, China (2000)

    Google Scholar 

  60. Gemmeke, J., Virtanen, T., Hurmalainen, A.: Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 19(7), 2067–2080 (2011)

    Article  Google Scholar 

  61. Ghahramani, Z., Hinton, G.E.: Variational learning for switching state-space models. Neural Comput. 12, 831–864 (2000)

    Article  Google Scholar 

  62. Gong, Y., Illina, I., Haton, J.P.: Modeling long term variability information in mixture stochastic trajectory framework. In: Proceedings of International Conference on Spoken Language Processing (1996)

    Google Scholar 

  63. He, X., Deng, L.: Discriminative Learning for Speech Recognition: Theory and Practice. Morgan and Claypool (2008)

    Google Scholar 

  64. He, X., Deng, L.: Speech recognition, machine translation, and speech translation—a unified discriminative learning paradigm. IEEE Signal Process. Mag. 27, 126–133 (2011)

    Article  Google Scholar 

  65. He, X., Deng, L., Chou, W.: Discriminative learning in sequential pattern recognition—a unifying review for optimization-oriented speech recognition. IEEE Signal Process. Mag. 25(5), 14–36 (2008)

    Article  Google Scholar 

  66. Heigold, G., Ney, H., Schluter, R.: Investigations on an EM-style optimization algorithm for discriminative training of HMMs. IEEE Trans. Audio, Speech Lang. Process. 21(12), 2616–2626 (2013)

    Article  Google Scholar 

  67. Heigold, G., Wiesler, S., Nubbaum-Thom, M., Lehnen, P., Schluter, R., Ney, H.: Discriminative HMMs. log-linear models, and CRFs: what is the difference? In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2010)

    Google Scholar 

  68. Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference (2013)

    Google Scholar 

  69. Holmes, W., Russell, M.: Probabilistic-trajectory segmental HMMs. Comput. Speech Lang. 13, 3–37 (1999)

    Article  Google Scholar 

  70. Huang, X., Acero, A., Hon, H.W., et al.: Spoken Language Processing, vol. 18. Prentice Hall, Englewood Cliffs (2001)

    Google Scholar 

  71. Huang, X., Deng, L.: An overview of modern speech recognition. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton, FL (2010). ISBN 978-1420085921

    Google Scholar 

  72. Jelinek, F.: Continuous speech recognition by statistical methods. Proc. IEEE 64(4), 532–557 (1976)

    Article  Google Scholar 

  73. Juang, B.H., Hou, W., Lee, C.H.: Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process. 5(3), 257–265 (1997)

    Article  Google Scholar 

  74. Juang, B.H., Levinson, S.E., Sondhi, M.M.: Maximum likelihood estimation for mixture multivariate stochastic observations of Markov chains. IEEE Int. Symp. Inf. Theory 32(2), 307–309 (1986)

    Article  Google Scholar 

  75. Kalinli, O., Seltzer, M.L., Droppo, J., Acero, A.: Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 18(8), 1889–1901 (2010)

    Article  Google Scholar 

  76. Kello, C.T., Plaut, D.C.: A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters. J. Acoust. Soc. Am. 116(4), 2354–2364 (2004)

    Article  Google Scholar 

  77. King, S., Frankel, J., Livescu, K., McDermott, E., Richmond, K., Wester, M.: J. Acoust. Soc. Am. 121, 723–742 (2007)

    Google Scholar 

  78. Kingma, D., Welling, M.: Efficient gradient-based inference through transformations between bayes nets and neural nets. In: Proceedings of International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  79. Lee, L., Attias, H., Deng, L.: Variational inference and learning for segmental switching state space models of hidden speech dynamics. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-872–I-875 (2003)

    Google Scholar 

  80. Lee, L.J., Fieguth, P., Deng, L.: A functional articulatory dynamic model for speech production. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 797–800. Salt Lake City, Utah (2001)

    Google Scholar 

  81. Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: High-performance hmm adaptation with joint compensation of additive and convolutive distortions via vector taylor series. In: Proceedings of IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 65–70 (2007)

    Google Scholar 

  82. Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4069–4072 (2008)

    Google Scholar 

  83. Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. Comput. Speech Lang. 23(3), 389–405 (2009)

    Article  Google Scholar 

  84. Liu, F.H., Stern, R.M., Huang, X., Acero, A.: Efficient cepstral normalization for robust speech recognition. In: Proceedings of ACL Workshop on Human Language Technologies (ACL-HLT), pp. 69–74 (1993)

    Google Scholar 

  85. Liu, S., Sim, K.: Temporally varying weight regression: a semi-parametric trajectory model for automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 22(1), 151–160 (2014)

    Article  Google Scholar 

  86. Livescu, K., Fosler-Lussier, E., Metze, F.: Subword modeling for automatic speech recognition: past, present, and emerging approaches. IEEE Signal Process. Mag. 29(6), 44–57 (2012)

    Article  Google Scholar 

  87. Ma, J., Deng, L.: A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech. Comput. Speech Lang. 14, 101–104 (2000)

    Article  Google Scholar 

  88. Ma, J., Deng, L.: Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Audio, Speech Lang. Process. 11(6), 590–602 (2004)

    Article  Google Scholar 

  89. Ma, J., Deng, L.: Target-directed mixture dynamic models for spontaneous speech recognition. IEEE Trans. Audio Speech Process. 12(1), 47–58 (2004)

    Article  Google Scholar 

  90. Macherey, W., Ney, H.: A comparative study on maximum entropy and discriminative training for acoustic modeling in automatic speech recognition. In: Proceedings of Eurospeech, pp. 493–496 (2003)

    Google Scholar 

  91. Mak, B., Tam, Y., Li, P.: Discriminative auditory-based features for robust speech recognition. IEEE Trans. Speech Audio Process. 12, 28–36 (2004)

    Article  Google Scholar 

  92. Mesot, B., Barber, D.: Switching linear dynamical systems for noise robust speech recognition. IEEE Trans. Audio, Speech Lang. Process. 15(6), 1850–1858 (2007)

    Article  Google Scholar 

  93. Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: Proceedings of International Conference on Machine Learning (ICML) (2014)

    Google Scholar 

  94. Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)

    Article  Google Scholar 

  95. Ostendorf, M., Digalakis, V., Kimball, O.: From HMM’s to segment models: a unified view of stochastic modeling for speech recognition. IEEE Trans. Audio Speech Process. 4(5) (1996)

    Google Scholar 

  96. Ostendorf, M., Kannan, A., Kimball, O., Rohlicek, J.: Continuous word recognition based on the stochastic segment model. In: Proceedings of the DARPA Workshop CSR (1992)

    Google Scholar 

  97. Pavlovic, V., Frey, B., Huang, T.: Variational learning in mixed-state dynamic graphical models. In: UAI, pp. 522–530. Stockholm (1999)

    Google Scholar 

  98. Picone, J., Pike, S., Regan, R., Kamm, T., Bridle, J., Deng, L., Ma, Z., Richards, H., Schuster, M.: Initial evaluation of hidden dynamic models on conversational speech. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1999)

    Google Scholar 

  99. Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4057–4060 (2008)

    Google Scholar 

  100. Povey, D., Kingsbury, B., Mangu, L., Saon, G., Soltau, H., Zweig, G.: FMPE: discriminatively trained features for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 961–964 (2005)

    Google Scholar 

  101. Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–105 (2002)

    Google Scholar 

  102. Povey, D., Woodland, P.C.: Minimum phone error and i-smoothing for improved discriminative training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 105–108 (2002)

    Google Scholar 

  103. Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  104. Rabiner, L., Juang, B.H.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)

    Article  Google Scholar 

  105. Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Upper Saddle River (1993)

    Google Scholar 

  106. Rosti, A., Gales, M.: Rao-blackwellised gibbs sampling for switching linear dynamical systems. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-809–I-812 (2004)

    Google Scholar 

  107. Russell, M., Jackson, P.: A multiple-level linear/linear segmental HMM with a formant-based intermediate layer. Comput. Speech Lang. 19, 205–225 (2005)

    Article  Google Scholar 

  108. Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. In: Readings in Speech Recognition, pp. 159–165. Morgan Kaufmann Publishers Inc, San Francisco (1990)

    Google Scholar 

  109. Schlueter, R., Macherey, W., Mueller, B., Ney, H.: Comparison of discriminative training criteria and optimization methods for speech recognition. Speech Commun. 31, 287–310 (2001)

    Article  Google Scholar 

  110. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011)

    Google Scholar 

  111. Sun, J., Deng, L.: An overlapping-feature based phonological model incorporating linguistic constraints: applications to speech recognition. J. Acoust. Soc. Am. 111, 1086–1101 (2002)

    Article  Google Scholar 

  112. Suzuki, J., Fujino, A., Isozaki, H.: Semi-supervised structured output learning based on a hybrid generative and discriminative approach. In: Proceedings of EMNLP-CoNLL (2007)

    Google Scholar 

  113. Wang, Y., Gales, M.J.: Speaker and noise factorization for robust speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(7), 2149–2158 (2012)

    Article  Google Scholar 

  114. Woodland, P.C., Povey, D.: Large scale discriminative training of hidden Markov models for speech recognition. Comput. Speech Lang. (2002)

    Google Scholar 

  115. Wright, S., Kanevsky, D., Deng, L., He, X., Heigold, G., Li, H.: Optimization algorithms and applications for speech and language processing. IIEEE Trans. Audio, Speech Lang. Process. 21(11), 2231–2243 (2013)

    Article  Google Scholar 

  116. Xing, E., Jordan, M., Russell, S.: A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of Uncertainty in Artificial Intelligence (2003)

    Google Scholar 

  117. Yu, D., Deng, L.: Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation. Comput. Speech Lang. 27, 72–87 (2007)

    Article  Google Scholar 

  118. Yu, D., Deng, L., Acero, A.: A lattice search technique for a long-contextual-span hidden trajectory model of speech. Speech Commun. 48, 1214–1226 (2006)

    Article  Google Scholar 

  119. Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010)

    Google Scholar 

  120. Yu, D., Deng, L., Gong, Y., Acero, A.: A novel framework and training algorithm for variable-parameter hidden Markov models. IEEE Trans. Audio, Speech Lang. Process. 17(7), 1348–1360 (2009)

    Article  Google Scholar 

  121. Yu, D., Deng, L., He, X., Acero, A.: Use of incrementally regulated discriminative margins in MCE training for speech recognition. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH) (2006)

    Google Scholar 

  122. Yu, D., Deng, L., He, X., Acero, A.: Use of incrementally regulated discriminative margins in mce training for speech recognition. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 2418–2421 (2006)

    Google Scholar 

  123. Yu, D., Deng, L., He, X., Acero, A.: Large-margin minimum classification error training: a theoretical risk minimization perspective. Comput. Speech Lang. 22, 415–429 (2008)

    Article  Google Scholar 

  124. Zen, H., Tokuda, K., Kitamura, T.: An introduction of trajectory model into HMM-based speech synthesis. In: Proceedings of ISCA SSW5, pp. 191–196 (2004)

    Google Scholar 

  125. Zhang, B., Matsoukas, S., Schwartz, R.: Discriminatively trained region dependent feature transforms for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–I (2006)

    Google Scholar 

  126. Zhang, L., Renals, S.: Acoustic-articulatory modelling with the trajectory HMM. IEEE Signal Process. Lett. 15, 245–248 (2008)

    Article  Google Scholar 

  127. Zhang, S., Gales, M.: Structured SVMs for automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 21(3), 544–555 (2013)

    Article  Google Scholar 

  128. Zhou, J.L., Seide, F., Deng, L.: Coarticulation modeling by embedding a target-directed hidden trajectory model into HMM—model and training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 744–747. Hongkong (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dong Yu .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag London

About this chapter

Cite this chapter

Yu, D., Deng, L. (2015). Hidden Markov Models and the Variants. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London. https://doi.org/10.1007/978-1-4471-5779-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5779-3_3

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5778-6

  • Online ISBN: 978-1-4471-5779-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics