Hidden Markov Models and the Variants

Yu, Dong; Deng, Li

doi:10.1007/978-1-4471-5779-3_3

Dong Yu³ &
Li Deng⁴

Part of the book series: Signals and Communication Technology ((SCT))

13k Accesses
2 Citations

Abstract

This chapter builds upon the reviews in the previous chapter on aspects of probability theory and statistics, including random variables and Gaussian mixture models, and extends the reviews to the Markov chain and the hidden Markov sequence or model (HMM). Central to the HMM is the concept of state, which is itself a random variable typically taking discrete values. Extending from a Markov chain to an HMM involves adding uncertainty or a statistical distribution on each of the states in the Markov chain. Hence, an HMM is a doubly-stochastic process, or probabilistic function of a Markov chain. When the state of the Markov sequence or HMM is confined to be discrete and the distributions associated with the HMM states do not overlap, we reduce it to a Markov chain. This chapter covers several key aspects of the HMM, including its parametric characterization, its simulation by random number generators, its likelihood evaluation, its parameter estimation via the EM algorithm, and its state decoding via the Viterbi algorithm or a dynamic programming procedure. We then provide discussions on the use of the HMM as a generative model for speech feature sequences and its use as the basis for speech recognition. Finally, we discuss the limitations of the HMM, leading to its various extended versions, where each state is made associated with a dynamic system or a hidden time-varying trajectory instead of with a temporally independent stationary distribution such as a Gaussian mixture. These variants of the HMM with state-conditioned dynamic systems expressed in the state-space formulation are introduced as a generative counterpart of the recurrent neural networks to be described in detail in Chap. 13.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acero, A., Deng, L., Kristjansson, T.T., Zhang, J.: HMM adaptation using vector taylor series for noisy speech recognition. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 869–872 (2000)
Google Scholar
Bahl, L., Brown, P., de Souza, P., Mercer, R.: Maximum mutual information estimation of HMM parameters for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 49–52 (1986)
Google Scholar
Baker, J.: Stochastic modeling for automatic speech recognition. In: Reddy, D. (ed.) Speech Recognition. Academic, New York (1976)
Google Scholar
Baker, J., Deng, L., Glass, J., Khudanpur, S., Lee, C.H., Morgan, N., O’Shgughnessy, D.: Research developments and directions in speech recognition and understanding, part i. IEEE Signal Process. Mag. 26(3), 75–80 (2009)
Article Google Scholar
Baker, J., Deng, L., Glass, J., Khudanpur, S., Lee, C.H., Morgan, N., O’Shgughnessy, D.: Updated minds report on speech recognition and understanding (research developments and directions in speech recognition and understanding, part ii ). IEEE Signal Process. Mag. 26(4), 78–85 (2009)
Article Google Scholar
Baum, L., Petrie, T.: Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Statist. 37(6), 1554–1563 (1966)
Article MATH MathSciNet Google Scholar
Bellman, R.: Dynamic Programming. Princeton University Press, Princeton (1957)
MATH Google Scholar
Bengio, Y.: Estimating or propagating gradients through stochastic neurons. CoRR (2013)
Google Scholar
Biem, A., Katagiri, S., McDermott, E., Juang, B.H.: An application of discriminative feature extraction to filter-bank-based speech recognition. IEEE Trans. Speech Audio Process. 9, 96–110 (2001)
Article Google Scholar
Bilmes, J.: A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report TR-97-021, ICSI (1997)
Google Scholar
Bilmes, J.: Buried Markov models: a graphical modeling approach to automatic speech recognition. Comput. Speech Lang. 17, 213–231 (2003)
Article Google Scholar
Bilmes, J.: What HMMs can do. IEICE Trans. Inf. Syst. E89-D(3), 869–891 (2006)
Google Scholar
Bilmes, J.: Dynamic graphical models. IEEE Signal Process. Mag. 33, 29–42 (2010)
Google Scholar
Bilmes, J., Bartels, C.: Graphical model architectures for speech recognition. IEEE Signal Process. Mag. 22, 89–100 (2005)
Article Google Scholar
Bridle, J., Deng, L., Picone, J., Richards, H., Ma, J., Kamm, T., Schuster, M., Pike, S., Reagan, R.: An investigation fo segmental hidden dynamic models of speech coarticulation for automatic speech recognition. Final Report for 1998 Workshop on Langauge Engineering, CLSP, Johns Hopkins (1998)
Google Scholar
Chen, X., Eversole, A., Li, G., Yu, D., Seide, F.: Pipelined back-propagation for context-dependent deep neural networks. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)
Google Scholar
Chengalvarayan, R., Deng, L.: HMM-based speech recognition using state-dependent, discriminatively derived transforms on mel-warped DFT features. IEEE Trans. Speech Audio Process. 5, 243–256 (1997)
Article Google Scholar
Chengalvarayan, R., Deng, L.: Speech trajectory discrimination using the minimum classification error learning. IEEE Trans. Speech Audio Process. 6, 505–515 (1998)
Article Google Scholar
Dahl, G., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2011)
Google Scholar
Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of International Conference on Machine Learning (ICML) (2014)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum-likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B. 39 (1977)
Google Scholar
Deng, L.: A generalized hidden Markov model with state-conditioned trend functions of time for the speech signal. Signal Process. 27(1), 65–78 (1992)
Article MATH Google Scholar
Deng, L.: A stochastic model of speech incorporating hierarchical nonstationarity. IEEE Trans. Acoust. Speech Signal Process. 1(4), 471–475 (1993)
Article Google Scholar
Deng, L.: A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition. Speech Commun. 24(4), 299–323 (1998)
Article Google Scholar
Deng, L.: Articulatory features and associated production models in statistical speech recognition. In: Computational Models of Speech Pattern Processing, pp. 214–224. Springer, New York (1999)
Google Scholar
Deng, L.: Computational models for speech production. In: Computational Models of Speech Pattern Processing, pp. 199–213. Springer, New York (1999)
Google Scholar
Deng, L.: Switching dynamic system models for speech articulation and acoustics. In: Mathematical Foundations of Speech and Language Processing, pp. 115–134. Springer, New York (2003)
Google Scholar
Deng, L.: Dynamic Speech Models—Theory, Algorithm, and Applications. Morgan and Claypool (2006)
Google Scholar
Deng, L.: Front-End, Back-End, and hybrid techniques to noise-robust speech recognition. Chapter 4 in Book: Robust Speech Recognition of Uncertain Data. Springer (2011)
Google Scholar
Deng, L., Acero, A., Plumpe, M., Huang, X.: Large vocabulary speech recognition under adverse acoustic environment. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 806–809 (2000)
Google Scholar
Deng, L., Aksmanovic, M., Sun, D., Wu, J.: Speech recognition using hidden Markov models with polynomial regression functions as non-stationary states. IEEE Trans. Acoust. Speech Signal Process. 2(4), 101–119 (1994)
Google Scholar
Deng, L., Attias, H., Lee, L., Acero, A.: Adaptive kalman smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model. IEEE Trans. Audio, Speech Lang. Process. 15, 13–23 (2007)
Article Google Scholar
Deng, L., Bazzi, I., Acero, A.: Tracking vocal tract resonances using an analytical nonlinear predictor and a target-guided temporal constraint. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH) (2003)
Google Scholar
Deng, L., Dang, J.: Speech analysis: the production-perception perspective. In: Advances in Chinese Spoken Language Processing. World Scientific Publishing, Singapore (2007)
Google Scholar
Deng, L., Droppo, J., Acero, A.: Recursive estimation of nonstationary noise using iterative stochastic approximation for robust speech recognition. IEEE Trans. Speech Audio Process. 11, 568–580 (2003)
Article Google Scholar
Deng, L., Droppo, J., Acero, A.: A Bayesian approach to speech feature enhancement using the dynamic cepstral prior. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-829–I-832 (2002)
Google Scholar
Deng, L., Droppo, J., Acero, A.: Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise. IEEE Trans. Speech Audio Process. 12(2), 133–143 (2004)
Article Google Scholar
Deng, L., Kenny, P., Lennig, M., Gupta, V., Seitz, F., Mermelsten, P.: Phonemic hidden Markov models with continuous mixture output densities for large vocabulary word recognition. IEEE Trans. Acoust. Speech Signal Process. 39(7), 1677–1681 (1991)
Article Google Scholar
Deng, L., Lennig, M., Seitz, F., Mermelstein, P.: Large vocabulary word recognition using context-dependent allophonic hidden Markov models. Comput. Speech Lang. 4, 345–357 (1991)
Article Google Scholar
Deng, L., Li, X.: Machine learning paradigms in speech recognition: an overview. IEEE Trans. Audio, Speech Lang. Process. 21(5), 1060–1089 (2013)
Article Google Scholar
Deng, L., Mark, J.: Parameter estimation for Markov modulated poisson processes via the em algorithm with time discretization. In: Telecommunication Systems (1993)
Google Scholar
Deng, L., O’Shaughnessy, D.: Speech Processing—A Dynamic and Optimization-Oriented Approach. Marcel Dekker Inc, New York (2003)
Google Scholar
Deng, L., Ramsay, G., Sun, D.: Production models as a structural basis for automatic speech recognition. Speech Commun. 33(2–3), 93–111 (1997)
Article Google Scholar
Deng, L., Rathinavelu, C.: A Markov model containing state-conditioned second-order non-stationarity: application to speech recognition. Comput. Speech Lang. 9(1), 63–86 (1995)
Article Google Scholar
Deng, L., Sameti, H.: Transitional speech units and their representation by regressive Markov states: applications to speech recognition. IEEE Trans. Speech Audio Process. 4(4), 301–306 (1996)
Article Google Scholar
Deng, L., Sun, D.: A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features. J. Acoust. Soc. Am. 85, 2702–2719 (1994)
Article Google Scholar
Deng, L., Wang, K., Acero, A., Hon, H., Droppo, J., Boulis, C., Wang, Y., Jacoby, D., Mahajan, M., Chelba, C., Huang, X.: Distributed speech processing in mipad’s multimodal user interface. IEEE Trans. Audio, Speech Lang. Process. 20(9), 2409–2419 (2012)
Google Scholar
Deng, L., Wu, J., Droppo, J., Acero, A.: Analysis and comparisons of two speech feature extraction/compensation algorithms (2005)
Google Scholar
Deng, L., Yu, D.: Use of differential cepstra as acoustic features in hidden trajectory modelling for phonetic recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 445–448 (2007)
Google Scholar
Deng, L., Yu, D., Acero, A.: A bidirectional target filtering model of speech coarticulation: two-stage implementation for phonetic recognition. IEEE Trans. Speech Audio Process. 14, 256–265 (2006)
Article Google Scholar
Deng, L., Yu, D., Acero, A.: Structured speech modeling. IEEE Trans. Speech Audio Process. 14, 1492–1504 (2006)
Article Google Scholar
Droppo, J., Acero, A.: Noise robust speech recognition with a switching linear dynamic model. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-953–I-956 (2004)
Google Scholar
Fox, E., Sudderth, E., Jordan, M., Willsky, A.: Bayesian nonparametric methods for learning Markov switching processes. IEEE Signal Process. Mag. 27(6), 43–54 (2010)
Google Scholar
Frey, B., Deng, L., Acero, A., Kristjansson, T.: Algonquin: iterating laplaces method to remove multiple types of acoustic distortion for robust speech recognition. In: Proceedings of European Conference on Speech Communication and Technology (EUROSPEECH) (2001)
Google Scholar
Fu, Q., Zhao, Y., Juang, B.H.: Automatic speech recognition based on non-uniform error criteria. IEEE Trans. Audio, Speech Lang. Process. 20(3), 780–793 (2012)
Article Google Scholar
Gales, M., Watanabe, S., Fosler-Lussier, E.: Structured discriminative models for speech recognition. IEEE Signal Process. Mag. 29, 70–81 (2012)
Article Google Scholar
Gales, M., Young, S.: Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4(5), 352–359 (1996)
Article Google Scholar
Gao, Y., Bakis, R., Huang, J., Xiang, B.: Multistage coarticulation model combining articulatory, formant and cepstral features. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 25–28. Beijing, China (2000)
Google Scholar
Gemmeke, J., Virtanen, T., Hurmalainen, A.: Exemplar-based sparse representations for noise robust automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 19(7), 2067–2080 (2011)
Article Google Scholar
Ghahramani, Z., Hinton, G.E.: Variational learning for switching state-space models. Neural Comput. 12, 831–864 (2000)
Article Google Scholar
Gong, Y., Illina, I., Haton, J.P.: Modeling long term variability information in mixture stochastic trajectory framework. In: Proceedings of International Conference on Spoken Language Processing (1996)
Google Scholar
He, X., Deng, L.: Discriminative Learning for Speech Recognition: Theory and Practice. Morgan and Claypool (2008)
Google Scholar
He, X., Deng, L.: Speech recognition, machine translation, and speech translation—a unified discriminative learning paradigm. IEEE Signal Process. Mag. 27, 126–133 (2011)
Article Google Scholar
He, X., Deng, L., Chou, W.: Discriminative learning in sequential pattern recognition—a unifying review for optimization-oriented speech recognition. IEEE Signal Process. Mag. 25(5), 14–36 (2008)
Article Google Scholar
Heigold, G., Ney, H., Schluter, R.: Investigations on an EM-style optimization algorithm for discriminative training of HMMs. IEEE Trans. Audio, Speech Lang. Process. 21(12), 2616–2626 (2013)
Article Google Scholar
Heigold, G., Wiesler, S., Nubbaum-Thom, M., Lehnen, P., Schluter, R., Ney, H.: Discriminative HMMs. log-linear models, and CRFs: what is the difference? In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2010)
Google Scholar
Hoffman, M.D., Blei, D.M., Wang, C., Paisley, J.: Stochastic variational inference (2013)
Google Scholar
Holmes, W., Russell, M.: Probabilistic-trajectory segmental HMMs. Comput. Speech Lang. 13, 3–37 (1999)
Article Google Scholar
Huang, X., Acero, A., Hon, H.W., et al.: Spoken Language Processing, vol. 18. Prentice Hall, Englewood Cliffs (2001)
Google Scholar
Huang, X., Deng, L.: An overview of modern speech recognition. In: Indurkhya, N., Damerau, F.J. (eds.) Handbook of Natural Language Processing, 2nd edn. CRC Press, Taylor and Francis Group, Boca Raton, FL (2010). ISBN 978-1420085921
Google Scholar
Jelinek, F.: Continuous speech recognition by statistical methods. Proc. IEEE 64(4), 532–557 (1976)
Article Google Scholar
Juang, B.H., Hou, W., Lee, C.H.: Minimum classification error rate methods for speech recognition. IEEE Trans. Speech Audio Process. 5(3), 257–265 (1997)
Article Google Scholar
Juang, B.H., Levinson, S.E., Sondhi, M.M.: Maximum likelihood estimation for mixture multivariate stochastic observations of Markov chains. IEEE Int. Symp. Inf. Theory 32(2), 307–309 (1986)
Article Google Scholar
Kalinli, O., Seltzer, M.L., Droppo, J., Acero, A.: Noise adaptive training for robust automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 18(8), 1889–1901 (2010)
Article Google Scholar
Kello, C.T., Plaut, D.C.: A neural network model of the articulatory-acoustic forward mapping trained on recordings of articulatory parameters. J. Acoust. Soc. Am. 116(4), 2354–2364 (2004)
Article Google Scholar
King, S., Frankel, J., Livescu, K., McDermott, E., Richmond, K., Wester, M.: J. Acoust. Soc. Am. 121, 723–742 (2007)
Google Scholar
Kingma, D., Welling, M.: Efficient gradient-based inference through transformations between bayes nets and neural nets. In: Proceedings of International Conference on Machine Learning (ICML) (2014)
Google Scholar
Lee, L., Attias, H., Deng, L.: Variational inference and learning for segmental switching state space models of hidden speech dynamics. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-872–I-875 (2003)
Google Scholar
Lee, L.J., Fieguth, P., Deng, L.: A functional articulatory dynamic model for speech production. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 797–800. Salt Lake City, Utah (2001)
Google Scholar
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: High-performance hmm adaptation with joint compensation of additive and convolutive distortions via vector taylor series. In: Proceedings of IEEE Workshop on Automfatic Speech Recognition and Understanding (ASRU), pp. 65–70 (2007)
Google Scholar
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: HMM adaptation using a phase-sensitive acoustic distortion model for environment-robust speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4069–4072 (2008)
Google Scholar
Li, J., Deng, L., Yu, D., Gong, Y., Acero, A.: A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions. Comput. Speech Lang. 23(3), 389–405 (2009)
Article Google Scholar
Liu, F.H., Stern, R.M., Huang, X., Acero, A.: Efficient cepstral normalization for robust speech recognition. In: Proceedings of ACL Workshop on Human Language Technologies (ACL-HLT), pp. 69–74 (1993)
Google Scholar
Liu, S., Sim, K.: Temporally varying weight regression: a semi-parametric trajectory model for automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 22(1), 151–160 (2014)
Article Google Scholar
Livescu, K., Fosler-Lussier, E., Metze, F.: Subword modeling for automatic speech recognition: past, present, and emerging approaches. IEEE Signal Process. Mag. 29(6), 44–57 (2012)
Article Google Scholar
Ma, J., Deng, L.: A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech. Comput. Speech Lang. 14, 101–104 (2000)
Article Google Scholar
Ma, J., Deng, L.: Efficient decoding strategies for conversational speech recognition using a constrained nonlinear state-space model. IEEE Trans. Audio, Speech Lang. Process. 11(6), 590–602 (2004)
Article Google Scholar
Ma, J., Deng, L.: Target-directed mixture dynamic models for spontaneous speech recognition. IEEE Trans. Audio Speech Process. 12(1), 47–58 (2004)
Article Google Scholar
Macherey, W., Ney, H.: A comparative study on maximum entropy and discriminative training for acoustic modeling in automatic speech recognition. In: Proceedings of Eurospeech, pp. 493–496 (2003)
Google Scholar
Mak, B., Tam, Y., Li, P.: Discriminative auditory-based features for robust speech recognition. IEEE Trans. Speech Audio Process. 12, 28–36 (2004)
Article Google Scholar
Mesot, B., Barber, D.: Switching linear dynamical systems for noise robust speech recognition. IEEE Trans. Audio, Speech Lang. Process. 15(6), 1850–1858 (2007)
Article Google Scholar
Mnih, A., Gregor, K.: Neural variational inference and learning in belief networks. In: Proceedings of International Conference on Machine Learning (ICML) (2014)
Google Scholar
Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)
Article Google Scholar
Ostendorf, M., Digalakis, V., Kimball, O.: From HMM’s to segment models: a unified view of stochastic modeling for speech recognition. IEEE Trans. Audio Speech Process. 4(5) (1996)
Google Scholar
Ostendorf, M., Kannan, A., Kimball, O., Rohlicek, J.: Continuous word recognition based on the stochastic segment model. In: Proceedings of the DARPA Workshop CSR (1992)
Google Scholar
Pavlovic, V., Frey, B., Huang, T.: Variational learning in mixed-state dynamic graphical models. In: UAI, pp. 522–530. Stockholm (1999)
Google Scholar
Picone, J., Pike, S., Regan, R., Kamm, T., Bridle, J., Deng, L., Ma, Z., Richards, H., Schuster, M.: Initial evaluation of hidden dynamic models on conversational speech. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (1999)
Google Scholar
Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4057–4060 (2008)
Google Scholar
Povey, D., Kingsbury, B., Mangu, L., Saon, G., Soltau, H., Zweig, G.: FMPE: discriminatively trained features for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 961–964 (2005)
Google Scholar
Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–105 (2002)
Google Scholar
Povey, D., Woodland, P.C.: Minimum phone error and i-smoothing for improved discriminative training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 105–108 (2002)
Google Scholar
Rabiner, L.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Rabiner, L., Juang, B.H.: An introduction to hidden Markov models. IEEE ASSP Mag. 3(1), 4–16 (1986)
Article Google Scholar
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Upper Saddle River (1993)
Google Scholar
Rosti, A., Gales, M.: Rao-blackwellised gibbs sampling for switching linear dynamical systems. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-809–I-812 (2004)
Google Scholar
Russell, M., Jackson, P.: A multiple-level linear/linear segmental HMM with a formant-based intermediate layer. Comput. Speech Lang. 19, 205–225 (2005)
Article Google Scholar
Sakoe, H., Chiba, S.: Dynamic programming algorithm optimization for spoken word recognition. In: Readings in Speech Recognition, pp. 159–165. Morgan Kaufmann Publishers Inc, San Francisco (1990)
Google Scholar
Schlueter, R., Macherey, W., Mueller, B., Ney, H.: Comparison of discriminative training criteria and optimization methods for speech recognition. Speech Commun. 31, 287–310 (2001)
Article Google Scholar
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011)
Google Scholar
Sun, J., Deng, L.: An overlapping-feature based phonological model incorporating linguistic constraints: applications to speech recognition. J. Acoust. Soc. Am. 111, 1086–1101 (2002)
Article Google Scholar
Suzuki, J., Fujino, A., Isozaki, H.: Semi-supervised structured output learning based on a hybrid generative and discriminative approach. In: Proceedings of EMNLP-CoNLL (2007)
Google Scholar
Wang, Y., Gales, M.J.: Speaker and noise factorization for robust speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(7), 2149–2158 (2012)
Article Google Scholar
Woodland, P.C., Povey, D.: Large scale discriminative training of hidden Markov models for speech recognition. Comput. Speech Lang. (2002)
Google Scholar
Wright, S., Kanevsky, D., Deng, L., He, X., Heigold, G., Li, H.: Optimization algorithms and applications for speech and language processing. IIEEE Trans. Audio, Speech Lang. Process. 21(11), 2231–2243 (2013)
Article Google Scholar
Xing, E., Jordan, M., Russell, S.: A generalized mean field algorithm for variational inference in exponential families. In: Proceedings of Uncertainty in Artificial Intelligence (2003)
Google Scholar
Yu, D., Deng, L.: Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation. Comput. Speech Lang. 27, 72–87 (2007)
Article Google Scholar
Yu, D., Deng, L., Acero, A.: A lattice search technique for a long-contextual-span hidden trajectory model of speech. Speech Commun. 48, 1214–1226 (2006)
Article Google Scholar
Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010)
Google Scholar
Yu, D., Deng, L., Gong, Y., Acero, A.: A novel framework and training algorithm for variable-parameter hidden Markov models. IEEE Trans. Audio, Speech Lang. Process. 17(7), 1348–1360 (2009)
Article Google Scholar
Yu, D., Deng, L., He, X., Acero, A.: Use of incrementally regulated discriminative margins in MCE training for speech recognition. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH) (2006)
Google Scholar
Yu, D., Deng, L., He, X., Acero, A.: Use of incrementally regulated discriminative margins in mce training for speech recognition. In: Proceedings of International Conference on Spoken Language Processing (ICSLP), pp. 2418–2421 (2006)
Google Scholar
Yu, D., Deng, L., He, X., Acero, A.: Large-margin minimum classification error training: a theoretical risk minimization perspective. Comput. Speech Lang. 22, 415–429 (2008)
Article Google Scholar
Zen, H., Tokuda, K., Kitamura, T.: An introduction of trajectory model into HMM-based speech synthesis. In: Proceedings of ISCA SSW5, pp. 191–196 (2004)
Google Scholar
Zhang, B., Matsoukas, S., Schwartz, R.: Discriminatively trained region dependent feature transforms for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I–I (2006)
Google Scholar
Zhang, L., Renals, S.: Acoustic-articulatory modelling with the trajectory HMM. IEEE Signal Process. Lett. 15, 245–248 (2008)
Article Google Scholar
Zhang, S., Gales, M.: Structured SVMs for automatic speech recognition. IEEE Trans. Audio, Speech Lang. Process. 21(3), 544–555 (2013)
Article Google Scholar
Zhou, J.L., Seide, F., Deng, L.: Coarticulation modeling by embedding a target-directed hidden trajectory model into HMM—model and training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 744–747. Hongkong (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Bothell, USA
Dong Yu
Microsoft Research, Redmond, WA, USA
Li Deng

Authors

Dong Yu
View author publications
You can also search for this author in PubMed Google Scholar
Li Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dong Yu .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yu, D., Deng, L. (2015). Hidden Markov Models and the Variants. In: Automatic Speech Recognition. Signals and Communication Technology. Springer, London. https://doi.org/10.1007/978-1-4471-5779-3_3

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5779-3_3
Published: 12 November 2014
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5778-6
Online ISBN: 978-1-4471-5779-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics