Abstract
In the field of pattern recognition, signals are frequently thought of as the product of a statistical generation process. The primary goal of analyzing these signals is to model their statistical properties as exactly as possible. However, the model to be determined should not only replicate the generation of certain data but also deliver useful information for segmenting the signals into meaningful units.
Hidden Markov models are able to account for both these modeling aspects. First, they define a generative statistical model that is able to generate data sequences according to rather complex probability distributions and that can be used for classifying sequential patterns. Second, information about the segmentation of the data considered can be derived from the two-stage stochastic process that is described by a hidden Markov model. Consequently, hidden Markov models possess the quite remarkable ability to treat segmentation and classification of patterns in an integrated framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In the tradition of research at IBM, HMMs are described in a slightly different way. There outputs are generated during state transitions, i.e., on the edges of the model ([11], cf. e.g. [136, p. 17]). With respect to its expressive power this formulation is, however, completely equivalent to the one which is used in this book and also throughout the majority of the literature as is also shown in [136, pp. 35–36].
- 2.
- 3.
The empirical distribution of a continuous random variable would assign non-zero density values only to known data points and, therefore, would be useless in practice. An approximation of the true density function by a Parzen estimate (cf. e.g. [65, Sect. 4.3]) would be possible in principle but require the storage of the complete data set.
- 4.
- 5.
Original quote from Ralph Waldo Emerson (American Philosopher, 1803–1882) from the essay “Self Reliance” (1841).
- 6.
As the name indicates already, there exists a matching counterpart of the forward algorithm which is referred to as the backward algorithm. Taken together they constitute the so-called forward–backward algorithm which will be presented in its entirety during the description of the HMM parameter training in Sect. 5.7.
- 7.
- 8.
In general, the decision for the optimal predecessor state is not unique. Multiple sequences \(\boldsymbol{s}^{*}_{k}\) with identical scores maximizing Eq. (5.6) might exist. In practical applications, therefore, a rule for breaking up such ambiguities is necessary which might, e.g., be a preference for states with lower indices.
- 9.
In practice this means that the chosen quality measure does not change any more within the scope of the available computational accuracy.
- 10.
Of course the optimization of general continuous mixture densities is possible with all training methods. However, it always represents the most challenging part of the procedure.
- 11.
The updated start probabilities \(\hat{\pi}_{i}\) can be considered as a special case of the transition probabilities \(\hat{a}_{ij}\) and can, therefore, be computed analogously.
- 12.
The fact that the total output probability P(O∣λ) may be computed both via the forward and the backward algorithm can be exploited in practice to check the computations for consistency and accuracy.
- 13.
Due to the close connection in their meaning and in order not to unnecessarily make the notation any more complex, the state probability is denoted with a single argument as γ t (i) and the state-transition probability with two arguments as γ t (i,j).
- 14.
According to common opinion, HMMs do not perform a state transition into a specially marked end state when reaching the end of the observation sequence at time T. Therefore, the restriction to all prior points in time is necessary here.
- 15.
For any given time t the symbol o k was either present in the observation sequence or not. Therefore, the probability P(S t =j,O t =o k ∣O,λ) either takes on the value zero or is equal to P(S t =j∣O,λ) which is simply γ t (j).
- 16.
For time t=1 in Eq. (5.20) the term \(\sum_{i=1}^{N} \alpha_{t-1}(i) a_{ij}\) needs to be replaced by the respective start probability π j of the associated state.
- 17.
Please note that in [209] and in subsequent works using this method, covariance modeling and state-transition probabilities hardly play any role. The update equation for covariances is, therefore, given in analogy to Eq. (5.24). Parameter updates for transition probabilities can be obtained in the same way as for discrete models (see Eq. (5.27)).
- 18.
For reasons of efficiency it is rather obvious to use the k-means algorithm as vector quantization method as it achieves competitive results with only a single pass through the data. However, principally any algorithm for vector quantizer design or the unsupervised estimation of mixture densities could be applied.
- 19.
The estimation of transition probabilities and discrete output probabilities is, for example, explained in [125, pp. 157–158].
References
Bahl, L.R., Brown, P.F., de Souza, P.V., Mercer, R.L.: Estimating hidden Markov model parameters so as to maximize speech recognition accuracy. IEEE Trans. on Speech and Audio Processing 1(1), 77–83 (1993)
Bahl, L.R., Jelinek, F.: Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition. IEEE Trans. on Information Theory 21(4), 404–411 (1975)
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics 41, 164–171 (1970)
Burshtein, D.: Robust parametric modelling of durations in hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 3(4), 240–242 (1996)
Chow, Y.-L.: Maximum Mutual Information estimation of HMM parameters for continuous speech recognition using the N-best algorithm. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 701–704 (1990)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–22 (1977)
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Ferguson, J.D.: Hidden Markov analysis: an introduction. In: Ferguson, J.D. (ed.) Symposium on the Application of Hidden Markov Models to Text and Speech, pp. 8–15. Institute for Defense Analyses, Communications Research Division, Princeton (1980)
Ferguson, T.S.: Bayesian density estimation by mixtures of normal distributions. In: Rizvi, M., Rustagi, J., Siegmund, D. (eds.) Recent Advances in Statistics, pp. 287–302. Academic Press, New York (1983)
Forney, G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)
Huang, X.D., Ariki, Y., Jack, M.A.: Hidden Markov Models for Speech Recognition. Information Technology Series, vol. 7. Edinburgh University Press, Edinburgh (1990)
Huang, X.D., Jack, M.A.: Semi-continuous Hidden Markov Models for speech signals. Comput. Speech Lang. 3(3), 239–251 (1989)
Jelinek, F.: A fast sequential decoding algorithm using a stack. IBM J. Res. Dev. 13(6), 675–685 (1969)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)
Juang, B.-H., Rabiner, L.R.: The segmental k-means algorithm for estimating parameters of Hidden Markov Models. IEEE Trans. Acoust. Speech Signal Process. 38(9), 1639–1641 (1990)
Lee, C.H., Rabiner, L.R., Pieraccini, R., Wilpon, J.G.: Acoustic modeling for large vocabulary speech recognition. Comput. Speech Lang. 4, 127–165 (1990)
Levinson, S.E.: Continuously variable duration Hidden Markov Models for automatic speech recognition. Comput. Speech Lang. 1(1), 29–45 (1986)
Markov, A.A.: Примѣръ статистическаго изслѣдованiя надъ текстомъ “Евгенiя Онѣгина” иллюстрирующiй связь испытанiй в цѣпь (Example of statistical investigations of the text of “Eugen Onegin”, wich demonstrates the connection of events in a chain). In: Извѣстiя Императорской Академiй Наукъ (Bulletin de l’Académie Impériale des Sciences de St.-Pétersbourg), Sankt-Petersburg, pp. 153–162 (1913) (in Russian)
Merhav, N., Ephraim, Y.: Hidden Markov Modeling using a dominant state sequence with application to speech recognition. Comput. Speech Lang. 5(4), 327–339 (1991)
Morgan, N., Bourlard, H.: Continuous speech recognition. IEEE Signal Process. Mag. 12(3), 24–42 (1995)
Ney, H., Steinbiss, V., Haeb-Umbach, R., Tran, B.-H., Essen, U.: An overview of the Philips research system for large vocabulary continuous speech recognition. Int. J. Pattern Recognit. Artif. Intell. 8(1), 33–70 (1994)
Nilsson, N.J.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann, San Francisco (1998)
Paul, D.B.: An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, San Francisco, pp. 25–28 (1992)
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Rigoll, G.: Maximum Mutual Information Neural Networks for hybrid connectionist-HMM speech recognition systems. IEEE Trans. Audio Speech Lang. Process. 2(1), 175–184 (1994)
Rottland, J., Rigoll, G.: Tied posteriors: an approach for effective introduction of context dependency in hybrid NN/HMM LVCSR. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul (2000)
Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)
Wellekens, C.J.: Mixture density estimators in Viterbi training. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 361–364 (1992)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag London
About this chapter
Cite this chapter
Fink, G.A. (2014). Hidden Markov Models. In: Markov Models for Pattern Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6308-4_5
Download citation
DOI: https://doi.org/10.1007/978-1-4471-6308-4_5
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6307-7
Online ISBN: 978-1-4471-6308-4
eBook Packages: Computer ScienceComputer Science (R0)