Hidden Markov Models

Fink, Gernot A.

doi:10.1007/978-1-4471-6308-4_5

Gernot A. Fink⁴

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

4757 Accesses
4 Citations

Abstract

In the field of pattern recognition, signals are frequently thought of as the product of a statistical generation process. The primary goal of analyzing these signals is to model their statistical properties as exactly as possible. However, the model to be determined should not only replicate the generation of certain data but also deliver useful information for segmenting the signals into meaningful units.

Hidden Markov models are able to account for both these modeling aspects. First, they define a generative statistical model that is able to generate data sequences according to rather complex probability distributions and that can be used for classifying sequential patterns. Second, information about the segmentation of the data considered can be derived from the two-stage stochastic process that is described by a hidden Markov model. Consequently, hidden Markov models possess the quite remarkable ability to treat segmentation and classification of patterns in an integrated framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In the tradition of research at IBM, HMMs are described in a slightly different way. There outputs are generated during state transitions, i.e., on the edges of the model ([11], cf. e.g. [136, p. 17]). With respect to its expressive power this formulation is, however, completely equivalent to the one which is used in this book and also throughout the majority of the literature as is also shown in [136, pp. 35–36].
2.
In [125, p. 150] analogously to the start probabilities also termination probabilities are defined. The HMM architectures used for analyzing biological sequences usually contain special non-emitting start and end states (see also Sect. 8.4).
3.
The empirical distribution of a continuous random variable would assign non-zero density values only to known data points and, therefore, would be useless in practice. An approximation of the true density function by a Parzen estimate (cf. e.g. [65, Sect. 4.3]) would be possible in principle but require the storage of the complete data set.
4.
According to Rabiner ([251], [252, p. 322]) the idea of characterizing the possible use cases of HMMs in the form of three fundamental problems paired with three corresponding algorithms for their solution goes back to Jack Ferguson, Institute for Defense Analyses [79].
5.
Original quote from Ralph Waldo Emerson (American Philosopher, 1803–1882) from the essay “Self Reliance” (1841).
6.
As the name indicates already, there exists a matching counterpart of the forward algorithm which is referred to as the backward algorithm. Taken together they constitute the so-called forward–backward algorithm which will be presented in its entirety during the description of the HMM parameter training in Sect. 5.7.
7.
If specially marked end states are used (cf. e.g. [67, p. 51]), the maximization needs to be restricted to the appropriate set. Alternatively, also additional termination probabilities can be considered when computing the final path probabilities (cf. e.g. [125, p. 150]).
8.
In general, the decision for the optimal predecessor state is not unique. Multiple sequences \(\boldsymbol{s}^{*}_{k}\) with identical scores maximizing Eq. (5.6) might exist. In practical applications, therefore, a rule for breaking up such ambiguities is necessary which might, e.g., be a preference for states with lower indices.
9.
In practice this means that the chosen quality measure does not change any more within the scope of the available computational accuracy.
10.
Of course the optimization of general continuous mixture densities is possible with all training methods. However, it always represents the most challenging part of the procedure.
11.
The updated start probabilities \(\hat{\pi}_{i}\) can be considered as a special case of the transition probabilities \(\hat{a}_{ij}\) and can, therefore, be computed analogously.
12.
The fact that the total output probability P(O∣λ) may be computed both via the forward and the backward algorithm can be exploited in practice to check the computations for consistency and accuracy.
13.
Due to the close connection in their meaning and in order not to unnecessarily make the notation any more complex, the state probability is denoted with a single argument as γ _t(i) and the state-transition probability with two arguments as γ _t(i,j).
14.
According to common opinion, HMMs do not perform a state transition into a specially marked end state when reaching the end of the observation sequence at time T. Therefore, the restriction to all prior points in time is necessary here.
15.
For any given time t the symbol o _k was either present in the observation sequence or not. Therefore, the probability P(S _t=j,O _t=o _k∣O,λ) either takes on the value zero or is equal to P(S _t=j∣O,λ) which is simply γ _t(j).
16.
For time t=1 in Eq. (5.20) the term \(\sum_{i=1}^{N} \alpha_{t-1}(i) a_{ij}\) needs to be replaced by the respective start probability π _j of the associated state.
17.
Please note that in [209] and in subsequent works using this method, covariance modeling and state-transition probabilities hardly play any role. The update equation for covariances is, therefore, given in analogy to Eq. (5.24). Parameter updates for transition probabilities can be obtained in the same way as for discrete models (see Eq. (5.27)).
18.
For reasons of efficiency it is rather obvious to use the k-means algorithm as vector quantization method as it achieves competitive results with only a single pass through the data. However, principally any algorithm for vector quantizer design or the unsupervised estimation of mixture densities could be applied.
19.
The estimation of transition probabilities and discrete output probabilities is, for example, explained in [125, pp. 157–158].

References

Bahl, L.R., Brown, P.F., de Souza, P.V., Mercer, R.L.: Estimating hidden Markov model parameters so as to maximize speech recognition accuracy. IEEE Trans. on Speech and Audio Processing 1(1), 77–83 (1993)
Article Google Scholar
Bahl, L.R., Jelinek, F.: Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition. IEEE Trans. on Information Theory 21(4), 404–411 (1975)
Article MATH Google Scholar
Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics 41, 164–171 (1970)
Article MATH MathSciNet Google Scholar
Burshtein, D.: Robust parametric modelling of durations in hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 3(4), 240–242 (1996)
MathSciNet Google Scholar
Chow, Y.-L.: Maximum Mutual Information estimation of HMM parameters for continuous speech recognition using the N-best algorithm. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 701–704 (1990)
Chapter Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–22 (1977)
MATH MathSciNet Google Scholar
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)
MATH Google Scholar
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Book Google Scholar
Ferguson, J.D.: Hidden Markov analysis: an introduction. In: Ferguson, J.D. (ed.) Symposium on the Application of Hidden Markov Models to Text and Speech, pp. 8–15. Institute for Defense Analyses, Communications Research Division, Princeton (1980)
Google Scholar
Ferguson, T.S.: Bayesian density estimation by mixtures of normal distributions. In: Rizvi, M., Rustagi, J., Siegmund, D. (eds.) Recent Advances in Statistics, pp. 287–302. Academic Press, New York (1983)
Google Scholar
Forney, G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)
Article MathSciNet Google Scholar
Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)
Google Scholar
Huang, X.D., Ariki, Y., Jack, M.A.: Hidden Markov Models for Speech Recognition. Information Technology Series, vol. 7. Edinburgh University Press, Edinburgh (1990)
Google Scholar
Huang, X.D., Jack, M.A.: Semi-continuous Hidden Markov Models for speech signals. Comput. Speech Lang. 3(3), 239–251 (1989)
Article Google Scholar
Jelinek, F.: A fast sequential decoding algorithm using a stack. IBM J. Res. Dev. 13(6), 675–685 (1969)
Article MATH MathSciNet Google Scholar
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)
Google Scholar
Juang, B.-H., Rabiner, L.R.: The segmental k-means algorithm for estimating parameters of Hidden Markov Models. IEEE Trans. Acoust. Speech Signal Process. 38(9), 1639–1641 (1990)
Article MATH Google Scholar
Lee, C.H., Rabiner, L.R., Pieraccini, R., Wilpon, J.G.: Acoustic modeling for large vocabulary speech recognition. Comput. Speech Lang. 4, 127–165 (1990)
Article Google Scholar
Levinson, S.E.: Continuously variable duration Hidden Markov Models for automatic speech recognition. Comput. Speech Lang. 1(1), 29–45 (1986)
Article Google Scholar
Markov, A.A.: Примѣръ статистическаго изслѣдованiя надъ текстомъ “Евгенiя Онѣгина” иллюстрирующiй связь испытанiй в цѣпь (Example of statistical investigations of the text of “Eugen Onegin”, wich demonstrates the connection of events in a chain). In: Извѣстiя Императорской Академiй Наукъ (Bulletin de l’Académie Impériale des Sciences de St.-Pétersbourg), Sankt-Petersburg, pp. 153–162 (1913) (in Russian)
Google Scholar
Merhav, N., Ephraim, Y.: Hidden Markov Modeling using a dominant state sequence with application to speech recognition. Comput. Speech Lang. 5(4), 327–339 (1991)
Article Google Scholar
Morgan, N., Bourlard, H.: Continuous speech recognition. IEEE Signal Process. Mag. 12(3), 24–42 (1995)
Article Google Scholar
Ney, H., Steinbiss, V., Haeb-Umbach, R., Tran, B.-H., Essen, U.: An overview of the Philips research system for large vocabulary continuous speech recognition. Int. J. Pattern Recognit. Artif. Intell. 8(1), 33–70 (1994)
Article Google Scholar
Nilsson, N.J.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann, San Francisco (1998)
MATH Google Scholar
Paul, D.B.: An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, San Francisco, pp. 25–28 (1992)
Google Scholar
Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)
Google Scholar
Rigoll, G.: Maximum Mutual Information Neural Networks for hybrid connectionist-HMM speech recognition systems. IEEE Trans. Audio Speech Lang. Process. 2(1), 175–184 (1994)
Google Scholar
Rottland, J., Rigoll, G.: Tied posteriors: an approach for effective introduction of context dependency in hybrid NN/HMM LVCSR. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul (2000)
Google Scholar
Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)
Article MATH Google Scholar
Wellekens, C.J.: Mixture density estimators in Viterbi training. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 361–364 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, TU Dortmund University, Dortmund, Germany
Gernot A. Fink

Authors

Gernot A. Fink
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fink, G.A. (2014). Hidden Markov Models. In: Markov Models for Pattern Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6308-4_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6308-4_5
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6307-7
Online ISBN: 978-1-4471-6308-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics