Skip to main content

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

Abstract

In the field of pattern recognition, signals are frequently thought of as the product of a statistical generation process. The primary goal of analyzing these signals is to model their statistical properties as exactly as possible. However, the model to be determined should not only replicate the generation of certain data but also deliver useful information for segmenting the signals into meaningful units.

Hidden Markov models are able to account for both these modeling aspects. First, they define a generative statistical model that is able to generate data sequences according to rather complex probability distributions and that can be used for classifying sequential patterns. Second, information about the segmentation of the data considered can be derived from the two-stage stochastic process that is described by a hidden Markov model. Consequently, hidden Markov models possess the quite remarkable ability to treat segmentation and classification of patterns in an integrated framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In the tradition of research at IBM, HMMs are described in a slightly different way. There outputs are generated during state transitions, i.e., on the edges of the model ([11], cf. e.g. [136, p. 17]). With respect to its expressive power this formulation is, however, completely equivalent to the one which is used in this book and also throughout the majority of the literature as is also shown in [136, pp. 35–36].

  2. 2.

    In [125, p. 150] analogously to the start probabilities also termination probabilities are defined. The HMM architectures used for analyzing biological sequences usually contain special non-emitting start and end states (see also Sect. 8.4).

  3. 3.

    The empirical distribution of a continuous random variable would assign non-zero density values only to known data points and, therefore, would be useless in practice. An approximation of the true density function by a Parzen estimate (cf. e.g. [65, Sect. 4.3]) would be possible in principle but require the storage of the complete data set.

  4. 4.

    According to Rabiner ([251], [252, p. 322]) the idea of characterizing the possible use cases of HMMs in the form of three fundamental problems paired with three corresponding algorithms for their solution goes back to Jack Ferguson, Institute for Defense Analyses [79].

  5. 5.

    Original quote from Ralph Waldo Emerson (American Philosopher, 1803–1882) from the essay “Self Reliance” (1841).

  6. 6.

    As the name indicates already, there exists a matching counterpart of the forward algorithm which is referred to as the backward algorithm. Taken together they constitute the so-called forward–backward algorithm which will be presented in its entirety during the description of the HMM parameter training in Sect. 5.7.

  7. 7.

    If specially marked end states are used (cf. e.g. [67, p. 51]), the maximization needs to be restricted to the appropriate set. Alternatively, also additional termination probabilities can be considered when computing the final path probabilities (cf. e.g. [125, p. 150]).

  8. 8.

    In general, the decision for the optimal predecessor state is not unique. Multiple sequences \(\boldsymbol{s}^{*}_{k}\) with identical scores maximizing Eq. (5.6) might exist. In practical applications, therefore, a rule for breaking up such ambiguities is necessary which might, e.g., be a preference for states with lower indices.

  9. 9.

    In practice this means that the chosen quality measure does not change any more within the scope of the available computational accuracy.

  10. 10.

    Of course the optimization of general continuous mixture densities is possible with all training methods. However, it always represents the most challenging part of the procedure.

  11. 11.

    The updated start probabilities \(\hat{\pi}_{i}\) can be considered as a special case of the transition probabilities \(\hat{a}_{ij}\) and can, therefore, be computed analogously.

  12. 12.

    The fact that the total output probability P(Oλ) may be computed both via the forward and the backward algorithm can be exploited in practice to check the computations for consistency and accuracy.

  13. 13.

    Due to the close connection in their meaning and in order not to unnecessarily make the notation any more complex, the state probability is denoted with a single argument as γ t (i) and the state-transition probability with two arguments as γ t (i,j).

  14. 14.

    According to common opinion, HMMs do not perform a state transition into a specially marked end state when reaching the end of the observation sequence at time T. Therefore, the restriction to all prior points in time is necessary here.

  15. 15.

    For any given time t the symbol o k was either present in the observation sequence or not. Therefore, the probability P(S t =j,O t =o k O,λ) either takes on the value zero or is equal to P(S t =jO,λ) which is simply γ t (j).

  16. 16.

    For time t=1 in Eq. (5.20) the term \(\sum_{i=1}^{N} \alpha_{t-1}(i) a_{ij}\) needs to be replaced by the respective start probability π j of the associated state.

  17. 17.

    Please note that in [209] and in subsequent works using this method, covariance modeling and state-transition probabilities hardly play any role. The update equation for covariances is, therefore, given in analogy to Eq. (5.24). Parameter updates for transition probabilities can be obtained in the same way as for discrete models (see Eq. (5.27)).

  18. 18.

    For reasons of efficiency it is rather obvious to use the k-means algorithm as vector quantization method as it achieves competitive results with only a single pass through the data. However, principally any algorithm for vector quantizer design or the unsupervised estimation of mixture densities could be applied.

  19. 19.

    The estimation of transition probabilities and discrete output probabilities is, for example, explained in [125, pp. 157–158].

References

  1. Bahl, L.R., Brown, P.F., de Souza, P.V., Mercer, R.L.: Estimating hidden Markov model parameters so as to maximize speech recognition accuracy. IEEE Trans. on Speech and Audio Processing 1(1), 77–83 (1993)

    Article  Google Scholar 

  2. Bahl, L.R., Jelinek, F.: Decoding for channels with insertions, deletions, and substitutions with applications to speech recognition. IEEE Trans. on Information Theory 21(4), 404–411 (1975)

    Article  MATH  Google Scholar 

  3. Baum, L.E., Petrie, T., Soules, G., Weiss, N.: A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics 41, 164–171 (1970)

    Article  MATH  MathSciNet  Google Scholar 

  4. Burshtein, D.: Robust parametric modelling of durations in hidden Markov models. IEEE Trans. Acoust. Speech Signal Process. 3(4), 240–242 (1996)

    MathSciNet  Google Scholar 

  5. Chow, Y.-L.: Maximum Mutual Information estimation of HMM parameters for continuous speech recognition using the N-best algorithm. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 701–704 (1990)

    Chapter  Google Scholar 

  6. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39(1), 1–22 (1977)

    MATH  MathSciNet  Google Scholar 

  7. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, New York (2001)

    MATH  Google Scholar 

  9. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  Google Scholar 

  10. Ferguson, J.D.: Hidden Markov analysis: an introduction. In: Ferguson, J.D. (ed.) Symposium on the Application of Hidden Markov Models to Text and Speech, pp. 8–15. Institute for Defense Analyses, Communications Research Division, Princeton (1980)

    Google Scholar 

  11. Ferguson, T.S.: Bayesian density estimation by mixtures of normal distributions. In: Rizvi, M., Rustagi, J., Siegmund, D. (eds.) Recent Advances in Statistics, pp. 287–302. Academic Press, New York (1983)

    Google Scholar 

  12. Forney, G.D.: The Viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  13. Huang, X., Acero, A., Hon, H.-W.: Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Prentice Hall, Englewood Cliffs (2001)

    Google Scholar 

  14. Huang, X.D., Ariki, Y., Jack, M.A.: Hidden Markov Models for Speech Recognition. Information Technology Series, vol. 7. Edinburgh University Press, Edinburgh (1990)

    Google Scholar 

  15. Huang, X.D., Jack, M.A.: Semi-continuous Hidden Markov Models for speech signals. Comput. Speech Lang. 3(3), 239–251 (1989)

    Article  Google Scholar 

  16. Jelinek, F.: A fast sequential decoding algorithm using a stack. IBM J. Res. Dev. 13(6), 675–685 (1969)

    Article  MATH  MathSciNet  Google Scholar 

  17. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)

    Google Scholar 

  18. Juang, B.-H., Rabiner, L.R.: The segmental k-means algorithm for estimating parameters of Hidden Markov Models. IEEE Trans. Acoust. Speech Signal Process. 38(9), 1639–1641 (1990)

    Article  MATH  Google Scholar 

  19. Lee, C.H., Rabiner, L.R., Pieraccini, R., Wilpon, J.G.: Acoustic modeling for large vocabulary speech recognition. Comput. Speech Lang. 4, 127–165 (1990)

    Article  Google Scholar 

  20. Levinson, S.E.: Continuously variable duration Hidden Markov Models for automatic speech recognition. Comput. Speech Lang. 1(1), 29–45 (1986)

    Article  Google Scholar 

  21. Markov, A.A.: Примѣръ статистическаго изслѣдованiя надъ текстомъ “Евгенiя Онѣгина” иллюстрирующiй связь испытанiй в цѣпь (Example of statistical investigations of the text of “Eugen Onegin”, wich demonstrates the connection of events in a chain). In: Извѣстiя Императорской Академiй Наукъ (Bulletin de l’Académie Impériale des Sciences de St.-Pétersbourg), Sankt-Petersburg, pp. 153–162 (1913) (in Russian)

    Google Scholar 

  22. Merhav, N., Ephraim, Y.: Hidden Markov Modeling using a dominant state sequence with application to speech recognition. Comput. Speech Lang. 5(4), 327–339 (1991)

    Article  Google Scholar 

  23. Morgan, N., Bourlard, H.: Continuous speech recognition. IEEE Signal Process. Mag. 12(3), 24–42 (1995)

    Article  Google Scholar 

  24. Ney, H., Steinbiss, V., Haeb-Umbach, R., Tran, B.-H., Essen, U.: An overview of the Philips research system for large vocabulary continuous speech recognition. Int. J. Pattern Recognit. Artif. Intell. 8(1), 33–70 (1994)

    Article  Google Scholar 

  25. Nilsson, N.J.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann, San Francisco (1998)

    MATH  Google Scholar 

  26. Paul, D.B.: An efficient A* stack decoder algorithm for continuous speech recognition with a stochastic language model. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, San Francisco, pp. 25–28 (1992)

    Google Scholar 

  27. Rabiner, L.R.: A tutorial on Hidden Markov Models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  28. Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cliffs (1993)

    Google Scholar 

  29. Rigoll, G.: Maximum Mutual Information Neural Networks for hybrid connectionist-HMM speech recognition systems. IEEE Trans. Audio Speech Lang. Process. 2(1), 175–184 (1994)

    Google Scholar 

  30. Rottland, J., Rigoll, G.: Tied posteriors: an approach for effective introduction of context dependency in hybrid NN/HMM LVCSR. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, Istanbul (2000)

    Google Scholar 

  31. Viterbi, A.J.: Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13(2), 260–269 (1967)

    Article  MATH  Google Scholar 

  32. Wellekens, C.J.: Mixture density estimators in Viterbi training. In: Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 361–364 (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag London

About this chapter

Cite this chapter

Fink, G.A. (2014). Hidden Markov Models. In: Markov Models for Pattern Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-6308-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-6308-4_5

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-6307-7

  • Online ISBN: 978-1-4471-6308-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics