Markovian Models for Sequential Data

  • Francesco CamastraEmail author
  • Alessandro Vinciarelli
Part of the Advanced Information and Knowledge Processing book series (AI&KP)


What the reader should know to understand this chapter \(\bullet \) Bayes decision theory (Chap.  5). \(\bullet \) Lagrange multipliers and conditional optimization problems (Chap.  9). \(\bullet \) Probability and statistics (Appendix A).


  1. 1.
    P. Baldi and S. Brunak. Bioinformatics: The Machine Learning Approach. MIT Press, 2001.Google Scholar
  2. 2.
    L.E. Baum. An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities, 3:1–8, 1972.Google Scholar
  3. 3.
    L.E. Baum and J. Eagon. An inequality with applications to statistical prediction for functions of Markov processes and to a model of ecology. Bulletin of the American Mathematical Society, 73:360–363, 1967.Google Scholar
  4. 4.
    L.E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41:164–171, 1970.Google Scholar
  5. 5.
    R. Bellman and S. Dreyfus. Applied Dynamic Programming. Princeton University Press, 1962.Google Scholar
  6. 6.
    Y. Bengio. Markovian models for sequential data. Neural Computing Surveys, 2:129–162, 1999.Google Scholar
  7. 7.
    S. Bengio. An asynchronous hidden Markov model for audio-visual speech recognition. In Advances in Neural Information Processing Systems, pages 1237–1244, 2003.Google Scholar
  8. 8.
    Y. Bengio and P. Frasconi. An input/output HMM architecture. In Advances in Neural Information Processing Systems, pages 427–434, 1995.Google Scholar
  9. 9.
    J.A. Bilmes. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report 510, International Computer Science Institute, 1998.Google Scholar
  10. 10.
    R. Bjar and S. Hamori. Hidden Markov Models: Applications to Financial Economics. Springer-Verlag, 2004.Google Scholar
  11. 11.
    H. Bourlard and S. Bengio. Hidden Markov models and other finite state automata for sequence processing. In M.A. Arbib, editor, The Handbook of Brain Theory and Neural Networks. 2002.Google Scholar
  12. 12.
    H. Bourlard and N. Morgan. Connectionist speech recognition: a hybrid approach. Kluwer Academic Publishers, 1994.Google Scholar
  13. 13.
    H. Bourlard, Y. Konig, and N. Morgan. A training algorithm for statistical sequence recognition with applications to transition-based speech recognition. IEEE Signal Processing Letters, 3(7):203–205, 1996.Google Scholar
  14. 14.
    S. Chen and R. Rosenfled. A survey of smoothing techniques for ME models. IEEE Transactions on Speech and Audio Processing, 8(1):37–50, 2000.Google Scholar
  15. 15.
    P. Clarkson and R. Rosenfled. Statistical language modeling using the CMU-Cambridge Toolkit. In Proceedings of Eurospeech, pages 2707–2710, 1997.Google Scholar
  16. 16.
    T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 1990.Google Scholar
  17. 17.
    R.J. Elliott, L. Aggoun, and J.B. Moore. Hidden Markov Models: Estimation and Control. Springer-Verlag, 1997.Google Scholar
  18. 18.
    S. Godfeld and R. Quandt. A Markov model for switching regressions. Journal of Econometrics, 1:3–16, 1973.Google Scholar
  19. 19.
    I.J. Good. The population frequencies of species and the estimation of population parameters. Biometrika, 40(3–4):237–264, 1953.Google Scholar
  20. 20.
    J. Hamilton. A new approach of the economic analysis of non-stationary time series and the business cycle. Econometrica, 57:357–384, 1989.Google Scholar
  21. 21.
    H.S. Heaps. Information Retrieval - Computational and Theoretical Aspects. Academic Press, 1978.Google Scholar
  22. 22.
    J. Hennebert, C. Ris, H. Bourlard, S. Renals, and N. Morgan. Estimation of global posteriors and forward-backward training of hybrid HMM-ANN systems. In Proceedings of Eurospeech, pages 1951–1954, 1997.Google Scholar
  23. 23.
    J. Hopcroft, R. Motwani, and J. Ullman. Introduction to Automata Theory, Language and Computations. Addison Wesley, 2000.Google Scholar
  24. 24.
    F. Jelinek. Statistical Aspects of Speech Recognition. MIT Press, 1997.Google Scholar
  25. 25.
    R. Kalman and R. Bucy. New results in linear filtering and prediction. Journal of Basic Engineering, 83D:95–108, 1961.Google Scholar
  26. 26.
    S. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3):400–401, 1987.Google Scholar
  27. 27.
    D. Klakow and J. Peters. Testing the correlation of word error rate and perplexity. Speech Communication, 38(1):19–28, 2002.Google Scholar
  28. 28.
    T. Koski. Hidden Markov Models for Bioinformatics. Springer-Verlag, 2002.Google Scholar
  29. 29.
    J.D. Lafferty, A. McCallum, and F.C.N. Pereira. Conditional Random Fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning, pages 282–289, 2001.Google Scholar
  30. 30.
    A. Markov. An example of statistical investigation in the text of Eugene Onyegin illustrating coupling of test in chains. Proceedings of the Academy of Sciences of St. Petersburg, 1913.Google Scholar
  31. 31.
    G.J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. John Wiley, 1997.Google Scholar
  32. 32.
    H. Ney, S. Martin, and F. Wessel. Statistical language modeling. In S. Young and G. Bloothooft, editors, Corpus Based Methods in Language and Speech Processing, pages 174–207. Kluwer Academic Publishers, 1997.Google Scholar
  33. 33.
    L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. In A. Waibel and K.F. Lee, editors, Readings in Speech Recognition, pages 267–296. 1989.Google Scholar
  34. 34.
    D. Ron, Y. Singer, and N. Tishby. The power of amnesia: learning probabilistic automata with variable memory length. Machine Learning, 25(2–3):117–149, 1996.Google Scholar
  35. 35.
    D. Ron, Y. Singer, and N. Tishby. Learning with probabilistic automata with variable memory length. In Proceedings of ACM Conference on Computational Learning Theory, pages 35–46, 1997.Google Scholar
  36. 36.
    R. Rosenfeld. Two decades of Statistical Language Modeling: where do we go from here? Proceedings of IEEE, 88(8):1270–1278, 2000.Google Scholar
  37. 37.
    R. Shumway and D. Stoffer. An approach to time series smoothing and forecasting using the EM algorithm. Journal of Time Series Analysis, 4(3):253–264, 1982.Google Scholar
  38. 38.
    C. Sutton and A. McCallum. An introduction to Conditional Random Fields. Machine Learning, 4(4):267–373, 2011.Google Scholar
  39. 39.
    E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco. Probabilistic finite state machines - Part I. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7):1013–1025, 2005.Google Scholar
  40. 40.
    E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco. Probabilistic finite state machines - Part II. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7):1026–1039, 2005.Google Scholar
  41. 41.
    A.J. Viterbi. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory, 13:260–269, 1967.Google Scholar
  42. 42.
    L. Xu and M.J. Jordan. On convergence properties of the EM algorithm for Gaussian Mixtures. Neural Computation, 8:129–151, 1996.Google Scholar
  43. 43.
    G. Zipf. Human Behaviour and the Principle of Least Effort. Addison-Wesley, 1949.Google Scholar

Copyright information

© Springer-Verlag London 2015

Authors and Affiliations

  1. 1.Department of Science and TechnologyParthenope University of NaplesNaplesItaly
  2. 2.School of Computing Science and the Institute of Neuroscience and PsychologyUniversity of GlasgowGlasgowUK

Personalised recommendations