Skip to main content

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

Most of the techniques presented in this book are aimed at making decisions about data. By data it is meant, in general, vectors representing, in some sense, real-world objects that cannot be handled directly by computers. The components of the vectors, the so-called features, are supposed to contain enough information to allow a correct decision and to distinguish between different objects (see Chapter 5). The algorithms are typically capable, after a training procedure, of associating input vectors with output decisions. On the other hand, in some cases real-world objects of interest cannot be represented with a single vector because they are sequential in nature. This is the case of speech and handwriting, which can be thought of as sequences of phonemes (see Chapter 2) and letters, respectively, temporal series, biological sequences (e.g. chains of proteins in DNA), natural language sentences, music, etc. The goal of this chapter is to show how some of the techniques presented so far for single vectors can be extended to sequential data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Baldi and S. Brunak. Bioinformatics: The Machine Learning Approach. MIT Press, 2001.

    Google Scholar 

  2. L.E. Baum. An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities, 3:1-8, 1972.

    Google Scholar 

  3. L.E. Baum and J. Eagon. An inequality with applications to statistical predic-tion for functions of Markov processes and to a model of ecology. Bulletin of the American Mathematical Society, 73:360-363, 1967.

    Article  MATH  MathSciNet  Google Scholar 

  4. L.E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41:164-171, 1970.

    Article  MATH  MathSciNet  Google Scholar 

  5. R. Bellman and S. Dreyfus. Applied Dynamic Programming. Princeton Univer- sity Press, 1962.

    Google Scholar 

  6. S. Bengio. An asynchronous hidden Markov model for audio-visual speech recog- nition. In Advances in Neural Information Processing Systems, pages 1237-1244, 2003.

    Google Scholar 

  7. Y. Bengio. Markovian models for sequential data. Neural Computing Surveys, 2:129-162, 1999.

    Google Scholar 

  8. Y. Bengio and P. Frasconi. An inout/output HMM architecture. In Advances in Neural Information Processing Systems, pages 427-434, 1995.

    Google Scholar 

  9. J.A. Bilmes. A gentle tutorial of the EM algorithm and its application to para-meter estimation for Gaussian mixture and hidden markov models. Technical Report TR-97-021, International Computer Science Institute (ICSI), Berkeley, 1998.

    Google Scholar 

  10. R. Bjar and S. Hamori. Hidden Markov Models: Applications to Financial Eco- nomics. Springer-Verlag, 2004.

    Google Scholar 

  11. H. Bourlard and S. Bengio. Hidden Markov models and other finite state au-tomata for sequence processing. In M.A. Arbib, editor, The Handbook of Brain Theory and Neural Networks. 2002.

    Google Scholar 

  12. H. Bourlard, Y. Konig, and N. Morgan. A training algorithm for statistical sequence recognition with applications to transition-based speech recognition. IEEE Signal Processing Letters, 3(7):203-205, 1996.

    Article  Google Scholar 

  13. H. Bourlard and N. Morgan. Connectionist Speech Recognition - A Hybrid Ap- proach. Kluwer Academic Publishers, 1993.

    Google Scholar 

  14. S. Chen and R. Rosenfled. A survey of smoothing techniques for ME models. IEEE Transactions on Speech and Audio Processing, 8(1):37-50, 2000.

    Article  Google Scholar 

  15. P. Clarkson and R. Rosenfled. Statistical language modeling using the CMU-Cambridge Toolkit. In Proceedings of Eurospeech, pages 2707-2710, 1997.

    Google Scholar 

  16. T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 1990.

    Google Scholar 

  17. R.J. Elliott, L. Aggoun, and J.B. Moore. Hidden Markov Models: Estimation and Control. Springer-Verlag, 1997.

    Google Scholar 

  18. S. Godfeld and R. Quandt. A markov model for switching regressions. Journal of Econometrics, 1:3-16, 1973.

    Article  Google Scholar 

  19. I.J. Good. The population frequencies of species and the estimation of popula- tion parameters. Biometrika, 40(3-4):237-264, 1953.

    Article  MATH  MathSciNet  Google Scholar 

  20. J. Hamilton. A new approach of the economic analysis of non-stationary time series and the business cycle. Econometrica, 57:357-384, 1989.

    Article  MATH  MathSciNet  Google Scholar 

  21. H.S. Heaps. Information Retrieval - Computational and Theoretical Aspects. Academic Press, 1978.

    Google Scholar 

  22. J. Hennebert, C. Ris, H. Bourlard, S. Renals, and N. Morgan. Estimation of global posteriors and forward-backward training of hybrid HMM-ANN systems. In Proceedings of Eurospeech, pages 1951-1954, 1997.

    Google Scholar 

  23. J. Hopcroft, R. Motwani, and J. Ullman. Introduction to Automata Theory, Language and Computations. Addison Wesley, 2000.

    Google Scholar 

  24. F. Jelinek. Statistical Aspects of Speech Recognition. MIT Press, 1997.

    Google Scholar 

  25. R. Kalman and R. Bucy. New results in linear filtering and prediction. Journal of Basic Engineering, 83D:95-108, 1961.

    MathSciNet  Google Scholar 

  26. S. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3):400-401, 1987.

    Article  Google Scholar 

  27. D. Klakow and J. Peters. Testing the correlation of word error rate and per-plexity. Speech Communication, 38(1):19-28, 2002.

    Article  MATH  Google Scholar 

  28. T. Koski. Hidden Markov Models for Bioinformatics. Springer-Verlag, 2002.

    Google Scholar 

  29. A. Markov. An example of statistical investigation in the text of Eugene Onyegin illustrating coupling of test in chains. Proceedings of the Academy of Sciences of St. Petersburg, 1913.

    Google Scholar 

  30. G.J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. John Wiley, 1997.

    Google Scholar 

  31. H. Ney, S. Martin, and F. Wessel. Statistical language modeling. In S. Young and G. Bloothooft, editors, Corpus Based Methods in Language and Speech Processing, pages 174-207. Kluwer Academic Publishers, 1997.

    Google Scholar 

  32. L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. In A. Waibel and K.F. Lee, editors, Readings in Speech Recognition, pages 267-296. 1989.

    Google Scholar 

  33. D. Ron, Y. Singer, and N. Tishby. The power of amnesia: learning probabilistic automata with variable memory length. Machine Learning, 25(2-3):117-149, 1996.

    Article  MATH  Google Scholar 

  34. D. Ron, Y. Singer, and N. Tishby. Learning with probabilistic automata with variable memory length. In Proceedings of ACM Conference on Computational Learning Theory, pages 35-46, 1997.

    Google Scholar 

  35. R. Rosenfeld. Two decades of Statistical Language Modeling: where do we go from here? Proceedings of IEEE, 88(8):1270-1278, 2000.

    Article  Google Scholar 

  36. R. Shumway and D. Stoffer. An approach to time series smoothing and fore-casting using the EM algorithm. Journal of Time Series Analysis, 4(3):253-264, 1982.

    Article  Google Scholar 

  37. E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco. Probabilistic finite state machines - Part I. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7):1013-1025, 2005.

    Article  Google Scholar 

  38. E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco. Probabilistic finite state machines - Part II. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7):1026-1039, 2005.

    Article  Google Scholar 

  39. A.J. Viterbi. Error bounds for convolutional codes and an asymptotically op-timal decoding algorithm. IEEE Transactions on Information Theory, 13:260-269,1967.

    Article  MATH  Google Scholar 

  40. L. Xu and M.J. Jordan. On convergence properties of the EM algorithm for Gaussian Mixtures. Neural Computation, 8:129-151, 1996.

    Article  Google Scholar 

  41. G. Zipf. Human Behaviour and the Principle of Least Effort. Addison-Wesley, 1949.

    Google Scholar 

Download references

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer

About this chapter

Cite this chapter

(2008). Markovian Models for Sequential Data. In: Machine Learning for Audio, Image and Video Analysis. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84800-007-0_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-007-0_10

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-006-3

  • Online ISBN: 978-1-84800-007-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics