Most of the techniques presented in this book are aimed at making decisions about data. By data it is meant, in general, vectors representing, in some sense, real-world objects that cannot be handled directly by computers. The components of the vectors, the so-called features, are supposed to contain enough information to allow a correct decision and to distinguish between different objects (see Chapter 5). The algorithms are typically capable, after a training procedure, of associating input vectors with output decisions. On the other hand, in some cases real-world objects of interest cannot be represented with a single vector because they are sequential in nature. This is the case of speech and handwriting, which can be thought of as sequences of phonemes (see Chapter 2) and letters, respectively, temporal series, biological sequences (e.g. chains of proteins in DNA), natural language sentences, music, etc. The goal of this chapter is to show how some of the techniques presented so far for single vectors can be extended to sequential data.
Preview
Unable to display preview. Download preview PDF.
References
P. Baldi and S. Brunak. Bioinformatics: The Machine Learning Approach. MIT Press, 2001.
L.E. Baum. An inequality and associated maximization technique in statistical estimation for probabilistic functions of a Markov process. Inequalities, 3:1-8, 1972.
L.E. Baum and J. Eagon. An inequality with applications to statistical predic-tion for functions of Markov processes and to a model of ecology. Bulletin of the American Mathematical Society, 73:360-363, 1967.
L.E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41:164-171, 1970.
R. Bellman and S. Dreyfus. Applied Dynamic Programming. Princeton Univer- sity Press, 1962.
S. Bengio. An asynchronous hidden Markov model for audio-visual speech recog- nition. In Advances in Neural Information Processing Systems, pages 1237-1244, 2003.
Y. Bengio. Markovian models for sequential data. Neural Computing Surveys, 2:129-162, 1999.
Y. Bengio and P. Frasconi. An inout/output HMM architecture. In Advances in Neural Information Processing Systems, pages 427-434, 1995.
J.A. Bilmes. A gentle tutorial of the EM algorithm and its application to para-meter estimation for Gaussian mixture and hidden markov models. Technical Report TR-97-021, International Computer Science Institute (ICSI), Berkeley, 1998.
R. Bjar and S. Hamori. Hidden Markov Models: Applications to Financial Eco- nomics. Springer-Verlag, 2004.
H. Bourlard and S. Bengio. Hidden Markov models and other finite state au-tomata for sequence processing. In M.A. Arbib, editor, The Handbook of Brain Theory and Neural Networks. 2002.
H. Bourlard, Y. Konig, and N. Morgan. A training algorithm for statistical sequence recognition with applications to transition-based speech recognition. IEEE Signal Processing Letters, 3(7):203-205, 1996.
H. Bourlard and N. Morgan. Connectionist Speech Recognition - A Hybrid Ap- proach. Kluwer Academic Publishers, 1993.
S. Chen and R. Rosenfled. A survey of smoothing techniques for ME models. IEEE Transactions on Speech and Audio Processing, 8(1):37-50, 2000.
P. Clarkson and R. Rosenfled. Statistical language modeling using the CMU-Cambridge Toolkit. In Proceedings of Eurospeech, pages 2707-2710, 1997.
T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. MIT Press, 1990.
R.J. Elliott, L. Aggoun, and J.B. Moore. Hidden Markov Models: Estimation and Control. Springer-Verlag, 1997.
S. Godfeld and R. Quandt. A markov model for switching regressions. Journal of Econometrics, 1:3-16, 1973.
I.J. Good. The population frequencies of species and the estimation of popula- tion parameters. Biometrika, 40(3-4):237-264, 1953.
J. Hamilton. A new approach of the economic analysis of non-stationary time series and the business cycle. Econometrica, 57:357-384, 1989.
H.S. Heaps. Information Retrieval - Computational and Theoretical Aspects. Academic Press, 1978.
J. Hennebert, C. Ris, H. Bourlard, S. Renals, and N. Morgan. Estimation of global posteriors and forward-backward training of hybrid HMM-ANN systems. In Proceedings of Eurospeech, pages 1951-1954, 1997.
J. Hopcroft, R. Motwani, and J. Ullman. Introduction to Automata Theory, Language and Computations. Addison Wesley, 2000.
F. Jelinek. Statistical Aspects of Speech Recognition. MIT Press, 1997.
R. Kalman and R. Bucy. New results in linear filtering and prediction. Journal of Basic Engineering, 83D:95-108, 1961.
S. Katz. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3):400-401, 1987.
D. Klakow and J. Peters. Testing the correlation of word error rate and per-plexity. Speech Communication, 38(1):19-28, 2002.
T. Koski. Hidden Markov Models for Bioinformatics. Springer-Verlag, 2002.
A. Markov. An example of statistical investigation in the text of Eugene Onyegin illustrating coupling of test in chains. Proceedings of the Academy of Sciences of St. Petersburg, 1913.
G.J. McLachlan and T. Krishnan. The EM Algorithm and Extensions. John Wiley, 1997.
H. Ney, S. Martin, and F. Wessel. Statistical language modeling. In S. Young and G. Bloothooft, editors, Corpus Based Methods in Language and Speech Processing, pages 174-207. Kluwer Academic Publishers, 1997.
L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. In A. Waibel and K.F. Lee, editors, Readings in Speech Recognition, pages 267-296. 1989.
D. Ron, Y. Singer, and N. Tishby. The power of amnesia: learning probabilistic automata with variable memory length. Machine Learning, 25(2-3):117-149, 1996.
D. Ron, Y. Singer, and N. Tishby. Learning with probabilistic automata with variable memory length. In Proceedings of ACM Conference on Computational Learning Theory, pages 35-46, 1997.
R. Rosenfeld. Two decades of Statistical Language Modeling: where do we go from here? Proceedings of IEEE, 88(8):1270-1278, 2000.
R. Shumway and D. Stoffer. An approach to time series smoothing and fore-casting using the EM algorithm. Journal of Time Series Analysis, 4(3):253-264, 1982.
E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco. Probabilistic finite state machines - Part I. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7):1013-1025, 2005.
E. Vidal, F. Thollard, C. de la Higuera, F. Casacuberta, and R.C. Carrasco. Probabilistic finite state machines - Part II. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7):1026-1039, 2005.
A.J. Viterbi. Error bounds for convolutional codes and an asymptotically op-timal decoding algorithm. IEEE Transactions on Information Theory, 13:260-269,1967.
L. Xu and M.J. Jordan. On convergence properties of the EM algorithm for Gaussian Mixtures. Neural Computation, 8:129-151, 1996.
G. Zipf. Human Behaviour and the Principle of Least Effort. Addison-Wesley, 1949.
Rights and permissions
Copyright information
© 2008 Springer
About this chapter
Cite this chapter
(2008). Markovian Models for Sequential Data. In: Machine Learning for Audio, Image and Video Analysis. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-84800-007-0_10
Download citation
DOI: https://doi.org/10.1007/978-1-84800-007-0_10
Publisher Name: Springer, London
Print ISBN: 978-1-84800-006-3
Online ISBN: 978-1-84800-007-0
eBook Packages: Computer ScienceComputer Science (R0)