Abstract
We propose in this paper a novel approach to the classification of discrete sequences. This approach builds a model fitting some dynamical features deduced from the learning sample. These features are discrete phase-type (PH) distributions. They model the first passage times (FPT) between occurrences of pairs of substrings. The PHit algorithm, an adapted version of the Expectation-Maximization algorithm, is proposed to estimate PH distributions. The most informative pairs of substrings are selected according to the Jensen-Shannon divergence between their class conditional empirical FPT distributions. The selected features are then used in two classification schemes: a maximum a posteriori (MAP) classifier and support vector machines (SVM) with marginalized kernels. Experiments on DNA splicing region detection and on protein sublocalization illustrate that the proposed techniques offer competitive results with smoothed Markov chains or SVM with a spectrum string kernel.
Chapter PDF
Similar content being viewed by others
Keywords
References
Asmussen, S., Nerman, O., Olsson, M.: Fitting phase-type distributions via the em algorithm. Scandinavian Journal of Statistics 23(4), 419–441 (1996)
Bobbio, A., Horváth, A., Scarpa, M., Telek, M.: Acyclic discrete phase type distributions: properties and a parameter estimation algorithm. Perform. Eval. 54(1), 1–32 (2003)
Callut, J., Dupont, P.: Inducing hidden markov models to model long-term dependencies. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS, vol. 3720, pp. 513–521. Springer, Heidelberg (2005)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statistical Society Ser. B (methodological) 39, 1–38 (1977)
Dupont, P.: Noisy sequence classification with smoothed markov chains. In: Conférence francophone sur l’apprentissage automatique 2006 (CAP 2006), Trégastel, France, pp. 187–201 (2006)
Kemeny, J.G., Snell, J.L.: Finite Markov Chains. Springer, Heidelberg (1983)
Latouche, G., Ramaswami, V.: Introduction to Matrix Analytic Methods in Stochastic Modeling. Society for Industrial & Applied Mathematics, U.S. (1999)
Lin, J.: Divergence measures based on the shannon entropy. IEEE Trans. Information Theory 37, 145–151 (1991)
Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Callut, J., Dupont, P. (2006). Sequence Discrimination Using Phase-Type Distributions. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_12
Download citation
DOI: https://doi.org/10.1007/11871842_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)