Abstract
This paper proposes a new integrated sequential data clustering framework based on an iterative process which alternates between the EM process and a modified b-coloring clustering algorithm. It exhibits two important features: Firstly, the proposed framework allows to give an assignment of clusters to the sequences where the b-coloring properties are maintained as long as the clustering process runs. Secondly, it gives each cluster a twofold representation by a generative model (Markov chains) as well as dominant members which ensure the global stability of the returned partition. The proposed framework is evaluated against benchmark datasets in UCI repository and its effectiveness is confirmed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Antunes, C., Oliveira, A.: Temporal data mining: an overview. In: KDD Workshop on Temporal Data Mining, pp. 1–13 (2001)
Cadez, I.V., Heckerman, D., Meek, C., Smyth, P., White, S.: Visualization of navigation patterns on a Web site using model-based clustering. In: Knowledge Discovery and Data Mining, pp. 280–284 (2000)
Cadez, I.V., Gaffney, S., Smyth, P.: A general probabilistic framework for clustering individuals and objects. In: Knowledge Discovery and Data Mining, pp. 140–149 (2000)
Elghazel, H., Deslandres, V., Kallel, K., Dussauchoy, A.: Clinical Pathway Analysis Using Graph-Based Approach and Markov Models. In: The Second IEEE/ACM International Conference on Digital Information Management, Lyon, France, pp. 279–284 (2007)
Elghazel, H., Deslandres, V., Hacid, M.S., Dussauchoy, A., Kheddouci, H.: A new clustering approach for symbolic data and its validation: Application to the healthcare data. In: Esposito, F., Raś, Z.W., Malerba, D., Semeraro, G. (eds.) ISMIS 2006. LNCS (LNAI), vol. 4203, pp. 473–482. Springer, Heidelberg (2006)
Gunopulos, D., Das, G.: Time series similarity measures (tutorial pm-2). In: Tutorial notes of the 6th ACM SIGKDD (2000)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31, 264–323 (1999)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM-Algorithm. Journal of the Royal Statistical Society, Series B 39, 1–38 (1977)
Smyth, P.: Clustering sequences with hidden Markov models. Advances in Neural Information Processing 9, 648–654 (1997)
Alon, J., Sclaroff, S., Kollios, G., Pavlovic, V.: Discovering clusters in motion time-series data. In: Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 375–381 (2003)
Oates, T., Firoiu, L., Cohen, P.: Clustering time series with hidden Markov models and dynamic time warping. In: Proceedings of the IJCAI 1999 Workshop on Neural, Symbolic and Reinforcement Learning Methods for Sequence Learning, pp. 17–21 (1999)
Irving, W., Manlove, D.F.: The b-chromatic number of a graph. Discrete Applied Mathematics 91, 127–141 (1999)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases, 1998. University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer Berlin Heidelberg
About this paper
Cite this paper
Elghazel, H., Yoshida, T., Hacid, MS. (2008). An Integrated Graph and Probability Based Clustering Framework for Sequential Data. In: Jean-Fran, JF., Berthold, M.R., Horváth, T. (eds) Discovery Science. DS 2008. Lecture Notes in Computer Science(), vol 5255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88411-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-88411-8_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88410-1
Online ISBN: 978-3-540-88411-8
eBook Packages: Computer ScienceComputer Science (R0)