Abstract
Mining data stream is a challenging research area in data mining, and concerns many applications. In stream models, the data is massive and evolving continuously, it can be read only once or a small number of times. Due to the limited memory availability, it is impossible to load the entire data set into memory. Traditional data mining techniques are not suitable for this kind of model and applications, and it is required to develop new approaches meeting these new paradigms. In this paper, we are interested in clustering data stream over sliding window. We investigate an efficient clustering algorithm based on DCA (Difference of Convex functions Algorithm). Comparative experiments with clustering using the standard K-means algorithm on some real-data sets are presented.
This research has been supported by ”Fonds Européens de Développement Régional” (FEDER) Lorraine via the project InnoMaD (Innovations techniques d’optimisation pour le traitement Massif de Données).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowl. Inf. Syst. 15(2), 181–184 (2008)
Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics Simulation and Computation 3(1), 1–27 (1974)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 81–92 (2003)
Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. SIGKDD Explor. Newsl. 2(1), 51–57 (2000)
Le Thi, H.A.: Contribution à l’optimisation non convexe et l’optimisation globale: Théorie, Algoritmes et Applications, Habilitation à Diriger des Recherches, Université de Rouen (1997)
Le Thi, H.A.: DC programming and DCA, http://lita.sciences.univ-metz.fr/~lethi/english/DCA.html
Le Thi, H.A., Belghiti, M.T., Pham, D.T.: A new efficient algorithm based on dc programming and dca for clustering. J. of Global Optimization 37(4), 593–608 (2007)
Le Thi, H.A., Pham, D.T.: The DC (difference of convex functions) Programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research 133, 23–46 (2005)
MacQueen, J.B.: Some Methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–288. University of California Press, Berkeley (1967)
Street, W.N., Kim, Y.S.: A streaming ensemble algorithm (sea) for largescale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 377–382. ACM, New York (2001)
Vendramin, L., Campello, R.J.G.B., Hruschka, E.R.: On the comparison of relative clustering validity criteria. In: Proceedings of the Ninth SIAM International Conference on Data Mining, Nevada, pp. 733–744 (April 2009)
Zhu, X.: Stream data mining repository, http://cse.fau.edu/xqzhu/stream.html (accessed on September 2012)
http://cseweb.ucsd.edu/users/elkan/skm.html (accessed on September 2012)
http://www.liaad.up.pt/kdus/kdus_5.html (accessed on September 2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Thuy, T.M., An, L.T.H., Boudjeloud-Assala, L. (2013). Clustering Data Streams over Sliding Windows by DCA. In: Nguyen, N., van Do, T., le Thi, H. (eds) Advanced Computational Methods for Knowledge Engineering. Studies in Computational Intelligence, vol 479. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00293-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-00293-4_6
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00292-7
Online ISBN: 978-3-319-00293-4
eBook Packages: EngineeringEngineering (R0)