Skip to main content

Clustering Data Streams over Sliding Windows by DCA

  • Conference paper
Advanced Computational Methods for Knowledge Engineering

Part of the book series: Studies in Computational Intelligence ((SCI,volume 479))

Abstract

Mining data stream is a challenging research area in data mining, and concerns many applications. In stream models, the data is massive and evolving continuously, it can be read only once or a small number of times. Due to the limited memory availability, it is impossible to load the entire data set into memory. Traditional data mining techniques are not suitable for this kind of model and applications, and it is required to develop new approaches meeting these new paradigms. In this paper, we are interested in clustering data stream over sliding window. We investigate an efficient clustering algorithm based on DCA (Difference of Convex functions Algorithm). Comparative experiments with clustering using the standard K-means algorithm on some real-data sets are presented.

This research has been supported by ”Fonds Européens de Développement Régional” (FEDER) Lorraine via the project InnoMaD (Innovations techniques d’optimisation pour le traitement Massif de Données).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhou, A., Cao, F., Qian, W., Jin, C.: Tracking clusters in evolving data streams over sliding windows. Knowl. Inf. Syst. 15(2), 181–184 (2008)

    Article  Google Scholar 

  2. Calinski, T., Harabasz, J.: A dendrite method for cluster analysis. Communications in Statistics Simulation and Computation 3(1), 1–27 (1974)

    MathSciNet  MATH  Google Scholar 

  3. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, pp. 81–92 (2003)

    Google Scholar 

  4. Farnstrom, F., Lewis, J., Elkan, C.: Scalability for clustering algorithms revisited. SIGKDD Explor. Newsl. 2(1), 51–57 (2000)

    Article  Google Scholar 

  5. Le Thi, H.A.: Contribution à l’optimisation non convexe et l’optimisation globale: Théorie, Algoritmes et Applications, Habilitation à Diriger des Recherches, Université de Rouen (1997)

    Google Scholar 

  6. Le Thi, H.A.: DC programming and DCA, http://lita.sciences.univ-metz.fr/~lethi/english/DCA.html

  7. Le Thi, H.A., Belghiti, M.T., Pham, D.T.: A new efficient algorithm based on dc programming and dca for clustering. J. of Global Optimization 37(4), 593–608 (2007)

    Article  MATH  Google Scholar 

  8. Le Thi, H.A., Pham, D.T.: The DC (difference of convex functions) Programming and DCA revisited with DC models of real world nonconvex optimization problems. Annals of Operations Research 133, 23–46 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. MacQueen, J.B.: Some Methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–288. University of California Press, Berkeley (1967)

    Google Scholar 

  10. Street, W.N., Kim, Y.S.: A streaming ensemble algorithm (sea) for largescale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2001, pp. 377–382. ACM, New York (2001)

    Chapter  Google Scholar 

  11. Vendramin, L., Campello, R.J.G.B., Hruschka, E.R.: On the comparison of relative clustering validity criteria. In: Proceedings of the Ninth SIAM International Conference on Data Mining, Nevada, pp. 733–744 (April 2009)

    Google Scholar 

  12. Zhu, X.: Stream data mining repository, http://cse.fau.edu/xqzhu/stream.html (accessed on September 2012)

  13. http://cseweb.ucsd.edu/users/elkan/skm.html (accessed on September 2012)

  14. http://www.liaad.up.pt/kdus/kdus_5.html (accessed on September 2012)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ta Minh Thuy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Thuy, T.M., An, L.T.H., Boudjeloud-Assala, L. (2013). Clustering Data Streams over Sliding Windows by DCA. In: Nguyen, N., van Do, T., le Thi, H. (eds) Advanced Computational Methods for Knowledge Engineering. Studies in Computational Intelligence, vol 479. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00293-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-00293-4_6

  • Publisher Name: Springer, Heidelberg

  • Print ISBN: 978-3-319-00292-7

  • Online ISBN: 978-3-319-00293-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics