Skip to main content

On Social Network-Based Algorithms for Data Stream Clustering

  • Chapter
  • First Online:
Learning from Data Streams in Evolving Environments

Part of the book series: Studies in Big Data ((SBD,volume 41))

Abstract

Extracting useful patterns from data is a challenging task that has been extensively investigated by both machine learning researchers and practitioners for many decades. This task becomes even more problematic when data is presented as a potentially unbounded sequence, the so-called data streams. Albeit most of the research on data stream mining focuses on supervised learning, the assumption that labels are available for learning is unverifiable in most streaming scenarios. Thus, several data stream clustering algorithms were proposed in the last decades to extract meaningful patterns from streams. In this study, we present three recent data stream clustering algorithms based on insights from social networks’ theory that exhibit competitive results against the state of the art. The main distinctive characteristics of these algorithms are the following: (1) they do not rely on a hyper-parameter to define the number of clusters to be found; and (2) they do not require batch processing during the offline steps. These algorithms are detailed and compared against existing works on the area, showing their efficiency in clustering quality, processing time, and memory usage.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://kdd.ics.uci.edu/databases/kddcup98/kddcup98.html.

References

  1. Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: International Conference on Database Theory 2001, pp. 420–434. Springer, Berlin (2001). https://doi.org/10.1007/3-540-44503-X_27

  2. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29, VLDB Endowment, VLDB ‘03, pp. 81–92 (2003)

    Google Scholar 

  3. Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. In: Reviews of Modern Physics, pp. 139–148. The American Physical Society (2002)

    Google Scholar 

  4. Amini, A., Wah, T.Y.: On density-based data streams clustering algorithms: a survey. J. Comput. Sci. Technol. 29(1), 116–141 (2014). https://doi.org/1.1007/s11390-014-1416-y

    Article  Google Scholar 

  5. Barddal, J.P., Gomes, H.M., Enembreck, F.: A complex network-based anytime data stream clustering algorithm. In: Neural Information Processing - 22nd International Conference, ICONIP 2015, Istanbul, Turkey, November 9–12, 2015, Proceedings, Part I, pp. 615–622 (2015). https://doi.org/10.1007/978-3-319-26532-2_68

  6. Barddal, J.P., Gomes, H.M., Enembreck, F.: SNCStream: a social network-based data stream clustering algorithm. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC). ACM, New York (2015)

    Google Scholar 

  7. Barddal, J.P., Gomes, H.M., Enembreck, F., Barthès, J.P.: SNCStream+: extending a high quality true anytime data stream clustering algorithm. Inf. Syst. (2016). https://doi.org/10.1016/j.is.2016.06.007

  8. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: Moa: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)

    Google Scholar 

  9. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp 328–339 (2006)

    Google Scholar 

  10. Corder, G., Foreman, D.: Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach. Wiley, London (2011)

    MATH  Google Scholar 

  11. Erdos, P., Rényi, A.: On the evolution of random graphs. In: Publication of the Mathematical Institute of the Hungarian Academy of Sciences, pp. 17–61 (1960)

    Google Scholar 

  12. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) KDD-96 Proceedings, pp. 226–231. AAAI Press, Menlo Park (1996)

    Google Scholar 

  13. Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 1–37 (2014). https://doi.org/1.1145/2523813

    Article  Google Scholar 

  14. Gomes, H.M., Barddal, J.P., Enembreck, F., Bifet, A.: A survey on ensemble learning for data stream classification. ACM Comput. Surv. 50(2), 1–36 (2017). https://doi.org/10.1145/3054925

    Article  Google Scholar 

  15. Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data SIGMOD‘84, pp. 47–57. ACM, New York (1984). https://doi.org/1.1145/602259.602266

  16. Harries, M., Wales, N.S.: Splice-2 comparative evaluation: Electricity pricing (1999)

    Google Scholar 

  17. Hassani, M., Spaus, P., Seidl, T.: Adaptive multiple-resolution stream clustering. In: Perner, P. (ed.) Machine Learning and Data Mining in Pattern Recognition. Lecture Notes in Computer Science, vol. 8556, pp. 134–148. Springer International Publishing, Berlin (2014)

    Google Scholar 

  18. Ikonomovska, E., Gama, J., Zenko, B., Dzeroski, S.: Speeding-up hoeffding-based regression trees with options. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 537–544 (2011)

    Google Scholar 

  19. Kosina, P., Gama, J.: Very fast decision rules for multi-class problems. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC‘12, pp. 795–800. ACM, New York (2012). https://doi.org/1.1145/2245276.2245431

  20. Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)

    Article  Google Scholar 

  21. Kremer, H., Kranen, P., Jansen, T., Seidl, T., Bifet, A., Holmes, G., Pfahringer, B.: An effective evaluation measure for clustering on evolving data streams. In: Proceedings of the 17th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD 2011), San Diego, CA, pp. 868–876. ACM, New York (2011)

    Google Scholar 

  22. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)

    Article  MathSciNet  Google Scholar 

  23. Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)

    Article  Google Scholar 

  24. Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., Carvalho, A.C.P.L.F.D., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 1–31 (2013). https://doi.org/1.1145/2522968.2522981

    Article  Google Scholar 

  25. Ugulino, W., Cardador, D., Vega, K., Velloso, E., Milidia, R., Fuks, H.: Wearable computing: accelerometers’ data classification of body postures and movements. In: Advances in Artificial Intelligence - SBIA 2012. Lecture Notes in Computer Science, pp. 52–61. Springer, Berlin (2012)

    Google Scholar 

  26. Watts, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature 393(6684), 440–442 (1998)

    Article  Google Scholar 

Download references

Acknowledgements

This research was financially supported by the Coordenação de Aperfeiçoa–mento de Pessoal de Nível Superior (CAPES) through the Programa de Suporte à Pòs-Graduação de Instituições de Ensino Particulares (PROSUP) program and Fundação Araucária.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jean Paul Barddal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Barddal, J.P., Gomes, H.M., Enembreck, F. (2019). On Social Network-Based Algorithms for Data Stream Clustering. In: Sayed-Mouchaweh, M. (eds) Learning from Data Streams in Evolving Environments. Studies in Big Data, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-319-89803-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-89803-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-89802-5

  • Online ISBN: 978-3-319-89803-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics