Abstract
Extracting useful patterns from data is a challenging task that has been extensively investigated by both machine learning researchers and practitioners for many decades. This task becomes even more problematic when data is presented as a potentially unbounded sequence, the so-called data streams. Albeit most of the research on data stream mining focuses on supervised learning, the assumption that labels are available for learning is unverifiable in most streaming scenarios. Thus, several data stream clustering algorithms were proposed in the last decades to extract meaningful patterns from streams. In this study, we present three recent data stream clustering algorithms based on insights from social networks’ theory that exhibit competitive results against the state of the art. The main distinctive characteristics of these algorithms are the following: (1) they do not rely on a hyper-parameter to define the number of clusters to be found; and (2) they do not require batch processing during the offline steps. These algorithms are detailed and compared against existing works on the area, showing their efficiency in clustering quality, processing time, and memory usage.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional space. In: International Conference on Database Theory 2001, pp. 420–434. Springer, Berlin (2001). https://doi.org/10.1007/3-540-44503-X_27
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases - Volume 29, VLDB Endowment, VLDB ‘03, pp. 81–92 (2003)
Albert, R., Barabási, A.L.: Statistical mechanics of complex networks. In: Reviews of Modern Physics, pp. 139–148. The American Physical Society (2002)
Amini, A., Wah, T.Y.: On density-based data streams clustering algorithms: a survey. J. Comput. Sci. Technol. 29(1), 116–141 (2014). https://doi.org/1.1007/s11390-014-1416-y
Barddal, J.P., Gomes, H.M., Enembreck, F.: A complex network-based anytime data stream clustering algorithm. In: Neural Information Processing - 22nd International Conference, ICONIP 2015, Istanbul, Turkey, November 9–12, 2015, Proceedings, Part I, pp. 615–622 (2015). https://doi.org/10.1007/978-3-319-26532-2_68
Barddal, J.P., Gomes, H.M., Enembreck, F.: SNCStream: a social network-based data stream clustering algorithm. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing (SAC). ACM, New York (2015)
Barddal, J.P., Gomes, H.M., Enembreck, F., Barthès, J.P.: SNCStream+: extending a high quality true anytime data stream clustering algorithm. Inf. Syst. (2016). https://doi.org/10.1016/j.is.2016.06.007
Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: Moa: Massive online analysis. J. Mach. Learn. Res. 11, 1601–1604 (2010)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the 2006 SIAM International Conference on Data Mining, pp 328–339 (2006)
Corder, G., Foreman, D.: Nonparametric Statistics for Non-Statisticians: A Step-by-Step Approach. Wiley, London (2011)
Erdos, P., Rényi, A.: On the evolution of random graphs. In: Publication of the Mathematical Institute of the Hungarian Academy of Sciences, pp. 17–61 (1960)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) KDD-96 Proceedings, pp. 226–231. AAAI Press, Menlo Park (1996)
Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 1–37 (2014). https://doi.org/1.1145/2523813
Gomes, H.M., Barddal, J.P., Enembreck, F., Bifet, A.: A survey on ensemble learning for data stream classification. ACM Comput. Surv. 50(2), 1–36 (2017). https://doi.org/10.1145/3054925
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data SIGMOD‘84, pp. 47–57. ACM, New York (1984). https://doi.org/1.1145/602259.602266
Harries, M., Wales, N.S.: Splice-2 comparative evaluation: Electricity pricing (1999)
Hassani, M., Spaus, P., Seidl, T.: Adaptive multiple-resolution stream clustering. In: Perner, P. (ed.) Machine Learning and Data Mining in Pattern Recognition. Lecture Notes in Computer Science, vol. 8556, pp. 134–148. Springer International Publishing, Berlin (2014)
Ikonomovska, E., Gama, J., Zenko, B., Dzeroski, S.: Speeding-up hoeffding-based regression trees with options. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 537–544 (2011)
Kosina, P., Gama, J.: Very fast decision rules for multi-class problems. In: Proceedings of the 27th Annual ACM Symposium on Applied Computing, SAC‘12, pp. 795–800. ACM, New York (2012). https://doi.org/1.1145/2245276.2245431
Kranen, P., Assent, I., Baldauf, C., Seidl, T.: The clustree: indexing micro-clusters for anytime stream mining. Knowl. Inf. Syst. 29(2), 249–272 (2011)
Kremer, H., Kranen, P., Jansen, T., Seidl, T., Bifet, A., Holmes, G., Pfahringer, B.: An effective evaluation measure for clustering on evolving data streams. In: Proceedings of the 17th ACM Conference on Knowledge Discovery and Data Mining (SIGKDD 2011), San Diego, CA, pp. 868–876. ACM, New York (2011)
Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Masud, M.M., Gao, J., Khan, L., Han, J., Thuraisingham, B.M.: Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans. Knowl. Data Eng. 23(6), 859–874 (2011)
Silva, J.A., Faria, E.R., Barros, R.C., Hruschka, E.R., Carvalho, A.C.P.L.F.D., Gama, J.: Data stream clustering: a survey. ACM Comput. Surv. 46(1), 1–31 (2013). https://doi.org/1.1145/2522968.2522981
Ugulino, W., Cardador, D., Vega, K., Velloso, E., Milidia, R., Fuks, H.: Wearable computing: accelerometers’ data classification of body postures and movements. In: Advances in Artificial Intelligence - SBIA 2012. Lecture Notes in Computer Science, pp. 52–61. Springer, Berlin (2012)
Watts, D.J., Strogatz, S.H.: Collective dynamics of small-world networks. Nature 393(6684), 440–442 (1998)
Acknowledgements
This research was financially supported by the Coordenação de Aperfeiçoa–mento de Pessoal de Nível Superior (CAPES) through the Programa de Suporte à Pòs-Graduação de Instituições de Ensino Particulares (PROSUP) program and Fundação Araucária.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Barddal, J.P., Gomes, H.M., Enembreck, F. (2019). On Social Network-Based Algorithms for Data Stream Clustering. In: Sayed-Mouchaweh, M. (eds) Learning from Data Streams in Evolving Environments. Studies in Big Data, vol 41. Springer, Cham. https://doi.org/10.1007/978-3-319-89803-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-89803-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89802-5
Online ISBN: 978-3-319-89803-2
eBook Packages: EngineeringEngineering (R0)