Data Stream Clustering in Conditions of an Unknown Amount of Classes

  • Polina Zhernova
  • Anastasiya Deyneko
  • Zhanna Deyneko
  • Irina Pliss
  • Volodymyr Ahafonov
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 754)


An on-line modified X-means method is proposed for solving data stream clustering tasks in conditions when an amount of clusters is apriori unknown. This approach is based on an ensemble of clustering neural networks that contains the self-organizing maps by T. Kohonen. Each clustering neural network consists of a different number of neurons where an amount of clusters is connected to a quality of the clustering process. All ensemble’s members process information which is fed sequentially to the system in a parallel mode. The effectiveness of the clustering process is determined using the Caliński-Harabasz index. The self-learning algorithm uses a similarity measure of a special type. A main feature of the proposed method is an absence of the competition step, i.e. neuron-winner is not determined. A number of experiments has been held in order to investigate the proposed system’s properties. Experimental results have confirmed the fact that the system under consideration could be used for solving a wide range of Data Mining tasks when data sets are processed in an on-line mode. The proposed ensemble system provides computational simplicity, and data sets are processed faster due to the possibility of parallel tuning.


Clustering X-means method Ensemble of neural networks Self-organization map Self-learning Kohonen neural network Similarity measure 


  1. 1.
    Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms and Application. SIAM, Philadelphia (2007)CrossRefGoogle Scholar
  2. 2.
    Xu, R., Wunsch, D.C.: Clustering. Computational Intelligence. IEEE Press/Wiley, Hoboken (2009)Google Scholar
  3. 3.
    Hu, Z., Bodyanskiy, Y.V., Tyshchenko, O.K., Tkachov, V.M.: Fuzzy clustering data arrays with omitted observations. Int. J. Intell. Syst. Appl. (IJISA) 9(6), 24–32 (2017). Scholar
  4. 4.
    Zhengbing, H., Bodyanskiy, Y.V., Tyshchenko, O.K., Samitova, V.O.: Possibilistic fuzzy clustering for categorical data arrays based on frequency prototypes and dissimilarity measures. Int. J. Intell. Syst. Appl. (IJISA) 9(5), 55–61 (2017). Scholar
  5. 5.
    Pelleg, D., Moor, A.: X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of 17th International Conference on Machine Learning, pp. 727–730. Morgan Kaufmann, San Francisco (2000)Google Scholar
  6. 6.
    Ishioka, T.: An expansion of X-means for automatically determining the optimal number of clusters. In: Proceedings of 4th IASTED International Conference on Computational Intelligence, pp. 91–96. Calgary, Alberta (2005)Google Scholar
  7. 7.
    Zhengbing, H., Bodyanskiy, Y.V., Tyshchenko, O.K., Samitova, V.O.: Fuzzy clustering data given on the ordinal scale based on membership and likelihood functions sharing. Int. J. Intell. Syst. Appl. (IJISA) 9(2), 1–9 (2017). Scholar
  8. 8.
    Hu, Z., Bodyanskiy, Y.V., Tyshchenko, O.K., Samitova, V.O.: Fuzzy clustering data given in the ordinal scale. Int. J. Intell. Syst. Appl. (IJISA), 9(1), 67–74 (2017). Scholar
  9. 9.
    Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. IOS Press, Amsterdam (2010)zbMATHGoogle Scholar
  10. 10.
    Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  11. 11.
    Perova, I., Pliss, I.: Deep hybrid system of computational intelligence with architecture adaptation for medical fuzzy diagnostics. Int. J. Intell. Syst. Appl. (IJISA) 9(7), 12–21 (2017). Scholar
  12. 12.
    Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1866–1881 (2005)CrossRefGoogle Scholar
  14. 14.
    Alizadeh, H., Minaei-Bidgoli, B., Parvin, H.: To improve the quality of cluster ensembles by selecting a subset of base clusters. J. Exp. Theor. Artif. Intell. 26, 127–150 (2013)CrossRefGoogle Scholar
  15. 15.
    Charkhabi, M., Dhot, T., Mojarad, S.A.: Cluster ensembles, majority vote, voter eligibility and privileged voters. Int. J. Mach. Learn. Comput. 4, 275–278 (2014)CrossRefGoogle Scholar
  16. 16.
    Bodyanskiy, Y.: Computational intelligence techniques for data analysis. Lecture Notes in Informatics. GI, Bonn (2005)Google Scholar
  17. 17.
    Bodyanskiy, Y., Rudenko, O.: Artificial Neural Networks: Architecture, Learning, Application. TELETEKH, Kharkiv (2004)Google Scholar
  18. 18.
    Bodyanskiy, Y., Peleshko, D., Vinokurova, O., Mashtalir, S., Ivanov, Y.: Analyzing and Processing of Data Stream using Computational Intelligence. Lvivska Polytehnika Publishing, Lviv (2016)Google Scholar
  19. 19.
    Murphy, P.M., Aha, D.: UCI Repository of machine learning databases. Department of Information and Computer Science. University of California, CA (1994).
  20. 20.
    Bodyanskiy, Y.V., Deineko, A.A., Kutsenko, Y.V.: On-line kernel clustering based on the general regression neural network and T. Kohonen’s self-organizing map. Autom. Control Comput. Sci. 51(1), 55–62 (2017)CrossRefGoogle Scholar
  21. 21.
    Bodyanskiy, Y., Deineko, A., Kutsenko, Y.: Sequential fuzzy clustering based on neuro-fuzzy approach. Radioelectronics Inform. Control 3(38), 30–39 (2016)Google Scholar
  22. 22.
    Zakharian, S., Ladevig-Riebler, P., Tores, S.: Neuronale Netze für Ingenieure: Arbeits und Übungsbuch für regelungs-technische Anwendungen. Vieweg, Braunschweig (1998)CrossRefGoogle Scholar
  23. 23.
    Perova, I., Pliss, G., Churyumov, G., Eze, F.M., Mahmoud, S.M.K.: Neo-fuzzy approach for medical diagnostics tasks in online-mode. In: 1th IEEE International Conference on Data Stream Mining and Processing (DSMP), pp. 34–38 (2016)Google Scholar
  24. 24.
    Bodyanskiy, Y., Deineko, A., Kutsenko, Y., Zayika O.: Data streams fast EM-fuzzy clustering based on Kohonen’s self-learning. In: 1st IEEE International Conference on Data Stream Mining and Processing (DSMP), pp. 309–313 (2016)Google Scholar
  25. 25.
    Frank, A., Asuncion, A.: UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA (2013).
  26. 26.
    Deineko, A., Kutsenko, Y., Pliss, I., Shalamov, M.: Kernel evolving neural networks for sequential principal component analysis and its adaptive learning algorithm. In: International Scientific and Technical Conference Computer Science and Information Technologies (CSIT 2011), Lviv, pp. 107–110 (2015)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Polina Zhernova
    • 1
  • Anastasiya Deyneko
    • 1
  • Zhanna Deyneko
    • 1
  • Irina Pliss
    • 1
  • Volodymyr Ahafonov
    • 1
  1. 1.Kharkiv National University of RadioelectronicsKharkivUkraine

Personalised recommendations