Abstract
An on-line modified X-means method is proposed for solving data stream clustering tasks in conditions when an amount of clusters is apriori unknown. This approach is based on an ensemble of clustering neural networks that contains the self-organizing maps by T. Kohonen. Each clustering neural network consists of a different number of neurons where an amount of clusters is connected to a quality of the clustering process. All ensemble’s members process information which is fed sequentially to the system in a parallel mode. The effectiveness of the clustering process is determined using the Caliński-Harabasz index. The self-learning algorithm uses a similarity measure of a special type. A main feature of the proposed method is an absence of the competition step, i.e. neuron-winner is not determined. A number of experiments has been held in order to investigate the proposed system’s properties. Experimental results have confirmed the fact that the system under consideration could be used for solving a wide range of Data Mining tasks when data sets are processed in an on-line mode. The proposed ensemble system provides computational simplicity, and data sets are processed faster due to the possibility of parallel tuning.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms and Application. SIAM, Philadelphia (2007)
Xu, R., Wunsch, D.C.: Clustering. Computational Intelligence. IEEE Press/Wiley, Hoboken (2009)
Hu, Z., Bodyanskiy, Y.V., Tyshchenko, O.K., Tkachov, V.M.: Fuzzy clustering data arrays with omitted observations. Int. J. Intell. Syst. Appl. (IJISA) 9(6), 24–32 (2017). https://doi.org/10.5815/ijisa.2017.06.03
Zhengbing, H., Bodyanskiy, Y.V., Tyshchenko, O.K., Samitova, V.O.: Possibilistic fuzzy clustering for categorical data arrays based on frequency prototypes and dissimilarity measures. Int. J. Intell. Syst. Appl. (IJISA) 9(5), 55–61 (2017). https://doi.org/10.5815/ijisa.2017.05.07
Pelleg, D., Moor, A.: X-means: extending K-means with efficient estimation of the number of clusters. In: Proceedings of 17th International Conference on Machine Learning, pp. 727–730. Morgan Kaufmann, San Francisco (2000)
Ishioka, T.: An expansion of X-means for automatically determining the optimal number of clusters. In: Proceedings of 4th IASTED International Conference on Computational Intelligence, pp. 91–96. Calgary, Alberta (2005)
Zhengbing, H., Bodyanskiy, Y.V., Tyshchenko, O.K., Samitova, V.O.: Fuzzy clustering data given on the ordinal scale based on membership and likelihood functions sharing. Int. J. Intell. Syst. Appl. (IJISA) 9(2), 1–9 (2017). https://doi.org/10.5815/ijisa.2017.02.01
Hu, Z., Bodyanskiy, Y.V., Tyshchenko, O.K., Samitova, V.O.: Fuzzy clustering data given in the ordinal scale. Int. J. Intell. Syst. Appl. (IJISA), 9(1), 67–74 (2017). https://doi.org/10.5815/ijisa.2017.01.07
Bifet, A.: Adaptive Stream Mining: Pattern Learning and Mining from Evolving Data Streams. IOS Press, Amsterdam (2010)
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)
Perova, I., Pliss, I.: Deep hybrid system of computational intelligence with architecture adaptation for medical fuzzy diagnostics. Int. J. Intell. Syst. Appl. (IJISA) 9(7), 12–21 (2017). https://doi.org/10.5815/ijisa.2017.07.02
Strehl, A., Ghosh, J.: Cluster ensembles – a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2002)
Topchy, A., Jain, A.K., Punch, W.: Clustering ensembles: models of consensus and weak partitions. IEEE Trans. Pattern Anal. Mach. Intell. 27, 1866–1881 (2005)
Alizadeh, H., Minaei-Bidgoli, B., Parvin, H.: To improve the quality of cluster ensembles by selecting a subset of base clusters. J. Exp. Theor. Artif. Intell. 26, 127–150 (2013)
Charkhabi, M., Dhot, T., Mojarad, S.A.: Cluster ensembles, majority vote, voter eligibility and privileged voters. Int. J. Mach. Learn. Comput. 4, 275–278 (2014)
Bodyanskiy, Y.: Computational intelligence techniques for data analysis. Lecture Notes in Informatics. GI, Bonn (2005)
Bodyanskiy, Y., Rudenko, O.: Artificial Neural Networks: Architecture, Learning, Application. TELETEKH, Kharkiv (2004)
Bodyanskiy, Y., Peleshko, D., Vinokurova, O., Mashtalir, S., Ivanov, Y.: Analyzing and Processing of Data Stream using Computational Intelligence. Lvivska Polytehnika Publishing, Lviv (2016)
Murphy, P.M., Aha, D.: UCI Repository of machine learning databases. Department of Information and Computer Science. University of California, CA (1994). http://www.ics.uci.edu/mlearn/MLRepository.html
Bodyanskiy, Y.V., Deineko, A.A., Kutsenko, Y.V.: On-line kernel clustering based on the general regression neural network and T. Kohonen’s self-organizing map. Autom. Control Comput. Sci. 51(1), 55–62 (2017)
Bodyanskiy, Y., Deineko, A., Kutsenko, Y.: Sequential fuzzy clustering based on neuro-fuzzy approach. Radioelectronics Inform. Control 3(38), 30–39 (2016)
Zakharian, S., Ladevig-Riebler, P., Tores, S.: Neuronale Netze für Ingenieure: Arbeits und Übungsbuch für regelungs-technische Anwendungen. Vieweg, Braunschweig (1998)
Perova, I., Pliss, G., Churyumov, G., Eze, F.M., Mahmoud, S.M.K.: Neo-fuzzy approach for medical diagnostics tasks in online-mode. In: 1th IEEE International Conference on Data Stream Mining and Processing (DSMP), pp. 34–38 (2016)
Bodyanskiy, Y., Deineko, A., Kutsenko, Y., Zayika O.: Data streams fast EM-fuzzy clustering based on Kohonen’s self-learning. In: 1st IEEE International Conference on Data Stream Mining and Processing (DSMP), pp. 309–313 (2016)
Frank, A., Asuncion, A.: UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA (2013). http://archive.ics.uci.edu/ml
Deineko, A., Kutsenko, Y., Pliss, I., Shalamov, M.: Kernel evolving neural networks for sequential principal component analysis and its adaptive learning algorithm. In: International Scientific and Technical Conference Computer Science and Information Technologies (CSIT 2011), Lviv, pp. 107–110 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Zhernova, P., Deyneko, A., Deyneko, Z., Pliss, I., Ahafonov, V. (2019). Data Stream Clustering in Conditions of an Unknown Amount of Classes. In: Hu, Z., Petoukhov, S., Dychka, I., He, M. (eds) Advances in Computer Science for Engineering and Education. ICCSEEA 2018. Advances in Intelligent Systems and Computing, vol 754. Springer, Cham. https://doi.org/10.1007/978-3-319-91008-6_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-91008-6_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91007-9
Online ISBN: 978-3-319-91008-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)