Abstract
In the k-means clustering algorithm, the selection of the initial clustering center affects the clustering efficiency. Currently widely used k-means++ can effectively improve the speed and accuracy of k-means. But k-means cluster algorithm does not scale well to massive datasets, as it needs to traverse the data set multiple times. In this paper, based on k-means++ clustering algorithm and grid clustering algorithm, a fast and efficient grid-based k-means++ clustering algorithm was proposed, which can efficiently process large-scale data. First, the N-dimensional space is granulated into disjoint rectangular grid cells. Then, the dense grid cell is marked by statistical gird cell information. Finally, the modified k-means++ clustering algorithm is applied to the meshed datasets. The experimental results on the simulation dataset show that compared with the original k-means++ clustering algorithm, the proposed algorithm can quickly obtain the clustering center and can effectively deal with the clustering problem of large-scale datasets.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Chen, Y.S., Chen, B.T.: Efficient fuzzy c-means clustering for image data. J. Electron. Imaging 14(1), 013017 (2005). https://doi.org/10.1117/1.1879012
Lavrač, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16(1), 3–23 (1999). https://doi.org/10.1016/S0933-3657(98)00062-1
Nazeri, Z., Bloedorn, E., Ostwald, P.: Experiences in mining aviation safety data. In: ACM SIGMOD Record, vol. 30, No. 2, pp. 562–566. ACM (2001). https://doi.org/10.1145/376284.375743
Lynch, C.: Big data: How do your data grow? Nature 455(7209), 28 (2008). https://doi.org/10.1038/455028a
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979). https://doi.org/10.2307/2346830
Arthur, D., Vassilvitskii, S.: k-means ++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007). https://doi.org/10.1145/1283383.1283494
Anusha, M., Sathiaseelan, J.G.R.: Feature selection using k-means genetic algorithm for multi-objective optimization. Procedia Comput. Sci. 57, 1074–1080 (2015). https://doi.org/10.1016/j.procs.2015.07.387
Li, M.J., Ng, M.K., Cheung, Y.M., Huang, J.Z.: Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters. IEEE Trans. Knowl. Data Eng. 20(11), 1519–1534 (2008). https://doi.org/10.1109/TKDE.2008.88
Berger, M., Rigoutsos, I.: An algorithm for point clustering and grid generation. IEEE Trans. Syst. Man Cybern. 21(5), 1278–1286 (1991). https://doi.org/10.1109/21.120081
Bhatnagar, V., Kaur, S., Chakravarthy, S.: Clustering data streams using grid-based synopsis. Knowl. Inf. Syst. 41(1), 127–152 (2014). https://doi.org/10.1007/s10115-013-0659-1
Park, N.H., Lee, W.S.: Statistical grid-based clustering over data streams. ACM Sigmod Record 33(1), 32–37 (2004). https://doi.org/10.1145/974121.974127
Yue, S., Wei, M., Wang, J.S., Wang, H.: A general grid-clustering approach. Pattern Recogn. Lett. 29(9), 1372–1384 (2008). https://doi.org/10.1016/j.patrec.2008.02.019
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Yang, Y., Zhu, Z. (2019). A Fast and Efficient Grid-Based K-means++ Clustering Algorithm for Large-Scale Datasets. In: Krömer, P., Zhang, H., Liang, Y., Pan, JS. (eds) Proceedings of the Fifth Euro-China Conference on Intelligent Data Analysis and Applications. ECC 2018. Advances in Intelligent Systems and Computing, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-03766-6_57
Download citation
DOI: https://doi.org/10.1007/978-3-030-03766-6_57
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03765-9
Online ISBN: 978-3-030-03766-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)