A Fast and Efficient Grid-Based K-means++ Clustering Algorithm for Large-Scale Datasets

Yang, Yang; Zhu, Zhixiang

doi:10.1007/978-3-030-03766-6_57

A Fast and Efficient Grid-Based K-means++ Clustering Algorithm for Large-Scale Datasets

Yang Yang¹⁸ &
Zhixiang Zhu¹⁹

Conference paper
First Online: 25 December 2018

703 Accesses
2 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 891))

Abstract

In the k-means clustering algorithm, the selection of the initial clustering center affects the clustering efficiency. Currently widely used k-means++ can effectively improve the speed and accuracy of k-means. But k-means cluster algorithm does not scale well to massive datasets, as it needs to traverse the data set multiple times. In this paper, based on k-means++ clustering algorithm and grid clustering algorithm, a fast and efficient grid-based k-means++ clustering algorithm was proposed, which can efficiently process large-scale data. First, the N-dimensional space is granulated into disjoint rectangular grid cells. Then, the dense grid cell is marked by statistical gird cell information. Finally, the modified k-means++ clustering algorithm is applied to the meshed datasets. The experimental results on the simulation dataset show that compared with the original k-means++ clustering algorithm, the proposed algorithm can quickly obtain the clustering center and can effectively deal with the clustering problem of large-scale datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Chen, Y.S., Chen, B.T.: Efficient fuzzy c-means clustering for image data. J. Electron. Imaging 14(1), 013017 (2005). https://doi.org/10.1117/1.1879012
Article Google Scholar
Lavrač, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16(1), 3–23 (1999). https://doi.org/10.1016/S0933-3657(98)00062-1
Article MathSciNet Google Scholar
Nazeri, Z., Bloedorn, E., Ostwald, P.: Experiences in mining aviation safety data. In: ACM SIGMOD Record, vol. 30, No. 2, pp. 562–566. ACM (2001). https://doi.org/10.1145/376284.375743
Article Google Scholar
Lynch, C.: Big data: How do your data grow? Nature 455(7209), 28 (2008). https://doi.org/10.1038/455028a
Article Google Scholar
Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979). https://doi.org/10.2307/2346830
Article Google Scholar
Arthur, D., Vassilvitskii, S.: k-means ++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007). https://doi.org/10.1145/1283383.1283494
Anusha, M., Sathiaseelan, J.G.R.: Feature selection using k-means genetic algorithm for multi-objective optimization. Procedia Comput. Sci. 57, 1074–1080 (2015). https://doi.org/10.1016/j.procs.2015.07.387
Article Google Scholar
Li, M.J., Ng, M.K., Cheung, Y.M., Huang, J.Z.: Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters. IEEE Trans. Knowl. Data Eng. 20(11), 1519–1534 (2008). https://doi.org/10.1109/TKDE.2008.88
Article Google Scholar
Berger, M., Rigoutsos, I.: An algorithm for point clustering and grid generation. IEEE Trans. Syst. Man Cybern. 21(5), 1278–1286 (1991). https://doi.org/10.1109/21.120081
Article Google Scholar
Bhatnagar, V., Kaur, S., Chakravarthy, S.: Clustering data streams using grid-based synopsis. Knowl. Inf. Syst. 41(1), 127–152 (2014). https://doi.org/10.1007/s10115-013-0659-1
Article Google Scholar
Park, N.H., Lee, W.S.: Statistical grid-based clustering over data streams. ACM Sigmod Record 33(1), 32–37 (2004). https://doi.org/10.1145/974121.974127
Article Google Scholar
Yue, S., Wei, M., Wang, J.S., Wang, H.: A general grid-clustering approach. Pattern Recogn. Lett. 29(9), 1372–1384 (2008). https://doi.org/10.1016/j.patrec.2008.02.019
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Technology, Xi’an University of Posts and Telecommunications, Xi’an, 710121, China
Yang Yang
Institute of IOT & IT-Based Industrialization, Xi’an University of Posts and Telecommunications, Xi’an, 710121, China
Zhixiang Zhu

Authors

Yang Yang
View author publications
You can also search for this author in PubMed Google Scholar
Zhixiang Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Yang .

Editor information

Editors and Affiliations

Department of Computer Science, VSB-Technical University of Ostrava, Ostrava, Czech Republic
Pavel Krömer
School of Automation, Xi’an University of Posts and Telecommunications, Xi’an, China
Hong Zhang
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, China
Yongquan Liang
College of Information Science and Engineering, Fujian University of Technology, Fuzhou, Fujian, China
Jeng-Shyang Pan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Y., Zhu, Z. (2019). A Fast and Efficient Grid-Based K-means++ Clustering Algorithm for Large-Scale Datasets. In: Krömer, P., Zhang, H., Liang, Y., Pan, JS. (eds) Proceedings of the Fifth Euro-China Conference on Intelligent Data Analysis and Applications. ECC 2018. Advances in Intelligent Systems and Computing, vol 891. Springer, Cham. https://doi.org/10.1007/978-3-030-03766-6_57

Download citation

DOI: https://doi.org/10.1007/978-3-030-03766-6_57
Published: 25 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03765-9
Online ISBN: 978-3-030-03766-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics