Advertisement

A Fast and Efficient Grid-Based K-means++ Clustering Algorithm for Large-Scale Datasets

  • Yang YangEmail author
  • Zhixiang Zhu
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 891)

Abstract

In the k-means clustering algorithm, the selection of the initial clustering center affects the clustering efficiency. Currently widely used k-means++ can effectively improve the speed and accuracy of k-means. But k-means cluster algorithm does not scale well to massive datasets, as it needs to traverse the data set multiple times. In this paper, based on k-means++ clustering algorithm and grid clustering algorithm, a fast and efficient grid-based k-means++ clustering algorithm was proposed, which can efficiently process large-scale data. First, the N-dimensional space is granulated into disjoint rectangular grid cells. Then, the dense grid cell is marked by statistical gird cell information. Finally, the modified k-means++ clustering algorithm is applied to the meshed datasets. The experimental results on the simulation dataset show that compared with the original k-means++ clustering algorithm, the proposed algorithm can quickly obtain the clustering center and can effectively deal with the clustering problem of large-scale datasets.

Keywords

K-means K-means++ Grid-based clustering algorithm Large-scale datasets 

References

  1. 1.
    Chen, Y.S., Chen, B.T.: Efficient fuzzy c-means clustering for image data. J. Electron. Imaging 14(1), 013017 (2005).  https://doi.org/10.1117/1.1879012CrossRefGoogle Scholar
  2. 2.
    Lavrač, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16(1), 3–23 (1999).  https://doi.org/10.1016/S0933-3657(98)00062-1MathSciNetCrossRefGoogle Scholar
  3. 3.
    Nazeri, Z., Bloedorn, E., Ostwald, P.: Experiences in mining aviation safety data. In: ACM SIGMOD Record, vol. 30, No. 2, pp. 562–566. ACM (2001).  https://doi.org/10.1145/376284.375743CrossRefGoogle Scholar
  4. 4.
    Lynch, C.: Big data: How do your data grow? Nature 455(7209), 28 (2008).  https://doi.org/10.1038/455028aCrossRefGoogle Scholar
  5. 5.
    Hartigan, J.A., Wong, M.A.: Algorithm AS 136: a k-means clustering algorithm. J. Roy. Stat. Soc. Ser. C (Appl. Stat.) 28(1), 100–108 (1979).  https://doi.org/10.2307/2346830CrossRefGoogle Scholar
  6. 6.
    Arthur, D., Vassilvitskii, S.: k-means ++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007).  https://doi.org/10.1145/1283383.1283494
  7. 7.
    Anusha, M., Sathiaseelan, J.G.R.: Feature selection using k-means genetic algorithm for multi-objective optimization. Procedia Comput. Sci. 57, 1074–1080 (2015).  https://doi.org/10.1016/j.procs.2015.07.387CrossRefGoogle Scholar
  8. 8.
    Li, M.J., Ng, M.K., Cheung, Y.M., Huang, J.Z.: Agglomerative fuzzy k-means clustering algorithm with selection of number of clusters. IEEE Trans. Knowl. Data Eng. 20(11), 1519–1534 (2008).  https://doi.org/10.1109/TKDE.2008.88CrossRefGoogle Scholar
  9. 9.
    Berger, M., Rigoutsos, I.: An algorithm for point clustering and grid generation. IEEE Trans. Syst. Man Cybern. 21(5), 1278–1286 (1991).  https://doi.org/10.1109/21.120081CrossRefGoogle Scholar
  10. 10.
    Bhatnagar, V., Kaur, S., Chakravarthy, S.: Clustering data streams using grid-based synopsis. Knowl. Inf. Syst. 41(1), 127–152 (2014).  https://doi.org/10.1007/s10115-013-0659-1CrossRefGoogle Scholar
  11. 11.
    Park, N.H., Lee, W.S.: Statistical grid-based clustering over data streams. ACM Sigmod Record 33(1), 32–37 (2004).  https://doi.org/10.1145/974121.974127CrossRefGoogle Scholar
  12. 12.
    Yue, S., Wei, M., Wang, J.S., Wang, H.: A general grid-clustering approach. Pattern Recogn. Lett. 29(9), 1372–1384 (2008).  https://doi.org/10.1016/j.patrec.2008.02.019CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Computer Science and TechnologyXi’an University of Posts and TelecommunicationsXi’anChina
  2. 2.Institute of IOT & IT-Based IndustrializationXi’an University of Posts and TelecommunicationsXi’anChina

Personalised recommendations