Advertisement

A Comparative Study on k-means Clustering Method and Analysis

  • Rajdeep BaruriEmail author
  • Anannya Ghosh
  • Saikat Chanda
  • Ranjan BanerjeeEmail author
  • Anindya Das
  • Arindam Mandal
  • Tapas Halder
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 985)

Abstract

A study of three clustering methods using four different cluster validity metrics is being presented here. We have discussed the clustering methods and made an analysis. We have given the mathematical formation of four cluster validity measures. From the experimental outcomes, indications regarding the optimal validation method, as well as, optimal clustering method are being presented. Choice of preferable clustering technique is presented after getting outcomes using real-world data sets.

Keywords

Data analytics Machine learning Algorithm analysis Clustering validity 

Notes

Acknowledgment

This research is funded by Jadavpur University (UGC-UPE, Phase-II, grant no. P-1/RS/115/13).

References

  1. 1.
    Abbas, O.A.: Comparisons between data clustering algorithms. Int. Arab J. Inf. Technol. 5, 320–325 (2008)Google Scholar
  2. 2.
    Bezdek, J.C., Pal, N.R.: Some new indices of cluster validity. IEEE Trans. Syst. Man Cybern. 28, 301–315 (1998)CrossRefGoogle Scholar
  3. 3.
    Bradley, P.S., Fayyad, U.M.: Refining initial points for \(k\)-means clustering. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 91–99 (1998)Google Scholar
  4. 4.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)CrossRefGoogle Scholar
  5. 5.
    Dheeru, D., Taniskidou, E.K.: UCI Machine Learning Repository (2017)Google Scholar
  6. 6.
    Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Eslamnezhad, M., Varjani, A.Y.: Intrusion detection based on MinMax K-means clustering. In: 7th International Symposium on Telecommunications, pp. 804–808 (2014)Google Scholar
  8. 8.
    Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)zbMATHGoogle Scholar
  9. 9.
    Hand, D., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)Google Scholar
  10. 10.
    Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2, 283–304 (1998)CrossRefGoogle Scholar
  11. 11.
    Johnson, T., Singh, S.K.: K-strange points clustering algorithm. In: Jain, L.C., Behera, H.S., Mandal, J.K., Mohapatra, D.P. (eds.) Computational Intelligence in Data Mining - Volume 1. SIST, vol. 31, pp. 415–425. Springer, New Delhi (2015).  https://doi.org/10.1007/978-81-322-2205-7_39CrossRefGoogle Scholar
  12. 12.
    Jones, N.C., Pevzner, P.A.: An Introduction to Bioinformatics Algorithms. The MIT Press, Cambridge (2004)Google Scholar
  13. 13.
    Krey, S., Ligges, U., Leisch, F.: Music and timbre segmentation by recursive constrained K-means clustering. Comput. Stat. 29, 37–50 (2014)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Li, W.: Modified K-means clustering algorithm. In: 2008 Congress on Image and Signal Processing, pp. 618–621 (2008)Google Scholar
  15. 15.
    Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Mahmud, M.S., Rahman, M.M., Akhtar, M.N.: Improvement of k-means clustering algorithm with better initial centroids based on weighted average. In: International Conference on Electrical & Computer Engineering, pp. 647–650 (2012)Google Scholar
  17. 17.
    Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1650–1654 (2002)CrossRefGoogle Scholar
  18. 18.
    Na, S., Xumin, L., Yong, G.: Research on \(k\)-means clustering algorithm: an improved \(k\)-means clustering algorithm. In: Proceedings of the Third International Symposium on Intelligent Information Technology and Security Informatics, pp. 63–67 (2010)Google Scholar
  19. 19.
    Patil, Y.S., Vaidya, M.B.: A technical survey on cluster analysis in data mining. Int. J. Emerg. Technol. Adv. Eng. 2, 503–513 (2012)Google Scholar
  20. 20.
    Peña, J.M.S., Lozano, J.A., Larrañaga, P.: An empirical comparison of four initialization methods for the \({k}\)-means algorithm. Pattern Recogn. Lett. 20, 1027–1040 (1999)CrossRefGoogle Scholar
  21. 21.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)CrossRefGoogle Scholar
  22. 22.
    Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1994)CrossRefGoogle Scholar
  23. 23.
    Wilkin, G.A., Huang, X.: \({K}\)-means clustering algorithms: implementation and comparison. In: Proceedings of the Second International Multi-Symposiums on Computer and Computational Sciences, pp. 133–136 (2007)Google Scholar
  24. 24.
    Zhao, Q., Hautamaki, V., Fränti, P.: Knee point detection in BIC for detecting the number of clusters. In: Blanc-Talon, J., Bourennane, S., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2008. LNCS, vol. 5259, pp. 664–673. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-88458-3_60CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  • Rajdeep Baruri
    • 1
    Email author
  • Anannya Ghosh
    • 2
  • Saikat Chanda
    • 2
  • Ranjan Banerjee
    • 1
    Email author
  • Anindya Das
    • 1
  • Arindam Mandal
    • 1
  • Tapas Halder
    • 3
  1. 1.Department of Computer Science and EngineeringJadavpur UniversityKolkataIndia
  2. 2.Department of Computer Science and EngineeringInstitute of Engineering and ManagementKolkataIndia
  3. 3.Cyber Patrol Cell, Kolkata PoliceKolkataIndia

Personalised recommendations