A Comparative Study on k-means Clustering Method and Analysis

Baruri, Rajdeep; Ghosh, Anannya; Chanda, Saikat; Banerjee, Ranjan; Das, Anindya; Mandal, Arindam; Halder, Tapas

doi:10.1007/978-981-13-8300-7_10

A Comparative Study on k-means Clustering Method and Analysis

Rajdeep Baruri¹²,
Anannya Ghosh¹³,
Saikat Chanda¹³,
Ranjan Banerjee¹²,
Anindya Das¹²,
Arindam Mandal¹² &
…
Tapas Halder¹⁴

Conference paper
First Online: 18 May 2019

1149 Accesses
1 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 985))

Abstract

A study of three clustering methods using four different cluster validity metrics is being presented here. We have discussed the clustering methods and made an analysis. We have given the mathematical formation of four cluster validity measures. From the experimental outcomes, indications regarding the optimal validation method, as well as, optimal clustering method are being presented. Choice of preferable clustering technique is presented after getting outcomes using real-world data sets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Abbas, O.A.: Comparisons between data clustering algorithms. Int. Arab J. Inf. Technol. 5, 320–325 (2008)
Google Scholar
Bezdek, J.C., Pal, N.R.: Some new indices of cluster validity. IEEE Trans. Syst. Man Cybern. 28, 301–315 (1998)
Article Google Scholar
Bradley, P.S., Fayyad, U.M.: Refining initial points for \(k\)-means clustering. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 91–99 (1998)
Google Scholar
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. 2, 224–227 (1979)
Article Google Scholar
Dheeru, D., Taniskidou, E.K.: UCI Machine Learning Repository (2017)
Google Scholar
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4, 95–104 (1974)
Article MathSciNet Google Scholar
Eslamnezhad, M., Varjani, A.Y.: Intrusion detection based on MinMax K-means clustering. In: 7th International Symposium on Telecommunications, pp. 804–808 (2014)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques, 3rd edn. Morgan Kaufmann, Burlington (2011)
MATH Google Scholar
Hand, D., Smyth, P.: Principles of Data Mining. MIT Press, Cambridge (2001)
Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 2, 283–304 (1998)
Article Google Scholar
Johnson, T., Singh, S.K.: K-strange points clustering algorithm. In: Jain, L.C., Behera, H.S., Mandal, J.K., Mohapatra, D.P. (eds.) Computational Intelligence in Data Mining - Volume 1. SIST, vol. 31, pp. 415–425. Springer, New Delhi (2015). https://doi.org/10.1007/978-81-322-2205-7_39
Chapter Google Scholar
Jones, N.C., Pevzner, P.A.: An Introduction to Bioinformatics Algorithms. The MIT Press, Cambridge (2004)
Google Scholar
Krey, S., Ligges, U., Leisch, F.: Music and timbre segmentation by recursive constrained K-means clustering. Comput. Stat. 29, 37–50 (2014)
Article MathSciNet Google Scholar
Li, W.: Modified K-means clustering algorithm. In: 2008 Congress on Image and Signal Processing, pp. 618–621 (2008)
Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
Mahmud, M.S., Rahman, M.M., Akhtar, M.N.: Improvement of k-means clustering algorithm with better initial centroids based on weighted average. In: International Conference on Electrical & Computer Engineering, pp. 647–650 (2012)
Google Scholar
Maulik, U., Bandyopadhyay, S.: Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1650–1654 (2002)
Article Google Scholar
Na, S., Xumin, L., Yong, G.: Research on \(k\)-means clustering algorithm: an improved \(k\)-means clustering algorithm. In: Proceedings of the Third International Symposium on Intelligent Information Technology and Security Informatics, pp. 63–67 (2010)
Google Scholar
Patil, Y.S., Vaidya, M.B.: A technical survey on cluster analysis in data mining. Int. J. Emerg. Technol. Adv. Eng. 2, 503–513 (2012)
Google Scholar
Peña, J.M.S., Lozano, J.A., Larrañaga, P.: An empirical comparison of four initialization methods for the \({k}\)-means algorithm. Pattern Recogn. Lett. 20, 1027–1040 (1999)
Article Google Scholar
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987)
Article Google Scholar
Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1994)
Book Google Scholar
Wilkin, G.A., Huang, X.: \({K}\)-means clustering algorithms: implementation and comparison. In: Proceedings of the Second International Multi-Symposiums on Computer and Computational Sciences, pp. 133–136 (2007)
Google Scholar
Zhao, Q., Hautamaki, V., Fränti, P.: Knee point detection in BIC for detecting the number of clusters. In: Blanc-Talon, J., Bourennane, S., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2008. LNCS, vol. 5259, pp. 664–673. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88458-3_60
Chapter Google Scholar

Download references

Acknowledgment

This research is funded by Jadavpur University (UGC-UPE, Phase-II, grant no. P-1/RS/115/13).

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Rajdeep Baruri, Ranjan Banerjee, Anindya Das & Arindam Mandal
Department of Computer Science and Engineering, Institute of Engineering and Management, Kolkata, India
Anannya Ghosh & Saikat Chanda
Cyber Patrol Cell, Kolkata Police, Kolkata, India
Tapas Halder

Authors

Rajdeep Baruri
View author publications
You can also search for this author in PubMed Google Scholar
Anannya Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Saikat Chanda
View author publications
You can also search for this author in PubMed Google Scholar
Ranjan Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Anindya Das
View author publications
You can also search for this author in PubMed Google Scholar
Arindam Mandal
View author publications
You can also search for this author in PubMed Google Scholar
Tapas Halder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Rajdeep Baruri or Ranjan Banerjee .

Editor information

Editors and Affiliations

College of Engineering, Iowa State University, Ames, IA, USA
Arun K. Somani
National University of Singapore, Singapore, Singapore
Seeram Ramakrishna
Swami Keshvanand Institute of Technology Management and Gramothan, Jaipur, India
Anil Chaudhary
Swami Keshvanand Institute of Technology Management and Gramothan, Jaipur, India
Chothmal Choudhary
Swami Keshvanand Institute of Technology Management and Gramothan, Jaipur, India
Basant Agarwal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Baruri, R. et al. (2019). A Comparative Study on k-means Clustering Method and Analysis. In: Somani, A., Ramakrishna, S., Chaudhary, A., Choudhary, C., Agarwal, B. (eds) Emerging Technologies in Computer Engineering: Microservices in Big Data Analytics. ICETCE 2019. Communications in Computer and Information Science, vol 985. Springer, Singapore. https://doi.org/10.1007/978-981-13-8300-7_10

Download citation

DOI: https://doi.org/10.1007/978-981-13-8300-7_10
Published: 18 May 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8299-4
Online ISBN: 978-981-13-8300-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics