An Efficient Approach for Selection of Initial Cluster Centroids for k-means

Gupta, Manoj Kr.; Chandra, Pravin

doi:10.1007/978-981-15-5827-6_1

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1229))

Included in the following conference series:

International Conference on Recent Developments in Science, Engineering and Technology

724 Accesses

Abstract

Choice of initial centroids has a major impact on the performance and accuracy of k-means algorithm to group the data objects into various clusters. In basic k-means, pure arbitrary choice of initial centroids lead to construction of different clusters in every run and consequently affects the performance and accuracy of it. To date, several attempts have been made by the researchers to increase the performance and accuracy of it. However, scope of improvement still exists in this area. Therefore, a new approach to initialize centroids for k-means is proposed in this paper on the basis of the concept to choose the well separated data-objects as initial cluster centroids instead of pure arbitrary selection. As a consequence, it leads to higher probability of closeness of the chosen centroids to the final cluster centroids. The proposed algorithm is empirically assessed on 6 different well-known datasets. The results confirms that the proposed approach is considerably better than the pure arbitrary selection of centroids.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arora, R.K., Gupta, M.K.: e-Governance using data warehousing and data mining. Int. J. Comput. Appl. 169(8), 28–31 (2017). https://doi.org/10.5120/ijca2017914785
Article Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques, 3rd edn. Elsevier (2012)
Google Scholar
Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice Hall, Englewood Cliffs (1988)
Google Scholar
Gupta, M.K., Chandra, P.: A comparative study of clustering algorithms. In: Proceedings of the 13th INDIACom-2019; IEEE Conference ID: 461816; 6th International Conference on Computing for Sustainable Global Development (2019)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 60 (1999)
Article Google Scholar
Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. American Statistical Association and the Society for Industrial and Applied Mathematics. SIAM (2007)
Google Scholar
Gupta, M.K., Chandra, P.: P-k-means: k-means using partition based cluster initialization method. In: Proceedings of the International Conference on Advancements in Computing & Management (ICACM 2019), pp. 567–573. Elsevier SSRN (2019). https://doi.org/10.2139/ssrn.3462549
Gupta, M.K., Chandra, P.: HYBCIM: hypercube based cluster initialization method for k-means. Int. J. Innov. Technol. Explor. Eng. 8(10), 3584–3587 (2019). https://doi.org/10.35940/ijitee.j9774.0881019
Gupta, M.K., Chandra, P.: An empirical evaluation of K-means clustering algorithm using different distance/similarity metrics. In: ICETIT 2019. LNEE, vol. 605, pp. 884–892. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-030-30577-2_79
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: Clustering validity checking methods: part iI. ACM SIGMOD Rec. 31(3) (2002). https://doi.org/10.1145/601858.601862
Article Google Scholar
Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)
Google Scholar
Motwani, M., Arora, N., Gupta, A.: A study on initial centroids selection for partitional clustering algorithms. In: Hoda, M., Chauhan, N., Quadri, S., Srivastava, P. (eds.) Software Engineering. Advances in Intelligent Systems and Computing, vol. 731. Springer, Heidelberg (2019). https://doi.org/10.1007/978-981-10-8848-3_21
Google Scholar
Jain, A.K.: Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31, 651–666 (2010)
Article Google Scholar
Xu, D., Tian, Y.: A comprehensive survey of clustering algorithms. Ann. Data. Sci. (2015). https://doi.org/10.1007/s40745-015-0040-1
Article MathSciNet Google Scholar
Forgy, E.: Cluster analysis of multivariate data: efficiency vs. interpretability of classifications. Biometrics 21(3), 768 (1965)
Google Scholar
McQueen, J.B.: Some methods for classification and analysis of multi-variate observation. In: Symposium on Mathematical Statistics and Probability, University of California Press (1967)
Google Scholar
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data. An Introduction to Cluster Analysis. Wiley, Hoboken (1990)
Google Scholar
Katsavounidis, I, Kuo, C., Zhang, Z.: A new initialization technique for generalized Lloyd iteration. IEEE 1(10), 144–146 (1994)
Article Google Scholar
Bradley, P.S., Fayyad, U.M.: Refining initial points for K-Means clustering. In: Proceedings of the 15th International Conference on Machine Learning, San Francisco, CA, pp. 91–99 (1998)
Google Scholar
Pei, J., Fan, J., Xie, W.: A new initialization method of cluster centers. J. Electron. 16(4), 320–326 (1999). https://doi.org/10.1007/s11767-999-0033-3
Article Google Scholar
Khan, S.S., Ahmad, A.: Cluster centre initialization algorithm for K-means clustering. Pattern Recogn. Lett. 25(11), 1293–1302 (2004)
Article Google Scholar
Su, T., Dy, J.: A deterministic method for initializing K-means clustering. Tools with artificial intelligence. In: 16th IEEE International Conference, ICTAI 2004, pp. 784–786 (2004)
Google Scholar
Hathaway, R.J., Bezdek, J.C., Huband, J.M.: Maximin initialization for cluster analysis. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225. Springer, Heidelberg (2006). https://doi.org/10.1007/11892755_2
Google Scholar
Arai, K., Barakbah, A.R.: Hierarchical K-means: an algorithm for centroids initialization for K-means. Rep. Fac. Sci. Eng. Saga Univ. 36 (2007)
Google Scholar
Arthur, D., Vassilvitskii, S.: k-means ++: The advantages of careful seeding. In: ACM-SIAM Symposium on Discrete Algorithms (SODA 2007) Astor Crowne Plaza, New Orleans, Louisiana, pp. 1–11 (2007)
Google Scholar
Wu, S., Jiang, Q., Huang, J.Z.: A new initialization method for clustering categorical data. In: Zhou, Z.H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS, vol. 4426, pp. 972–980. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71701-0_109
Kang, P., Cho, S.: K-means clustering seeds initialization based on centrality, sparsity, and isotropy. In: Corchado, E., Yin, H. (eds.) IDEAL 2009. LNCS, vol. 5788. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04394-9_14
Chapter Google Scholar
Maitra, R.: Initializing partition-optimization algorithms. IEEE/ACM Trans. Comput. Biol. Bioinform. 6, 144–157 (2009)
Article Google Scholar
Xu, J., Xu, B., Zhang, W.: Stable initialization scheme for K-means clustering. Wuhan Univ. J. Nat. Sci. 14(1), 24–28 (2009). https://doi.org/10.1007/s11859-009-0106-z
Article MathSciNet Google Scholar
Dang, Y., Xuan, Z., Rong, L., Liu, M.: A novel initialization method for semi-supervised clustering. In: Bi, Y., Williams, M.A. (eds.) KSEM 2010. LNCS, vol. 6291, pp. 317–328. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15280-1_30
Chapter Google Scholar
Naldi, M.C., Campello, R.J.G.B., Hruschka, E.R., Carvalho, A.C.P.L.F.: Efficiency issues of evolutionary K-means. Appl. Soft Comput. 11, 1938–1952 (2011)
Google Scholar
Reddy, D., Mishra, D., Jana, P.K.: MST-based cluster initialization for K-means. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds.) CCSIT 2011. CCIS, vol. 131, pp. 329–338. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17857-3_33
Chapter Google Scholar
Bai, L., Liang, J., Dang, C., Cao, F.: A cluster centers initialization method for clustering categorical data. Expert Syst. Appl. 39(9), 8022–8029 (2012). ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2012.01.131
Article Google Scholar
Chen, G.H.: Cluster center initialization using hierarchical two-division of a data set along each dimension. In: Jin, D., Lin, S. (eds.) Advances in Computer Science and Information Engineering. Advances in Intelligent and Soft Computing, vol. 168, pp. 235–241. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30126-1_38
Google Scholar
Aldahdooh, R.T., Ashour, W.: DIMK-means distance-based initialization methods for K-means clustering algorithms. Int. J. Intell. Syst. Appl. 2, 41–51 (2013)
Article Google Scholar
Goyal, M., Kumar, S.: Improving the initial centroids of K-means clustering algorithm to generalize its applicability. J. Inst. Eng. (India): Ser. B 95(4), 345–350 (2014). https://doi.org/10.1007/s40031-014-0106-z
Article Google Scholar
Duwairi, R., Abu-Rahmeh, M.: A novel approach for initializing the spherical K-means clustering algorithm. Simul. Model. Practice Theory 54, 49–63 (2015). ISSN 1569-190X, https://doi.org/10.1016/j.simpat.2015.03.007
Article Google Scholar
Poomagal, S., Saranya, P., Karthik, S.: A novel method for selecting initial centroids in K-means clustering algorithm. Int. J. Intell. Syst. Technol. Appl. 15(3) (2016). https://doi.org/10.1504/IJISTA.2016.078347
Article Google Scholar
Dhanabal, S., Chandramathi, S.: Enhancing clustering accuracy by finding initial centroid using k-minimum-average-maximum method. Int. J. Inf. Commun. Technol. 11(2) (2017). https://doi.org/10.1504/IJICT.2017.086252
Article Google Scholar
Golasowski, M., Martinovič, J., Slaninová, K.: Comparison of K-means clustering initialization approaches with brute-force initialization. In: Chaki, R., Saeed, K., Cortesi, A., Chaki, N. (eds.) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, vol. 567, pp. 103–114. Springer, Heidelberg (2017). https://doi.org/10.1007/978-981-10-3409-1_7
Google Scholar
Kumar, K.M., Reddy, A.R.M.: An efficient k-means clustering filtering algorithm using density based initial cluster centers. Inf. Sci. 418–419, 286–301 (2017). ISSN 0020-0255, https://doi.org/10.1016/j.ins.2017.07.036
Article MathSciNet Google Scholar
Ismkhan, H.: I-k-means −+: an iterative clustering algorithm based on an enhanced version of the K-means. Pattern Recogn. 79, 402–413 (2018). ISSN 0031-3203, https://doi.org/10.1016/j.patcog.2018.02.015
Article Google Scholar
Nguyen, C.D., Duc, T., Duong, T.H.: K-means** – a fast and efficient K-means algorithms. Int. J. Intell. Inf. Database Syst. 11(1) (2018). https://doi.org/10.1504/ijiids.2018.091595
Article Google Scholar
Sandhya, N., Raja Sekar, M.: Analysis of variant approaches for initial centroid selection in K-means clustering algorithm. In: Satapathy, S., Bhateja, V., Das, S. (eds.) Smart Computing and Informatics. Smart Innovation, Systems and Technologies, vol. 78, pp. 109–121. Springer, Heidelberg (2018). https://doi.org/10.1007/978-981-10-5547-8_11
Google Scholar
Yu, S., Chu, S., Wang, C., Chan, Y., Chang, T.: Two improved K-means algorithms. Appl. Soft Comput. 68, 747–755 (2018). ISSN 1568-4946, https://doi.org/10.1016/j.asoc.2017.08.032
Article Google Scholar
Kurada, R.R., Kanadam, K.P.: A novel evolutionary automatic clustering technique by unifying initial seed selection algorithms into teaching–learning-based optimization. In: Soft Computing and Medical Bioinformatics. Springer Briefs in Applied Sciences and Technology. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0059-2_1
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On clustering validation techniques. J. Intell. Inf. Syst. 17(2/3), 107–145 (2001)
Google Scholar
Theodoridis, S., Koutroubas, K.: Pattern Recognition, 2nd edn. Academic Press, Cambridge (2003)
Chapter Google Scholar
Gupta, M.K., Chandra, P.: MP-K-Means: modified partition based cluster initialization method for K-means algorithm. Int. J. Recent Technol. Eng. 8(4), 1140–1148 (2019). https://doi.org/10.35940/ijrte.D6837.118419

Download references

Author information

Authors and Affiliations

USIC&T, Guru Gobind Singh Indraprastha University, Dwarka, India
Manoj Kr. Gupta & Pravin Chandra

Authors

Manoj Kr. Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Pravin Chandra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Manoj Kr. Gupta .

Editor information

Editors and Affiliations

GD Goenka University, Gurugram, India
Usha Batra
GD Goenka University, Sohna, Haryana, India
Nihar Ranjan Roy
University of Arkansas, Fayetteville, AR, USA
Brajendra Panda

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, M.K., Chandra, P. (2020). An Efficient Approach for Selection of Initial Cluster Centroids for k-means. In: Batra, U., Roy, N., Panda, B. (eds) Data Science and Analytics. REDSET 2019. Communications in Computer and Information Science, vol 1229. Springer, Singapore. https://doi.org/10.1007/978-981-15-5827-6_1

Download citation

DOI: https://doi.org/10.1007/978-981-15-5827-6_1
Published: 28 May 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-5826-9
Online ISBN: 978-981-15-5827-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics