A Study on Initial Centroids Selection for Partitional Clustering Algorithms

Motwani, Mahesh; Arora, Neeti; Gupta, Amit

doi:10.1007/978-981-10-8848-3_21

Mahesh Motwani¹⁸,
Neeti Arora¹⁹ &
Amit Gupta¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 731))

2390 Accesses
7 Citations

Abstract

Data mining tools and techniques allow an organization to make creative decisions and subsequently do proper planning. Clustering is used to determine the objects that are similar in characteristics and group them together. K-means clustering method chooses random cluster centres (initial centroid), one for each centroid, and this is the major weakness of K-means. The performance and quality of K-means strongly depends on the initial guess of centres (centroid). By augmenting K-means with a technique of selecting centroids, several modifications have been suggested in research on clustering. The first two main authors of this paper have also developed three algorithms that unlike K-means do not perform random generation of the initial centres and actually produce same set of initial centroids for the same input data. These developed algorithms are sum of distance clustering (SODC), distance-based clustering algorithm (DBCA) and farthest distributed centroid clustering (FDCC). We present a brief survey of the algorithms available in the research on modification of initial centroids for K-means clustering algorithm and further describe the developed algorithm farthest distributed centroid clustering in this paper. The experimental results carried out show that farthest distributed centroid clustering algorithm produces better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the hierarchical partitioning clustering algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Han, J., Kamber, H.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, Burlington (2002)
MATH Google Scholar
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, pp. 281–297 (1967)
Google Scholar
Dunham, M.: Data Mining: Introductory and Advanced Concepts. Pearson Education, London (2006)
Google Scholar
Llyod, S.: Least Squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Article MathSciNet Google Scholar
Khan, S.S., Ahmed, A.: Cluster center initialization algorithm for k-means algorithm. Pattern Recogn. Lett. 25(11), 1293–1302 (2004)
Article Google Scholar
Deelers, S., Auwatanamongkol, S.: Engineering k-means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance. In: Proceedings of World Academy of Science, Engineering and Technology, vol. 26, pp. 323–328 (2007)
Google Scholar
Bradley, P.S., Fayyad, U.M.: Refining initial points for K-Means clustering. In: Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, pp. 91–99 (1998)
Google Scholar
Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recogn. 36, 451–461 (2003)
Article Google Scholar
Yuan, F., Meng, Z.H., Zhang, H.X., Dong, C.R.: A new algorithm to get the initial centroids. In: Proceedings of the 3rd International Conference on Machine Learning and Cybernetics, Shanghai, pp. 26–29 (2004)
Google Scholar
Barakbah, A.R., Helen, A.: Optimized K-means: an algorithm of initial centroids optimization for K-means. In: Proceedings of Soft Computing, Intelligent Systems and Information Technology (SIIT), pp. 63–66 (2005)
Google Scholar
Fahim, A.M., Salem, A.M., Torkey, F.A., Ramadan, M.A.: An efficient enhanced k-means clustering algorithm. J. Zhejiang Univ. Sci. 7(10), 1626–1633 (2006)
Article Google Scholar
Barakbah, A.R., Arai, K.: Hierarchical K-means: an algorithm for centroids initialization for K-means. Rep. Fac. Sci. Eng. 36(1) (2007) (Saga University, Japan)
Google Scholar
Barakbah, A.R, Kiyoki, Y.: A pillar algorithm for K-means optimization by distance maximization for initial centroid designation. In: IEEE Symposium on Computational Intelligence and Data Mining (IDM), Nashville-Tennessee, pp. 61–68 (2009)
Google Scholar
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, pp. 1027–1035 (2007)
Google Scholar
Ahmed, A.H., Ashour, W.: An initialization method for the K-means algorithm using RNN and coupling degree. Int. J. Comput. Appl. (0975–8887) 25(1) (2011)
Google Scholar
Huang, L., Du, S., Zhang, Y., Ju, Y., Li, Z.: K-means initial clustering center optimal algorithm based on Kruskal. J. Inf. Comput. Sci. 9(9), 2387–2392 (2012)
Google Scholar
Kruskal, J.: On the shortest spanning subtree and the travelling salesman problem. Proc. Am. Math. Soc, 48–50 (1956)
Google Scholar
Fahim, A.M., Salem, A.M., Torkey, F.A., Ramadan, M.A, Saake, G.: An efficient K-means with good initial starting points. Georgian Electron. Sci. J. Comput. Sci. Telecommun. 19(2) (2009)
Google Scholar
Reddy, D., Jana, P.K.: Initialization for K-mean clustering Voronoi diagram. In: International Conference on C3IT-2012, Hooghly, Procedia Technology (Elsevier) vol. 4, pp. 395–400, Feb 2012
Google Scholar
Preparata, F.P., Shamos, M.I.: Computational Geometry—An Introduction. Springer, Berlin, Heidelberg, Tokyo (1985)
Book Google Scholar
Naik, A., Satapathy, S.C., Parvathi, K.: Improvement of initial cluster center of c-means using teaching learning based optimization. Procedia Technol. 6, 428–435 (2012)
Article Google Scholar
Yang, S.Z., Luo, S.W.: A novel algorithm for initializing clustering centers. In: Proceedings of International Conference on IEEE Machine Learning and Cybernetics, China, vol. 9, pp. 5579–5583 (2005)
Google Scholar
Ye, Y., Huang, J., Chen, X., Zhou, S., Williams, G., Xu, X.: Neighborhood density method for selecting initial cluster centers in K-means clustering. In: Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, vol. 3918, pp. 189–198 (2006)
Chapter Google Scholar
Zhou, S., Zhao, J.: A neighborhood-based clustering algorithm. In: PAKD 2005. LNAI 3518, pp. 361–371 (1982)
Chapter Google Scholar
Mahmud, M.S., Rahman, M., Akhtar, N.: Improvement of K-means clustering algorithm with better initial centroids based on weighted average. In: 7th International Conference on Electrical and Computer Engineering, Dhaka, Bangladesh, Dec 2012
Google Scholar
Kwedlo, W., Iwanowicz, P.: Using genetic algorithm for selection of initial cluster centers for the K-means method. In: International Conference on Artificial Intelligence and Soft Computing. Springer Notes on Artificial Intelligence, pp. 165–172 (2010)
Chapter Google Scholar
Arora, N., Motwani, M.: Sum of distance based algorithm for clustering web data. Int. J. Comput. Appl. 87(7), 26–30 (2014)
Google Scholar
Arora, N., Motwani, M.: A distance based clustering algorithm. Int. J. Comput. Eng. Technol. 5(5), 109–119 (2014)
Google Scholar
Arora, N., Motwani, M.: Optimizing K-Means by fixing initial cluster centers. Int. J. Curr. Eng. Technol. 4(3), 2101–2107 (2014)
Article Google Scholar
MathWorks MatLab: The Language of Technical Computing (2009)
Google Scholar
Karypis, G.: CLUTO: A Clustering Toolkit. Release 2.1.1, Tech. Rep. No. 02-017. University of Minnesota, Department of Computer Science, Minneapolis, MN 55455 (2003)
Google Scholar
Kowalski, G.: Information Retrieval Systems—Theory and Implementation. Kluwer Academic Publishers (1997)
Google Scholar
Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)
Article Google Scholar
Duygulu, P., et al.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European Conference on Computer Vision, pp. 97–112 (2002)
Chapter Google Scholar
Wang, J.Z., Li, J., Wiederhold, G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE Trans. Pattern Anal. Mach. Intell. 23(9), 947–963 (2001)
Article Google Scholar
Forina, M., Aeberhard, S.: UCI Machine Learning Repository (1991)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, RGPV, Bhopal, India
Mahesh Motwani
RGPV, Bhopal, India
Neeti Arora & Amit Gupta

Authors

Mahesh Motwani
View author publications
You can also search for this author in PubMed Google Scholar
Neeti Arora
View author publications
You can also search for this author in PubMed Google Scholar
Amit Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahesh Motwani .

Editor information

Editors and Affiliations

Bharati Vidyapeeth’s Institute of Computer Applications and Management (BVICAM), New Delhi, Delhi, India
M. N. Hoda
Department of Computer Engineering, YMCAUST, Faridabad, Haryana, India
Naresh Chauhan
Department of Computer Science, University of Kashmir, Srinagar, Jammu and Kashmir, India
S. M. K. Quadri
Department of Information Technology and Systems, Indian Institute of Management Rohtak, Rohtak, Haryana, India
Praveen Ranjan Srivastava

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Motwani, M., Arora, N., Gupta, A. (2019). A Study on Initial Centroids Selection for Partitional Clustering Algorithms. In: Hoda, M., Chauhan, N., Quadri, S., Srivastava, P. (eds) Software Engineering. Advances in Intelligent Systems and Computing, vol 731. Springer, Singapore. https://doi.org/10.1007/978-981-10-8848-3_21

Download citation

DOI: https://doi.org/10.1007/978-981-10-8848-3_21
Published: 13 June 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8847-6
Online ISBN: 978-981-10-8848-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics