Abstract
Data mining tools and techniques allow an organization to make creative decisions and subsequently do proper planning. Clustering is used to determine the objects that are similar in characteristics and group them together. K-means clustering method chooses random cluster centres (initial centroid), one for each centroid, and this is the major weakness of K-means. The performance and quality of K-means strongly depends on the initial guess of centres (centroid). By augmenting K-means with a technique of selecting centroids, several modifications have been suggested in research on clustering. The first two main authors of this paper have also developed three algorithms that unlike K-means do not perform random generation of the initial centres and actually produce same set of initial centroids for the same input data. These developed algorithms are sum of distance clustering (SODC), distance-based clustering algorithm (DBCA) and farthest distributed centroid clustering (FDCC). We present a brief survey of the algorithms available in the research on modification of initial centroids for K-means clustering algorithm and further describe the developed algorithm farthest distributed centroid clustering in this paper. The experimental results carried out show that farthest distributed centroid clustering algorithm produces better quality clusters than the partitional clustering algorithm, agglomerative hierarchical clustering algorithm and the hierarchical partitioning clustering algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Han, J., Kamber, H.: Data Mining Concepts and Techniques. Morgan Kaufmann Publishers, Burlington (2002)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, pp. 281–297 (1967)
Dunham, M.: Data Mining: Introductory and Advanced Concepts. Pearson Education, London (2006)
Llyod, S.: Least Squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)
Khan, S.S., Ahmed, A.: Cluster center initialization algorithm for k-means algorithm. Pattern Recogn. Lett. 25(11), 1293–1302 (2004)
Deelers, S., Auwatanamongkol, S.: Engineering k-means algorithm with initial cluster centers derived from data partitioning along the data axis with the highest variance. In: Proceedings of World Academy of Science, Engineering and Technology, vol. 26, pp. 323–328 (2007)
Bradley, P.S., Fayyad, U.M.: Refining initial points for K-Means clustering. In: Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, pp. 91–99 (1998)
Likas, A., Vlassis, N., Verbeek, J.J.: The global k-means clustering algorithm. Pattern Recogn. 36, 451–461 (2003)
Yuan, F., Meng, Z.H., Zhang, H.X., Dong, C.R.: A new algorithm to get the initial centroids. In: Proceedings of the 3rd International Conference on Machine Learning and Cybernetics, Shanghai, pp. 26–29 (2004)
Barakbah, A.R., Helen, A.: Optimized K-means: an algorithm of initial centroids optimization for K-means. In: Proceedings of Soft Computing, Intelligent Systems and Information Technology (SIIT), pp. 63–66 (2005)
Fahim, A.M., Salem, A.M., Torkey, F.A., Ramadan, M.A.: An efficient enhanced k-means clustering algorithm. J. Zhejiang Univ. Sci. 7(10), 1626–1633 (2006)
Barakbah, A.R., Arai, K.: Hierarchical K-means: an algorithm for centroids initialization for K-means. Rep. Fac. Sci. Eng. 36(1) (2007) (Saga University, Japan)
Barakbah, A.R, Kiyoki, Y.: A pillar algorithm for K-means optimization by distance maximization for initial centroid designation. In: IEEE Symposium on Computational Intelligence and Data Mining (IDM), Nashville-Tennessee, pp. 61–68 (2009)
Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, Society for Industrial and Applied Mathematics, pp. 1027–1035 (2007)
Ahmed, A.H., Ashour, W.: An initialization method for the K-means algorithm using RNN and coupling degree. Int. J. Comput. Appl. (0975–8887) 25(1) (2011)
Huang, L., Du, S., Zhang, Y., Ju, Y., Li, Z.: K-means initial clustering center optimal algorithm based on Kruskal. J. Inf. Comput. Sci. 9(9), 2387–2392 (2012)
Kruskal, J.: On the shortest spanning subtree and the travelling salesman problem. Proc. Am. Math. Soc, 48–50 (1956)
Fahim, A.M., Salem, A.M., Torkey, F.A., Ramadan, M.A, Saake, G.: An efficient K-means with good initial starting points. Georgian Electron. Sci. J. Comput. Sci. Telecommun. 19(2) (2009)
Reddy, D., Jana, P.K.: Initialization for K-mean clustering Voronoi diagram. In: International Conference on C3IT-2012, Hooghly, Procedia Technology (Elsevier) vol. 4, pp. 395–400, Feb 2012
Preparata, F.P., Shamos, M.I.: Computational Geometry—An Introduction. Springer, Berlin, Heidelberg, Tokyo (1985)
Naik, A., Satapathy, S.C., Parvathi, K.: Improvement of initial cluster center of c-means using teaching learning based optimization. Procedia Technol. 6, 428–435 (2012)
Yang, S.Z., Luo, S.W.: A novel algorithm for initializing clustering centers. In: Proceedings of International Conference on IEEE Machine Learning and Cybernetics, China, vol. 9, pp. 5579–5583 (2005)
Ye, Y., Huang, J., Chen, X., Zhou, S., Williams, G., Xu, X.: Neighborhood density method for selecting initial cluster centers in K-means clustering. In: Advances in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science, vol. 3918, pp. 189–198 (2006)
Zhou, S., Zhao, J.: A neighborhood-based clustering algorithm. In: PAKD 2005. LNAI 3518, pp. 361–371 (1982)
Mahmud, M.S., Rahman, M., Akhtar, N.: Improvement of K-means clustering algorithm with better initial centroids based on weighted average. In: 7th International Conference on Electrical and Computer Engineering, Dhaka, Bangladesh, Dec 2012
Kwedlo, W., Iwanowicz, P.: Using genetic algorithm for selection of initial cluster centers for the K-means method. In: International Conference on Artificial Intelligence and Soft Computing. Springer Notes on Artificial Intelligence, pp. 165–172 (2010)
Arora, N., Motwani, M.: Sum of distance based algorithm for clustering web data. Int. J. Comput. Appl. 87(7), 26–30 (2014)
Arora, N., Motwani, M.: A distance based clustering algorithm. Int. J. Comput. Eng. Technol. 5(5), 109–119 (2014)
Arora, N., Motwani, M.: Optimizing K-Means by fixing initial cluster centers. Int. J. Curr. Eng. Technol. 4(3), 2101–2107 (2014)
MathWorks MatLab: The Language of Technical Computing (2009)
Karypis, G.: CLUTO: A Clustering Toolkit. Release 2.1.1, Tech. Rep. No. 02-017. University of Minnesota, Department of Computer Science, Minneapolis, MN 55455 (2003)
Kowalski, G.: Information Retrieval Systems—Theory and Implementation. Kluwer Academic Publishers (1997)
Veenman, C.J., Reinders, M.J.T., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002)
Duygulu, P., et al.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European Conference on Computer Vision, pp. 97–112 (2002)
Wang, J.Z., Li, J., Wiederhold, G.: SIMPLIcity: semantics-sensitive integrated matching for picture libraries. IEEE Trans. Pattern Anal. Mach. Intell. 23(9), 947–963 (2001)
Forina, M., Aeberhard, S.: UCI Machine Learning Repository (1991)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Motwani, M., Arora, N., Gupta, A. (2019). A Study on Initial Centroids Selection for Partitional Clustering Algorithms. In: Hoda, M., Chauhan, N., Quadri, S., Srivastava, P. (eds) Software Engineering. Advances in Intelligent Systems and Computing, vol 731. Springer, Singapore. https://doi.org/10.1007/978-981-10-8848-3_21
Download citation
DOI: https://doi.org/10.1007/978-981-10-8848-3_21
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8847-6
Online ISBN: 978-981-10-8848-3
eBook Packages: EngineeringEngineering (R0)