Favoring the k-Means Algorithm with Initialization Methods
Clustering algorithms are non-supervised algorithms and, among the many available, the k-Means can be considered one of the most popular and successful. The performance of the k-Means, however, is highly dependent on a ‘good’ initialization of the k group centers (centroids) as well as of the value assigned to the number (k) of groups the final clustering should have. This chapter addresses experiments using five initialization algorithms available in the literature namely, the Method1, the k-Means++, the CCIA, the Maedeh&Suresh and the SPSS algorithms, to empirically evaluate their contribution to improving k-Means performance.
KeywordsUnsupervised learning k-Means Initialization algorithms
The authors thank UNIFACCAMP and CNPq for their support. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior − Brasil (CAPES) − Finance Code 001.
- 1.MacQueen, J.B.: Some methods for classification and analysis of multivariate observations, In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press (1987)Google Scholar
- 3.Arthur, D., Vassilvitskii, S.: K-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007)Google Scholar
- 4.Maedeh, A., Suresh, K.: Design of efficient k-Means clustering algorithm with improved initial centroids. Int. J. Eng. Technol. 5(1), 33–38 (2013)Google Scholar
- 9.Burks, S., Harrell, G., Wang, J.: On initial effects of the k-Means clustering, In: Proceedings of the 2015 World Congress in Computer Science, Computer Engineering, & Applied Computing, USA, pp. 200–205 (2015)Google Scholar
- 10.Dua, D., Karra Taniskidou, E.: UCI Machine Learning Repository (http://archive.ics.edu/ml). University of California, School of Information and Computer Science, Irvine, CA (2017)
- 11.Chernoff, H.: The use of faces to represent points in n-dimensional space graphically, Technical report no. 71, Department of Statistics. Stanford University, Stanford, CA, USA (1971)Google Scholar
- 12.Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J., Ostrowski, E.: Handbook of Small Data Sets, 1st edn. Chapman and Hall/CRC, London (1993)Google Scholar
- 14.Su, M.C., Chou, C.H., Hsieh, C.C.: Fuzzy C-Means algorithm with a point symmetry distance. Int. J. Fuzzy Syst. 7(4), 175–181 (2005)Google Scholar