Advertisement

Favoring the k-Means Algorithm with Initialization Methods

  • Anderson Francisco de Oliveira
  • Maria do Carmo NicolettiEmail author
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 940)

Abstract

Clustering algorithms are non-supervised algorithms and, among the many available, the k-Means can be considered one of the most popular and successful. The performance of the k-Means, however, is highly dependent on a ‘good’ initialization of the k group centers (centroids) as well as of the value assigned to the number (k) of groups the final clustering should have. This chapter addresses experiments using five initialization algorithms available in the literature namely, the Method1, the k-Means++, the CCIA, the Maedeh&Suresh and the SPSS algorithms, to empirically evaluate their contribution to improving k-Means performance.

Keywords

Unsupervised learning k-Means Initialization algorithms 

Notes

Acknowledgments

The authors thank UNIFACCAMP and CNPq for their support. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior − Brasil (CAPES) − Finance Code 001.

References

  1. 1.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations, In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297. University of California Press (1987)Google Scholar
  2. 2.
    Al-Daoud, M., Roberts, S.A.: New methods for the initialisation of clusters. Pattern Recogn. Lett. 17, 451–455 (1996)CrossRefGoogle Scholar
  3. 3.
    Arthur, D., Vassilvitskii, S.: K-Means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 1027–1035. Society for Industrial and Applied Mathematics, USA (2007)Google Scholar
  4. 4.
    Maedeh, A., Suresh, K.: Design of efficient k-Means clustering algorithm with improved initial centroids. Int. J. Eng. Technol. 5(1), 33–38 (2013)Google Scholar
  5. 5.
    Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for k-Means clustering. Pattern Recogn. Lett. 25, 1293–1302 (2004)CrossRefGoogle Scholar
  6. 6.
    Pavan, K.K., Rao, A.A., Rao, A.V.D., Sridhar, G.R.: Robust seed selection algorithm for k-means type algorithms. Int. J. Comput. Sci. Inform. Technol. (IJCSIT) 3(5), 147–163 (2011)CrossRefGoogle Scholar
  7. 7.
    Aggarwal, C.C., Reddy, C.K.: Data Clustering Algorithms and Applications. Chapman & Hall/CRC Data Mining and Knowledge Discovery Series. CRC Press, Boca Raton (2013)CrossRefGoogle Scholar
  8. 8.
    Han, J., Kamber, M., Pei, J.: Data Mining – Concepts and Techniques, 3rd edn. Morgan Kaufmann Publishers, Amsterdam (2012)zbMATHGoogle Scholar
  9. 9.
    Burks, S., Harrell, G., Wang, J.: On initial effects of the k-Means clustering, In: Proceedings of the 2015 World Congress in Computer Science, Computer Engineering, & Applied Computing, USA, pp. 200–205 (2015)Google Scholar
  10. 10.
    Dua, D., Karra Taniskidou, E.: UCI Machine Learning Repository (http://archive.ics.edu/ml). University of California, School of Information and Computer Science, Irvine, CA (2017)
  11. 11.
    Chernoff, H.: The use of faces to represent points in n-dimensional space graphically, Technical report no. 71, Department of Statistics. Stanford University, Stanford, CA, USA (1971)Google Scholar
  12. 12.
    Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J., Ostrowski, E.: Handbook of Small Data Sets, 1st edn. Chapman and Hall/CRC, London (1993)Google Scholar
  13. 13.
    Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discovery Data 1(1) (2007).  https://doi.org/10.1145/1217299.1217303, http://doi.acm.org/10.1145/1217299.1217303, Article 4, 30 pagesCrossRefGoogle Scholar
  14. 14.
    Su, M.C., Chou, C.H., Hsieh, C.C.: Fuzzy C-Means algorithm with a point symmetry distance. Int. J. Fuzzy Syst. 7(4), 175–181 (2005)Google Scholar
  15. 15.
    Rousseeuw, P.: Silhouettes: a graphical-aid to the interpretation and validation of cluster analysis. Comput. Appl. Math. 20, 53–65 (1987)CrossRefGoogle Scholar
  16. 16.
    Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66(336), 846–850 (1971)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Anderson Francisco de Oliveira
    • 1
  • Maria do Carmo Nicoletti
    • 1
    • 2
    Email author
  1. 1.Centro Universitário C. Limpo Paulista (UNIFACCAMP)Campo Limpo PaulistaBrazil
  2. 2.Universidade Federal de S. Carlos (UFSCar)São CarlosBrazil

Personalised recommendations