Using Genetic Algorithm for Selection of Initial Cluster Centers for the K-Means Method

  • Wojciech Kwedlo
  • Piotr Iwanowicz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6114)


The K-means algorithm is one of the most widely used clustering methods. However, solutions obtained by it are strongly dependent on initialization of cluster centers. In the paper a novel genetic algorithm, called GAKMI (Genetic Algorithm for the K-Means Initialization), for the selection of initial cluster centers is proposed. Contrary to most of the approaches described in the literature, which encode coordinates of cluster centers directly in a chromosome, our method uses binary encoding. In this encoding bits set to one select elements of the learning set as initial cluster centers. Since in our approach not every binary chromosome encodes a feasible solution, we propose two repair algorithms to convert infeasible chromosomes into feasible ones. GAKMI was tested on three datasets, using varying number of clusters. The experimental results are encouraging.


Genetic Algorithm Feature Vector Cluster Center Binary String Initial Center 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, New York (1973)zbMATHGoogle Scholar
  2. 2.
    Asuncion, A., Newman, D.J.: UCI machine learning repository (2007),
  3. 3.
    Babu, G.P., Murty, M.N.: A near-optimal initial seed value selection in k-means algorithm using a genetic algorithm. Pattern Recognition Letters 14(10), 763–769 (1993)zbMATHCrossRefGoogle Scholar
  4. 4.
    Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recognition 39(5), 761–775 (2006)zbMATHCrossRefGoogle Scholar
  5. 5.
    Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley, Reading (1989)zbMATHGoogle Scholar
  6. 6.
    Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., de Carvalho, A.C.P.L.F.: A survey of evolutionary algorithms for clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 39(2) (2009)Google Scholar
  7. 7.
    McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)Google Scholar
  8. 8.
    Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs. Springer, Heidelberg (1996)zbMATHGoogle Scholar
  9. 9.
    Milligan, G.W.: An examination of the effect of six types of error perturbation on fifteen clustering algorithms. Psychometrika 45(3), 325–342 (1980)CrossRefGoogle Scholar
  10. 10.
    Reinelt, G.: A traveling salesman problem library. ORSA Journal on Computing 3(4), 376–384 (1991)zbMATHGoogle Scholar
  11. 11.
    Ward, J.H.: Hierarchical grouping to optimize an objective function. Journal of American Statistical Associacion 58(301), 236–244 (1963)CrossRefGoogle Scholar
  12. 12.
    Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Wojciech Kwedlo
    • 1
  • Piotr Iwanowicz
    • 1
  1. 1.Faculty of Computer ScienceBiałystok University of TechnologyBiałystokPoland

Personalised recommendations