A Genetic Programming Approach to Data Clustering
This paper presents a genetic programming (GP) to data clustering. The aim is to accurately classify a set of input data into their genuine clusters. The idea lies in discovering a mathematical function on clustering regularities and then utilize the rule to make a correct decision on the entities of each cluster. To this end, GP is incorporated into the clustering procedures. Each individual is represented by a parsing tree on the program set. Fitness function evaluates the quality of clustering with regard to similarity criteria. Crossover exchanges sub-trees between parental candidates in a positionally independent fashion. Mutation introduces (in part) a new sub-tree with a low probability. The variation operators (i.e., crossover, mutation) offer an effective search capability to obtain the improved quality of solution and the enhanced speed of convergence. Experimental results demonstrate that the proposed approach outperforms a well-known reference.
Unable to display preview. Download preview PDF.
- 1.Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3) (September 1999)Google Scholar
- 4.Koza, J.R.: Genetic Programming On the programming of Computers by Means of Natural Selection. The MIT Press (1992)Google Scholar
- 5.Langdon, W.B.: Genetic Programming + Data Structures = Automatic Programming. The Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers (1998)Google Scholar
- 6.Mitchell, T.M.: Machine Learning. Computer Science Series. McGRAW-HILL International Editions (1997)Google Scholar