Clustering is an important abstraction process and it plays a vital role in both pattern recognition and data mining. Partitional algorithms are frequently used for clustering large data sets. K-means algorithm is the most popular partitional clustering algorithm; its fuzzy, rough, probabilistic and neural network are also popular. However, a major problem with the K-means algorithm and its variants is that they may not reach the globally optimal solution of the associated clustering problem. Several stochastic search techniques have been suggested in the past to address this problem. Genetic algorithms (GAs) are attractive to solve the partitional clustering problem. However, conventional GA based solutions may not scale well. A recent proposal in the literature is to use a Quad-tree based algorithm for scaling up the clustering algorithm. Unfortunately this solution does not scale up to handle large dimensional data sets. In this chapter, we explain the GA based clustering approaches and propose an efficient scheme for clustering high-dimensional largescale data sets using GAs based on the well-known CF-Tree data structure. We also discuss the notion of multi-objective clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Mitra S, Acharya T (2003) Data mining: Multimedia, soft computing, and bioinformatics, New York: John Wiley
Watanabe S (1969) Knowing and guessing, John Wiley & Sons, Inc, New York
Pal S K (1996) Genetic algorithms for pattern recognition, CRC Press
Freitas A A (2002) Data mining and knowledge discovery with evolutionary algorithms, Springer-Verlag, Berlin
Mitchell M (1998) An introduction to genetic algorithms, Prentice-Hall of India
Vose M D (2004) The simple genetic algorithm, Prentice-Hall of India
Spath H (1980) Cluster analysis - algorithms for data reduction and classification of objects, Ellis Horwood, West Sussex, UK
Michalewicz Z (1992) Genetic algorithms + data structures = evolution programs, Springer, NY
Koza J R (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA
Samet H (1990) The design and analysis of spatial data structures. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA
Goldberg D E (1989) Genetic algorithms in search, optimization, and machine learning, Addison-Wesley, Reading
Jain A K, Murty M N, Flynn P J (1999) Data clustering: A review. ACM Computing Surveys 31:264-323
Yu H, Yang J, Han J, Li X (2005) Making SVMs scalable to large data sets using hierarchical cluster indexing. Data Mining and Knowledge Discovery 11(3):295–321
Fogel D B (1994) An introduction to simulated evolutionary optimization. IEEE Trans. Neural Networks 5(1):3-14
Rudolph G (1994) Convergence analysis of canonical genetic algorithms. IEEE Trans. Neural Networks 5(1):96-101
Raghavan V V, Birchard K (1979) A clustering strategy based on a formalism of the reproductive process in natural systems. SIGIR Forum, 14:10-22
Jones D R, Beltramo M A (1990) Clustering with genetic algorithms. GMR-7156, General Motors Research Report
Bhuyan J N, Raghavan V V, Venkatesh K E (1991) Genetic algorithm for clustering with an ordered representation. In proceedings of the Fourth ICGA, 408-415
Mishra S K, Raghavan V V (1994) An empirical study of the performance of heuristic methods for clustering, in Pattern Recognition in Practice IV, (eds) E. S. Gelsema and L. N. Kanal, Elsevier Science, 425-436
Fogel D B, Simpson P K (1993) Experiments with evolving fuzzy clusters. Proceedings of the Second Annual Conference on Evolutionary Programming
Babu G P, Murty M N (1994) Clustering with evolutionary strategies. Pattern Recognition 27 (2):321-329
Laszlo M, Mukherjee S (2006) A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering. IEEE Trans. Pattern Analysis Machine Intelligence 28(4):533-543
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In SIGMOD-96: Proceedings of the 1996 ACM SIGMOD international conference on Management of data, 103-114, New York, NY, USA, ACM Press
Babu G P, Murty M N (1993) A near-optimal initial seed selection in k-means algorithm using a genetic algorithm. Pattern Recognition Letters 14:763-769
Babu T R, Murty M N (2001) Comparison of genetic algorithm based prototype selection schemes. Pattern Recognition 34:523-525
Kaufman L, Rousseeuw P J (1989) Finding groups in data - An introuction to cluster analysis, Wiley, NY
Krishna K, Murty M N (1999) Genetic k-means algorithm. IEEE Trans. SMC-Part B 29:433-439
Murthy C A, Chowdhury N (1996) In search of optimal clusters using genetic algorithms. Pattern Recognition Letters 17:825-832
Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognition 33:1455-1465
Handl J, Knowles J (2004) Evolutionary multiobjective clustering. In Proceedings of PPSN VIII, LNCS 3242:1081-1091
Coello Coello A C (2000) An updated survey of GA-based multiobjective optimization techniques. ACM Computing Surveys 32:109-143
Hiroyasu T, Miki M, Watanabe S (2000) The new model of parallel genetic algorithm in multiobjective optimization problems- divide range multiobjective genetic algorithm. In Proceedings of the 2000 Congress on Evolutionary Computation, 333-340
Molyneaux A K, Leyland G B, Favrat D (2000) A new clustering evolutionary multiobjective optimization technique, citeseer.ist.psu.edu, 446943.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Murty, M.N., Rashmin, B., Bhattacharyya, C. (2008). Clustering Based on Genetic Algorithms. In: Ghosh, A., Dehuri, S., Ghosh, S. (eds) Multi-Objective Evolutionary Algorithms for Knowledge Discovery from Databases. Studies in Computational Intelligence, vol 98. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77467-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-77467-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77466-2
Online ISBN: 978-3-540-77467-9
eBook Packages: EngineeringEngineering (R0)