Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 98))

Clustering is an important abstraction process and it plays a vital role in both pattern recognition and data mining. Partitional algorithms are frequently used for clustering large data sets. K-means algorithm is the most popular partitional clustering algorithm; its fuzzy, rough, probabilistic and neural network are also popular. However, a major problem with the K-means algorithm and its variants is that they may not reach the globally optimal solution of the associated clustering problem. Several stochastic search techniques have been suggested in the past to address this problem. Genetic algorithms (GAs) are attractive to solve the partitional clustering problem. However, conventional GA based solutions may not scale well. A recent proposal in the literature is to use a Quad-tree based algorithm for scaling up the clustering algorithm. Unfortunately this solution does not scale up to handle large dimensional data sets. In this chapter, we explain the GA based clustering approaches and propose an efficient scheme for clustering high-dimensional largescale data sets using GAs based on the well-known CF-Tree data structure. We also discuss the notion of multi-objective clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Mitra S, Acharya T (2003) Data mining: Multimedia, soft computing, and bioinformatics, New York: John Wiley

    Google Scholar 

  2. Watanabe S (1969) Knowing and guessing, John Wiley & Sons, Inc, New York

    MATH  Google Scholar 

  3. Pal S K (1996) Genetic algorithms for pattern recognition, CRC Press

    Google Scholar 

  4. Freitas A A (2002) Data mining and knowledge discovery with evolutionary algorithms, Springer-Verlag, Berlin

    MATH  Google Scholar 

  5. Mitchell M (1998) An introduction to genetic algorithms, Prentice-Hall of India

    Google Scholar 

  6. Vose M D (2004) The simple genetic algorithm, Prentice-Hall of India

    Google Scholar 

  7. Spath H (1980) Cluster analysis - algorithms for data reduction and classification of objects, Ellis Horwood, West Sussex, UK

    Google Scholar 

  8. Michalewicz Z (1992) Genetic algorithms + data structures = evolution programs, Springer, NY

    MATH  Google Scholar 

  9. Koza J R (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA

    MATH  Google Scholar 

  10. Samet H (1990) The design and analysis of spatial data structures. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA

    Google Scholar 

  11. Goldberg D E (1989) Genetic algorithms in search, optimization, and machine learning, Addison-Wesley, Reading

    Google Scholar 

  12. Jain A K, Murty M N, Flynn P J (1999) Data clustering: A review. ACM Computing Surveys 31:264-323

    Article  Google Scholar 

  13. Yu H, Yang J, Han J, Li X (2005) Making SVMs scalable to large data sets using hierarchical cluster indexing. Data Mining and Knowledge Discovery 11(3):295–321

    Article  MathSciNet  Google Scholar 

  14. Fogel D B (1994) An introduction to simulated evolutionary optimization. IEEE Trans. Neural Networks 5(1):3-14

    Article  Google Scholar 

  15. Rudolph G (1994) Convergence analysis of canonical genetic algorithms. IEEE Trans. Neural Networks 5(1):96-101

    Article  Google Scholar 

  16. Raghavan V V, Birchard K (1979) A clustering strategy based on a formalism of the reproductive process in natural systems. SIGIR Forum, 14:10-22

    Article  Google Scholar 

  17. Jones D R, Beltramo M A (1990) Clustering with genetic algorithms. GMR-7156, General Motors Research Report

    Google Scholar 

  18. Bhuyan J N, Raghavan V V, Venkatesh K E (1991) Genetic algorithm for clustering with an ordered representation. In proceedings of the Fourth ICGA, 408-415

    Google Scholar 

  19. Mishra S K, Raghavan V V (1994) An empirical study of the performance of heuristic methods for clustering, in Pattern Recognition in Practice IV, (eds) E. S. Gelsema and L. N. Kanal, Elsevier Science, 425-436

    Google Scholar 

  20. Fogel D B, Simpson P K (1993) Experiments with evolving fuzzy clusters. Proceedings of the Second Annual Conference on Evolutionary Programming

    Google Scholar 

  21. Babu G P, Murty M N (1994) Clustering with evolutionary strategies. Pattern Recognition 27 (2):321-329

    Article  Google Scholar 

  22. Laszlo M, Mukherjee S (2006) A genetic algorithm using hyper-quadtrees for low-dimensional k-means clustering. IEEE Trans. Pattern Analysis Machine Intelligence 28(4):533-543

    Article  Google Scholar 

  23. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. In SIGMOD-96: Proceedings of the 1996 ACM SIGMOD international conference on Management of data, 103-114, New York, NY, USA, ACM Press

    Chapter  Google Scholar 

  24. Babu G P, Murty M N (1993) A near-optimal initial seed selection in k-means algorithm using a genetic algorithm. Pattern Recognition Letters 14:763-769

    Article  MATH  Google Scholar 

  25. Babu T R, Murty M N (2001) Comparison of genetic algorithm based prototype selection schemes. Pattern Recognition 34:523-525

    Article  Google Scholar 

  26. Kaufman L, Rousseeuw P J (1989) Finding groups in data - An introuction to cluster analysis, Wiley, NY

    Google Scholar 

  27. Krishna K, Murty M N (1999) Genetic k-means algorithm. IEEE Trans. SMC-Part B 29:433-439

    Google Scholar 

  28. Murthy C A, Chowdhury N (1996) In search of optimal clusters using genetic algorithms. Pattern Recognition Letters 17:825-832

    Article  Google Scholar 

  29. Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognition 33:1455-1465

    Article  Google Scholar 

  30. Handl J, Knowles J (2004) Evolutionary multiobjective clustering. In Proceedings of PPSN VIII, LNCS 3242:1081-1091

    Google Scholar 

  31. Coello Coello A C (2000) An updated survey of GA-based multiobjective optimization techniques. ACM Computing Surveys 32:109-143

    Article  Google Scholar 

  32. Hiroyasu T, Miki M, Watanabe S (2000) The new model of parallel genetic algorithm in multiobjective optimization problems- divide range multiobjective genetic algorithm. In Proceedings of the 2000 Congress on Evolutionary Computation, 333-340

    Google Scholar 

  33. Molyneaux A K, Leyland G B, Favrat D (2000) A new clustering evolutionary multiobjective optimization technique, citeseer.ist.psu.edu, 446943.html

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Murty, M.N., Rashmin, B., Bhattacharyya, C. (2008). Clustering Based on Genetic Algorithms. In: Ghosh, A., Dehuri, S., Ghosh, S. (eds) Multi-Objective Evolutionary Algorithms for Knowledge Discovery from Databases. Studies in Computational Intelligence, vol 98. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77467-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77467-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77466-2

  • Online ISBN: 978-3-540-77467-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics