A Hybrid Evolutionary Approach to Cluster Detection

  • Junping Sun
  • William Sverdlik
  • Samir Tout
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 6)

The modern world has witnessed a surge in technological advancements that span various industries. In some sectors, such as search engines, bioinformatics, and pattern recognition, software applications typically deal with having to interpret shear amounts of data in an attempt to discover patterns that may provide great value for business analysis, development, and planning. This emphasized the importance of fields of study such as clustering, a descendant discipline of data mining, which gained momentum in recent decades. Clustering addresses this very problem of analyzing large datasets and attempting to unravel data distributions and patterns by means of a mostly unsupervised data classification [9]. Example clustering applications include multimedia analysis and retrieval [10], pattern recognition [15], and bioinformatics [5].

This chapter starts by providing an overview of existing clustering approaches. Then, it defines key concepts that are utilized by the PYRAMID algorithm. It also presents the experiments that were conducted in Tout et al. [23] as well as other experiments using various datasets that were employed in Sheikholeslami et al. [21] featuring different challenges. Finally, it explores the independence of PYRAMID on user-supplied parameters and outlines future research directions.


Genetic Programming Cluster Detection Data Parallelism Slave Processor Master Processor 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berkhin, P. (2002). Survey of clustering data mining techniques. Accrue Software. Retrieved February 28, 2005, from http://www.ee.ucr.edu/~barth/EE242/clustering_survey.pdf.
  2. 2.
    Berry, M.J. and Linoff, G. (1997). Data Mining Techniques: For Marketing, Sales, and Customer Support. New York: John Wiley and Sons.Google Scholar
  3. 3.
    Davis, L. (1991). Handbook of Genetic Algorithms. New York: Van Nostrand Reinhold.Google Scholar
  4. 4.
    Deb, K. (2001). Multi-Objective Optimization Using Evolutionary Algorithms. New York: John Wiley and Sons.MATHGoogle Scholar
  5. 5.
    Dettling, M. and Bühlmann, P. (2002). Supervised clustering of genes. Genome Biology, 3(12), 39–50.CrossRefGoogle Scholar
  6. 6.
    Dorai, C. and Jain, A.K. (1995). Shape spectra based view grouping for free-form objects. Proceedings of the International Conference on Image Processing, Washington, DC, 3, 340–343.CrossRefGoogle Scholar
  7. 7.
    Ester, M., Kriegel, H.P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, 226–231.Google Scholar
  8. 8.
    Guha, S., Rastogi, R., and Shim, K. (1998). CURE: An efficient clustering algorithm for large databases. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, WA, 73–84.Google Scholar
  9. 9.
    Han, J., and Kamber, M. (2001). Data Mining, Concepts and Techniques. San Francisco: Morgan Kaufmann.Google Scholar
  10. 10.
    Hinneburg, A., and Keim, D.A. (1998). An efficient approach to clustering in large multimedia databases with noise. Proceedings of the Fourth International Conference on Knowledge Discovery in Databases, New York, 58–65.Google Scholar
  11. 11.
    Jain, A.K., Murty, M., and Flynn, P. (1999). Data clustering: A review. ACM Computing Surveys, 31(3), 264–323.CrossRefGoogle Scholar
  12. 12.
    Karypis, G., Han, S., and Kumar, V. (1999). Chameleon: A hierarchical clustering using dynamic modeling. IEEE Computer: Data Analysis and Mining (Special Issue), 32(8), 68–75.Google Scholar
  13. 13.
    Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley and Sons, Inc.Google Scholar
  14. 14.
    Kolatch, E. (2001). Clustering Algorithms for Spatial Databases: A Survey (Technical Report No. CMSC 725). Department of Computer Science, University of Maryland, College Park, 1–22.Google Scholar
  15. 15.
    Koza, J.R. (1991). Evolving a computer program to generate random numbers using the genetic programming paradigm. Proceedings of the Fourth International Conference on Genetic Algorithms, La Jolla, CA, 37–44.Google Scholar
  16. 16.
    Ohsawa, Y. and Nagashima, A. (2001). A spatio-temporal geographic information system based on implicit topology description:STIMS. Proceedings of the Third International Society for Photogrammetry and Remote Sensing (ISPRS) Workshop on Dynamic and Multi-Dimensional Geographic Information System, Thailand, 218–223.Google Scholar
  17. 17.
    Rasmussen, E. (1992). Clustering algorithms. Information Retrieval: Data Structures and Algorithms, 419–442. Upper Saddle River, NJ: Prentice-Hall.Google Scholar
  18. 18.
    Ripley, B.D. (1996). Pattern Recognition and Neural Networks. Cambridge, MA: Cambridge University Press.MATHGoogle Scholar
  19. 19.
    Sarafis, I., Zalzala, A., and Trinder, P. (2002). A genetic rule-based data clustering toolkit. Proceedings of the 2002 World Congress on Evolutionary Computation, Honolulu, 1238–1243.Google Scholar
  20. 20.
    Sarafis, I., Zalzala, A., and Trinder, P. (2003). Mining comprehensive clustering rules with an evolutionary algorithm. Proceedings of the Genetic and Evolutionary Computation Conference, Chicago, 1–12.Google Scholar
  21. 21.
    Sheikholeslami, G., Chatterjee, S., and Zhang, A. (1998). WaveCluster: A multi-resolution clustering approach for very large spatial databases. Proceedings of the 24th International Conference on Very Large Data Bases, New York, 428–439.Google Scholar
  22. 22.
    Solberg, A., Taxt, T., and Jain, A. (1996). A Markov random field model for classification of multisource satellite imagery. IEEE Transactions on Geoscience and Remote Sensing, 34(1), 100–113.CrossRefGoogle Scholar
  23. 23.
    Tout, S., Sverdlik, W., and Sun, J. (2006). Parallel hybrid clustering using genetic programming and multi-objective fitness with density (PYRAMID). Proceedings of the 2006 International Conference on Data Mining (DMIN’06), Las Vegas, NV, 197–203.Google Scholar
  24. 24.
    Wang, W., Yang, J., and Muntz, R. (1997). STING: A statistical information grid approach to spatial data mining. Proceedings of the 1997 International Conference on Very Large Data Bases, Athens, 186–195.Google Scholar
  25. 25.
    Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: An efficient data clustering method for very large databases. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, 103–114.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Junping Sun
    • 1
  • William Sverdlik
    • 2
  • Samir Tout
    • 2
  1. 1.Graduate School of Computer and Information SciencesNova Southeastern UniversityFort LauderdaleUSA
  2. 2.Department of Computer ScienceEastern Michigan UniversityYpsilantiUSA

Personalised recommendations