A Hybrid Evolutionary Approach to Cluster Detection
The modern world has witnessed a surge in technological advancements that span various industries. In some sectors, such as search engines, bioinformatics, and pattern recognition, software applications typically deal with having to interpret shear amounts of data in an attempt to discover patterns that may provide great value for business analysis, development, and planning. This emphasized the importance of fields of study such as clustering, a descendant discipline of data mining, which gained momentum in recent decades. Clustering addresses this very problem of analyzing large datasets and attempting to unravel data distributions and patterns by means of a mostly unsupervised data classification . Example clustering applications include multimedia analysis and retrieval , pattern recognition , and bioinformatics .
This chapter starts by providing an overview of existing clustering approaches. Then, it defines key concepts that are utilized by the PYRAMID algorithm. It also presents the experiments that were conducted in Tout et al.  as well as other experiments using various datasets that were employed in Sheikholeslami et al.  featuring different challenges. Finally, it explores the independence of PYRAMID on user-supplied parameters and outlines future research directions.
KeywordsGenetic Programming Cluster Detection Data Parallelism Slave Processor Master Processor
Unable to display preview. Download preview PDF.
- 1.Berkhin, P. (2002). Survey of clustering data mining techniques. Accrue Software. Retrieved February 28, 2005, from http://www.ee.ucr.edu/~barth/EE242/clustering_survey.pdf.
- 2.Berry, M.J. and Linoff, G. (1997). Data Mining Techniques: For Marketing, Sales, and Customer Support. New York: John Wiley and Sons.Google Scholar
- 3.Davis, L. (1991). Handbook of Genetic Algorithms. New York: Van Nostrand Reinhold.Google Scholar
- 7.Ester, M., Kriegel, H.P., Sander, J., and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, 226–231.Google Scholar
- 8.Guha, S., Rastogi, R., and Shim, K. (1998). CURE: An efficient clustering algorithm for large databases. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, WA, 73–84.Google Scholar
- 9.Han, J., and Kamber, M. (2001). Data Mining, Concepts and Techniques. San Francisco: Morgan Kaufmann.Google Scholar
- 10.Hinneburg, A., and Keim, D.A. (1998). An efficient approach to clustering in large multimedia databases with noise. Proceedings of the Fourth International Conference on Knowledge Discovery in Databases, New York, 58–65.Google Scholar
- 12.Karypis, G., Han, S., and Kumar, V. (1999). Chameleon: A hierarchical clustering using dynamic modeling. IEEE Computer: Data Analysis and Mining (Special Issue), 32(8), 68–75.Google Scholar
- 13.Kaufman, L. and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis. New York: John Wiley and Sons, Inc.Google Scholar
- 14.Kolatch, E. (2001). Clustering Algorithms for Spatial Databases: A Survey (Technical Report No. CMSC 725). Department of Computer Science, University of Maryland, College Park, 1–22.Google Scholar
- 15.Koza, J.R. (1991). Evolving a computer program to generate random numbers using the genetic programming paradigm. Proceedings of the Fourth International Conference on Genetic Algorithms, La Jolla, CA, 37–44.Google Scholar
- 16.Ohsawa, Y. and Nagashima, A. (2001). A spatio-temporal geographic information system based on implicit topology description:STIMS. Proceedings of the Third International Society for Photogrammetry and Remote Sensing (ISPRS) Workshop on Dynamic and Multi-Dimensional Geographic Information System, Thailand, 218–223.Google Scholar
- 17.Rasmussen, E. (1992). Clustering algorithms. Information Retrieval: Data Structures and Algorithms, 419–442. Upper Saddle River, NJ: Prentice-Hall.Google Scholar
- 19.Sarafis, I., Zalzala, A., and Trinder, P. (2002). A genetic rule-based data clustering toolkit. Proceedings of the 2002 World Congress on Evolutionary Computation, Honolulu, 1238–1243.Google Scholar
- 20.Sarafis, I., Zalzala, A., and Trinder, P. (2003). Mining comprehensive clustering rules with an evolutionary algorithm. Proceedings of the Genetic and Evolutionary Computation Conference, Chicago, 1–12.Google Scholar
- 21.Sheikholeslami, G., Chatterjee, S., and Zhang, A. (1998). WaveCluster: A multi-resolution clustering approach for very large spatial databases. Proceedings of the 24th International Conference on Very Large Data Bases, New York, 428–439.Google Scholar
- 23.Tout, S., Sverdlik, W., and Sun, J. (2006). Parallel hybrid clustering using genetic programming and multi-objective fitness with density (PYRAMID). Proceedings of the 2006 International Conference on Data Mining (DMIN’06), Las Vegas, NV, 197–203.Google Scholar
- 24.Wang, W., Yang, J., and Muntz, R. (1997). STING: A statistical information grid approach to spatial data mining. Proceedings of the 1997 International Conference on Very Large Data Bases, Athens, 186–195.Google Scholar
- 25.Zhang, T., Ramakrishnan, R., and Livny, M. (1996). BIRCH: An efficient data clustering method for very large databases. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, 103–114.Google Scholar