A Review of evolutionary Algorithms for Data Mining

  • Alex A. Freitas

Evolutionary Algorithms (EAs) are stochastic search algorithms inspired by the process of neo-Darwinian evolution. The motivation for applying EAs to data mining is that they are robust, adaptive search techniques that perform a global search in the solution space. This chapter first presents a brief overview of EAs, focusing mainly on two kinds of EAs, viz. Genetic Algorithms (GAs) and Genetic Programming (GP). Then the chapter reviews the main concepts and principles used by EAs designed for solving several data mining tasks, namely: discovery of classification rules, clustering, attribute selection and attribute construction. Finally, it discusses Multi-Objective EAs, based on the concept of Pareto dominance, and their use in several data mining tasks.

Key words: genetic algorithm, genetic programming, classification, clustering, attribute selection, attribute construction, multi-objective optimization


Genetic Algorithm Data Mining Evolutionary Algorithm Genetic Programming Pareto Front 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aldenderfer MS & Blashfield RK (1984) Cluster Analysis (Sage University Paper Series on Quantitative Applications in the Social Sciences, No. 44) Sage Publications.Google Scholar
  2. Atkinson-Abutridy J, Mellishm C, and Aitken S 2003 A semantically guided and domain-independent evolutionary model for knowledge discovery from texts. IEEE Trans. Evolutionary Computation 7(6), 546-560.CrossRefGoogle Scholar
  3. Bacardit J, Goldberg DE, Butz MV, Llora X, Garrell JM (2004). Speeding-up Pittsburgh learning classifier systems: modeling time and accuracy. Proc. Parallel Problem Solving From Nature (PPSN-2004), LNCS 3242, 1021-1031, Springer.Google Scholar
  4. Bacardit J and Krasnogor N (2006) Smart crossover operator with multiple parents for a Pittsburgh learning classifier system. Proc. Genetic & Evolutionary Computation Conf. (GECCO-2006), 1441-1448. Morgan Kaufmann.Google Scholar
  5. Backer E (1995) Computer-Assisted Reasoning in Cluster Analysis. Prentice-Hall.Google Scholar
  6. Back T, Fogel DB and Michalewicz (Eds.) (2000) Evolutionary Computation 1: Basic Algorithms and Operators. Institute of Physics Publishing.Google Scholar
  7. Bala J, De Jong K, Huang J, Vafaie H and Wechsler H (1995) Hybrid learning using genetic algorithms and decision trees for pattern classification. Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI-95), 719-724.Google Scholar
  8. Bala J, De Jong K, Huang J, Vafaie H and Wechsler H 1996 Using learning to facilitate the evolution of features for recognizing visual concepts. Evolutionary Computation 4(3): 297-312.CrossRefGoogle Scholar
  9. Banzhaf W (2000) Interactive evolution. In: T. Back, D.B. Fogel and T. Michalewicz (Eds.) Evolutionary Computation 1, 228-236. Institute of Physics Pub.Google Scholar
  10. Banzhaf W, Nordin P, Keller RE, and Francone FD (1998) Genetic Programming ∼ an Introduction: On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann.Google Scholar
  11. Bhattacharrya S (1998) Direct marketing response models using genetic algorithms. Proceedings of the 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD-98), 144-148. AAAI Press.Google Scholar
  12. Brachman RJ and Anand T. (1996) The process of knowledge discovery in databases: a human-centered approach. In: U.M. Fayyad et al (Eds.) Advances. in Knowledge Discovery and Data Mining, 37-58. AAAI/MIT.Google Scholar
  13. Bull L (Ed.) (2004) Applications of Learning Classifier Systems. Springer.Google Scholar
  14. Bull L and Kovacs T (Eds.) (2005) Foundations of Learning Classifier Systems. Springer.Google Scholar
  15. Cantu-Paz E (2000) Efficient and Accurate Parallel Genetic Algorithms. Kluwer.Google Scholar
  16. Caruana R and Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of supervised learning performance criteria. Proc. 2004 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-04), ACM.Google Scholar
  17. Carvalho DR and Freitas AA (2004). A hybrid decision tree/genetic algorithm method for data mining. Special issue on Soft Computing Data Mining, Information Sciences 163(1-3), pp. 13-35. 14 June 2004.Google Scholar
  18. Chen S, Guerra-Salcedo C and Smith SF (1999) Non-standard crossover for a standard representation - commonality-based feature subset selection. Proc. Genetic and Evolutionary Computation Conf. (GECCO-99), 129-134. Morgan Kaufmann.Google Scholar
  19. Cherkauer KJ and Shavlik JW (1996). Growing simpler decision trees to facilitate knowledge discovery. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (KDD-96), 315-318. AAAI Press.Google Scholar
  20. Coello Coello CA, Van Veldhuizen DA and Lamont GB (2002) Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer.Google Scholar
  21. Coello Coello CA and Lamont GB (Ed.) (2004) Applications of Multi-objective Evolutionary Algorithms. World Scientific.Google Scholar
  22. Deb K (2001) Multi-Objective Optimization Using Evolutionary Algorithms. Wiley.Google Scholar
  23. Deb K and Goldberg DE (1989). An investigation of niche and species formation in genetic function optimization. Proc. 2nd Int. Conf. Genetic Algorithms (ICGA89),42-49.Google Scholar
  24. De Jong K (2006) Evolutionary Computation: a unified approach. MIT.Google Scholar
  25. De la Iglesia B (2007) Application of multi-objective metaheuristic algorithms in data mining. Proc. 3rd UK Knowledge Discovery and Data Mining Symposium (UKKDD-2007), 39-44, University of Kent, UK, April 2007.Google Scholar
  26. Dhar V, Chou D and Provost F 2000. Discovering interesting patterns for investment decision making with GLOWER - a genetic learner overlaid with entropy reduction. Data Mining and Knowledge Discovery 4(4), 251-280.MATHCrossRefGoogle Scholar
  27. Divina F (2005) Assessing the effectiveness of incorporating knowledge in an evolutionary concept learner. Proc. EuroGP-2005 (European Conf. on Genetic Programming), LNCS 3447, 13-24, Springer.Google Scholar
  28. Divina F & Marchiori E (2002) Evolutionary Concept Learning. Proc. Genetic & Evolutionary Computation Conf. (GECCO-2002), 343-350. Morgan Kaufmann.Google Scholar
  29. Divina F & Marchiori E (2005) Handling continuous attributes in an evolutionary inductive learner. IEEE Trans. Evolutionary Computation, 9(1), 31-43, Feb. 2005.CrossRefGoogle Scholar
  30. Eiben AE and Smith JE (2003) Introduction to Evolutionary Computing. Springer.Google Scholar
  31. Emmanouilidis C, Hunter A and J. MacIntyre J (2000) A multiobjective evolutionary setting for feature selection and a commonality-based crossover operator. Proc. 2000 Congress on Evolutionary Computation (CEC-2000), 309-316. IEEE.Google Scholar
  32. Emmanouilidis C 2002 Evolutionary multi-objective feature selection and ROC analysis with application to industrial machinery fault diagnosis. In: K. Giannakoglou et al. (Eds.) Evolutionary Methods for Design, Optimisation and Control. Barcelona: CIMNE.Google Scholar
  33. Estivill-Castro V and Murray AT 1997 Spatial clustering for data mining with genetic algorithms. Tech. Report FIT-TR-97-10. Queensland University of Technology. Australia.Google Scholar
  34. Falkenauer E (1998) Genetic Algorithms and Grouping Problems. John-Wiley & Sons.Google Scholar
  35. Fayyad UM, Piatetsky-Shapiro G and Smyth P (1996) From data mining to knowledge discovery: an overview. In: U.M. Fayyad et al (Eds.) Advances in Knowledge Discovery and Data Mining, 1-34. AAAI/MIT.Google Scholar
  36. Firpi H, Goodman E, Echauz J (2005) On prediction of epileptic seizures by computing multiple genetic programming artificial features. Proc. 2005 European Conf. on Genetic Programming (EuroGP-2005), LNCS 3447, 321-330. Springer.Google Scholar
  37. Folino G, Pizzuti C and Spezzano G (2006) GP ensembles for large-scale data classification. IEEE Trans. Evolutionary Computation 10(5), 604-616, Oct. 2006.CrossRefGoogle Scholar
  38. Freitas AA and. Lavington SH (1998) Mining Very Large Databases with Parallel Processing. Kluwer.Google Scholar
  39. Freitas AA 2001 Understanding the crucial role of attribute interaction in data mining. Artificial Intelligence Review 16(3), 177-199.MATHCrossRefGoogle Scholar
  40. Freitas AA (2002a) Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer.Google Scholar
  41. Freitas AA (2002b) A survey of evolutionary algorithms for data mining and knowledge discovery. In: A. Ghosh and S. Tsutsui. (Eds.) Advances in Evolutionary Computation, pp. 819-845. Springer-Verlag.Google Scholar
  42. Freitas AA (2002c). Evolutionary Computation. In: W. Klosgen and J. Zytkow (Eds.) Handbook of Data Mining and Knowledge Discovery, pp. 698-706.Oxford Univ. Press.Google Scholar
  43. Freitas AA 2004 A critical review of multi-objective optimization in data mining: a position paper. ACM SIGKDD Explorations, 6(2), 77-86, Dec. 2004.CrossRefMathSciNetGoogle Scholar
  44. Freitas AA (2005) Evolutionary Algorithms for Data Mining. In: O. Maimon and L. Rokach (Eds.) The Data Mining and Knowledge Discovery Handbook, pp. 435-467. Springer.Google Scholar
  45. Freitas AA 2006 Are we really discovering ”interesting” knowledge from data? Expert Update, Vol. 9, No. 1, 41-47, Autumn 2006.MathSciNetGoogle Scholar
  46. Furnkranz J and Flach PA (2003). An analysis of rule evaluation metrics. Proc.20th Int. Conf. Machine Learning (ICML-2003). Morgan Kaufmann.Google Scholar
  47. Gathercole C and Ross P (1997) Tackling the Boolean even N parity problem with genetic programming and limited-error fitness. Genetic Programming 1997: Proc. 2nd Conf. (GP-97), 119-127. Morgan Kaufmann.Google Scholar
  48. Ghozeil A and Fogel DB (1996) Discovering patterns in spatial data using evolutionary programming. Genetic Programming 1996: Proceedings of the 1st Annual Conf., 521-527. MIT Press.Google Scholar
  49. Giordana A, Saitta L, Zini F (2004) Learning disjunctive concepts by means of genetic algorithms. Proc. 10th Int. Conf. Machine Learning (ML-94), 96-104. Morgan Kaufmann.Google Scholar
  50. Goldberg DE (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley.Google Scholar
  51. Goldberg DE and Richardson J (1987) Genetic algorithms with sharing for multimodal function optimization. Proc. Int. Conf. Genetic Algorithms (ICGA-87), 41-49.Google Scholar
  52. Guerra-Salcedo C and Whitley D (1998) Genetic search for feature subset selection: a comparison between CHC and GENESIS. Genetic Programming 1998: Proc. 3rd Annual Conf., 504-509. Morgan Kaufmann.Google Scholar
  53. Guerra-Salcedo C, Chen S, Whitley D, and Smith S (1999) Fast and accurate feature selection using hybrid genetic strategies. Proc. Congress on Evolutionary Computation (CEC-99), 177-184. IEEE.Google Scholar
  54. Guyon I and Elisseeff A 2003 An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157-1182.MATHCrossRefGoogle Scholar
  55. Hall LO, Ozyurt IB, Bezdek JC 1999 Clustering with a genetically optimized approach. IEEE Trans. on Evolutionary Computation 3(2), 103-112.CrossRefGoogle Scholar
  56. Hand DJ (1997) Construction and Assessment of Classification Rules. Wiley.Google Scholar
  57. Handl J and Knowles J (2004) Evolutionary multiobjective clustering. Proc. Parallel Problem Solving From Nature (PPSN-2004), LNCS 3242, 1081-1091, Springer.Google Scholar
  58. Hekanaho J (1995) Symbiosis in multimodal concept learning. Proc. 1995 Int. Conf. on Machine Learning (ML-95), 278-285. Morgan Kaufmann.Google Scholar
  59. Hekanaho J 1996 Testing different sharing methods in concept learning. TUCS Technical Report No. 71. Turku Centre for Computer Science, Finland.Google Scholar
  60. Hirsch L, Saeedi M and Hirsch R (2005) Evolving rules for document classification. Proc. 2005 European Conf. on Genetic Programming (EuroGP-2005), LNCS 3447, 85-95, Springer.Google Scholar
  61. Hu YJ (1998). A genetic programming approach to constructive induction. Genetic Programming 1998: Proc. 3rd Annual Conf., 146-151. Morgan Kaufmann.Google Scholar
  62. Ishibuchi H and Nakashima T (2000) Multi-objective pattern and feature selection by a genetic algorithm. Proc. 2000 Genetic and Evolutionary Computation Conf. (GECCO-2000), 1069-1076. Morgan Kaufmann.Google Scholar
  63. Ishibuchi H and Namba S (2004) Evolutionary multiobjective knowledge extraction for high-dimensional pattern classification problems. Proc. Parallel Problem Solving From Nature (PPSN-2004), LNCS 3242, 1123-1132, Springer.Google Scholar
  64. Jiao L, Liu J and Zhong W 2006 An organizational coevolutionary algorithm for classification. IEEE Trans. Evolutionary Computation, Vol. 10, No. 1, 67-80, Feb. 2006.CrossRefGoogle Scholar
  65. Jin, Y (Ed.) (2006) Multi-Objective Machine Learning. Springer.Google Scholar
  66. Jong K, Marchiori E and Sebag M (2004) Ensemble learning with evolutionary computation: application to feature ranking. Proc. Parallel Problem Solving from Nature VIII (PPSN-2004), LNCS 3242, 1133-1142. Springer, 2004.Google Scholar
  67. Jourdan L, Dhaenens-Flipo C and Talbi EG (2003) Discovery of genetic and environmental interactions in disease data using evolutionary computation. In: G.B. Fogel and D.W. Corne (Eds.) Evolutionary Computation in Bioinformatics, 297-316. Morgan Kaufmann.Google Scholar
  68. Kim Y, Street WN and Menczer F (2000) Feature selection in unsupervised learning via evolutionary search. Proc. 6th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD-2000), 365-369. ACM.Google Scholar
  69. Kim D (2004). Structural risk minimization on decision trees: using an evolutionary multiobjective algorithm. Proc. 2004 European Conference on Genetic Programming (EuroGP-2004), LNCS 3003, 338-348, Springer.Google Scholar
  70. Korkmaz EE, Du J, Alhajj R and Barker (2006) Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering. Intelligent Data Analysis 10 2006,163-182.Google Scholar
  71. Koza JR (1992) Genetic Programming: on the programming g of computers by means of natural selection. MIT Press.Google Scholar
  72. Krawiec K 2002 Genetic programming-based construction of features for machine learning and knowledge discovery tasks. Genetic Programming and Evolvable Machines 3(4), 329-344.MATHCrossRefGoogle Scholar
  73. Krsihma K and Murty MN 1999 Genetic k-means algorithm. IEEE Transactions on Systems, Man and Cyberneics - Part B: Cybernetics, 29(3), 433-439.CrossRefGoogle Scholar
  74. Krzanowski WJ and Marriot FHC (1995) Kendall’s Library of Statistics 2: Multivariate Analysis - Part 2. Chapter 10 - Cluster Analysis, pp. 61-94.London: Arnold.Google Scholar
  75. Kudo M and Sklansky J 2000 Comparison of algorithms that select features for pattern classifiers. Pattern Recognition 33(2000), 25-41.CrossRefGoogle Scholar
  76. Liu JJ and Kwok JTY (2000) An extended genetic rule induction algorithm. Proc. 2000 Congress on Evolutionary Computation (CEC-2000). IEEE.Google Scholar
  77. Liu H and Motoda H (1998) Feature Selection for Knowledge Discovery and Data Mining. Kluwer.Google Scholar
  78. Liu B, Hsu W and Chen S (1997) Using general impressions to analyze discovered classification rules. Proc. 3rd Int. Conf. on Knowledge Discovery and Data Mining (KDD-97), 31-36. AAAI Press.Google Scholar
  79. Llora X and Garrell J 2003 Prototype induction and attribute selection via evolutionary algorithms. Intelligent Data Analysis 7, 193-208.Google Scholar
  80. Miller MT, Jerebko AK, Malley JD, Summers RM (2003) Feature selection for computer-aided polyp detection using genetic algorithms. Medical Imaging 2003: Physiology and Function: methods, systems and applications. Proc. SPIE Vol. 5031.Google Scholar
  81. Moser A and Murty MN (2000) On the scalability of genetic algorithms to very large-scale feature selection. Proc. Real-World Applications of Evolutionary Computing (EvoWorkshops 2000). LNCS 1803, 77-86. Springer.Google Scholar
  82. Muharram MA and Smith GD (2004) Evolutionary feature construction using information gain and gene index. Genetic Programming: Proc. 7th European Conf. (EuroGP-2003), LNCS 3003, 379-388. Springer.Google Scholar
  83. Muni DP, Pal NR and Das J (2004) A novel approach to design classifiers using genetic programming. IEEE Trans. Evolutionary Computation 8(2), 183-196, April 2004.CrossRefGoogle Scholar
  84. Neri F and Giordana A 1995 Search-intensive concept induction. Evolutionary Computation 3(4), 375-416.CrossRefGoogle Scholar
  85. Ni B and Liu J (2004) A novel method of searching the microarray data for the best gene subsets by using a genetic algorithms. Proc. Parallel Problem Solving From Nature (PPSN-2004), LNCS 3242, 1153-1162, Springer.Google Scholar
  86. Otero FB, Silva MMS, Freitas AA and Nievola JC (2003) Genetic programming for attribute construction in data mining. Genetic Programming: Proc. EuroGP2003, LNCS 2610, 384-393. Springer.Google Scholar
  87. Papagelis A and Kalles D (2001) Breeding decision trees using evolutionary techniques. Proc. 18th Int. Conf. Machine Learning (ICML-2001), 393-400. Morgan Kaufmann.Google Scholar
  88. Pappa GL and Freitas AA (2006) Automatically evolving rule induction algorithms. Machine Learning: ECML 2006 - Proc. of the 17th European Conf. on Machine Learning, LNAI 4212, 341-352. Springer.Google Scholar
  89. Pappa GL and Freitas AA (2007) Discovering new rule induction algorithms with grammar-based genetic programming. Maimon O and Rokach L (Eds.) Soft Computing for Knowledge Discovery and Data Mining. Springer.Google Scholar
  90. Pappa GL, Freitas AA and Kaestner CAA 2002 A multiobjective genetic algorithm for attribute selection. Proc. 4th Int. Conf. On Recent Advances in Soft Computing (RASC-2002), 116-121. Nottingham Trent University, UK.Google Scholar
  91. Pappa GL, Freitas AA and Kaestner CAA (2004) Multi-Objective Algorithms for Attribute Selection in Data Mining. In: Coello Coello CA and Lamont GB (Ed.) Applications of Multi-objective Evolutionary Algorithms, 603-626. World Scientific.Google Scholar
  92. Pazzani MJ (2000) Knowledge discovery from data, IEEE Intelligent Systems, 10-13, Mar./Apr. 2000.Google Scholar
  93. Quinlan JR. (1993) C4.5: Programs for Machine Learning. Morgan Kaufmann.Google Scholar
  94. Romao W, Freitas AA and Pacheco RCS (2002) A Genetic Algorithm for Discovering Interesting Fuzzy Prediction Rules: applications to science and technology data. Proc. Genetic and Evolutionary Computation Conf. (GECCO-2002), pp. 1188-1195. Morgan Kaufmann.Google Scholar
  95. Romao W, Freitas AA, Gimenes IMS 2004 Discovering interesting knowledge from a science and technology database with a genetic algorithm. Applied Soft Computing 4(2), pp. 121-137.CrossRefGoogle Scholar
  96. Rozsypal A and Kubat M 2003 Selecting representative examples and attributes by a genetic algorithm. Intelligent Data Analysis 7, 290-304.Google Scholar
  97. Sarafis I 2005 Data mining clustering of high dimensional databases with evolutionary algorithms. PhD Thesis, School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh, UK.Google Scholar
  98. Sharpe PK and Glover RP 1999 Efficient GA based techniques for classification. Applied Intelligence 11, 277-284.CrossRefGoogle Scholar
  99. Smith RE (2000) Learning classifier systems. In: T. Back, D.B. Fogel and T. Michalewicz (Eds.) Evolutionary Computation 1: Basic Algorithms and Operators, 114-123. Institute of Physics Publishing.Google Scholar
  100. Smith MG and Bull L (2003) Feature construction and selection using genetic programming and a genetic algorithm. Genetic Programming: Proc. EuroGP2003, LNCS 2610, 229-237. Springer.Google Scholar
  101. Smith MG and Bull L (2004) Using genetic programming for feature creation with a genetic algorithm feature selector. Proc. Parallel Problem Solving From Nature (PPSN-2004), LNCS 3242, 1163-1171, Springer.Google Scholar
  102. Song D, Heywood MI and Zincir-Heywood AN (2005) Training genetic programming on half a million patterns: an example from anomaly detection. IEEE Trans. Evolutionary Computation 9(3), 225-239, June 2005.CrossRefGoogle Scholar
  103. Srikanth R, George R, Warsi N, Prabhu D, Petry FE, Buckles B 1995 A variablelength genetic algorithm for clustering and classification. Pattern Recognition Letters 16(8), 789-800.CrossRefGoogle Scholar
  104. Tan PN, Steinbach M and Kumar V (2006) Introduction to Data Mining. AddisonWesley.Google Scholar
  105. Terano T and Ishino Y (1998) Interactive genetic algorithm based feature selection and its application to marketing data analysis. In: Liu H and Motoda H (Eds.) Feature Extraction, Construction and Selection: a data mining perspective, 393-406. Kluwer.Google Scholar
  106. Terano T and Inada M (2002) Data mining from clinical data using interactive evolutionary computation. In: A. Ghosh and S. Tsutsui (Eds.) Advances in Evolutionary Computing: theory and applications, 847-861. Springer.Google Scholar
  107. Vafaie H and De Jong K (1998) Evolutionary Feature Space Transformation. In: H. Liu and H. Motoda (Eds.) Feature Extraction, Construction and Selection, 307-323. Kluwer.Google Scholar
  108. Witten IH and Frank E (2005) Data Mining: practical machine learning tools and techniques . 2nd Ed. Morgan Kaufmann.Google Scholar
  109. Wong ML and Leung KS (2000) Data Mining Using Grammar Based Genetic Programming and Applications. Kluwer.Google Scholar
  110. Yang J and Honavar V (1997) Feature subset selection using a genetic algorithm. Genetic Programming 1997: Proc. 2nd Annual Conf. (GP-97), 380-385. Morgan Kaufmann.Google Scholar
  111. Yang J and Honavar V (1998) Feature subset selection using a genetic algorithm. In: Liu, H. and Motoda, H (Eds.) Feature Extraction, Construction and Selection, 117-136. Kluwer.Google Scholar
  112. Zhang P, Verma B, Kumar K (2003) Neural vs. Statistical classifier in conjunction with genetic algorithm feature selection in digital mammography. Proc. Congress on Evolutionary Computation (CEC-2003). IEEE Press.Google Scholar
  113. Zhou C, Xiao W, Tirpak TM and Nelson PC 2003 Evolving accurate and compact classification rules with gene expression programming. IEEE Trans. on Evolutionary Computation 7(6), 519-531.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Alex A. Freitas
    • 1
  1. 1.Computing LaboratoryUniversity of KentUK

Personalised recommendations