Skip to main content

A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery

  • Chapter
Advances in Evolutionary Computing

Part of the book series: Natural Computing Series ((NCS))

Abstract

This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowledge discovery process, focusing on attribute selection and pruning of an ensemble of classifiers. We show how the requirements of data mining and knowledge discovery influence the design of evolutionary algorithms. In particular, we discuss how individual representation, genetic operators and fitness functions have to be adapted for extracting high-level knowledge from data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal R, Imielinski T and Swami A. Mining association rules between sets of items in large Databases. Proc. 1993 Int. Conf Management of Data (SIGMOD-93), 207–216. May 1993.

    Google Scholar 

  2. Agrawal R, Mannila H, Srikant R, Toivonen H and Verkamo AI. Fast Discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P and Uthurusamy R. (Eds.) Advances in Knowledge Discovery and Data Mining, 307–328. AAAI/MIT Press, 1996.

    Google Scholar 

  3. [3] Anglano C, Giordana A, Lo Bello G and Saitta L. Coevolutionary, Distributed search for inducing concept Descriptions. Lecture Notes in Artificial Intelligence 1398. ECML-98: Proc. 10th Eur. Conf. Machine Learning, 422–333. Springer-Verlag, 1998.

    Google Scholar 

  4. [4] Araujo DLA, Lopes HS and Freitas AA. A parallel genetic algorithm for rule Discovery in large Databases. Proc. 1999 IEEE Systems, Man and Cybernetics Conf., v. 3, 940–945. Tokyo, 1999.

    Google Scholar 

  5. [5] Bala J, De Jong K, Huang J, Vafaie H and Wechsler H. Using learning to facilitate the evolution of features for recognizing visual concepts. Evolutionary Computation 4(3) — Special Issue on Evolution, Learning, and Instinct: 100 years of the Baldwin Effect. 1997.

    Google Scholar 

  6. Banzhaf W, Nordin P, Keller RE and Francone FD Genetic Programming — an Introduction: On the Automatic Evolution of Computer Programs and Its Applications. Morgan Kaufmann, 1998.

    Google Scholar 

  7. Bhattacharyya S, Pictet O and Zumbach G. Representational semantics for genetic programming based learning in high-frequency financial Data. Genetic Programming 1998: Proc. 3rd Annual Conf., 11–16. Morgan Kaufmann, 1998.

    Google Scholar 

  8. Bojarczuk CC, Lopes HS and Freitas AA. Discovering comprehensible classification rules using genetic programming: a case study in a medical Domain. Proc. Genetic and Evolutionary Computation Conf. (GECCO-99), 953–958. Orlando, FL, USA, July/1999.

    Google Scholar 

  9. Bojarczuk CC, Lopes HS and Freitas AA. Genetic programming for knowledge discovery in chest pain Diagnosis. IEEE Engineering in Medicine and Biology Magazine& special issue on Data mining and knowledge Discovery, 19(4), 38–44, July/Aug. 2000.

    Google Scholar 

  10. Carvalho DR and Freitas AA. A hybrid Decision tree/genetic algorithm for coping with the problem of small Disjuncts in Data mining. Proc. Genetic and Evolutionary Computation Conf. (GECCO-2000), 1061–1068. Las Vegas, NV, USA, July 2000.

    Google Scholar 

  11. Catlett J. On changing continuous attributes into ordered Discrete attributes. Proc. Eur. Working Session on Learning (EWSL-91). Lecture Notes in Artificial Intelligence 482, 164–178. Springer-Verlag, 1991.

    MathSciNet  Google Scholar 

  12. Cherkauer KJ and Shavlik JW. Growing simpler Decision trees to facilitate knowledge discovery. Proc. 2nd Int. Conf. Knowledge Discovery& Data Mining (KDD-96), 315–318. AAAI Press, 1996.

    Google Scholar 

  13. De Jong KA, Spears WM and Gordon DF. Using genetic algorithms for concept learning. Machine Learning, 13, 161–188, 1993.

    Article  Google Scholar 

  14. Dhar V, Chou D and Provost F. Discovering interesting patterns for investment decision making with GLOWER& a Genetic Learner Overlaid with Entropy Reduction. To appear in Data Mining and Knowledge Discovery Journal. 2000.

    Google Scholar 

  15. Domingos P. Knowledge acquisition from examples via multiple models. Machine Learning: Proc. 14th Int. Conf. (ICML-97), 98–106. Morgan Kaufmann, 1997.

    Google Scholar 

  16. Eggermont J, Eiben AE and van Hemert JI. A comparison of genetic programming variants for Data classification. Proc. Intelligent Data Analysis (IDA-99). 1999.

    Google Scholar 

  17. Falkenauer E. Genetic Algorithms and Grouping Problems. John Wiley& Sons, 1998.

    Google Scholar 

  18. Fayyad UM, Piatetsky-Shapiro G and Smyth P. From Data mining to knowledge discovery: an overview. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P and Uthurusamy R. (Eds.) Advances in Knowledge Discovery& Data Mining, 1–34. AAAI/MIT, 1996.

    Google Scholar 

  19. Fisher DH. Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, 139–172,1987.

    Google Scholar 

  20. Fisher D and Hapanyengwi G. Database management and analysis tools of machine induction. Journal of Intelligent Information Systems, 2(1), 5–38, 1993.

    Article  Google Scholar 

  21. Flockhart IW and Radcliffe NJ. GA-MINER: parallel Data mining with hierarchical genetic algorithms — final report. EPCC-AIKMS-GA-MINER-Report 1.0. University of Edinburgh, UK, 1995.

    Google Scholar 

  22. Freitas AA. On objective measures of rule surprisingness. Lecture Notes in Artificial Intelligence 1510: Principles of Data Mining and Knowledge Discovery (Proc. 2nd Eur. Symp., PKDD’98, Nantes, France), 1–9. Springer-Verlag, 1998.

    Google Scholar 

  23. Freitas AA. On Rule Interestingness Measures. Knowledge-Based Systems, 12(5-6), 309–315, Oct. 1999.

    Article  Google Scholar 

  24. Freitas AA. Understanding the crucial Differences between classification and Discovery of association rules — a position paper. To appear in ACM SIGKDD Explorations, 2(1), 2000.

    Google Scholar 

  25. Freitas AA and Lavington SH. Mining Very Large Databases with Parallel Processing. Kluwer, 1998.

    Google Scholar 

  26. Gebhardt F. Choosing among competing generalizations. Knowledge Acquisition, 3, 361–380, 1991,.

    Article  Google Scholar 

  27. Giordana A and Neri F. Search-intensive concept induction. Evolutionary Computation 3(4), 375–416, Winter 1995.

    Article  Google Scholar 

  28. Giordana A and Saitta L, Zini F. Learning Disjunctive concepts by means of genetic algorithms. Proc. 10th Int. Conf. Machine Learning (ML-94), 96–104. Morgan Kaufmann, 1994.

    Google Scholar 

  29. Goldberg DE Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989.

    Google Scholar 

  30. Greene DP and Smith SF. Competition-based induction of Decision models from examples. Machine Learning, 13, 229–257, 1993.

    Article  Google Scholar 

  31. Guerra-Salcedo C and Whitley D. Feature selection mechanisms for ensemble creation: a genetic search perspective. In: Freitas AA (Ed.) Data Mining with Evolutionary Algorithms: Research Directions — Papers from the AAAI Workshop, 13–17. Technical Report WS-99-06. AAAI Press, 1999.

    Google Scholar 

  32. Guyon I, Matic O and Vapnik V. Discovering informative patterns and Data cleaning. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P and Uthurusamy R. (Eds.) Advances in Knowledge Discovery and Data Mining, 181–203. AAAI/MIT Press. 1996.

    Google Scholar 

  33. Hall LO, Ozyurt IB and Bezdek JC. Clustering with a genetically optimized approach. IEEE Trans. Evolutionary Computation 3(2), 103–112. July 1999.

    Article  Google Scholar 

  34. Hand DJ. Construction and Assessment of Classification Rules. John Wiley& Sons, 1997.

    Google Scholar 

  35. Holland JH. Escaping brittleness: the possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In: Mitchell T et al. (Eds.) Machine Learning, Vol. 2, 593–623. Morgan Kaufmann, 1986.

    Google Scholar 

  36. Hu Y-J. A genetic programming approach to constructive induction. Genetic Programming 1998: Proc. 3rd Annual Conf., 146–151. Morgan Kaufmann, 1998.

    Google Scholar 

  37. Janikow CZ. A knowledge-intensive genetic algorithm for supervised learning. Machine Learning, 13, 189–228, 1993.

    Article  Google Scholar 

  38. John GH, Kohavi R and Pfleger K. Irrelevant features and the subset selection problem. Proc. 11th Int. Conf. Machine Learning, 121–129. 1994.

    Google Scholar 

  39. Kelly Jr. JD and Davis L. A hybrid genetic algorithm for classification. Proc. 12th Int. Joint Conf on A1, 645–650. 1991.

    Google Scholar 

  40. Klemettinen M, Mannila H, Ronkainen P, Toivonen H and Verkamo AI. Finding interesting rules from large sets of Discovered association rules. Proc. 3rd Int. Conf. on Information and Knowledge Management. Gaithersburg, MD, USA, Nov./Dec. 1994.

    Google Scholar 

  41. Koza JR. Genetic Programming: on the Programming of Computers by Means of Natural Selection. MIT Press, 1992.

    Google Scholar 

  42. Kudo M and Skalansky J. Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33(1), 25–41, Jan. 2000.

    Article  Google Scholar 

  43. Kwedlo W and Kretowski M. Discovery of Decision rules from Databases: an evolutionary approach. Proc. 2nd Eur. Symp. Principles of Data Mining and Knowledge Discovery (PKDD-98). Lecture Motes in Artificial Intelligence 1510, 371–378. Springer-Verlag, 1998.

    Google Scholar 

  44. Liu B, Hsu W. and Chen S. Using general impressions to analyze Discovered classification rules. Proc. 3rd Int. Conf. Knowledge Discovery& Data Mining, 31–36. AAAI Press, 1997.

    Google Scholar 

  45. Mahfoud SW. Niching Methods for Genetic Algorithms. Ph.D. Thesis. Univ. of Illinois at Urbana-Champaign. IlliGAL Report No. 95001. May 1995.

    Google Scholar 

  46. Martin-Bautista MJ and Vila MA. A survey of genetic feature selection in mining issues. Proc. Congr. Evolutionary Computation (CEC-99), 1314–1321. Washington DC, USA, July 1999.

    Google Scholar 

  47. Michalewicz O. Genetic Algorithms + Data Structures = Evolution Programs. 3rd Ed. Springer-Verlag, 1996.

    Google Scholar 

  48. Michie, D, Spiegelhalter, DJ and Taylor, CC. Machine Learning, Neural and Statistical Classification. Ellis Horwood, 1994.

    Google Scholar 

  49. Noda E, Freitas AA and Lopes HS. Discovering interesting prediction rules with a genetic algorithm. Proc. Conf. on Evolutionary Computation — 1999 (CEC-99), 1322–1329. Washington DC, USA, July 1999.

    Google Scholar 

  50. Park Y and Song M. A genetic algorithm for clustering problems. Genetic Programming 1998: Proc. 3rd Annual Conf., 568–575. Morgan Kaufmann, 1998.

    Google Scholar 

  51. Pei M, Goodman ED, Punch WF. Pattern Discovery from Data using genetic algorithms. Proc. 1st Pacific-Asia Conf. Knowledge Discovery& Data Mining (PAKDD-97). Feb. 1997.

    Google Scholar 

  52. Pfahringer B. Supervised and unsupervised Discretization of continuous features. Proc. 12th Int. Conf. Machine Learning, 456–463. 1995.

    Google Scholar 

  53. Poli R and Cagnoni S. Genetic programming with user-driven selection: experiments on the evolution of algorithms for image enhancement. Genetic Programming 1997: Proc. 2nd Annual Conf., 269–277. Morgan Kaufmann, 1997.

    Google Scholar 

  54. Punch WF, Goodman ED, Pei M, Chia-Sun L, Hovland P, Enbody R. Further research on feature selection and classification using genetic algorithms. Proc. 5th Int. Conf. Genetic Algorithms (ICGA-93), 557–564. Morgan Kaufmann, 1993.

    Google Scholar 

  55. Pyle D. Data Preparation for Data Mining. Morgan Kaufmann, 1999.

    Google Scholar 

  56. Ryan MD and Rayward-Smith VJ. The evolution of Decision trees. Genetic Programming 1998: Proc. 3rd Annual Conf., 350–358. Morgan Kaufmann, 1998.

    Google Scholar 

  57. Schaffer C. Overfitting avoidance as bias. Machine Learning, 10, 153–178, 1993.

    Google Scholar 

  58. Schapire RE, Freund Y, Bartlett P and Lee WS. Boosting the margin: a new explanation for the effectiveness of voting methods. Machine Learning: Proc. 14th Int. Conf. (ICML-97), 322–330. Morgan Kaufmann, 1997.

    Google Scholar 

  59. Simoudis E, Livezey B and Kerber R. Integrating inductive and Deductive reasoning for Data mining. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P and Uthurusamy R. (Eds.) Advances in Knowledge Discovery and Data Mining, 353–373. AAAI/MIT Press, 1996.

    Google Scholar 

  60. Terano T and Ishino Y. Interactive genetic algorithm based feature selection and its application to marketing Data analysis. In: Liu H and Motoda H (Eds.) Feature Extraction, Construction and Selection: a Data mining perspective, 393–406. Kluwer, 1998.

    Google Scholar 

  61. Thompson S. Pruning boosted classifiers with a real valued genetic algorithm. Research& Development. in Expert Systems XV — Proc. ES’98, 133–146. Springer-Verlag, 1998.

    Google Scholar 

  62. Thompson S. Genetic algorithms as postprocessors for Data mining. In: Freitas AA (Ed.) Data Mining with Evolutionary Algorithms: Research Directions — Papers from the AAAI Workshop, 18–22. Technical Report WS-99-06. AAAI Press, 1999.

    Google Scholar 

  63. Vafaie H and De Jong K. Robust feature selection algorithms. Proc. 1993 IEEE Int. Conf on Tools with A1, 356–363. Boston, MS, USA. Nov. 1993.

    Google Scholar 

  64. Vafaie H and De Jong K. Evolutionary feature space transformation. In: Liu H and Motoda H (Eds.) Feature Extraction, Construction and Selection: a Data mining perspective, 307–323. Kluwer, 1998.

    Google Scholar 

  65. Weiss GM and Hirsh H. Learning to predict rare events in event sequences. Proc. 4th Int. Conf. Knowledge Discovery and Data Mining, 359–363. AAAI Press, 1998.

    Google Scholar 

  66. Weiss SM and Indurkhya N. Predictive Data Mining: a practical guide. Morgan Kaufmann, 1998.

    Google Scholar 

  67. Weiss SM and Kulikowski CA. Computer Systems that Learn. Morgan Kaufmann, 1991.

    Google Scholar 

  68. Wong ML and Leung KS. Data Mining Using Grammar-Based Genetic Programming and Applications. Kluwer, 2000.

    Google Scholar 

  69. Yang J and Honavar V. Feature subset selection using a genetic algorithm. In: Liu O and Motoda H (Eds.) Feature Extraction, Construction and Selection: a Data mining perspective, 117–136. Kluwer, 1998.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Freitas, A.A. (2003). A Survey of Evolutionary Algorithms for Data Mining and Knowledge Discovery. In: Ghosh, A., Tsutsui, S. (eds) Advances in Evolutionary Computing. Natural Computing Series. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18965-4_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-18965-4_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-62386-8

  • Online ISBN: 978-3-642-18965-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics