Discovering New Rule Induction Algorithms with Grammar-based Genetic Programming

  • Gisele L. Pappa
  • Alex A. Freitas

Rule induction is a data mining technique used to extract classification rules of the form IF (conditions) THEN (predicted class) from data. The majority of the rule induction algorithms found in the literature follow the sequential covering strategy, which essentially induces one rule at a time until (almost) all the training data is covered by the induced rule set. This strategy describes a basic algorithm composed by several key elements, which can be modified and/or extended to generate new and better rule induction algorithms. With this in mind, this work proposes the use of a grammar-based genetic programming (GGP) algorithm to automatically discover new sequential covering algorithms. The proposed system is evaluated using 20 data sets, and the automatically-discovered rule induction algorithms are compared with four well-known human-designed rule induction algorithms. Results showed that the GGP system is a promising approach to effectively discover new sequential covering algorithms.


Sequential Covering Production Rule Numerical Attribute Rule Induction Target Problem 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aho, A.V., Sethi, R., Ullman, J.D, (1986), Compilers: Principles, Techniques and Tools. 1st edn. Addison-Wesley.Google Scholar
  2. Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D, (1998), Genetic Programming - An Introduction; On the Automatic Evolution of Computer Programs and its Applications. Morgan Kaufmann.Google Scholar
  3. Bhattacharyya, S, (1998), Direct marketing response models using genetic algorithms. In: Proc. of 4th Int. Conf. on Knowledge Discovery and Data Mining (KDD-98). 144-148.Google Scholar
  4. Caruana, R., Niculescu-Mizil, A, (2004), Data mining in metric space: an empirical analysis of supervised learning performance criteria. In: Proc. of the 10th ACM SIGKDD Int. Conf. on Knowledge discovery and data mining (KDD-04), ACM Press 69-78.Google Scholar
  5. Clark, P., Boswell, R., 1991, Rule induction with CN2: some recent improvements. In Kodratoff, Y., ed, EWSL-91: Proc. of the European Working Session on Learning on Machine Learning, New York, NY, USA, Springer-Verlag 151-163.CrossRefGoogle Scholar
  6. Clark, P., Niblett, T, 1989, The CN2 induction algorithm. Machine Learning 3 261-283.Google Scholar
  7. Cohen, W.W., 1995, Fast effective rule induction. In Prieditis, A., Russell, S., eds, Proc. of the 12th Int. Conf. on Machine Learning (ICML-95), Tahoe City, CA, Morgan Kaufmann 115-123.Google Scholar
  8. Fawcett, T, (2003), Roc graphs: Notes and practical considerations for data mining researchers. Technical Report HPL-2003-4, HP Labs.Google Scholar
  9. Flach, P, (2003), The geometry of roc space: understanding machine learning metrics through roc isometrics. In: Proc. 20th International Conference on Machine Learning (ICML-03), AAAI Press 194-201.Google Scholar
  10. Freitas, A.A, (2002), Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer-Verlag.Google Scholar
  11. Fürnkranz, J, 1999, Separate-and-conquer rule learning. Artificial Intelligence Review 13(1) 3-54.MATHCrossRefGoogle Scholar
  12. de la Iglesia, B., Debuse, J.C.W., Rayward-Smith, V.J, (1996) Discovering knowledge in commercial databases using modern heuristic techniques. In: Proc. of the 2nd ACM SIGKDD Int. Conf. on Knowledge discovery and data mining (KDD-96), 44-49.Google Scholar
  13. Genetic Programming, (2006)
  14. Koza, J.R, 1992, Genetic Programming: On the Programming of Computers by the means of natural selection. The MIT Press, Massachusetts.MATHGoogle Scholar
  15. Michalski, R.S, (1969), On the quasi-minimal solution of the general covering problem. In: Proc. of the 5th Int. Symposium on Information Processing, Bled, Yugoslavia 125-128.Google Scholar
  16. Mitchell, T, (1997), Machine Learning. Mc Graw Hill.Google Scholar
  17. Naur, P, 1963, Revised report on the algorithmic language algol-60. Communications ACM 6(1) 1-17.Google Scholar
  18. Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J., (1998), UCI Repository of machine learning databases. University of California, Irvine,∼mlearn/MLRepository.html
  19. Pappa, G.L., Freitas, A.A. (2006), Automatically evolving rule induction algorithms. In Fürnkranz, J., Scheffer, T., Spiliopoulou, M., eds, Proc. of the 17th European Conf. on Machine Learning (ECML-06). Volume 4212 of Lecture Notes in Computer Science., Springer Berlin/Heidelberg 341-352.Google Scholar
  20. Pappa, G.L, 2007, Automatically Evolving Rule Induction Algorithms with Grammar-based Genetic Programming. PhD thesis, Computing Laboratory, University of Kent, Cannterbury, UK.Google Scholar
  21. Provost, F., Fawcett, T., Kohavi, R, 1998, The case against accuracy estimation for comparing induction algorithms. In: Proc. of the 15th Int. Conf. on Machine Learning (ICML-98), San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. 445-453.Google Scholar
  22. Quinlan, J.R, (1993), C4.5: programs for machine learning. Morgan Kaufmann. Witten, I.H., Frank, E, (2005), Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. 2nd edn. Morgan Kaufmann.Google Scholar
  23. Zhang, J, 1992, Selecting typical instances in instance-based learning. In: Proc. of the 9th Int. Workshop on Machine learning (ML-92), San Francisco, CA, USA, Morgan Kaufmann 470-479.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Gisele L. Pappa
    • 1
  • Alex A. Freitas
    • 1
  1. 1.Computing LaboratoryUniversity of KentUK

Personalised recommendations