Comparing a Variety of Evolutionary Algorithm Techniques on a Collection of Rule Induction Tasks
Induction of useful rules from databases has been studied by several researchers. There remains need for systematic comparison of alternative such methods, especially considering the available variety of rule representation strategies, genetic operators, evolutionary algorithm designs, and so forth. Here, the performance of five commonly employed evolutionary algorithms are examined on a collection of 100 separate rule induction tasks on five freely available datasets. All tasks require the generation of rules in disjunctive normal form with either a fixed or free consequent maximising an accuracy/applicability tradeoff measure; tasks differ in terms of the dataset used, the identity of a fixed consequent (or no fixed consequent), and the maximum number of disjuncts allowed in the antecedent. Results generally indicate that single-member based methods (hill climbing, simulated annealing, tabu search) fare at least as well as population based techniques when rules are restricted to fairly low complexity, but this situation is reversed as rules are allowed to be more complex. These results are of import to data mining application developers and researchers wishing to find the appropriate search strategy for rule induction with respect to their particular needs.
KeywordsTabu Search Hill Climbing Rule Induction Disjunctive Normal Form Rule Complexity
Unable to display preview. Download preview PDF.
- M.W. Carter, G. Laporte, and S.Y. Lee. Examination timetabling: Algorithmic strategies and applications. Operational Research Society, 47(3):373–383, 1996.Google Scholar
- B. Iglesia, J.C.M. Debuse, and V.J. Rayward-Smith. Discovering knowledge in commercial database using modern heuristic techniques. Technical report, Department of Information Systems, University of East Anglia, 1996.Google Scholar
- M. Klemettinen, H. Mannila, P. Roukainen, M. Toivonen, and I. Verkamo. Finding interesting rules from large sets of discovered association rules. In N. Adam, B. Bhargava, and Y. Tesha, editors, Third International Conference on Information and Knowledge Management (CIKM94), pages 401–407. ACM Press, 1994.Google Scholar
- H. Mannila, M. Toivonen, and I. Verkamo. Efficient algorithms for discovering association rules. In Fayyad and Uthurusamy, editors, Knowledge Discovery in Databases, pages 181–192. AAAI Press, 1994.Google Scholar
- M. Hosheimer and A.P.J.M. Siebes. Data mining: the search for knowledge in databases. Technical Report Report CR-R9406, CWI, The Netherlands, 1994. (available via:ftp://ftp.cwi.nl/pub/CWIreports/AA/CSR9406.ps.Z).Google Scholar
- R. Quinlan. C4.5 Programs for Machine Learning. Morgan Kauffman, San Mateo, CA, 1993.Google Scholar
- N.J. Radcliffe. GA-miner: Parallel data mining with hierarchical genetic algorithms. Technical Report GR/J99278, EPSRC AIKMS Grant, 1996.Google Scholar
- N.J. Radcliffe and P.D. Surry. Co-operation through hierarchical competition in genetic data mining. Technical Report EPCC-TR94-09, EPCC, 1994.Google Scholar
- C. Voudrais and E. Tsang. Partial constraint satisfaction problems and guided local search. Technical Report TR CSM-250, Dept. Computer Science, University of Essex, 1995.Google Scholar
- D. Whitley. The GENITOR algorithms and selection pressure. In J.D. Schaffer, editor, The Proceedings of the Third International Conference on Genetic Algorithms, San Mateo, 1989. Morgan Kaufmann.Google Scholar