Skip to main content

HEAD-DT: Automatic Design of Decision-Tree Algorithms

  • Chapter
  • First Online:

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

As presented in Chap. 2, for the past 40 years researchers have attempted to improve decision-tree induction algorithms, either by proposing new splitting criteria for internal nodes, by investigating pruning strategies for avoiding overfitting, by testing new approaches for dealing with missing values, or even by searching for alternatives to the top-down greedy induction. Each new decision-tree induction algorithm presents some (or many) of these strategies, which are chosen in order to maximize performance in empirical analyses. Nevertheless, the number of different strategies for the several components of a decision-tree algorithm is so vast after these 40 years of research that it would be impracticable for a human being to test all possibilities with the purpose of achieving the best performance in a given data set (or in a set of data sets). Hence, we pose two questions for researchers in the area: “is it possible to automate the design of decision-tree induction algorithms?”, and, if so, “how can we automate the design of a decision-tree induction algorithm?” The answer for these questions arose with the pioneering work of Pappa and Freitas [30], which proposed the automatic design of rule induction algorithms through an evolutionary algorithm. The authors proposed the use of a grammar-based GP algorithm for building and evolving individuals which are, in fact, rule induction algorithms. That approach successfully employs EAs to evolve a generic rule induction algorithm, which can then be applied to solve many different classification problems, instead of evolving a specific set of rules tailored to a particular data set. As presented in Chap. 3, in the area of optimisation this type of approach is named hyper-heuristics (HHs) [5, 6]. HHs are search methods for automatically selecting and combining simpler heuristics, resulting in a generic heuristic that is used to solve any instance of a given optimisation problem. For instance, a HH can generate a generic heuristic for solving any instance of the timetabling problem (i.e., allocation of any number of resources subject to any set of constraints in any schedule configuration) whilst a conventional EA would just evolve a solution to one particular instance of the timetabling problem (i.e., a predefined set of resources and constraints in a given schedule configuration). In this chapter, we present a hyper-heuristic strategy for automatically designing decision-tree induction algorithms, namely HEAD-DT (Hyper-Heuristic Evolutionary Algorithm for Automatically Designing Decision-Tree Algorithms). Section 4.1 introduces HEAD-DT and its evolutionary scheme. Section 4.2 presents the individual representation adopted by HEAD-DT to evolve decision-tree algorithms, as well as information regarding each individual’s gene. Section 4.3 shows the evolutionary cycle of HEAD-DT, detailing its genetic operators. Section 4.4 depicts the fitness evaluation process in HEAD-DT, and introduces two possible frameworks for executing HEAD-DT. Section 4.5 computes the total size of the search space that HEAD-DT is capable of traversing, whereas Sect. 4.6 discusses related work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   44.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. R.C. Barros, D.D. Ruiz, M.P. Basgalupp, Evolutionary model trees for handling continuous classes in machine learning. Inf. Sci. 181, 954–971 (2011)

    Article  Google Scholar 

  2. R.C. Barros et al., Towards the automatic design of decision tree induction algorithm, in 13th Annual Conference Companion on Genetic and Evolutionary Computation (GECCO 2011). pp. 567–574 (2011)

    Google Scholar 

  3. R.C. Barros et al., A survey of evolutionary algorithms for decision-tree induction. IEEE Trans. Syst., Man, Cybern., Part C: Appl. Rev. 42(3), 291–312 (2012)

    Article  MathSciNet  Google Scholar 

  4. L. Breiman et al., Classification and Regression Trees (Wadsworth, Belmont, 1984)

    MATH  Google Scholar 

  5. E. Burke, S. Petrovic, Recent research directions in automated timetabling. Eur. J. Oper. Res. 140(2), 266–280 (2002)

    Article  MATH  Google Scholar 

  6. E.K. Burke, G. Kendall, E. Soubeiga, A tabu-search hyperheuristic for timetabling and rostering. J. Heuristics 9(6), 451–470 (2003)

    Article  Google Scholar 

  7. E.K. Burke et al., A Classification of Hyper-heuristics Approaches, in Handbook of Metaheuristics, 2nd edn., International Series in Operations Research & Management Science, ed. by M. Gendreau, J.-Y. Potvin (Springer, Berlin, 2010), pp. 449–468

    Chapter  Google Scholar 

  8. B. Cestnik, I. Bratko, On Estimating Probabilities in Tree Pruning, Machine learning-EWSL-91. Vol. 482. Lecture Notes in Computer Science (Springer, Berlin, 1991)

    Google Scholar 

  9. B. Chandra, P.P. Varghese, Moving towards efficient decision tree construction. Inf. Sci. 179(8), 1059–1069 (2009)

    Article  MATH  Google Scholar 

  10. B. Chandra, R. Kothari, P. Paul, A new node splitting measure for decision tree construction. Pattern Recognit. 43(8), 2725–2731 (2010)

    Article  MATH  Google Scholar 

  11. J. Ching, A. Wong, K. Chan, Class-dependent discretization for inductive learning from continuous and mixed-mode data. IEEE Trans. Pattern Anal. Mach. Intell. 17(7), 641–651 (1995)

    Article  Google Scholar 

  12. P. Clark, T. Niblett, The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)

    Google Scholar 

  13. B. Delibasic et al., Component-based decision trees for classification. Intell. Data Anal. 15(5), 1–38 (2011)

    Google Scholar 

  14. F. Esposito, D. Malerba, G. Semeraro, A comparative analysis of methods for pruning decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19(5), 476–491 (1997)

    Article  Google Scholar 

  15. U. Fayyad, K. Irani, The attribute selection problem in decision tree generation, in National Conference on Artificial Intelligence. pp. 104–110 (1992)

    Google Scholar 

  16. A. Frank, A. Asuncion, UCI Machine Learning Repository (2010)

    Google Scholar 

  17. J.H. Friedman, A recursive partitioning decision rule for nonparametric classification. IEEE Trans. Comput. 100(4), 404–408 (1977)

    Article  Google Scholar 

  18. M. Gleser, M. Collen, Towards automated medical decisions. Comput. Biomed. Res. 5(2), 180–189 (1972)

    Article  Google Scholar 

  19. T. Ho, M. Basu, Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell. 24(3), 289–300 (2002)

    Article  Google Scholar 

  20. T. Ho, M. Basu, M. Law, Measures of Geometrical Complexity in Classification Problems, Data Complexity in Pattern Recognition (Springer, London, 2006)

    Google Scholar 

  21. B. Jun et al., A new criterion in selection and discretization of attributes for the generation of decision trees. IEEE Trans. Pattern Anal. Mach. Intell. 19(2), 1371–1375 (1997)

    Article  Google Scholar 

  22. I. Kononenko, I. Bratko, E. Roskar, Experiments in automatic learning of medical diagnostic rules. Tech. rep. Ljubljana, Yugoslavia: Jozef Stefan Institute (1984)

    Google Scholar 

  23. W. Loh, Y. Shih, Split selection methods for classification trees. Stat. Sinica 7, 815–840 (1997)

    MATH  MathSciNet  Google Scholar 

  24. R.L. De Mántaras, A Distance-Based Attribute Selection Measure for Decision Tree Induction, Machine learning 6.1 (Kluwer, The Netherland, 1991). ISSN: 0885–6125

    Google Scholar 

  25. J. Martin, An exact probability metric for decision tree splitting and stopping. Mach. Learn. 28(2), 257–291 (1997)

    Article  Google Scholar 

  26. J. Mingers, Expert systems—rule induction with statistical data. J. Oper. Res. Soc. 38, 39–47 (1987)

    Google Scholar 

  27. J. Mingers, An empirical comparison of selection measures for decision-tree induction. Mach. Learn. 3(4), 319–342 (1989)

    Google Scholar 

  28. T. Niblett, I. Bratko, Learning decision rules in noisy domains, in 6th Annual Technical Conference on Research and Development in Expert Systems III. pp. 25–34 (1986)

    Google Scholar 

  29. G.L. Pappa, Automatically Evolving Rule Induction Algorithms with Grammar-Based Genetic Programming. PhD thesis. University of Kent at Canterbury (2007)

    Google Scholar 

  30. G.L. Pappa, A.A. Freitas, Automating the Design of Data Mining Algorithms: An Evolutionary Computation Approach (Springer Publishing Company, Incorporated, 2009)

    Google Scholar 

  31. J.R. Quinlan, Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  32. J.R. Quinlan, Decision trees as probabilistic classifiers, in 4th International Workshop on Machine Learning (1987)

    Google Scholar 

  33. J.R. Quinlan, Simplifying decision trees. Int. J. Man-Mach. Stud. 27, 221–234 (1987)

    Article  Google Scholar 

  34. J.R. Quinlan, Unknown attribute values in induction, in 6th International Workshop on Machine Learning. pp. 164–168 (1989)

    Google Scholar 

  35. J. R. Quinlan, C4.5: programs for machine learning. San Francisco: Morgan Kaufmann (1993). ISBN: 1-55860-238-0

    Google Scholar 

  36. C.E. Shannon, A mathematical theory of communication. BELL Syst. Tech. J. 27(1), 379–423, 625–56 (1948)

    Google Scholar 

  37. P.C. Taylor, B.W. Silverman, Block diagrams and splitting criteria for classification trees. Stat. Comput. 3, 147–161 (1993)

    Article  Google Scholar 

  38. A. Vella, D. Corne, C. Murphy, Hyper-heuristic decision tree induction, in World Congress on Nature and Biologically Inspired Computing, pp. 409–414 (2010)

    Google Scholar 

  39. I.H. Witten, E. Frank, Data mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann. ISBN: 1558605525 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rodrigo C. Barros .

Rights and permissions

Reprints and permissions

Copyright information

© 2015 The Author(s)

About this chapter

Cite this chapter

Barros, R.C., de Carvalho, A.C.P.L.F., Freitas, A.A. (2015). HEAD-DT: Automatic Design of Decision-Tree Algorithms. In: Automatic Design of Decision-Tree Induction Algorithms. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-14231-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14231-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14230-2

  • Online ISBN: 978-3-319-14231-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics