# HEAD-DT: Fitness Function Analysis

## Abstract

In Chap. 4, more specifically in Sect. 4.4, we saw that the definition of a fitness function for the scenario in which HEAD-DT evolves a decision-tree algorithm from multiple data sets is an interesting and relevant problem. In the experiments presented in Chap. 5, Sect. 5.2, we employed a simple average over the F-Measure obtained in the data sets that belong to the meta-training set. As previously observed, when evolving an algorithm from multiple data sets, each individual of HEAD-DT has to be executed over each data set in the meta-training set. Hence, instead of obtaining a single value of predictive performance, each individual scores a set of values that have to be eventually combined into a single measure. In this chapter, we analyse in more detail the impact of different strategies to be used as fitness function during the evolutionary cycle of HEAD-DT. We divide the experimental scheme into two distinct scenarios: (i) evolving a decision-tree induction algorithm from multiple balanced data sets; and (ii) evolving a decision-tree induction algorithm from multiple imbalanced data sets. In each of these scenarios, we analyse the difference in performance of well-known performance measures such as accuracy, F-Measure, AUC, recall, and also a lesser-known criterion, namely the relative accuracy improvement. In addition, we analyse different schemes of aggregation, such as simple average, median, and harmonic mean.

## Keywords

Fitness functions Performance measures Evaluation schemes## References

- 1.T. Fawcett, An introduction to ROC analysis. Pattern Recognit. Lett.
**27**(8), 861–874 (2006)CrossRefMathSciNetGoogle Scholar - 2.C. Ferri, J. Hernández-Orallo, R. Modroiu, An experimental comparison of performance measures for classification. Pattern Recognit. Lett.
**30**(1), 27–38 (2009)CrossRefGoogle Scholar - 3.B. Hanczar et al., Small-sample precision of ROC-related estimates. Bioinformatics
**26**(6), 822–830 (2010)CrossRefGoogle Scholar - 4.D.J. Hand, Measuring classifier performance: a coherent alternative to the area under the ROC curve. Mach. Learn.
**77**(1), 103–123 (2009)CrossRefGoogle Scholar - 5.J.M. Lobo, A. Jiménez-Valverde, R. Real, AUC: a misleading measure of the performance of predictive distribution models. Glob. Ecol. Biogeogr.
**17**(2), 145–151 (2008)CrossRefGoogle Scholar - 6.S.J. Mason, N.E. Graham, Areas beneath the relative operating characteristics (roc) and relative operating levels (rol) curves: statistical significance and interpretation. Q. J. R. Meteorol. Soc.
**128**(584), 2145–2166 (2002)CrossRefGoogle Scholar - 7.G.L. Pappa, Automatically evolving rule induction algorithms with grammar-based genetic programming, Ph.D. thesis. University of Kent at Canterbury (2007)Google Scholar
- 8.D. Powers, Evaluation: From precision, recall and f-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol.
**2**(1), 37–63 (2011)MathSciNetGoogle Scholar