Tree-Based Methods

  • Chris Aldrich
  • Lidia Auret
Part of the Advances in Computer Vision and Pattern Recognition book series (ACVPR)


In this chapter, tree-based methods are discussed as another of the three major machine learning paradigms considered in the book. This includes the basic information theoretical approach used to construct classification and regression trees and a few simple examples to illustrate the characteristics of decision tree models. Following this is a short introduction to ensemble theory and ensembles of decision trees, leading to random forest models, which are discussed in detail. Unsupervised learning of random forests in particular is reviewed, as these characteristics are potentially important in unsupervised fault diagnostic systems. The interpretation of random forest models includes a discussion on the assessment of the importance of variables in the model, as well as partial dependence analysis to examine the relationship between predictor variables and the response variable. A brief review of boosted trees follows that of random forests, including discussion of concepts, such as gradient boosting and the AdaBoost algorithm. The use of tree-based ensemble models is illustrated by an example on rotogravure printing and the identification of defects in hot rolled steel plate.


Random Forest Ensemble Member Random Forest Model Decision Tree Algorithm AdaBoost Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation, 9(7), 1545–1588.CrossRefGoogle Scholar
  2. Archer, K. J., & Kimes, R. V. (2008). Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis, 52(4), 2249–2260.MathSciNetMATHCrossRefGoogle Scholar
  3. Auret, L., & Aldrich, C. (2012). Interpretation of nonlinear relationships between process variables by use of random forests. Minerals Engineering, 35, 27–42.CrossRefGoogle Scholar
  4. Belson, W. A. (1959). Matching and prediction on the principle of biological classification. Journal of the Royal Statistical Society Series C (Applied Statistics), 8(2), 65–75.Google Scholar
  5. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.MathSciNetMATHGoogle Scholar
  6. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.MATHCrossRefGoogle Scholar
  7. Breiman, L., & Cutler, A. (2003). Manual on setting up, using, and understanding random forests v4.0. Available at: Accessed 30 May 2008.
  8. Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.MATHGoogle Scholar
  9. Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling. Boca Raton: Chapman & Hall.MATHGoogle Scholar
  10. Cutler, A. (2009). Random forests. In useR! The R User Conference 2009. Available at:
  11. Cutler, A., & Stevens, J. R. (2006). Random forests for microarrays. In Methods in enzymology; DNA microarrays, Part B: Databases and statistics (pp. 422–432). San Diego: Academic Press.Google Scholar
  12. Dietterich, T. G. (2000a). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139–157.CrossRefGoogle Scholar
  13. Dietterich, T. (2000b). Ensemble methods in machine learning. In Multiple classifier systems (Lecture notes in computer science, pp. 1–15). Berlin/Heidelberg: Springer. Available at:
  14. Evans, B., & Fisher, D. (1994). Overcoming process delays with decision tree induction. IEEE Expert, 9(1), 60–66.CrossRefGoogle Scholar
  15. Frank, A., & Asuncion, A. (2010). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. Available at:
  16. Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning. Proceedings of the Thirteenth International Conference (ICML’96)| (pp.148–156|558).Google Scholar
  17. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.MathSciNetMATHCrossRefGoogle Scholar
  18. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.MathSciNetMATHCrossRefGoogle Scholar
  19. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378.MathSciNetMATHCrossRefGoogle Scholar
  20. Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2), 337–374.MathSciNetMATHCrossRefGoogle Scholar
  21. Gillo, M. W., & Shelly, M. W. (1974). Predictive modeling of multivariable and multivariate data. Journal of the American Statistical Association, 69(347), 646–653.MATHCrossRefGoogle Scholar
  22. Hansen, L., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993–1001.CrossRefGoogle Scholar
  23. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning – Data mining, inference and prediction. New York: Springer.MATHCrossRefGoogle Scholar
  24. Ho, T. K. (1995). Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (pp. 278–282). ICDAR1995. Montreal: IEEE Computer Society.Google Scholar
  25. Izenman, A. (2008). Modern multivariate statistical techniques: Regression, classification, and manifold learning. New York/London: Springer.CrossRefGoogle Scholar
  26. Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society Series C (Applied Statistics), 29(2), 119–127.Google Scholar
  27. Messenger, R., & Mandell, L. (1972). A modal search technique for predictive nominal scale multivariate analysis. Journal of the American Statistical Association, 67(340), 768–772.Google Scholar
  28. Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58(302), 415–434.MATHCrossRefGoogle Scholar
  29. Nicodemus, K. K., & Malley, J. D. (2009). Predictor correlation impacts machine learning algorithms: Implications for genomic studies. Bioinformatics, 25(15), 1884–1890.CrossRefGoogle Scholar
  30. Polikar, R. (2006). Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE, 6(3), 21–45.CrossRefGoogle Scholar
  31. Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.Google Scholar
  32. Quinlan, R. (1993). C4.5: Programs for machine learning. Palo Alto: Morgan Kaufmann.Google Scholar
  33. Ratsch, G., Onoda, T., & Muller, K. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320.CrossRefGoogle Scholar
  34. RuleQuest Research. (2011). Data mining tools See5 and C5.0. Information on See5/C5.0. Available at: Accessed 10 Feb 2011.
  35. Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5), 401–409.CrossRefGoogle Scholar
  36. Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.Google Scholar
  37. Schapire, R., Freund, Y., Bartlett, P., & Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686.MathSciNetMATHCrossRefGoogle Scholar
  38. Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1), 118–138.MathSciNetCrossRefGoogle Scholar
  39. Strobl, C., Boulesteix, A., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9(1), 307–317.CrossRefGoogle Scholar
  40. Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323–348.CrossRefGoogle Scholar
  41. Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.MATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag London 2013

Authors and Affiliations

  • Chris Aldrich
    • 1
    • 2
  • Lidia Auret
    • 2
  1. 1.Western Australian School of MinesCurtin UniversityPerthAustralia
  2. 2.Department of Process EngineeringUniversity of StellenboschStellenboschSouth Africa

Personalised recommendations