Unsupervised Process Monitoring and Fault Diagnosis with Machine Learning Methods pp 183-220 | Cite as

# Tree-Based Methods

## Abstract

In this chapter, tree-based methods are discussed as another of the three major machine learning paradigms considered in the book. This includes the basic information theoretical approach used to construct classification and regression trees and a few simple examples to illustrate the characteristics of decision tree models. Following this is a short introduction to ensemble theory and ensembles of decision trees, leading to random forest models, which are discussed in detail. Unsupervised learning of random forests in particular is reviewed, as these characteristics are potentially important in unsupervised fault diagnostic systems. The interpretation of random forest models includes a discussion on the assessment of the importance of variables in the model, as well as partial dependence analysis to examine the relationship between predictor variables and the response variable. A brief review of boosted trees follows that of random forests, including discussion of concepts, such as gradient boosting and the AdaBoost algorithm. The use of tree-based ensemble models is illustrated by an example on rotogravure printing and the identification of defects in hot rolled steel plate.

### Keywords

Entropy Manifold Expense Sammon Auret### References

- Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees.
*Neural Computation, 9*(7), 1545–1588.CrossRefGoogle Scholar - Archer, K. J., & Kimes, R. V. (2008). Empirical characterization of random forest variable importance measures.
*Computational Statistics & Data Analysis, 52*(4), 2249–2260.MathSciNetMATHCrossRefGoogle Scholar - Auret, L., & Aldrich, C. (2012). Interpretation of nonlinear relationships between process variables by use of random forests.
*Minerals Engineering, 35*, 27–42.CrossRefGoogle Scholar - Belson, W. A. (1959). Matching and prediction on the principle of biological classification.
*Journal of the Royal Statistical Society Series C (Applied Statistics), 8*(2), 65–75.Google Scholar - Breiman, L. (1996). Bagging predictors.
*Machine Learning, 24*(2), 123–140.MathSciNetMATHGoogle Scholar - Breiman, L. (2001). Random forests.
*Machine Learning, 45*(1), 5–32.MATHCrossRefGoogle Scholar - Breiman, L., & Cutler, A. (2003).
*Manual on setting up, using, and understanding random forests v4.0*. ftp://ftp.stat.berkeley.edu/pub/users/breiman/Using_random_forests_v4.0.pdf. Available at: ftp://ftp.stat.berkeley.edu/pub/users/breiman/Using_random_forests_v4.0.pdf. Accessed 30 May 2008. - Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. J. (1984).
*Classification and regression trees*. Belmont: Wadsworth.MATHGoogle Scholar - Cox, T. F., & Cox, M. A. A. (2001).
*Multidimensional scaling*. Boca Raton: Chapman & Hall.MATHGoogle Scholar - Cutler, A. (2009). Random forests. In
*useR! The R User Conference 2009*. Available at: http://www.agrocampus-ouest.fr/math/useR-2009/ - Cutler, A., & Stevens, J. R. (2006). Random forests for microarrays. In
*Methods in enzymology; DNA microarrays, Part B: Databases and statistics*(pp. 422–432). San Diego: Academic Press.Google Scholar - Dietterich, T. G. (2000a). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization.
*Machine Learning, 40*(2), 139–157.CrossRefGoogle Scholar - Dietterich, T. (2000b). Ensemble methods in machine learning. In
*Multiple classifier systems*(Lecture notes in computer science, pp. 1–15). Berlin/Heidelberg: Springer. Available at: http://dx.doi.org/10.1007/3-540-45014-9_1. - Evans, B., & Fisher, D. (1994). Overcoming process delays with decision tree induction.
*IEEE Expert, 9*(1), 60–66.CrossRefGoogle Scholar - Frank, A., & Asuncion, A. (2010).
*UCI machine learning repository*. University of California, Irvine, School of Information and Computer Sciences. Available at: http://archive.ics.uci.edu/ml - Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In
*Machine Learning. Proceedings of the Thirteenth International Conference (ICML’96)|*(pp.148–156|558).Google Scholar - Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting.
*Journal of Computer and System Sciences, 55*(1), 119–139.MathSciNetMATHCrossRefGoogle Scholar - Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine.
*The Annals of Statistics, 29*(5), 1189–1232.MathSciNetMATHCrossRefGoogle Scholar - Friedman, J. H. (2002). Stochastic gradient boosting.
*Computational Statistics & Data Analysis, 38*(4), 367–378.MathSciNetMATHCrossRefGoogle Scholar - Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting.
*The Annals of Statistics, 28*(2), 337–374.MathSciNetMATHCrossRefGoogle Scholar - Gillo, M. W., & Shelly, M. W. (1974). Predictive modeling of multivariable and multivariate data.
*Journal of the American Statistical Association, 69*(347), 646–653.MATHCrossRefGoogle Scholar - Hansen, L., & Salamon, P. (1990). Neural network ensembles.
*IEEE Transactions on Pattern Analysis and Machine Intelligence, 12*(10), 993–1001.CrossRefGoogle Scholar - Hastie, T., Tibshirani, R., & Friedman, J. (2009).
*The elements of statistical learning – Data mining, inference and prediction*. New York: Springer.MATHCrossRefGoogle Scholar - Ho, T. K. (1995). Random decision forests. In
*Proceedings of the Third International Conference on Document Analysis and Recognition*(pp. 278–282)*.*ICDAR1995. Montreal: IEEE Computer Society.Google Scholar - Izenman, A. (2008).
*Modern multivariate statistical techniques: Regression, classification, and manifold learning*. New York/London: Springer.CrossRefGoogle Scholar - Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data.
*Journal of the Royal Statistical Society Series C (Applied Statistics), 29*(2), 119–127.Google Scholar - Messenger, R., & Mandell, L. (1972). A modal search technique for predictive nominal scale multivariate analysis.
*Journal of the American Statistical Association, 67*(340), 768–772.Google Scholar - Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal.
*Journal of the American Statistical Association, 58*(302), 415–434.MATHCrossRefGoogle Scholar - Nicodemus, K. K., & Malley, J. D. (2009). Predictor correlation impacts machine learning algorithms: Implications for genomic studies.
*Bioinformatics, 25*(15), 1884–1890.CrossRefGoogle Scholar - Polikar, R. (2006). Ensemble based systems in decision making.
*Circuits and Systems Magazine, IEEE, 6*(3), 21–45.CrossRefGoogle Scholar - Quinlan, J. (1986). Induction of decision trees.
*Machine Learning, 1*(1), 81–106.Google Scholar - Quinlan, R. (1993).
*C4.5: Programs for machine learning*. Palo Alto: Morgan Kaufmann.Google Scholar - Ratsch, G., Onoda, T., & Muller, K. (2001). Soft margins for AdaBoost.
*Machine Learning, 42*(3), 287–320.CrossRefGoogle Scholar - RuleQuest Research. (2011). Data mining tools See5 and C5.0.
*Information on See5/C5.0*. Available at: http://www.rulequest.com/see5-info.html. Accessed 10 Feb 2011. - Sammon, J. W. (1969). A nonlinear mapping for data structure analysis.
*IEEE Transactions on Computers, C-18*(5), 401–409.CrossRefGoogle Scholar - Schapire, R. E. (1990). The strength of weak learnability.
*Machine Learning, 5*(2), 197–227.Google Scholar - Schapire, R., Freund, Y., Bartlett, P., & Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods.
*The Annals of Statistics, 26*(5), 1651–1686.MathSciNetMATHCrossRefGoogle Scholar - Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors.
*Journal of Computational and Graphical Statistics, 15*(1), 118–138.MathSciNetCrossRefGoogle Scholar - Strobl, C., Boulesteix, A., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests.
*BMC Bioinformatics, 9*(1), 307–317.CrossRefGoogle Scholar - Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.
*Psychological Methods, 14*(4), 323–348.CrossRefGoogle Scholar - Valiant, L. G. (1984). A theory of the learnable.
*Communications of the ACM, 27*(11), 1134–1142.MATHCrossRefGoogle Scholar