Tree-Based Methods

Aldrich, Chris; Auret, Lidia

doi:10.1007/978-1-4471-5185-2_5

Chris Aldrich^3,4 &
Lidia Auret⁴

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

4366 Accesses

Abstract

In this chapter, tree-based methods are discussed as another of the three major machine learning paradigms considered in the book. This includes the basic information theoretical approach used to construct classification and regression trees and a few simple examples to illustrate the characteristics of decision tree models. Following this is a short introduction to ensemble theory and ensembles of decision trees, leading to random forest models, which are discussed in detail. Unsupervised learning of random forests in particular is reviewed, as these characteristics are potentially important in unsupervised fault diagnostic systems. The interpretation of random forest models includes a discussion on the assessment of the importance of variables in the model, as well as partial dependence analysis to examine the relationship between predictor variables and the response variable. A brief review of boosted trees follows that of random forests, including discussion of concepts, such as gradient boosting and the AdaBoost algorithm. The use of tree-based ensemble models is illustrated by an example on rotogravure printing and the identification of defects in hot rolled steel plate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Binary splitting is considered here; extension to multiple splits is trivial.
2.
The C4.5 algorithm (Quinlan 1993) scales the decrease in impurity for categorical input variables, as a bias favouring multilevel variables exists in the cross-entropy impurity function. This corrected impurity decrease is known as the gain ratio.
3.
See “The Elements of Statistical Learning” (Hastie et al. 2009) for details.
4.
Shi and Horvath (2006) focused on the clustering utility of random forest proximities, a subtle difference to general feature extraction applications. Here, clustering refers to the ability of a feature extraction method to generate projections where known clusters are separate, without using cluster information in training.

References

Amit, Y., & Geman, D. (1997). Shape quantization and recognition with randomized trees. Neural Computation, 9(7), 1545–1588.
Article Google Scholar
Archer, K. J., & Kimes, R. V. (2008). Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis, 52(4), 2249–2260.
Article MathSciNet MATH Google Scholar
Auret, L., & Aldrich, C. (2012). Interpretation of nonlinear relationships between process variables by use of random forests. Minerals Engineering, 35, 27–42.
Article Google Scholar
Belson, W. A. (1959). Matching and prediction on the principle of biological classification. Journal of the Royal Statistical Society Series C (Applied Statistics), 8(2), 65–75.
Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
MathSciNet MATH Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Article MATH Google Scholar
Breiman, L., & Cutler, A. (2003). Manual on setting up, using, and understanding random forests v4.0. ftp://ftp.stat.berkeley.edu/pub/users/breiman/Using_random_forests_v4.0.pdf. Available at: ftp://ftp.stat.berkeley.edu/pub/users/breiman/Using_random_forests_v4.0.pdf. Accessed 30 May 2008.
Breiman, L., Friedman, J. H., Olshen, R., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.
MATH Google Scholar
Cox, T. F., & Cox, M. A. A. (2001). Multidimensional scaling. Boca Raton: Chapman & Hall.
MATH Google Scholar
Cutler, A. (2009). Random forests. In useR! The R User Conference 2009. Available at: http://www.agrocampus-ouest.fr/math/useR-2009/
Cutler, A., & Stevens, J. R. (2006). Random forests for microarrays. In Methods in enzymology; DNA microarrays, Part B: Databases and statistics (pp. 422–432). San Diego: Academic Press.
Google Scholar
Dietterich, T. G. (2000a). An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40(2), 139–157.
Article Google Scholar
Dietterich, T. (2000b). Ensemble methods in machine learning. In Multiple classifier systems (Lecture notes in computer science, pp. 1–15). Berlin/Heidelberg: Springer. Available at: http://dx.doi.org/10.1007/3-540-45014-9_1.
Evans, B., & Fisher, D. (1994). Overcoming process delays with decision tree induction. IEEE Expert, 9(1), 60–66.
Article Google Scholar
Frank, A., & Asuncion, A. (2010). UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. Available at: http://archive.ics.uci.edu/ml
Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Machine Learning. Proceedings of the Thirteenth International Conference (ICML’96)| (pp.148–156|558).
Google Scholar
Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Article MathSciNet MATH Google Scholar
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.
Article MathSciNet MATH Google Scholar
Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, 38(4), 367–378.
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. The Annals of Statistics, 28(2), 337–374.
Article MathSciNet MATH Google Scholar
Gillo, M. W., & Shelly, M. W. (1974). Predictive modeling of multivariable and multivariate data. Journal of the American Statistical Association, 69(347), 646–653.
Article MATH Google Scholar
Hansen, L., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10), 993–1001.
Article Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning – Data mining, inference and prediction. New York: Springer.
Book MATH Google Scholar
Ho, T. K. (1995). Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (pp. 278–282). ICDAR1995. Montreal: IEEE Computer Society.
Google Scholar
Izenman, A. (2008). Modern multivariate statistical techniques: Regression, classification, and manifold learning. New York/London: Springer.
Book Google Scholar
Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Journal of the Royal Statistical Society Series C (Applied Statistics), 29(2), 119–127.
Google Scholar
Messenger, R., & Mandell, L. (1972). A modal search technique for predictive nominal scale multivariate analysis. Journal of the American Statistical Association, 67(340), 768–772.
Google Scholar
Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American Statistical Association, 58(302), 415–434.
Article MATH Google Scholar
Nicodemus, K. K., & Malley, J. D. (2009). Predictor correlation impacts machine learning algorithms: Implications for genomic studies. Bioinformatics, 25(15), 1884–1890.
Article Google Scholar
Polikar, R. (2006). Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE, 6(3), 21–45.
Article Google Scholar
Quinlan, J. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Google Scholar
Quinlan, R. (1993). C4.5: Programs for machine learning. Palo Alto: Morgan Kaufmann.
Google Scholar
Ratsch, G., Onoda, T., & Muller, K. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320.
Article Google Scholar
RuleQuest Research. (2011). Data mining tools See5 and C5.0. Information on See5/C5.0. Available at: http://www.rulequest.com/see5-info.html. Accessed 10 Feb 2011.
Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5), 401–409.
Article Google Scholar
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.
Google Scholar
Schapire, R., Freund, Y., Bartlett, P., & Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686.
Article MathSciNet MATH Google Scholar
Shi, T., & Horvath, S. (2006). Unsupervised learning with random forest predictors. Journal of Computational and Graphical Statistics, 15(1), 118–138.
Article MathSciNet Google Scholar
Strobl, C., Boulesteix, A., Kneib, T., Augustin, T., & Zeileis, A. (2008). Conditional variable importance for random forests. BMC Bioinformatics, 9(1), 307–317.
Article Google Scholar
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323–348.
Article Google Scholar
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27(11), 1134–1142.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Western Australian School of Mines, Curtin University, Perth, WA, Australia
Chris Aldrich
Department of Process Engineering, University of Stellenbosch, Stellenbosch, South Africa
Chris Aldrich & Lidia Auret

Authors

Chris Aldrich
View author publications
You can also search for this author in PubMed Google Scholar
Lidia Auret
View author publications
You can also search for this author in PubMed Google Scholar

Nomenclature

Symbol	Description
i(η)	Impurity measure of node η in a classification tree
p(k\|η)	Proportion of samples in class k at node η in a classification tree
C	Number of classes in a classification problem
Δi(ς, η)	Decrease in impurity for a candidate split position ς at node η
ς	Split point index in a decision tree
ς*	Optimal split point index in a decision tree
η	Node index variable
	Input space
N	Sample size
p _R	Proportion of samples reporting to the right descendent node after splitting in a classification tree
p _L	Proportion of samples reporting to the left descendent node after splitting in a classification tree
c _R	Prediction in right descendent node of a regression tree after splitting
c _L	Prediction in left descendent node of a regression tree after splitting
η _R	Index of right descendent node
η _L	Index of left descendent node
X ^k	kth bootstrap sample of learning data set X
T	Set of ensemble trees
t _k	kth tree in an ensemble of trees
K	Number of trees in an ensemble of classification or regression trees
t _k(⋅)	Prediction of kth tree in an ensemble of classification or regression trees
m	Number of variables considered at each split point in a random forest tree
M	Total number of input variables
\( \mathbf{X}_{{\rm OOB}(j)}^k \)	Out-of-bag (OOB) input learning data for the kth tree in an ensemble of trees with variable j permuted
\( \mathbf{y}_{\rm OOB}^k \)	Out-of-bag (OOB) output learning data for the kth tree in an ensemble of trees
\( {\omega_j}({{t_k}}) \)	Importance measure for jth variable in kth tree in an ensemble of trees (random forest)
\( {\omega_j} \)	Importance measure for jth variable in an ensemble of trees (random forest)
\( {{\mathbf{X}}_S} \)	Subset of variables in X
X _C	Subset of variables in X complementary to \( {X_S} \)
\( {X_{i,C }} \)	Values of samples in X _C
\( \bar{f}({{{\mathbf{X}}_S}}) \)	Partial dependence of a predicted response to the subset of variables in \( {{\mathbf{X}}_S} \)
b _j	jth scalar calculation point
S	Proximity matrix
D	Dissimilarity matrix
g	Unknown data density
g ₀	Reference distribution
X ⁰	Synthetic data set obtained by random sampling from the product of marginal distributions in X
Z	Concatenated matrix
T	Scaling coordinate features
β	K × 1 weighting vector of trees in a boosted tree ensemble
w	N × 1 weighting vector of samples in a boosted tree ensemble
\( {\epsilon_k} \)	Ensemble error
\( {\beta_k} \)	Weight of kth tree in boosted tree ensemble
W _k	Normalizing constant
\( F(\mathbf{x}) \)	Output of ensemble of boosted classification or regression trees
L(y,f(x))	Loss function of a classifier or regressor
g _k(x)	Gradient at x after at kth iteration
ρ _k	Optimization search step size at kth iteration
θ _k	Parameters of kth model in an ensemble
i _w(η)	Weighted cross-entropy of node η
\( Q\left( {k\|\eta } \right) \)	Sum of weights of samples in node η labelled as class k
W(η)	Sum of all sample weights present in node η
\( {{{\mathrm{X}}^{\prime}}_{\mathrm{t}}} \)	Matrix of time series column vectors with mean centred columns
\( {{\widehat{\mathbf{X}}}_i} \)	ith of d
\( {{\tilde{\mathbf{X}}}_i} \)	ith trajectory matrix
\( \rho_{p,q}^{(w) } \)	Weighted or w-correlation between time series p and q
\( \rho_{\max}^{(L,K) } \)	Maximum of the absolute value of the correlations between the rows and between the columns of a pair of trajectory matrices \( {{\tilde{\mathbf{X}}}_i} \) and \( {{\tilde{\mathbf{X}}}_j} \)
\( \mathcal{N}(a,b) \)	Normal distribution with mean a and standard deviation b
\( \mathbf{u}(t) \)	Input vector at time t
\( \mathbf{y}(t) \)	Vector of measured variables at time t
\( \mathbf{v}(\mathrm{t}) \)	Gaussian noise with variance 0.01
\( \mathbf{w}(t) \)	Gaussian noise with variance 0.1

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aldrich, C., Auret, L. (2013). Tree-Based Methods. In: Unsupervised Process Monitoring and Fault Diagnosis with Machine Learning Methods. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5185-2_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5185-2_5
Published: 17 May 2013
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5184-5
Online ISBN: 978-1-4471-5185-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Tree-Based Methods

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Nomenclature

Nomenclature

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation