Classifcation and Regression Trees
Classification and regression trees (CARTs) are an approach to discovering relationships among a large number of independent (predictor) variables and a categorical or continuous trait. Classification trees are applied to categorical outcomes, while regression trees apply to continuous traits. Both involve the application of a recursive algorithm that aims to partition individuals into groups in a way that minimizes the within-group heterogeneity. CART was originally described by Breiman et al. (1993) and has gained popularity in recent years as a method for identifying structure in high-dimensional data settings. In the following sections, we begin by describing methods for constructing a tree. This involves defining a measure of heterogeneity, or what is commonly referred to as node impurity, as well as determining how predictor variables are input into the model. Both of these components will impact the resulting tree and need to be considered and defined carefully to reect the scientific questions at hand. We then describe methods for refining this tree to arrive at a final reproducible model. Further discussions of CART methods can be found in Breiman et al. (1993) and Zhang and Singer (1999). In Chapter 7, we describe extensions of the CART model, including random forests and logic regression trees that offer some additional advantages.
KeywordsRegression Tree Terminal Node Gini Index Multifactor Dimensionality Reduction Binary Trait
Unable to display preview. Download preview PDF.