Abstract
Classification and regression trees (CARTs) are an approach to discovering relationships among a large number of independent (predictor) variables and a categorical or continuous trait. Classification trees are applied to categorical outcomes, while regression trees apply to continuous traits. Both involve the application of a recursive algorithm that aims to partition individuals into groups in a way that minimizes the within-group heterogeneity. CART was originally described by Breiman et al. (1993) and has gained popularity in recent years as a method for identifying structure in high-dimensional data settings. In the following sections, we begin by describing methods for constructing a tree. This involves defining a measure of heterogeneity, or what is commonly referred to as node impurity, as well as determining how predictor variables are input into the model. Both of these components will impact the resulting tree and need to be considered and defined carefully to reect the scientific questions at hand. We then describe methods for refining this tree to arrive at a final reproducible model. Further discussions of CART methods can be found in Breiman et al. (1993) and Zhang and Singer (1999). In Chapter 7, we describe extensions of the CART model, including random forests and logic regression trees that offer some additional advantages.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2009 Springer-Verlag New York
About this chapter
Cite this chapter
Foulkes, A. (2009). Classifcation and Regression Trees. In: Applied Statistical Genetics with R. Use R. Springer, New York, NY. https://doi.org/10.1007/978-0-387-89554-3_6
Download citation
DOI: https://doi.org/10.1007/978-0-387-89554-3_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-89553-6
Online ISBN: 978-0-387-89554-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)