Skip to main content

Classification Trees and Rule-Based Models

  • Chapter

Abstract

Classification trees fall within the family of tree-based models and, similar to regression trees (Chapter 8), consist of nested if-then statements. Classification trees and rules are basic partitioning models and are covered in Sections 14.1 and 14.2, respectively. Ensemble methods combine many trees (or rules) into one model and tend to have much better predictive performance than single tree- or rule-based model. Popular ensemble techniques are bagging (Section 14.3), random forests (Section 14.4), boosting (Section 14.5), and C5.0 (Section 14.6). In Section 14.7 we compare the model results from two different encodings for the categorical predictors. Then in Section 14.8, we demonstrate how to train each of these models in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    See Breiman (1996c) for a discussion of the technical nuances of splitting algorithms.

  2. 2.

    An alternate way to think of this is in terms of entropy, a measure of uncertainty. When the classes are balanced 50/50, we have no real ability to guess the outcome: it is as uncertain as possible. However, if ten samples were in class 1, we would have less uncertainty since it is more likely that a random data point would be in class 1.

  3. 3.

    Also known as the mutual information statistic. This statistic is discussed again in Chap. 18.

  4. 4.

    By default, C4.5 uses simple binary split of continuous predictors. However, Quinlan (1993b) also describes a technique called soft thresholding that treats values near the split point differently. For brevity, this is not discussed further here.

  5. 5.

    Because a weak classifier is used, the stage values are often close to zero.

  6. 6.

    An example of this type of argument is shown in Sect. 16.9where rpart is fit using with differential costs for different types of errors.

References

  • Bauer E, Kohavi R (1999). “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.” Machine Learning, 36, 105–142.

    Article  Google Scholar 

  • Breiman L (1996c). “Technical Note: Some Properties of Splitting Criteria.” Machine Learning, 24(1), 41–47.

    MathSciNet  MATH  Google Scholar 

  • Breiman L (1998). “Arcing Classifiers.” The Annals of Statistics, 26, 123–140.

    MathSciNet  MATH  Google Scholar 

  • Breiman L (2000). “Randomizing Outputs to Increase Prediction Accuracy.” Mach. Learn., 40, 229–242. ISSN 0885-6125.

    Google Scholar 

  • Breiman L (2001). “Random Forests.” Machine Learning, 45, 5–32.

    Article  MATH  Google Scholar 

  • Breiman L, Friedman J, Olshen R, Stone C (1984). Classification and Regression Trees. Chapman and Hall, New York.

    MATH  Google Scholar 

  • Chan K, Loh W (2004). “LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees.” Journal of Computational and Graphical Statistics, 13(4), 826–852.

    Article  MathSciNet  Google Scholar 

  • Cover TM, Thomas JA (2006). Elements of Information Theory. Wiley–Interscience.

    Google Scholar 

  • Frank E, Wang Y, Inglis S, Holmes G (1998). “Using Model Trees for Classification.” Machine Learning.

    Google Scholar 

  • Frank E, Witten I (1998). “Generating Accurate Rule Sets Without Global Optimization.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 144–151.

    Google Scholar 

  • Freund Y (1995). “Boosting a Weak Learning Algorithm by Majority.” Information and Computation, 121, 256–285.

    Article  MathSciNet  MATH  Google Scholar 

  • Freund Y, Schapire R (1996). “Experiments with a New Boosting Algorithm.” Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156.

    Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2000). “Additive Logistic Regression: A Statistical View of Boosting.” Annals of Statistics, 38, 337–374.

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.

    Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.

    Article  MathSciNet  Google Scholar 

  • Johnson K, Rayens W (2007). “Modern Classification Methods for Drug Discovery.” In A Dmitrienko, C Chuang-Stein, R D’Agostino (eds.), “Pharmaceutical Statistics Using SAS: A Practical Guide,” pp. 7–43. Cary, NC: SAS Institute Inc.

    Google Scholar 

  • Kearns M, Valiant L (1989). “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata.” In “Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing,”.

    Google Scholar 

  • Loh WY (2002). “Regression Trees With Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, 12, 361–386.

    MathSciNet  MATH  Google Scholar 

  • Menze B, Kelm B, Splitthoff D, Koethe U, Hamprecht F (2011). “On Oblique Random Forests.” Machine Learning and Knowledge Discovery in Databases, pp. 453–469.

    Google Scholar 

  • Molinaro A, Lostritto K, Van Der Laan M (2010). “partDSA: Deletion/Substitution/Addition Algorithm for Partitioning the Covariate Space in Prediction.” Bioinformatics, 26(10), 1357–1363.

    Article  Google Scholar 

  • Ozuysal M, Calonder M, Lepetit V, Fua P (2010). “Fast Keypoint Recognition Using Random Ferns.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461.

    Article  Google Scholar 

  • Quinlan R (1987). “Simplifying Decision Trees.” International Journal of Man–Machine Studies, 27(3), 221–234.

    Article  Google Scholar 

  • Quinlan R (1993b). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.

    Google Scholar 

  • Quinlan R (1996a). “Bagging, Boosting, and C4.5.” In “In Proceedings of the Thirteenth National Conference on Artificial Intelligence,”.

    Google Scholar 

  • Quinlan R (1996b). “Improved use of continuous attributes in C4.5.” Journal of Artificial Intelligence Research, 4, 77–90.

    Google Scholar 

  • Quinlan R, Rivest R (1989). “Inferring Decision Trees Using the Minimum Description Length Principle.” Information and computation, 80(3), 227–248.

    Article  MathSciNet  MATH  Google Scholar 

  • Ruczinski I, Kooperberg C, Leblanc M (2003). “Logic Regression.” Journal of Computational and Graphical Statistics, 12(3), 475–511.

    Article  MathSciNet  MATH  Google Scholar 

  • Schapire R (1990). “The Strength of Weak Learnability.” Machine Learning, 45, 197–227.

    Google Scholar 

  • Shannon C (1948). “A Mathematical Theory of Communication.” The Bell System Technical Journal, 27(3), 379–423.

    Article  MathSciNet  MATH  Google Scholar 

  • Valiant L (1984). “A Theory of the Learnable.” Communications of the ACM, 27, 1134–1142.

    Article  MATH  Google Scholar 

  • Wallace C (2005). Statistical and Inductive Inference by Minimum Message Length. Springer–Verlag.

    Google Scholar 

  • Zeileis A, Hothorn T, Hornik K (2008). “Model–Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kuhn, M., Johnson, K. (2013). Classification Trees and Rule-Based Models. In: Applied Predictive Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6849-3_14

Download citation

Publish with us

Policies and ethics