Classification Trees and Rule-Based Models

  • Max Kuhn
  • Kjell Johnson


Classification trees fall within the family of tree-based models and, similar to regression trees (Chapter 8), consist of nested if-then statements. Classification trees and rules are basic partitioning models and are covered in Sections 14.1 and 14.2, respectively. Ensemble methods combine many trees (or rules) into one model and tend to have much better predictive performance than single tree- or rule-based model. Popular ensemble techniques are bagging (Section 14.3), random forests (Section 14.4), boosting (Section 14.5), and C5.0 (Section 14.6). In Section 14.7 we compare the model results from two different encodings for the categorical predictors. Then in Section 14.8, we demonstrate how to train each of these models in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.


Random Forest Classification Tree Terminal Node Gini Index Variable Importance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Bauer E, Kohavi R (1999). “An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants.” Machine Learning, 36, 105–142.CrossRefGoogle Scholar
  2. Breiman L (1996c). “Technical Note: Some Properties of Splitting Criteria.” Machine Learning, 24(1), 41–47.MathSciNetMATHGoogle Scholar
  3. Breiman L (1998). “Arcing Classifiers.” The Annals of Statistics, 26, 123–140.MathSciNetMATHGoogle Scholar
  4. Breiman L (2000). “Randomizing Outputs to Increase Prediction Accuracy.” Mach. Learn., 40, 229–242. ISSN 0885-6125.Google Scholar
  5. Breiman L (2001). “Random Forests.” Machine Learning, 45, 5–32.CrossRefMATHGoogle Scholar
  6. Breiman L, Friedman J, Olshen R, Stone C (1984). Classification and Regression Trees. Chapman and Hall, New York.MATHGoogle Scholar
  7. Chan K, Loh W (2004). “LOTUS: An Algorithm for Building Accurate and Comprehensible Logistic Regression Trees.” Journal of Computational and Graphical Statistics, 13(4), 826–852.MathSciNetCrossRefGoogle Scholar
  8. Cover TM, Thomas JA (2006). Elements of Information Theory. Wiley–Interscience.Google Scholar
  9. Frank E, Wang Y, Inglis S, Holmes G (1998). “Using Model Trees for Classification.” Machine Learning.Google Scholar
  10. Frank E, Witten I (1998). “Generating Accurate Rule Sets Without Global Optimization.” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 144–151.Google Scholar
  11. Freund Y (1995). “Boosting a Weak Learning Algorithm by Majority.” Information and Computation, 121, 256–285.MathSciNetCrossRefMATHGoogle Scholar
  12. Freund Y, Schapire R (1996). “Experiments with a New Boosting Algorithm.” Machine Learning: Proceedings of the Thirteenth International Conference, pp. 148–156.Google Scholar
  13. Friedman J, Hastie T, Tibshirani R (2000). “Additive Logistic Regression: A Statistical View of Boosting.” Annals of Statistics, 38, 337–374.MathSciNetCrossRefMATHGoogle Scholar
  14. Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.Google Scholar
  15. Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.MathSciNetCrossRefGoogle Scholar
  16. Johnson K, Rayens W (2007). “Modern Classification Methods for Drug Discovery.” In A Dmitrienko, C Chuang-Stein, R D’Agostino (eds.), “Pharmaceutical Statistics Using SAS: A Practical Guide,” pp. 7–43. Cary, NC: SAS Institute Inc.Google Scholar
  17. Kearns M, Valiant L (1989). “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata.” In “Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing,”.Google Scholar
  18. Loh WY (2002). “Regression Trees With Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, 12, 361–386.MathSciNetMATHGoogle Scholar
  19. Menze B, Kelm B, Splitthoff D, Koethe U, Hamprecht F (2011). “On Oblique Random Forests.” Machine Learning and Knowledge Discovery in Databases, pp. 453–469.Google Scholar
  20. Molinaro A, Lostritto K, Van Der Laan M (2010). “partDSA: Deletion/Substitution/Addition Algorithm for Partitioning the Covariate Space in Prediction.” Bioinformatics, 26(10), 1357–1363.CrossRefGoogle Scholar
  21. Ozuysal M, Calonder M, Lepetit V, Fua P (2010). “Fast Keypoint Recognition Using Random Ferns.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(3), 448–461.CrossRefGoogle Scholar
  22. Quinlan R (1987). “Simplifying Decision Trees.” International Journal of Man–Machine Studies, 27(3), 221–234.CrossRefGoogle Scholar
  23. Quinlan R (1993b). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.Google Scholar
  24. Quinlan R (1996a). “Bagging, Boosting, and C4.5.” In “In Proceedings of the Thirteenth National Conference on Artificial Intelligence,”.Google Scholar
  25. Quinlan R (1996b). “Improved use of continuous attributes in C4.5.” Journal of Artificial Intelligence Research, 4, 77–90.Google Scholar
  26. Quinlan R, Rivest R (1989). “Inferring Decision Trees Using the Minimum Description Length Principle.” Information and computation, 80(3), 227–248.MathSciNetCrossRefMATHGoogle Scholar
  27. Ruczinski I, Kooperberg C, Leblanc M (2003). “Logic Regression.” Journal of Computational and Graphical Statistics, 12(3), 475–511.MathSciNetCrossRefMATHGoogle Scholar
  28. Schapire R (1990). “The Strength of Weak Learnability.” Machine Learning, 45, 197–227.Google Scholar
  29. Shannon C (1948). “A Mathematical Theory of Communication.” The Bell System Technical Journal, 27(3), 379–423.MathSciNetCrossRefMATHGoogle Scholar
  30. Valiant L (1984). “A Theory of the Learnable.” Communications of the ACM, 27, 1134–1142.CrossRefMATHGoogle Scholar
  31. Wallace C (2005). Statistical and Inductive Inference by Minimum Message Length. Springer–Verlag.Google Scholar
  32. Zeileis A, Hothorn T, Hornik K (2008). “Model–Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Max Kuhn
    • 1
  • Kjell Johnson
    • 2
  1. 1.Division of Nonclinical StatisticsPfizer Global Research and DevelopmentGrotonUSA
  2. 2.Arbor AnalyticsSalineUSA

Personalised recommendations