Advertisement

Pruning decision trees with misclassification costs

  • Jeffrey P. Bradford
  • Clayton Kunz
  • Ron Kohavi
  • Cliff Brunk
  • Carla E. Brodley
Decision Trees
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1398)

Abstract

We describe an experimental study of pruning methods for decision tree classifiers when the goal is minimizing loss rather than error. In addition to two common methods for error minimization, CART's cost-complexity pruning and C4.5's error-based pruning, we study the extension of cost-complexity pruning to loss and one pruning variant based on the Laplace correction. We perform an empirical comparison of these methods and evaluate them with respect to loss. We found that applying the Laplace correction to estimate the probability distributions at the leaves was beneficial to all pruning methods. Unlike in error minimization, and somewhat surprisingly, performing no pruning led to results that were on par with other methods in terms of the evaluation criteria. The main advantage of pruning was in the reduction of the decision tree size, sometimes by a factor of ten. While no method dominated others on all datasets, even for the same domain different pruning mechanisms are better for different loss matrices.

Keywords

Tree Size Loss Matrice Pruning Algorithm Misclassification Cost Decision Tree Classifier 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. Bradford, J. P., Kunz, C., Kohavi, R., Brunk, C. & Brodley, C. E. (1998), Pruning decision trees with misclassification costs (long). http://robotics.stanford.edu/≈ronnyk/prune-long.ps.gz.Google Scholar
  2. Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. (1984), Classification and Regression Trees, Wadsworth International Group.Google Scholar
  3. Cestnik, B. (1990), Estimating probabilities: A crucial task in machine learning, in L. C. Aiello, ed., ‘Proceedings of the ninth European Conference on Artificial Intelligence', pp. 147–149.Google Scholar
  4. Draper, B. A., Brodley, C. E. & Utgoff, P. E. (1994), ‘Goal-directed classification using linear machine decision trees', IEEE Transactions on Pattern Analysis and Machine Intelligence 16(9), 888–893.CrossRefGoogle Scholar
  5. Good, I. J. (1965), The Estimation of Probabilities: An Essay on Modern Bayesian Methods, M.I.T. Press.Google Scholar
  6. Kohavi, R., Sommerfield, D. & Dougherty, J. (1996), Data mining using MCC++: A machine learning library in C++, in ‘Tools with Artificial Intelligence', IEEE Computer Society Press, pp. 234–245. http://www.sgi.com/Technology/mlc.Google Scholar
  7. Merz, C. J. & Murphy, P. M. (1997), UCI repository of machine learning databases. http://www.ics.uci.edu/≈mlearn/MLRepository.html.Google Scholar
  8. Oates, T. & Jansen, D. (1997), The effects of training set size on decision tree complexity, in D. Fisher, ed., ‘Machine Learning: Proceedings of the Fourteenth International Conference', Morgan Kaufmann, pp. 254–262.Google Scholar
  9. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T. & Brunk, C. (1994), Reducing misclassification costs, in ‘Machine Learning: Proceedings of the Eleventh International Conference', Morgan Kaufmann.Google Scholar
  10. Quinlan, J. R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo, California.Google Scholar
  11. Turney, P. (1997), Cost-sensitive learning. http://ai.iit.nrc.ca/bibliographies/cost-sensitive.html.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Jeffrey P. Bradford
    • 1
  • Clayton Kunz
    • 2
  • Ron Kohavi
    • 2
  • Cliff Brunk
    • 2
  • Carla E. Brodley
    • 1
  1. 1.School of Electrical EngineeringPurdue UniversityWest Lafayette
  2. 2.Data Mining and Visualization Silicon Graphics, Inc.Mountain View

Personalised recommendations