Abstract
Boosting is a powerful method for improving the predictive accuracy of classifiers. The AdaBoost algorithm of Freund and Schapire has been successfully applied to many domains [2, 10, 12] and the combination of AdaBoost with the C4.5 decision tree algorithm has been called the best off-the-shelf learning algorithm in practice. Unfortunately, in some applications, the number of decision trees required by AdaBoost to achieve a reasonable accuracy is enormously large and hence is very space consuming. This problem was first studied by Margineantu and Dietterich [7], where they proposed an empirical method called Kappa pruning to prune the boosting ensemble of decision trees. The Kappa method did this without sacrificing too much accuracy. In this work-in-progress we propose a potential improvement to the Kappa pruning method and also study the boosting pruning problem from a theoretical perspective. We point out that the boosting pruning problem is intractable even to approximate. Finally, we suggest a margin-based theoretical heuristic for this problem.
Chapter PDF
Similar content being viewed by others
References
Y. Freund and R.E. Schapire. A decision-theoretic generalization of online learning and an application to boosting. J. Comp. System Sciences, 55(1):119–139, 1997. 404, 405
Y. Freund and R.E. Schapire. Experiments with a New Boosting Algorithm. Proc. 13th Int. Conf. on Machine Learning, 148–156, 1996. 404, 405, 406
M.R. Garey and D.S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman and Company, 1979. 409
D. Hochbaum. Approximation Algorithms for NP-hard Problems. PWS Publishing Company, 1997. 410
W. Hoeffding. Probability Inequalities for Sums of Bounded Random Variables. J. American Stat. Assoc., 58:13–30, 1963. 411
V. Kann. Polynomially bounded minimization problems that are hard to approximate. Nordic Journal of Computing, 1:317–331, 1994. 409
D. Margineantu and T.G. Dietterich. Pruning Adaptive Boosting. Proc. 14th Int. Conf. Machine Learning, 211–218, 1997. 404, 405, 406, 408, 411
C.J. Merz and P.M. Murphy. UCI Repository of Machine Learning Databases. Tech. Report, U.C. Irvine, CA. 407
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993. 404, 405
J.R. Quinlan. Bagging, Boosting, and C4.5. Proc. 13th Nat. Conf. Artificial Intelligence, 725–730, 1996. 404, 405, 406
R.E. Schapire, Y. Freund, P. Bartlett, and W.S. Lee. Boosting the Margin: a new explanation of the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, 1998. 405, 410
R.E. Schapire and Y. Singer. Improved Boosting Algorithms using Confidencerated Predictions. Proc. 11th Ann. Conf. Comp. Learning Theory, 80–91, 1998. 404, 410
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tamon, C., Xiang, J. (2000). On the Boosting Pruning Problem. In: López de Mántaras, R., Plaza, E. (eds) Machine Learning: ECML 2000. ECML 2000. Lecture Notes in Computer Science(), vol 1810. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45164-1_41
Download citation
DOI: https://doi.org/10.1007/3-540-45164-1_41
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67602-7
Online ISBN: 978-3-540-45164-8
eBook Packages: Springer Book Archive