Abstract
We consider a class of algorithms for classification, which are based on sequential greedy minimization of a convex upper bound on the 0 — 1 loss function. A large class of recently popular algorithms falls within the scope of this approach, including many variants of Boosting algorithms. The basic question addressed in this paper relates to the statistical consistency of such approaches. We provide precise conditions which guarantee that sequential greedy procedures are consistent, and establish rates of convergence under the assumption that the Bayes decision boundary belongs to a certain class of smooth functions. The results are established using a form of regularization which constrains the search space at each iteration of the algorithm. In addition to providing general consistency results, we provide rates of convergence for smooth decision boundaries. A particularly interesting conclusion of our work is that Logistic function based Boosting provides faster rates of convergence than Boosting based on the exponential function used in AdaBoost.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. A. Adams. Sobolev Spaces. Academic Press, New York, 1975.
M. Anthony and P. L. Bartlett. Neural Network Learning; Theoretical Foundations. Cambridge University Press, 1999.
P. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: risk bounds and structural results. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 224–240, 2001.
L. Breiman. Arcing classifiers. The Annals of Statistics, 26(3):801–824, 1998.
Y. Freund and R. E. Schapire. A decision theoretic generalization of on-line learning and application to boosting. Comput. Syst. Sci., 55(1):119–139, 1997.
J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. The Annals of Statistics, 38(2):337–374, 2000.
W. Jiang. Does boosting overfit: Views from an exact solution. Technical Report 00-03, Department of Statistics, Northwestern University, 2000.
W. Jiang. Process consistency for adaboost. Technical Report 00-05, Department of Statistics, Northwestern University, 2000.
V. Koltchinksii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statis., 30(1), 2002.
G. Lugosi and N. Vayatis. On the bayes-risk consistency of bosting methods. Technical report, Pompeu Fabra University, 2001.
S. Mannor and R. Meir. Geometric bounds for generlization in boosting. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 461–472, 2001.
S. Mannor and R. Meir. On the existence of weak learners and applications to boosting. Machine Learning, 2002. To appear.
L. Mason, P. Bartlett, J. Baxter, and M. Frean. Functional gradient techniques for combining hypotheses. In B. Schölkopf A. Smola, P. Bartlett and D. Schuurmans, editors, Advances in Large Margin Classifiers. MIT Press, 2000.
R. Meir and V. Maiorov. On the optimality of neural network approximation using incremental algorithms. IEEE Trans. Neural Networks, 11(2):323–337, 2000.
D. Pollard. Convergence of Empirical Processes. Springer Verlag, New York, 1984.
R. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.
A. W. van der Vaart and J. A. Wellner. Weak Convergence and EmpiricalProcesses. Springer Verlag, New York, 1996.
Y. Yang. Minimax nonparametric classification-patri: rates of convergence. IEEE Trans. Inf. Theory, 45(7):2271–2284, 1999.
T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Technical Report RC22155, IBM T. J. Watson Research Center, Yorktown Heights, 2001.
T. Zhang. Sequential greedy approximation for certain convex optimization problems. Technical Report RC22309, IBM T. J. Watson Research Center, Yorktown Heights, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mannor, S., Meir, R., Zhang, T. (2002). The Consistency of Greedy Algorithms for Classification. In: Kivinen, J., Sloan, R.H. (eds) Computational Learning Theory. COLT 2002. Lecture Notes in Computer Science(), vol 2375. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45435-7_22
Download citation
DOI: https://doi.org/10.1007/3-540-45435-7_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43836-6
Online ISBN: 978-3-540-45435-9
eBook Packages: Springer Book Archive