A Consistent Strategy for Boosting Algorithms

  • Gábor Lugosi
  • Nicolas Vayatis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2375)


The probability of error of classification methods based on convex combinations of simple base classifiers by “boosting” algorithms is investigated. The main result of the paper is that certain regularized boosting algorithms provide Bayes-risk consistent classifiers under the only assumption that the Bayes classifier may be approximated by a convex combination of the base classifiers. Non-asymptotic distribution-free bounds are also developed which offer interesting new insight into how boosting works and help explain their success in practical classification problems.


Cost Function Generalization Error Consistent Strategy Oracle Inequality Additive Logistic Regression 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Y. Amit and G. Blanchard and K. Wilder. Multiple Randomized Classifiers. Submitted, 2001.Google Scholar
  2. 2.
    G. Blanchard. Méthodes de mélange et d’agrégation en reconnaissance de formes. Application aux arbres de décision. PhD thesis, Université Paris XIII, 2001. In English.Google Scholar
  3. 3.
    L. Breiman. Bagging predictors. Machine Learning, 26(2):123–140, 1996.Google Scholar
  4. 4.
    L. Breiman. Bias, variance, and arcing classifiers. Technical Report 460, Statistics Department, University of California, April 1996.Google Scholar
  5. 5.
    L. Breiman. Arcing the edge. Technical Report 486, Statistics Department, University of California, June 1997.Google Scholar
  6. 6.
    L. Breiman. Pasting bites together for prediction in large data sets. Technical report, Statistics Department, University of California, July 1997.Google Scholar
  7. 7.
    L. Breiman. Prediction games and arcing algorithms. Technical Report 504, Statistics Department, University of California, December 1997.Google Scholar
  8. 8.
    L. Breiman. Arcing classifiers. Annals of Statistics, 26:801–849, 1998.zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    L. Breiman. Some infinite theory for predictor ensembles. Technical Report 577, Statistics Department, UC Berkeley, August 2000.Google Scholar
  10. 10.
    P. Bühlmann and B. Yu. Discussion of the paper “Additive Logistic Regression” by Jerome Friedman, Trevor Hastie and Robert Tibshirani. The Annals of Statistics, 28:377–386, 2000.Google Scholar
  11. 11.
    P. Bühlmann and B. Yu. Boosting with the L2-Loss: Regression and Classification Manuscript, August 2001.Google Scholar
  12. 12.
    M. Collins, R. E. Schapire, and Y. Singer. Logistic regression, AdaBoost and Bregman distances. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, 2000.Google Scholar
  13. 13.
    L. Devroye, L. Györfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer-Verlag, New York, 1996.zbMATHGoogle Scholar
  14. 14.
    L. Devroye and G. Lugosi. Combinatorial Methods in Density Estimation. Springer-Verlag, New York, 2000.Google Scholar
  15. 15.
    Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2):256–285, September 1995.Google Scholar
  16. 16.
    Y. Freund, Y. Mansour, and R. E. Schapire. Why averaging classifiers can protect against overfitting. In Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, 2001.Google Scholar
  17. 17.
    Y. Freund and R. E. Schapire. Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning, pages 148–146. Morgan Kaufmann, 1996.Google Scholar
  18. 18.
    Y. Freund and R. E. Schapire. Game theory, on-line prediction and boosting. In Proc. 9th Annu. Conf. on Comput. Learning Theory, pages 325–332. ACM Press, New York, NY, 1996.CrossRefGoogle Scholar
  19. 19.
    Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, August 1997.Google Scholar
  20. 20.
    Y. Freund and R. E. Schapire. Discussion of the paper “additive logistic regression: a statistical view of boosting” by J. Friedman, T. Hastie and R. Tibshirani. The Annals of Statistics, 38(2):391–393, 2000.Google Scholar
  21. 21.
    J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Technical report, Department of Statistics, Sequoia Hall, Stanford University, July 1998.Google Scholar
  22. 22.
    W. Jiang. Process consistency for adaboost. Technical Report 00-05, Department of Statistics, Northwestern University, November 2000.Google Scholar
  23. 23.
    W. Jiang. Some theoretical aspects of boosting in the presence of noisy data. In Proceedings of The Eighteenth International Conference on Machine Learning (ICML-2001), June 2001, Morgan Kaufmann.Google Scholar
  24. 24.
    V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. Submitted, 2000.Google Scholar
  25. 25.
    M. Ledoux and M. Talagrand. Probability in Banach Space. Springer-Verlag, New York, 1991.Google Scholar
  26. 26.
    S. Mannor and R. Meir. Weak learners and improved convergence rate in boosting. In Advances in Neural Information Processing Systems 13: Proc. NIPS’2000, 2001.Google Scholar
  27. 27.
    S. Mannor, R. Meir, and S. Mendelson. On the consistency of boosting algorithms. Manuscript, June 2001.Google Scholar
  28. 28.
    L. Mason, J. Baxter, P. L. Bartlett, and M. Frean. Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 221–247. MIT Press, Cambridge, MA, 1999.Google Scholar
  29. 29.
    R. E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227, 1990.Google Scholar
  30. 30.
    R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, October 1998.Google Scholar
  31. 31.
    T. Zhang Statistical Behavior and Consistency of Classification Methods based on Convex Risk Minimization. Manuscript, 2001.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Gábor Lugosi
    • 1
  • Nicolas Vayatis
    • 1
  1. 1.Department of EconomicsPompeu Fabra UniversityBarcelonaSpain

Personalised recommendations