• Richard A. Berk
Part of the Springer Texts in Statistics book series (STS)


In this chapter we continue the use of fitting ensembles. We focus on boosting with an emphasis on classifiers. The unifying idea is that the statistical learning procedure makes many passes through the data and constructs fitted values for each. However, with each pass, observations that were fit more poorly on the last pass are given more weight. In that way, the algorithm works more diligently to fit the hard-to-fit observations. In the end, each set of fitted values is combined in an averaging process that serves as a regularizer. Boosting can be a very effective statistical learning procedure.


  1. Bartlett, P. L., & M. Traskin (2007). AdaBoost is Consistent. Journal of Machine Learning Research, 8, 23472368.MathSciNetzbMATHGoogle Scholar
  2. Berk, R. A., Kriegler, B., & Ylvisaker, D. (2008). Counting the homeless in Los Angeles county. In D. Nolan & S. Speed (Eds.), Probability and statistics: Essays in honor of David A. Freedman. Monograph series for the institute of mathematical statistics.Google Scholar
  3. Bühlmann, P., & Yu, B. (2004). Discussion. The Annals of Statistics, 32, 96–107.zbMATHGoogle Scholar
  4. Buja, A., & Stuetzle, W. (2006). Observations on bagging. Statistica Sinica, 16(2), 323–352.MathSciNetzbMATHGoogle Scholar
  5. Buja, A., Mease, D., & Wyner, A. J. (2007). Boosting Algorithms: Regularization, Prediction and Model Fitting. Statistical Science, 22(4), 506–512.MathSciNetCrossRefGoogle Scholar
  6. Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. arXiv:1603.02754v1 [cs.LG].Google Scholar
  7. Chernozhukov, V., Chetverikov, D., Demirer, M., Esther Duflo, E., Hansen, C., & Newey, W. (2017). Double/Debiased/Neyman machine learning of treatment effects. arXiv:1701.08687v1 [stat.ML].Google Scholar
  8. Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Machine learning: Proceedings for the thirteenth international conference (pp. 148–156). San Francisco: Morgan Kaufmann.Google Scholar
  9. Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of online learning and an application to boosting. Journal of Computer and System Sciences, 55, 119–139.MathSciNetCrossRefGoogle Scholar
  10. Freund, Y., & Schapire, R. E. (1999). A short introduction to boosting. Journal of the Japanese Society for Artificial Intelligence, 14, 771–780.Google Scholar
  11. Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29, 1189–1232MathSciNetCrossRefGoogle Scholar
  12. Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38, 367–378.MathSciNetCrossRefGoogle Scholar
  13. Friedman, J. H., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Annals of Statistics, 28, 337–407.MathSciNetCrossRefGoogle Scholar
  14. Friedman, J. H., Hastie, T., Rosset S., Tibshirani, R., & Zhu, J. (2004). Discussion of boosting papers. Annals of Statistics, 32, 102–107.zbMATHGoogle Scholar
  15. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd edn.). New York: Springer.CrossRefGoogle Scholar
  16. Imbens, G., & Rubin, D. B. (2015) Causal inference for statistics, social, and biomedical sciences: An introduction. Cambridge: Cambridge University.CrossRefGoogle Scholar
  17. Jiang, W. (2004). Process consistency for AdaBoost. Annals of Statistics, 32, 13–29.MathSciNetCrossRefGoogle Scholar
  18. Mease, D., & Wyner, A. J. (2008). Evidence contrary to the statistical view of boosting (with discussion). Journal of Machine Learning, 9, 1–26.Google Scholar
  19. Mease, D., Wyner, A. J., & Buja, A. (2007). Boosted classification trees and class Probability/Quantile estimation. Journal of Machine Learning, 8, 409–439.zbMATHGoogle Scholar
  20. Ridgeway, G. (1999). The state of boosting. Computing Science and Statistics, 31, 172–181.Google Scholar
  21. Ridgeway, G. (2012). Generalized boosted models: A guide to the gbm Package. Available at from gbm documentation in R.Google Scholar
  22. Rosenbaum, P. R. (2010). Design of observational studies. New York: Springer.CrossRefGoogle Scholar
  23. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55.MathSciNetCrossRefGoogle Scholar
  24. Schapire, R. E. (1999). A brief introduction to boosting. In Proceedings of the sixteenth international joint conference on artificial intelligence.Google Scholar
  25. Schapire, R. E., & Freund, Y. (2012) Boosting. Cambridge: MIT Press.zbMATHGoogle Scholar
  26. Schapire, R. E., Freund, Y., Bartlett, P, & Lee, W.-S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5), 1651–1686.MathSciNetCrossRefGoogle Scholar
  27. Scharfstein, D.O. Rotnitzky, A. and J.M. Robins (1994) “Adjusting for Non-Ignorable Drop-Out Using Semipara-Metric Non-Response Models.” Journal of the American Statistical Association 94:1096–1120.CrossRefGoogle Scholar
  28. Tchetgen Tchetgen, E. J., Robins, J. M., & Rotnitzky, A. (2010). On doubly robust estimation in a semiparametric odds ratio model. Biometrika, 97(1), 171–180.MathSciNetCrossRefGoogle Scholar
  29. Wyner, A. J., Olson, M., Bleich, J, & Mease, D. (2017). Explaining the success of AdaBoost and random forests as interpolating classifiers. Journal of Machine Learning Research, 18, 1–33.MathSciNetzbMATHGoogle Scholar
  30. Zhang, T., & Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Annals of Statistics, 33(4), 1538–1579.MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Richard A. Berk
    • 1
  1. 1.Department of Criminology, Schools of Arts and SciencesUniversity of PennsylvaniaPhiladelphiaUSA

Personalised recommendations