Skip to main content

Part of the book series: Springer Texts in Statistics ((STS))

  • 177k Accesses

Abstract

In this chapter, we turn to boosting. Boosting is an ensemble procedure that is a legitimate competitor with random forests. In some ways, it is far more flexible. A much wider range of response variables types are can be used, all within the same basic algorithmic structure. Just as for random forests, there are several useful ways to study the output, and excellent software exists within R.

The original version of this chapter was revised: See the “Chapter Note” section at the end of this chapter for details. The erratum to this chapter is available at https://doi.org/10.1007/978-3-319-44048-4_10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    A more general definition is provided by Schapire and his colleagues (2008: 1697). A wonderfully rich and more recent discussion about the central role of margins in boosting can be found in Schapire and Freund (2012). The book is a remarkable mix from a computer science perspective of the formal mathematics and very accessible discussions of what the mathematics means.

  2. 2.

    In computer science parlance, an “example” is an observation or case.

  3. 3.

    For very large datasets, there is a scalable version of tree boosting called XGBoost() (Chen and Guestrin 2016) that can provide remarkable speed improvements but has yet to include the range of useful loss functions found in gbm(). More will be said about XGBoost() when “deep learning” is briefly considered in Chap. 8.

  4. 4.

    Other initializations such as least squares regression could be used, depending on loss function (e.g., for a quantitative response variable).

  5. 5.

    For gbm(), the data not selected for each tree are called “out-of-bag” data although that is not fully consistent with the usual definition because in gbm(), the sampling is without replacement.

  6. 6.

    Even with the minimum number of observations allowed in terminal nodes specified, with more observations there can be larger trees. There can be more splits before the minimum is reached.

  7. 7.

    For these analyses, the work was done on an iMac with a single core. The processor was a 3.4 Ghz Intel Core i7.

  8. 8.

    If forecasting were on the table, it might have been useful to try a much larger number of iterations to reduce generalization error.

  9. 9.

    The plots are shown just as gbm() builds them, and there are very few options provided. But just as with random forests, the underling data can be stored and then used to construct new plots more responsive to the preferences of data analysts.

  10. 10.

    Because both inputs are integers, the transition from one value to the next is the midpoint between the two.

  11. 11.

    It is not appropriate to compare the overall error rate in the two tables (.18–.21) because the errors are not weighted by costs. In Table 6.2, classification errors for those who perished are about 5 times more costly.

  12. 12.

    The out-of-bag approach was not available in gbm() for boosted quantile regression.

  13. 13.

    The size of the correlation is being substantially determined by actual fares over $200. They are still being fit badly, but not a great deal worse.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard A. Berk .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Berk, R.A. (2016). Boosting. In: Statistical Learning from a Regression Perspective. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-44048-4_6

Download citation

Publish with us

Policies and ethics