On Cesáro Averages for Weighted Trees in the Random Forest

Abstract

The random forest is a popular and effective classification method. It uses a combination of bootstrap resampling and subspace sampling to construct an ensemble of decision trees that are then averaged for a final prediction. In this paper, we propose a potential improvement on the random forest that can be thought of as applying a weight to each tree before averaging. The new method is motivated by the potential instability of averaging predictions of trees that may be of highly variable quality, and because of this, we replace the regular average with a Cesáro average. We provide both a theoretical analysis that gives exact conditions under which the new approach outperforms the traditional random forest, and numerical analysis that shows the new approach is competitive when training a classification model on numerous realistic data sets.

This is a preview of subscription content, log in to check access.

References

  1. Apostol, T. (1976). Introduction to analytic number theory, Berlin Germany. New York: Springer.

    Google Scholar 

  2. Bache, K. , & Lichman, M. UCI machine learning repository. http://archive.ics.uci.edu/ml.

  3. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Article  Google Scholar 

  4. Daho, M.E.H., Settouti, N., Lazouni, M.E., Chikh, M.E.A. (2014). Weighted vote for trees aggregation in random forest. In Intl Conference on Multimedia Computing Systems (ICMCS) (pp. 438–443).

  5. Friedman, J.H. (2006). Recent advances in predictive (machine) learning. Journal of Classification, 23, 175–197.

    MathSciNet  Article  Google Scholar 

  6. Hendricks, P. (2015). titanic: Titanic passenger survival data set. R package version 0.1.0. https://CRAN.R-project.org/package=titanic.

  7. Li, H.B., Wang, W., Ding, H.W., Dong, J. (2010). Trees weighting random forest method for classifying high-dimensional noisy data. In Proc. IEEE 7th Int. Conf. e-Business Eng. (ICEBE) (pp. 160–163).

  8. Naghibi, S.A., Pourghasemi, H.R., Dixon, B. (2016). GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environmental Monitoring and Assessment, 188, 44.

    Article  Google Scholar 

  9. Ronao, C.A., & Cho, S.B. (2015). Random forests with weighted voting for anomalous query access detection in relational databases. Artificial Intelligence and Soft Computing, 9120, 36–48.

    Article  Google Scholar 

  10. Stein, E., & Shakarchi, R. (2003). Fourier analysis: an introduction Princeton. New Jersey: Princeton University Press.

    Google Scholar 

  11. Subasi, A., Alickovic, E., Kevric, J. (2017). Diagnosis of chronic kidney disease by using random forest. CMBEBIH, 62, 589–594.

    Article  Google Scholar 

  12. Weisstein, E.W. (2004). Harmonic series. http://mathworld.wolfram.com/HarmonicSeries.html.

  13. Winham, S.J., Freimuth, R.R., Biernacka, J.M. (2013). A weighted random forests approach to improve predictive performance. Statistical Analysis and Data Mining, 6, 496–505.

    MathSciNet  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hieu Pham.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Pham, H., Olafsson, S. On Cesáro Averages for Weighted Trees in the Random Forest. J Classif 37, 223–236 (2020). https://doi.org/10.1007/s00357-019-09322-8

Download citation

Keywords

  • Classification
  • Machine learning
  • Random forest