Advertisement

On Cesáro Averages for Weighted Trees in the Random Forest

  • Hieu PhamEmail author
  • Sigurður Olafsson
Article
  • 9 Downloads

Abstract

The random forest is a popular and effective classification method. It uses a combination of bootstrap resampling and subspace sampling to construct an ensemble of decision trees that are then averaged for a final prediction. In this paper, we propose a potential improvement on the random forest that can be thought of as applying a weight to each tree before averaging. The new method is motivated by the potential instability of averaging predictions of trees that may be of highly variable quality, and because of this, we replace the regular average with a Cesáro average. We provide both a theoretical analysis that gives exact conditions under which the new approach outperforms the traditional random forest, and numerical analysis that shows the new approach is competitive when training a classification model on numerous realistic data sets.

Keywords

Classification Machine learning Random forest 

Notes

References

  1. Apostol, T. (1976). Introduction to analytic number theory, Berlin Germany. New York: Springer.Google Scholar
  2. Bache, K. , & Lichman, M. UCI machine learning repository. http://archive.ics.uci.edu/ml.
  3. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.CrossRefzbMATHGoogle Scholar
  4. Daho, M.E.H., Settouti, N., Lazouni, M.E., Chikh, M.E.A. (2014). Weighted vote for trees aggregation in random forest. In Intl Conference on Multimedia Computing Systems (ICMCS) (pp. 438–443).Google Scholar
  5. Friedman, J.H. (2006). Recent advances in predictive (machine) learning. Journal of Classification, 23, 175–197.MathSciNetCrossRefzbMATHGoogle Scholar
  6. Hendricks, P. (2015). titanic: Titanic passenger survival data set. R package version 0.1.0. https://CRAN.R-project.org/package=titanic.
  7. Li, H.B., Wang, W., Ding, H.W., Dong, J. (2010). Trees weighting random forest method for classifying high-dimensional noisy data. In Proc. IEEE 7th Int. Conf. e-Business Eng. (ICEBE) (pp. 160–163).Google Scholar
  8. Naghibi, S.A., Pourghasemi, H.R., Dixon, B. (2016). GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environmental Monitoring and Assessment, 188, 44.CrossRefGoogle Scholar
  9. Ronao, C.A., & Cho, S.B. (2015). Random forests with weighted voting for anomalous query access detection in relational databases. Artificial Intelligence and Soft Computing, 9120, 36–48.CrossRefGoogle Scholar
  10. Stein, E., & Shakarchi, R. (2003). Fourier analysis: an introduction Princeton. New Jersey: Princeton University Press.zbMATHGoogle Scholar
  11. Subasi, A., Alickovic, E., Kevric, J. (2017). Diagnosis of chronic kidney disease by using random forest. CMBEBIH, 62, 589–594.CrossRefGoogle Scholar
  12. Weisstein, E.W. (2004). Harmonic series. http://mathworld.wolfram.com/HarmonicSeries.html.
  13. Winham, S.J., Freimuth, R.R., Biernacka, J.M. (2013). A weighted random forests approach to improve predictive performance. Statistical Analysis and Data Mining, 6, 496–505.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© The Classification Society 2019

Authors and Affiliations

  1. 1.Department of Industrial and Manufacturing Systems EngineeringIowa State UniversityAmesUSA

Personalised recommendations