Skip to main content

An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests

  • Conference paper
  • First Online:
Engineering Applications of Neural Networks (EANN 2016)

Abstract

Random Forest (RF) is an ensemble classification technique that was developed by Breiman over a decade ago. Compared with other ensemble techniques, it has proved its accuracy and superiority. Many researchers, however, believe that there is still room for enhancing and improving its performance in terms of predictive accuracy. This explains why, over the past decade, there have been many extensions of RF where each extension employed a variety of techniques and strategies to improve certain aspect(s) of RF. Since it has been proven empirically that ensembles tend to yield better results when there is a significant diversity among the constituent models, the objective of this paper is twofold. First, it investigates how an unsupervised learning technique, namely, Local Outlier Factor (LOF) can be used to identify diverse trees in the RF. Second, trees with the highest LOF scores are then used to create a new RF termed LOFB-DRF that is much smaller in size than RF, and yet performs at least as good as RF, but mostly exhibits higher performance in terms of accuracy. The latter refers to a known technique called ensemble pruning. Experimental results on 10 real datasets prove the superiority of our proposed method over the traditional RF. Unprecedented pruning levels reaching as high as 99 % have been achieved at the time of boosting the predictive accuracy of the ensemble. The notably extreme pruning level makes the technique a good candidate for real-time applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Garcıa Adeva, J.J., Beresi, U., Calvo, R.: Accuracy and diversity in ensembles of text categorisers. CLEI Electron. J. 9(1), 1–12 (2005)

    Google Scholar 

  2. Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Comput. 9(7), 1545–1588 (1997)

    Article  Google Scholar 

  3. Arning, A., Agrawal, R., Raghavan, P.: A linear method for deviation detection in large databases. In: KDD, pp. 164–169 (1996)

    Google Scholar 

  4. Bache, K., Lichman, M.: Uci machine learning repository (2013)

    Google Scholar 

  5. Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)

    MathSciNet  MATH  Google Scholar 

  6. Breiman, L.: Stacked regressions. Mach. Learn. 24(1), 49–64 (1996)

    MathSciNet  MATH  Google Scholar 

  7. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  8. Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)

    Google Scholar 

  9. Brown, G., Wyatt, J., Harris, R., Yao, X.: Diversity creation methods: a survey and categorisation. Inf. Fusion 6(1), 5–20 (2005)

    Article  Google Scholar 

  10. Kriegel, H.-P., Kroger, P., Schubert, E., Zimek, A.: Interpreting and unifying outlier scores. In: 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ (2011)

    Google Scholar 

  11. Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)

    MathSciNet  MATH  Google Scholar 

  12. Fleiss, J.L., Levin, B., Paik, M.C.: Statistical Methods for Rates and Proportions. Wiley, New York (2013)

    MATH  Google Scholar 

  13. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  14. Giacinto, G., Roli, F.: Design of effective neural network ensembles for image classification purposes. Image Vis. Comput. 19(9), 699–707 (2001)

    Article  Google Scholar 

  15. Ho, T.K.: Random decision forests. In: 1995 Proceedings of the Third International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282. IEEE (1995)

    Google Scholar 

  16. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20(8), 832–844 (1998)

    Article  Google Scholar 

  17. Knorr, E.M., Ng, R.T.: Finding intensional knowledge of distancebased outliers. VLDB 99, 211–222 (1999)

    Google Scholar 

  18. Knox, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the International Conference on Very Large Data Bases. Citeseer (1998)

    Google Scholar 

  19. Kohavi, R., Wolpert, D.H., et al.: Bias plus variance decomposition for zero-one loss functions. In: ICML, pp. 275–283 (1996)

    Google Scholar 

  20. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Mach. Learn. 51(2), 181–207 (2003)

    Article  MATH  Google Scholar 

  21. Maclin, R., Opitz, D.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11(1–2), 169–198 (1999)

    MATH  Google Scholar 

  22. Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: ICML, vol. 97, pp. 211–218. Citeseer (1997)

    Google Scholar 

  23. Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H., Hall, M., Frank, E.: The WEKA data mining software: an update. SIGKDD Explor. Newslett. 11(1), 10–18 (2009)

    Article  Google Scholar 

  24. Martínez-Muñoz, G., Suárez, A.: Pruning in ordered bagging ensembles. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 609–616. ACM (2006)

    Google Scholar 

  25. Martinez-Muoz, G., Hernández-Lobato, D., Suárez, A.: An analysis of ensemble pruning techniques based on ordered aggregation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 245–259 (2009)

    Article  Google Scholar 

  26. Partridge, D., Krzanowski, W.: Software diversity: practical statistics for its measurement and exploitation. Inf. Softw. Technol. 39(10), 707–717 (1997)

    Article  Google Scholar 

  27. Partridge, D., Yates, W.B.: Engineering multiversion neural-net systems. Neural Comput. 8(4), 869–893 (1996)

    Article  Google Scholar 

  28. Polikar, R.: Ensemble based systems in decision making. IEEE Circ. Syst. Mag. 6(3), 21–45 (2006)

    Article  Google Scholar 

  29. Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)

    Article  Google Scholar 

  30. Ruts, I., Rousseeuw, P.J.: Computing depth contours of bivariate point clouds. Comput. Stat. Data Anal. 23(1), 153–168 (1996)

    Article  MATH  Google Scholar 

  31. Schubert, E., Wojdanowski, R., Zimek, A., Kriegel, H.-P.: On evaluation of outlier rankings and outlier scores. In: 2012 Proceedings of the 12th SIAM International Conference on Data Mining (SDM), Anaheim, CA, pp. 1047–1058 (2012)

    Google Scholar 

  32. Skalak, D.B.: The sources of increased accuracy for two proposed boosting algorithms. In: Proceedings of American Association for Artificial Intelligence, AAAI-96, Integrating Multiple Learned Models Workshop, vol. 1129, p. 1133. Citeseer (1996)

    Google Scholar 

  33. Smyth, P., Wolpert, D.: Linearly combining density estimators via stacking. Mach. Learn. 36(1–2), 59–83 (1999)

    Article  Google Scholar 

  34. Tang, E.K., Suganthan, P.N., Yao, X.: An analysis of diversity measures. Mach. Learn. 65(1), 247–271 (2006)

    Article  Google Scholar 

  35. Tsoumakas, G., Partalas, I., Vlahavas, I.: An ensemble pruning primer. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, pp. 1–13. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  36. Williams, G.: Use R: Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery. Springer, New York (2011)

    MATH  Google Scholar 

  37. Wolpert, D.H.: Stacked generalization. Neural Netw. 5(2), 241–259 (1992)

    Article  MathSciNet  Google Scholar 

  38. Yan, W., Goebel, K.F.: Designing classifier ensembles with constrained performance requirements. In: Defense and Security. International Society for Optics and Photonics, pp. 59–68 (2004)

    Google Scholar 

  39. Yang, Y., Korb, K.B., Ting, K.M., Webb, G.I.: Ensemble selection for superparent-one-dependence estimators. In: Zhang, S., Jarvis, R.A. (eds.) AI 2005. LNCS (LNAI), vol. 3809, pp. 102–112. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed Medhat Gaber .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Fawagreh, K., Gaber, M.M., Elyan, E. (2016). An Outlier Ranking Tree Selection Approach to Extreme Pruning of Random Forests. In: Jayne, C., Iliadis, L. (eds) Engineering Applications of Neural Networks. EANN 2016. Communications in Computer and Information Science, vol 629. Springer, Cham. https://doi.org/10.1007/978-3-319-44188-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44188-7_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44187-0

  • Online ISBN: 978-3-319-44188-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics