Skip to main content

Rank Forest: Systematic Attribute Sub-spacing in Decision Forest

  • Conference paper
  • First Online:
Book cover Data Mining (AusDM 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 845))

Included in the following conference series:

  • 743 Accesses

Abstract

Decision Trees are well known classification algorithms that are also appreciated for their capacity for knowledge discovery. In the literature two major shortcomings of decision trees have been pointed out: (1) instability, and (2) high computational cost. These problems have been addressed to some extent through ensemble learning techniques such as Random Forest. Unlike decision trees where the whole attribute space of a dataset is used to discover the best test attribute for a node, in Random Forest a random subspace of attributes is first selected from which the test attribute for a node is then identified. The property that randomly selects an attribute subspace can cause the selection of all/many poor quality attributes in a subspace resulting in an individual tree with low accuracy. Therefore, in this paper we propose a probabilistic selection of attributes (instead of a random selection) where the probability of the selection of an attribute is proportionate to its quality. Although we developed this approach independently, after the research was completed we discovered that some existing techniques also took the same approach. While in this paper we use mutual information as a measure of an attribute quality, the papers in the literature used information gain ratio and a t-test as the measure. The proposed technique has been evaluated using nine different datasets and a stable performance can be seen in terms of the accuracy (ensemble accuracy and individual tree accuracy) and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://archive.ics.uci.edu/ml/.

  2. 2.

    http://sci2s.ugr.es/keel/category.php?cat=clas.

References

  1. Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  2. Martin, J.K., Hirschberg, D.S.: The time complexity of decision tree induction (1995)

    Google Scholar 

  3. Chikalov, I.: Average Time Complexity of Decision Trees. Intelligent Systems Reference Library, vol. 12. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22661-8

    Book  MATH  Google Scholar 

  4. Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1

    Chapter  Google Scholar 

  5. Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  6. Biau, G., Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. 13, 1063–1095 (2012)

    MathSciNet  MATH  Google Scholar 

  7. Amaratunga, D., Cabrera, J., Lee, Y.-S.: Enriched random forests. Bioinformatics 24, 2010–2014 (2008). https://doi.org/10.1093/bioinformatics/btn356

    Article  Google Scholar 

  8. Zhao, H., Williams, G.J., Huang, J.Z.: wsrf: an R package for classification with scalable weighted subspace random forests (2017). jstatsoft.org

  9. Hssina, B., Merbouha, A., Ezzikouri, H., Erritali, M.: A comparative study of decision tree ID3 and C4.5. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 4(2) (2014)

    Google Scholar 

  10. Islam, M.Z.: EXPLORE: a novel decision tree classification algorithm. In: MacKinnon, L.M. (ed.) BNCOD 2010. LNCS, vol. 6121, pp. 55–71. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25704-9_7

    Chapter  Google Scholar 

  11. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Los Altos (1993)

    Google Scholar 

  12. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)

    Google Scholar 

  13. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)

    MATH  Google Scholar 

  14. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, Los Altos (2006)

    MATH  Google Scholar 

  15. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998). https://doi.org/10.1109/34.709601

    Article  Google Scholar 

  16. Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 304–315. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31753-3_25

    Chapter  Google Scholar 

  17. Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 585–588 (2003)

    Google Scholar 

  18. Islam, M.Z., Giggins, H.: Knowledge discovery through SysFor -a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australasian Data Mining Conference (2011)

    Google Scholar 

  19. Xu, Y., Jones, G., Li, J., Wang, B., Sun, C.: A study on mutual information-based feature selection for text categorization. J. Comput. Inf. Syst. 3(3), 203–213 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zaheer Babar or Md Zahidul Islam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Babar, Z., Islam, M.Z., Mansha, S. (2018). Rank Forest: Systematic Attribute Sub-spacing in Decision Forest. In: Boo, Y., Stirling, D., Chi, L., Liu, L., Ong, KL., Williams, G. (eds) Data Mining. AusDM 2017. Communications in Computer and Information Science, vol 845. Springer, Singapore. https://doi.org/10.1007/978-981-13-0292-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-0292-3_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-0291-6

  • Online ISBN: 978-981-13-0292-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics