Rank Forest: Systematic Attribute Sub-spacing in Decision Forest

Babar, Zaheer; Islam, Md Zahidul; Mansha, Sameen

doi:10.1007/978-981-13-0292-3_2

Zaheer Babar¹⁵,
Md Zahidul Islam¹⁶ &
Sameen Mansha¹⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 845))

Included in the following conference series:

Australasian Conference on Data Mining

743 Accesses

Abstract

Decision Trees are well known classification algorithms that are also appreciated for their capacity for knowledge discovery. In the literature two major shortcomings of decision trees have been pointed out: (1) instability, and (2) high computational cost. These problems have been addressed to some extent through ensemble learning techniques such as Random Forest. Unlike decision trees where the whole attribute space of a dataset is used to discover the best test attribute for a node, in Random Forest a random subspace of attributes is first selected from which the test attribute for a node is then identified. The property that randomly selects an attribute subspace can cause the selection of all/many poor quality attributes in a subspace resulting in an individual tree with low accuracy. Therefore, in this paper we propose a probabilistic selection of attributes (instead of a random selection) where the probability of the selection of an attribute is proportionate to its quality. Although we developed this approach independently, after the research was completed we discovered that some existing techniques also took the same approach. While in this paper we use mutual information as a measure of an attribute quality, the papers in the literature used information gain ratio and a t-test as the measure. The proposed technique has been evaluated using nine different datasets and a stable performance can be seen in terms of the accuracy (ensemble accuracy and individual tree accuracy) and efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Feldman, R., Sanger, J.: The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, Cambridge (2006)
Book Google Scholar
Martin, J.K., Hirschberg, D.S.: The time complexity of decision tree induction (1995)
Google Scholar
Chikalov, I.: Average Time Complexity of Decision Trees. Intelligent Systems Reference Library, vol. 12. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22661-8
Book MATH Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45014-9_1
Chapter Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Biau, G., Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. 13, 1063–1095 (2012)
MathSciNet MATH Google Scholar
Amaratunga, D., Cabrera, J., Lee, Y.-S.: Enriched random forests. Bioinformatics 24, 2010–2014 (2008). https://doi.org/10.1093/bioinformatics/btn356
Article Google Scholar
Zhao, H., Williams, G.J., Huang, J.Z.: wsrf: an R package for classification with scalable weighted subspace random forests (2017). jstatsoft.org
Hssina, B., Merbouha, A., Ezzikouri, H., Erritali, M.: A comparative study of decision tree ID3 and C4.5. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 4(2) (2014)
Google Scholar
Islam, M.Z.: EXPLORE: a novel decision tree classification algorithm. In: MacKinnon, L.M. (ed.) BNCOD 2010. LNCS, vol. 6121, pp. 55–71. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25704-9_7
Chapter Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Los Altos (1993)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1, 81–106 (1986)
Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)
MATH Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, Los Altos (2006)
MATH Google Scholar
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 20, 832–844 (1998). https://doi.org/10.1109/34.709601
Article Google Scholar
Adnan, M.N., Islam, M.Z.: Forest CERN: a new decision forest building technique. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9651, pp. 304–315. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31753-3_25
Chapter Google Scholar
Li, J., Liu, H.: Ensembles of cascading trees. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 585–588 (2003)
Google Scholar
Islam, M.Z., Giggins, H.: Knowledge discovery through SysFor -a systematically developed forest of multiple decision trees. In: Proceedings of the 9th Australasian Data Mining Conference (2011)
Google Scholar
Xu, Y., Jones, G., Li, J., Wang, B., Sun, C.: A study on mutual information-based feature selection for text categorization. J. Comput. Inf. Syst. 3(3), 203–213 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing and Information Sciences, Radboud University, Nijmegen, The Netherlands
Zaheer Babar
School of Computing and Mathematics, Charles Sturt University, Bathurst, Australia
Md Zahidul Islam
School of Information Technology and Electrical Engineering, University of Queensland, Brisbane, Australia
Sameen Mansha

Authors

Zaheer Babar
View author publications
You can also search for this author in PubMed Google Scholar
Md Zahidul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Sameen Mansha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Zaheer Babar or Md Zahidul Islam .

Editor information

Editors and Affiliations

RMIT University, Melbourne, Victoria, Australia
Yee Ling Boo
University of Wollongong, Wollongong, New South Wales, Australia
David Stirling
La Trobe University, Melbourne, Victoria, Australia
Lianhua Chi
School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, South Australia, Australia
Lin Liu
La Trobe University, Melbourne, Victoria, Australia
Kok-Leong Ong
Microsoft Pty Ltd, Singapore, Singapore
Graham Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Babar, Z., Islam, M.Z., Mansha, S. (2018). Rank Forest: Systematic Attribute Sub-spacing in Decision Forest. In: Boo, Y., Stirling, D., Chi, L., Liu, L., Ong, KL., Williams, G. (eds) Data Mining. AusDM 2017. Communications in Computer and Information Science, vol 845. Springer, Singapore. https://doi.org/10.1007/978-981-13-0292-3_2

Download citation

DOI: https://doi.org/10.1007/978-981-13-0292-3_2
Published: 14 April 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0291-6
Online ISBN: 978-981-13-0292-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics