Exponentially Weighted Random Forest

Jain, Vikas; Sharma, Jaya; Singhal, Kriti; Phophalia, Ashish

doi:10.1007/978-3-030-34869-4_19

Vikas Jain¹⁴,
Jaya Sharma¹⁴,
Kriti Singhal¹⁴ &
…
Ashish Phophalia¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11941))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

1539 Accesses
6 Citations

Abstract

Random forest (RF) is a supervised, non-parametric, ensemble-based machine learning method used for classification and regression task. It is easy in terms of implementation and scalable, hence attracting many researchers. Being an ensemble-based method, it considers equal weights/votes to all atomic units i.e. decision trees. However, this may not be true always for varying test cases. Hence, the correlation between decision tree and data samples are explored in the recent past to take care of such issues. In this paper, a dynamic weighing scheme is proposed between test samples and decision tree in RF. The correlation is defined in terms of similarity between the test case and the decision tree using exponential distribution. Hence, the proposed method named as Exponentially Weighted Random Forest (EWRF). The performance of the proposed method is rigorously tested over benchmark datasets from the UCI repository for both classification and regression tasks.

J. Sharma and K. Singhal—Equal contribution.

You have full access to this open access chapter, Download conference paper PDF

Improved Weighted Random Forest for Classification Problems

Artificial Intelligence Random Forest Algorithm and the Application

Pruning a Random Forest by Learning a Learning Algorithm

Keywords

1 Introduction

Random forest (RF) is an ensemble-based, supervised machine learning algorithm proposed by Leo Brieman [6]^{Footnote 1}. It consists of numerous randomized decision trees to solve classification and regression problems. In RF, decision trees are constructed independently. Therefore, RF can be implemented and executed as parallel threads, hence it is fast and easy to implement. It has been used for various domains like brain tumor segmentation, Alzheimer detection, face recognition, human pose detection, object detection etc [7].

A decision tree in RF is built during the training phase using the bagging concept. A decision tree has several important parameters like predefined splitting criteria, tree depth and the number of elements on the leaf node. However, the best choice of these parameters is not answered precisely yet [7, 10]. This motivated various methods to come up with the heuristic approach in building the decision tree and hence RF. The method proposed by Paul et al. [15] converges with reduced and important features, and derived the bound for the number of trees. In addition, there has been some work done on proving the consistency of RF and leveraging dependency on the data by several researchers [4, 5, 9, 16]. Denil et. al. [9] used Poisson distribution in feature selection for growing a tree, whereas Wang et al. [16], has proposed a Bernoulli Random Forest (BRF) framework incorporating Bernoulli distribution for the feature and splitting point selection.

The conventional RF assigns equal weights to the votes casted by each individual tree [6]. Hence, the prediction is made based on the majority voting. However, in the real-life scenario, a dataset may have a huge number of features, but the percentage of truly informative features may be less. Therefore, the contribution of such decision trees, which are populated by less informative attributes may be less. Hence, all the trees in a forest are not equally contributing to the better classification [8]. Therefore, instead of assigning a fixed weight to the decision tree, the dynamic weight should be assigned. Paul et al. [13] have proposed a method to compute the weights during the training phase and assigns a fixed weight to each decision tree. The mechanism proposed by Winham et al. [17] and Liu et al. [12], both computes the weight either based on the performance of tree computed using OOB samples or using a feature weighing scheme. Akash et al. [2] compute the confidence as weight in RF using the entropy or Gini score calculated during the tree construction. However, these methods do not talk about the relationship of these weights with test samples. Therefore, a dynamic weighing scheme is proposed in this paper. It computes the similarity between test cases and the decision tree using exponential distribution. Therefore, the forest formed is named as Exponentially Weighted Random Forest (EWRF).

The remainder of this paper is organized as follows: Sect. 2, describes RF as a classifier and regression and problem associated with conventional RF. Section 3, presents the proposed EWRF approach. Section 4, discuss the implementation details and performance. It has been concluded in Sect. 5.

2 Random Forest

Random forest built upon decision trees as an atomic units. Each decision tree either behaves as a classifier for classification or as a regressor to predict the output for regression task. Given a dataset $\mathbbm {D} = \{(X_{1},C_{1}),(X_{2},C_{2}),.......,(X_{M},C_{M})\}$ with M number of instances such that $X_{i} \in \mathbbm {R}^{N}$ with N number of attributes. Let the dataset is having class labels as ${C}_{i} \in \{Y_{1},Y_{2},.......,Y_{C}\}$. Initially, dataset $\mathbbm {D}$, is partitioned into training set $\mathbbm {D}_{1}$, having ${M'}$ number of instances ${(M' < M)}$, and testing set $\mathbbm {D}_{2}$, having remaining instances. Decision trees are constructed using training samples along with bootstrap sampling (random sampling along with replacement) as described in [6].

2.1 Random Forest as Classifier

Random forest assigns the class value based on the proportion of the individual class values present at the leaf node.

The class distribution for the ${j^{th}}$ class at the terminal node h, in the decision tree t, for the test case X, can be represented as:

$$\begin{aligned} p_{j,h}^{t} = \frac{1}{n_{h}}\sum _{{i}\in {h}}\mathbb {I}{(Y_{i}=j)} \end{aligned}$$

(1)

here: ${n_{h}}$ is total number of instances in the terminal node h. $\mathbb {I(\cdot )}$ is an Indicator function.

Based on maximum class distribution, the class value j, is assigned by the decision tree t, for the test case X, by the following equation:

$$\begin{aligned} \hat{Y}_{j}^{t} = \underset{1 \le j \le C}{\text {max}}\{p_{j,h}^{t}\} \end{aligned}$$

(2)

To assign the final class value based on majority voting in conventional RF, first count the predicted class by each decision tree for the test case X, using the following equation:

$$\begin{aligned} C(Y_{i} = j) = \sum _{t = 1}^{n_{tree}} \mathbbm {1}\cdot \mathbb {J}{(\hat{Y}_{j}^{t})} \end{aligned}$$

(3)

here, $\mathbb {J(\cdot )}$ is an indicator function. Finally, based on majority voting, RF assigns the final class value using Eq. (4).

$$\begin{aligned} \hat{Y} = \underset{1 \le j \le C}{\text {max}}\{C(Y_{i} = j)\} \end{aligned}$$

(4)

2.2 Random Forest as Regressor

In regressor task, decision trees have to predict the outcome. In the regression dataset, the outcome value associated with each instance is a single real value i.e. $\mathbf {Y_{i}\in R}$. In order to construct RF as a regressor, Mean Squared Error (MSE) is used as the splitting criterion. Once all the decision trees are constructed, the test instance is passed to each decision tree. Based on the decision tree node values, test instance follows either left or right subtree and reaches to the leaf node. The predicted value is the mean value of instances present at the leaf node. The predicted value for a test case $\mathbf {X}$, at a terminal node h, by the decision tree t, is the mean value of instances present within the leaf node. It can be calculated as:

$$\begin{aligned} \hat{Y}_{h}^{t} = \frac{1}{n_{h}}\sum _{y_{i}\in {h}}{Y_{i}} \end{aligned}$$

(5)

Finally the predicted value by the RF is the average of values predicted by each trees. Hence, the overall prediction made by forest can be computed as:

$$\begin{aligned} \hat{Y} = \frac{1}{n_{tree}}\sum _{t = 1}^{n_{tree}} {{\mathbbm {1}}} \cdot \hat{Y}_{h}^{t} \end{aligned}$$

(6)

2.3 Problem with Conventional Random Forest

Random forest classifier to be effective, each decision tree must have reasonably good classification performance and trees must be diverse and weakly correlated [14]. The diversity is obtained by randomly choosing training instances and attributes for each tree. However, a decision tree can not always contribute effectively to each and every test instance. Considering a dataset with a high ratio of less informative attributes, the performance of RF gets significantly affected. This is due to the equal contribution of decision trees while performing majority voting. In such cases, performance can be increased by reducing the contribution of decision trees whose nodes are populated by non-informative attributes and assigning a dynamic weight to the decision trees [3, 11].

3 Proposed Method

The proposed EWRF consists of two steps. In the first step, decision trees are constructed as described in conventional RF [6]. In the second step, the exponential weight score is calculated as described in following subsections.

3.1 Exponential Weight Score Calculation

During the testing phase, test samples are passed to each and every decision tree in the forest. Let $F_{i}$ is the feature value for splitting at an internal node of a decision tree t. A test sample ${X} = \{a_{1},a_{2},....,a_{j},...,a_{N}\}$, is passed to a decision tree. It is guided either to the left $(a_{j}^{X} \le \tau )$ or right $(a_{j}^{X} > \tau )$ subtree, based on threshold $\tau $, and move down until it reaches to the leaf node of decision tree t. The sum of the squared distance between corresponding attribute values in the test sample X, and participating nodes $F_{i}$, in the path of the decision tree t, is calculated as follows:

$$\begin{aligned} {d} = \sum { ||F_{i}-a_{j}^{X}||_{2} }; \forall {F_{i} \in t; a_{j} \in X} \end{aligned}$$

Thus, we have $\{d_{1},d_{2},......,d_{n_{tree}}\}$ distances computed for each test sample, with respect to all decision trees. The smaller the value of d for the decision tree, the more will be the similarity between tree and test case till that node, and hence the corresponding will be high weight value. This has been shown in Fig. 1. In the proposed EWRF, the weight associated with each decision tree directly proportional to the similarity between the test instance and decision tree. Hence, the weight associated with a decision tree is computed using an exponential distribution measure to maintain such a relationship. In this way, the weight of each decision tree for incoming test cases may vary. The exponential tree weight score is calculated as follows:

$$\begin{aligned} W_\mathbf {X}^{t} = \frac{1}{Z}\exp \left\{ -\frac{\sum { ||F_{i}-a_{j}^{X}||_{2} }}{\alpha } \right\} \end{aligned}$$

(7)

where Z is the normalizing term, which is the sum of weights of all dsecision trees. The $\alpha $ value is one of the hyper-parameter to control the weight score. For classification, the Eq. (3) is turned out to be as:

$$\begin{aligned} C(Y = j) = \sum _{t = 1}^{n_{tree}} (W_\mathbf {X}^{t})\cdot J(\hat{Y}_{j}^{t}) \end{aligned}$$

(8)

For regression, the Eq. (6) is turned out to be as:

$$\begin{aligned} \hat{Y} = \frac{1}{n_{tree}}\sum _{t = 1}^{n_{tree}} (W_\mathbf {X}^{t}) \cdot \hat{Y}_{h}^{t} \end{aligned}$$

(9)

At last, weighted voting is performed using Eqs. (8) and (9) for predicting output in classification and regression tasks respectively, shown in Fig. 2. The pseudo code for predicting the class or regression value is provided in Algorithm 1.

4 Experimental Results

This section is comprised of datasets, implementation details, and performance analysis of EWRF compared to conventional RF, and state-of-the-art methods.

4.1 Datasets and Implementation Details

The experiments have been conducted over the benchmark datasets, which are publicly available over the UCI repository [1]. These datasets are from a variety of domains and have different combinations of numerical attribute values. These datasets vary in terms of the number of classes, features, and instances for rigorous testing of the proposed method.

There are five main parameters for conducting the experiments: (1) the number of trees ${n_{tree}}$, (2) the number of minimum instances at leaf node ${n_{min}}$, (3) the sample ratio in which dataset is divided into training set and test set, (4) the maximum tree depth ${T_{depth}}$, and (5) value of $\alpha $ for computation of exponential weighing score. The value of ${n_{tree}}$ is decided empirically. The experiments have been done over Vehicle, Wine, and Abalone datasets with ${n_{tree}}$ in the range of 10 to 100 with a step size of 10. We have observed that beyond ${n_{tree}}$ = 50, the accuracy saturates, so it is kept as 50 in all experiments. The ${n_{min}}$ is kept as 5 and the sample ratio for dividing the datasets into training and testing is kept as 0.5. These values are taken from the state-of-the-art methods for a fair comparison. Experiments have been done with different values of ${T_{depth}}$ and the results are quoted with the depth, where accuracy is better among different trials. The value of $\alpha $ is chosen as 0.45 for classification and 0.75 for regression. It is also decided by experimenting with different values of $\alpha $ = $\{0.15, 0.45, 0.75, 1.0\}$. Each of the experiment is repeated 10 times with the random selection of training and testing subsets.

Table 1. MSE comparison between state-of-the-art methods and proposed EWRF with average over 10 iterations (least value is the best)

Full size table

Table 2. Classification accuracy comparison between state-of-the-art methods and proposed EWRF with average over 10 iterations (high value is the best)

Full size table

4.2 Performance Analysis

The results generated with the proposed EWRF are compared to the conventional RF [6], and the state-of-the-art methods, i.e. four variants of random forest Biau08 [5], Biau12 [4], Denil [9] and BRF [16] for the regression and classification datasets. The highest learning performance among these comparisons is marked in boldface for each dataset.

In regression, it can be observed from Table 1 that EWRF achieves the significant reduction in MSE on seven datasets out of ten datasets. In particular, one can observe that for the Concrete dataset, Biau08 [5], Biau12 [4], Denil [9], and BRF [16] have almost same MSE value. However, there is more than $50\%$ reduction in MSE for Yacht, Concrete, and Housing datasets. The proposed method has also shown improvement for datasets having large number of classes like Student, and Automobile. From Table 1, it is clear that the proposed method has shown much improvement over the compared state-of-the-art methods.

For classification, the comparison between the existing state-of-the-art methods and proposed EWRF is shown in Table 2. It can be seen that EWRF is showing improvement as compare to Biau08 [5], Biau12 [4] and Denil [9] for all the classification data except for Spambase. In comparison with BRF [16], the proposed method is showing improvement for seven datasets out of eleven datasets. In comparison to conventional RF, the proposed EWRF is showing improvement for nine datasets out of eleven datasets.

5 Conclusion

The conventional Random Forest (RF) assigns equal weights to the votes cast by each individual tree. Also, the approaches proposed in the past assigns weights to every decision tree during the training phase only. In this paper, we have explored the dynamic relationship between test samples and decision trees, based on which aggregation/weighted voting is performed. Thus, weights derived in EWRF are dynamic in nature. The proposed method is tested over various heterogeneous datasets and compared to state-of-the-art competitors. The proposed method has shown improvement for both regression and classification tasks.

Notes

1.
Referred to as conventional random forest throughout the text.

References

UCI repository. https://archive.ics.uci.edu/ml/index.php. Accessed 15 Nov 2018
Akash, P.S., Kadir, M.E., Ali, A.A., Tawhid, M.N.A., Shoyaib, M.: Introducing confidence as a weight in random forest. In: 2019 International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), pp. 611–616. IEEE (2019)
Google Scholar
Amaratunga, D., Cabrera, J., Lee, Y.S.: Enriched random forests. Bioinformatics 24(18), 2010–2014 (2008)
Article Google Scholar
Biau, G.: Analysis of a random forests model. J. Mach. Learn. Res. 13(Apr), 1063–1095 (2012)
MathSciNet MATH Google Scholar
Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. J. Mach. Learn. Res. 9(Sep), 2015–2033 (2008)
MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article Google Scholar
Criminisi, A., Shotton, J.: Decision Forests for Computer Vision and Medical Image Analysis. Springer (2013)
Google Scholar
Deng, H., Runger, G.: Feature selection via regularized trees. In: The 2012 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2012)
Google Scholar
Denil, M., Matheson, D., De Freitas, N.: Narrowing the gap: random forests in theory and in practice. In: International Conference on Machine Learning, pp. 665–673 (2014)
Google Scholar
Ishwaran, H.: The effect of splitting on random forests. Mach. Learn. 99(1), 75–118 (2015)
Article MathSciNet Google Scholar
Kulkarni, V.Y., Sinha, P.K., Petare, M.C.: Weighted hybrid decision tree model for random forest classifier. J. Inst. Eng. (India): Ser. B 97(2), 209–217 (2016)
Google Scholar
Liu, Y., Zhao, H.: Variable importance-weighted random forests. Quant. Biol. 5(4), 338–351 (2017)
Article MathSciNet Google Scholar
Paul, A., Mukherjee, D.P.: Enhanced random forest for mitosis detection. In: Proceedings of the 2014 Indian Conference on Computer Vision Graphics and Image Processing, p. 85. ACM (2014)
Google Scholar
Paul, A., Mukherjee, D.P.: Reinforced random forest. In: Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing, p. 1. ACM (2016)
Google Scholar
Paul, A., Mukherjee, D.P., Das, P., Gangopadhyay, A., Chintha, A.R., Kundu, S.: Improved random forest for classification. IEEE Trans. Image Process. 27(8), 4012–4024 (2018)
Article MathSciNet Google Scholar
Wang, Y., Xia, S.T., Tang, Q., Wu, J., Zhu, X.: A novel consistent random forest framework: Bernoulli random forests. IEEE Trans. Neural Netw. Learn. Syst. 29(8), 3510–3523 (2018)
Article MathSciNet Google Scholar
Winham, S.J., Freimuth, R.R., Biernacka, J.M.: A weighted random forests approach to improve predictive performance. Stat. Anal. Data Min.: ASA Data Sci. J. 6(6), 496–505 (2013)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Information Technology, Vadodara, Gandhinagar, India
Vikas Jain, Jaya Sharma, Kriti Singhal & Ashish Phophalia

Authors

Vikas Jain
View author publications
You can also search for this author in PubMed Google Scholar
Jaya Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Kriti Singhal
View author publications
You can also search for this author in PubMed Google Scholar
Ashish Phophalia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashish Phophalia .

Editor information

Editors and Affiliations

Tezpur University, Tezpur, India
Bhabesh Deka
Indian Statistical Institute, Kolkata, India
Pradipta Maji
Indian Statistical Institute, Kolkata, India
Sushmita Mitra
Tezpur University, Tezpur, India
Dhruba Kumar Bhattacharyya
Indian Institute of Technology Guwahati, Guwahati, India
Prabin Kumar Bora
Indian Statistical Institute, Kolkata, India
Sankar Kumar Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jain, V., Sharma, J., Singhal, K., Phophalia, A. (2019). Exponentially Weighted Random Forest. In: Deka, B., Maji, P., Mitra, S., Bhattacharyya, D., Bora, P., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2019. Lecture Notes in Computer Science(), vol 11941. Springer, Cham. https://doi.org/10.1007/978-3-030-34869-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-34869-4_19
Published: 25 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34868-7
Online ISBN: 978-3-030-34869-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Exponentially Weighted Random Forest

Abstract

Similar content being viewed by others

Improved Weighted Random Forest for Classification Problems

Artificial Intelligence Random Forest Algorithm and the Application