Synonyms

Bootstrap estimation; Bootstrap sampling

Definition

The bootstrap is a statistical method for estimating the performance (e.g., accuracy) of classification or regression methods. The bootstrap is based on the statistical procedure of sampling with replacement. Unlike other estimation methods such as cross-validation, the same object or tuple can be selected for the training set more than once in the boostrap. That is, each time a tuple is selected, it is equally likely to be selected again and re-added to the training set.

Historical Background

The bootstrap sampling was developed by Bradley Efron in 1979, and mainly used for estimating the statistical parameters such as mean, standard errors, etc. [2]. A meta-classification method using the bootstrap called bootstrap aggregating (or bagging) was proposed by Leo Breiman in 1994 to improve the classification by combining classifications of randomly generated training sets [1].

Foundations

This section discusses a commonly used bootstrap method, 0.632 bootstrap. Given a dataset of N tuples, the dataset is sampled N times, with replacement, resulting in a bootstrap sample or training set of N tuples. It is very likely that some of the original data tuples will occur more than once in the training set. The data tuples that were not sampled into the training set end up forming the test set. If this process is repeated multiple times, on average 63.2 % of the original data tuples will end up in the training set and the remaining 36.8 % will form the test set (hence, the name, 0.632 bootstrap).

The figure, 63.2 %, comes from the fact that a tuple will not be chosen with probability of 36.8 %. Each tuple has a probability of 1∕N of being selected, so the probability of not being chosen is (1 – 1∕N). The selection is done N times, so the probability that a tuple will not be chosen during the whole time is (1 – 1∕N)N. If N is large, the probability approaches e−1 = 0.368. Thus, 36.8 % of tuples will not be selected for training and thereby end up in the test set, and the remaining 63.2 % will form the training set.

The above procedure can be repeated k times, where in each iteration, the current test set is used to obtain an accuracy estimate of the model obtained from the current bootstrap sample. The overall accuracy of the model is then estimated as

$$ \begin{array}{ll} Acc(M)&={\displaystyle \sum_{i=1}^k\big(0.632\times cc{\left({M}_i\right)}_{test\hbox{\_}set}}\\[6pt]&{}\quad{\displaystyle+0.368\times Acc{\left({M}_i\right)}_{train\hbox{\_}{set}}\big)},\end{array} $$
(1)

where Acc(Mi)train_set and Acc(Mi)test_set are the accuracy of the model obtained with bootstrap sample i when it is applied to training set and test set respectively in sample i.

Key Applications

The bootstrap method is preferably used for estimating the performance when the size of dataset is relatively small.

Cross-References