# Performance improvement via bagging in probabilistic prediction of chaotic time series using similarity of attractors and LOOCV predictable horizon

- 393 Downloads
- 1 Citations

## Abstract

Recently, we have presented a method of probabilistic prediction of chaotic time series. The method employs learning machines involving strong learners capable of making predictions with desirably long predictable horizons, where, however, usual ensemble mean for making representative prediction is not effective when there are predictions with shorter predictable horizons. Thus, the method selects a representative prediction from the predictions generated by a number of learning machines involving strong learners as follows: first, it obtains plausible predictions holding large similarity of attractors with the training time series and then selects the representative prediction with the largest predictable horizon estimated via LOOCV (leave-one-out cross-validation). The method is also capable of providing average and/or safe estimation of predictable horizon of the representative prediction. We have used CAN2s (competitive associative nets) for learning piecewise linear approximation of nonlinear function as strong learners in our previous study, and this paper employs bagging (bootstrap aggregating) to improve the performance, which enables us to analyze the validity and the effectiveness of the method.

## Keywords

Probabilistic prediction of chaotic time series Long-term unpredictability Attractors of chaotic time series Leave-one-out cross-validation Estimation of predictable horizon## 1 Introduction

So far, a number of methods for time series prediction have been studied (cf. [1, 2]), and our methods have awarded 3rd and 2nd places in the competitions of time series prediction held at IJCNN’04 [3] and ESTSP’07 [4], respectively. Our methods have used model selection methods evaluating MSE (mean square prediction error) for holdout and/or cross-validation datasets. Recently, we have developed several model selection methods for chaotic time series prediction [5, 6]. The method in [5] utilizes moments of predictive deviation as ensemble diversity measures for model selection in time series prediction and achieves better performance from the point of view of MSE than the conventional holdout method. The method in [6] uses direct multistep ahead (DMS) prediction to apply the out-of-bag (OOB) estimate of MSE. Although both methods have selected the models to generate good predictions on average, they cannot always have provided good predictions, especially when the horizon to be predicted is large. This is owing mainly to the fact that the MSE of a set of predictions is largely affected by a small number of predictions with short predictable horizons even if most of the predictions have long predictable horizons. This is because the prediction error of chaotic time series increases exponentially with the increase in time after the predictable horizon (see [6] for the analysis and [1] for properties of chaotic time series).

Instead of using model selection methods employing the estimation of the MSE, we have developed a method of probabilistic prediction of chaotic time series [7]. Here, from [8], the probabilistic prediction has come to dominate the science of weather and climate forecasting, mainly because the theory of chaos at the heart of meteorology shows that for a simple set of nonlinear equations (or Lorenz’s equations shown below) with initial conditions changed by minute perturbations, there is no longer a single deterministic solution and hence all forecasts must be treated as probabilistic. Although most of the methods shown in [8] use ensemble mean for representative forecast, our method in [7] (see below for details) uses an individual prediction selected from a set of plausible predictions for the representative because our method employs learning machines involving strong learners capable of making predictions with small error for a desirably long duration and we can see that ensemble mean does not work when the set of predictions for the ensemble involves a prediction with short predictable horizon. This is owing mainly to the exponential increase in prediction error of chaotic time series after the predictable horizon (see Sect. 3.2 for details)

Thus, instead of using ensemble mean, our method in [7] firstly selects plausible predictions by means of evaluating the similarity of attractors between training and predicted time series and then obtains the representative prediction by means of LOOCV (leave-one-out cross-validation) to select the prediction with longer predictable horizon. Comparing with our previous methods using the MSE for model selection [5, 6], the method in [7] has an advantage that it is capable of selecting the representative prediction from plausible predictions for each start time of prediction and providing the estimation of predictable horizon. Furthermore, it has achieved long predictable horizons on average. However, there are several cases where the method selects representative prediction with short predictable horizon, although there are plausible predictions with longer predictable horizons.

To overcome this problem, this paper tries to improve the performance of learning machines by using bagging (bootstrap aggregating) method and show the analysis of LOOCV predictable horizon. Here, the bagging is known to use ensemble mean to have an ability to reduce the variance of predictions by single learning machines, and then, we can expect that the performance in time series prediction becomes more stable and higher. Note that, in this paper, the bagging ensemble is employed for iterated one-step-ahead (IOS) prediction of time series, and we deal with probabilistic prediction as an ensemble of longer-term predictions. Furthermore, we use CAN2 (competitive associative net 2) as a learning machine (see [3] for the details of CAN2), where CAN2 has been introduced for learning piecewise linear approximation of nonlinear function and the performance has been shown in evaluating predictive uncertainty challenge [9], where our method has been awarded the first place in regression problems. The CAN2 has been used in our methods [3, 4] for the competitions of time series predictions shown above.

We show the present method of probabilistic prediction of chaotic time series in Sect. 2, experimental results and analysis in Sect. 3, and the conclusion in Sect. 4.

## 2 Probabilistic prediction of chaotic time series

### 2.1 IOS prediction of chaotic time series

*k*-dimensional delay embedding from a chaotic differential dynamical system (see [1] for the theory of chaotic time series). Here, \(y_t\) is obtained not analytically but numerically, and then, \(y_t\) involves an error \(e(\varvec{x}_t)\) owing to an executable finite calculation precision. This indicates that there are a number of plausible target functions \(r(\varvec{x}_t)\) with allowable error \(e(\varvec{x}_t)\). Furthermore, in general, a time series generated with higher precision has small prediction error for longer duration of time from the initial time of prediction. Thus, let a time series generated with a high precision (or 128-bit precision; see Sect. 3 for details), be ground truth time series \({y}^{[{\mathrm{\tiny gt}}]}_{t}\), while we examine predictions generated with standard 64-bit precision.

*t*and the horizon

*h*. For a given training time series \(y_{t_{\mathrm{\tiny g}}:h_{\mathrm{\tiny g}}}(={y}^{[{\mathrm{\tiny train}}]}_{t_{\mathrm{\tiny g}}:h_{\mathrm{\tiny g}}})\), we are supposed to predict succeeding time series \(y_{t_{\mathrm{\tiny p}}:h_{\mathrm{\tiny p}}}\) for \(t_{\mathrm{\tiny p}}\ge t_{\mathrm{\tiny g}}+h_{\mathrm{\tiny g}}\). Then, we make the training dataset \({D}^{[{\mathrm{\tiny train}}]}_{}=\{(\varvec{x}_{t},y_{t})\mid t \in {I}^{[{\mathrm{\tiny train}}]}_{}\}\) for \({I}^{[{\mathrm{\tiny train}}]}_{}=\{t\mid t_{\mathrm{\tiny g}}+k\le t< t_{\mathrm{\tiny g}}+h_{\mathrm{\tiny g}}\}\) to train a learning machine. After the learning, the machine executes IOS prediction by

### 2.2 Single CAN2 and the bagging for IOS prediction

*N*units. The \(j\)th unit has a weight vector \(\varvec{w}_{j}\triangleq (w_{{j}1},\ldots ,w_{{j}k})^{\mathrm{T}}\in {\mathbb {R}}^{k\times 1}\) and an associative matrix (or a row vector) \(\varvec{M}_{j}\,\triangleq \,(M_{{j}0},M_{{j}1},\ldots ,M_{{j}k})\in {\mathbb {R}}^{1\times (k+1)}\) for \({j}\in I^{N} \triangleq \{1,2,\ldots ,N\}\). The CAN2 after learning the training dataset \({D}^{[\mathrm{\tiny train}]}_{}=\{(\varvec{x}_t,y_t)\mid t\in {I}^{[{\mathrm{\tiny train}}]}_{}\}\) approximates the target function \(r(\varvec{x}_t)\) by

*c*(

*t*)th unit of the CAN2. The index

*c*(

*t*) indicates the unit who has the weight vector \(\varvec{w}_{c(t)}\) closest to the input vector \(\varvec{x}_t\), or \(c(t)\triangleq \mathop{{\mathrm{argmin}}}\nolimits_{{j}\in I^N} \Vert \varvec{x}_t-\varvec{w}_{j}\Vert.\) Note that the above prediction performs piecewise linear approximation of \(y=r(\varvec{x})\) and

*N*indicates the number of piecewise linear regions. We use the learning algorithm shown in [10] whose high performance in regression problems has been shown in evaluating predictive uncertainty challenge [9].

*N*units after leaning \({D}^{[n\alpha ^{\sharp }]}_{,}{j}\), which we denote \({\theta }^{[j]}_{N} \,(\in {\varTheta }_{N}\triangleq \{{\theta }^{[j]}_{N}\mid {j}\in {J}^{[{\mathrm{\tiny bag}}]}_{}\})\), the bagging for predicting the target value \(r_{t_c}=r(\varvec{x}_{t_c})\) is done by

*j*th machine \({\theta }^{[j]}_{N}\). The angle brackets \(\left\langle \cdot \right\rangle\) indicate the mean, and the subscript \({j}\in {J}^{[{\mathrm{bag}}]}_{}\) indicates the range of the mean. For simple expression, we sometimes use \({\langle \cdot \rangle }_{j}\) instead of \({\langle \cdot \rangle }_{{j}\in {J}^{[{\mathrm{bag}}]}_{}}\) in the following.

### 2.3 Probabilistic prediction and estimation of predictable horizon

#### 2.3.1 Similarity of attractors to select plausible predictions

*N*of units, where \(\varTheta _{}\) indicates the set of all learning machines. We employ single and bagging CAN2s, which we denote \({\theta }^{[{\mathrm {single}}]}_{N}\) and \({\theta }^{[{\mathrm{bag}}]}_{N}\), respectively, if necessary. We suppose that there are a number of plausible prediction functions \(f(\cdot )={f}^{[{\theta }_{N}]}_{}(\cdot )\), and we have to remove implausible ones. To have this done, we select the following set of plausible predictions:

*z*is true, and 0 if

*z*is false, and \(\lfloor \cdot \rfloor\) indicates the floor function.

#### 2.3.2 LOOCV measure to estimate predictable horizons

#### 2.3.3 Probabilistic prediction involving longer LOOCV predictable horizons

#### 2.3.4 Representative prediction and estimation of predictable horizon

## 3 Numerical experiments and analysis

### 3.1 Experimental settings

Using \({y}^{[{\mathrm{\tiny train}}]}_{t_g:h_g}={y}^{[{\mathrm{\tiny gt}}]}_{0:2000}\), we make the training dataset \({D}^{[{\mathrm{\tiny train}}]}_{}=\{({\varvec{x}}^{[{\mathrm{\tiny gt}}]}_{t},{y}^{[{\mathrm{\tiny gt}}]}_{t})\mid t \in {I}^{[{\mathrm{\tiny train}}]}_{}\}\) for \({I}^{[{\mathrm{\tiny train}}]}_{}=\{10 \,(=k),11,\ldots ,1999\}\) and \({\varvec{x}}^{[{\mathrm{\tiny gt}}]}_{t}=({y}^{[{\mathrm{\tiny gt}}]}_{t-1},\ldots ,{y}^{[{\mathrm{\tiny gt}}]}_{t-k})^{\mathrm{T}}\). For learning machines \(\theta _N\), we have employed single CAN2s \({\theta }^{[{\mathrm {single}}]}_{N}\) and bagging CAN2s \({\theta }^{[{\mathrm{bag}}]}_{N}\) with the number of units \(N=5+20i\,(i=0,1,2,\ldots ,14)\). After the training, we execute IOS prediction \({\hat{y}}_{t}={f}^{[\theta _N]}_{}(\varvec{x}_{t})\) for \(t=t_p,t_p+1,\ldots\) with the initial input vector \(\varvec{x}_{t_{\mathrm{\tiny p}}}=({y}^{[{\mathrm{\tiny gt}}]}_{t_{\mathrm{\tiny p}}-1},\ldots ,{y}^{[{\mathrm{\tiny gt}}]}_{t_{\mathrm{\tiny p}}-k})\) for prediction start time \(t_{\mathrm{\tiny p}}\in T_{\mathrm{\tiny p}}=\{2000+100i\mid i=0,1,2,\ldots ,29\}\) and prediction horizon \(h_p=500\). We show experimental results for the embedding dimension being \(k=10\) and the threshold in (8) being \(e_y=10\) (see [7] for the result with \(k=8\), which is not significantly but slightly different).

In order to estimate the accuracy of \({y}^{[{\mathrm{\tiny gt}}]}_{t}\), we have obtained an average predictable horizon \(\left\langle h\left( {y}^{[{\mathrm{\tiny gt}}]}_{t:500},{y}^{[\Delta t=10^{-5},r=128]}_{t:500}\right) \right\rangle _{t\in T_{\mathrm{p}}}=230\) steps (=5.75 s/25 ms) for the time series \({y}^{[\Delta t=10^{-5},r=128]}_{t:500}\) generated with \(\Delta t=10^{-5}\) and \(r=128\)-bit precision via the Runge–Kutta method. This indicates that \({y}^{[{\mathrm{\tiny gt}}]}_{t}\) with \(\Delta t=10^{-4}\) and \(r=128\) is considered to be accurate during 230 steps on average because we have observed that predictable horizon of two time series generated by the Runge–Kutta method with step sizes \(\Delta t=10^{-n}\) and \(10^{-n-1}\) for \(n=3,4,5,6,7\) increases monotonically with the decrease in step size or the increase in *n*.

Here, note that we have executed several experiments with using the parameter \(\theta =(N,k)\) for \(k=6\), 8, 10, 12 and so on, and we do not have found out any critically different results, although we would like to execute and show the results of comparative study in our future research.

### 3.2 Results and analysis

In Fig. 2b, we can see that single CAN2s have larger number of predictions with the similarity *S* smaller than \(S_{\mathrm{\tiny th}}=0.8\) than bagging CAN2s at \(t=2799\), and their predictions are not selected as plausible predictions. A detailed analysis of the similarity is shown below.

The representative prediction \({y}^{[\theta _{\sigma (1)}]}_{t_p:h_p}\) (green) shown in (c) is chosen by means of selecting the largest LOOCV predictable horizon \({\tilde{h}}^{[\theta _{\sigma (1)}]}_{t_p:h_p}\) shown in (d). From (d), we can see that the single CAN2 (left) has actual predictable horizon \({h}^{[\theta _N]}_{t_p:h_p}\) larger than 200 and LOOCV predictable horizon \({\tilde{h}}^{[\theta _N]}_{t_p:h_p}\) smaller than 100, actually \(({h}^{[\theta _{N}]}_{t_p:h_p},{\tilde{h}}^{[\theta _{N}]}_{t_p:h_p})=(209,72.1)\). Since the present method selects the prediction with the largest \({\tilde{h}}^{[\theta _{N}]}_{t_p:h_p}\), the prediction with \({h}^{[\theta _{N}]}_{t_p:h_p}=209\) could not have selected. On the other hand, we can see that bagging CAN2 (right in (d)) successfully selects the prediction with \({h}^{[\theta _{N}]}_{t_p:h_p}\) larger than 100, actually \(({h}^{[\theta _{N}]}_{t_p:h_p},{\tilde{h}}^{[\theta _{N}]}_{t_p:h_p})=(183,191)\). Precisely, bagging CAN2s have successfully provided large \({\tilde{h}}^{[\theta _{N}]}_{t_p:h_p}=191\) because there are a number of predictions with long predictable horizons around \({h}^{[\theta _{N}]}_{t_p:h_p}=200\) as shown as the group of points neighboring \({h}^{[\theta _{N}]}_{t_p:h_p}=200\) in (d) on the right-hand side. Incidentally, from (c), we can see that ensemble mean does not seem appropriate for producing representative prediction in long-term prediction of chaotic time series.

In Fig. 3, we show the results of actual and estimated predictable horizons. Note that we have obtained \(\left\langle h\left( {y}^{[{\mathrm{\tiny gt}}]}_{t:500},{y}^{[\Delta t=5\times 10^{-4},r=64]}_{t:500}\right) \right\rangle _{t\in T_p}=172\) steps (=4.3s/25ms) and \(\left\langle h\left( {y}^{[{\mathrm{\tiny gt}}]}_{t:500},{y}^{[\Delta t=10^{-3},r=64]}_{t:500}\right) \right\rangle _{t\in T_p}=142\) steps (=3.55 s/25 ms) and the former is almost the same as the mean of predictable horizons achieved by single and bagging CAN2 being 170 and 175 steps, respectively. This indicates that single and bagging CAN2s after learning the training data generated via the Runge–Kutta method with the step size \(\Delta t=10^{-4}\) have almost the same prediction performance as the Runge–Kutta method with \(\Delta t=5\times 10^{-4}\). Although we do not have no general measure to evaluate time series prediction so far, the above method using the step size of Runge–Kutta method and the mean predictable horizon seems reasonable. In Fig. 3a, we can see that the performance of the stability of prediction by single CAN2 is improved by bagging CAN2 from the point of view that the former has four actual predictable horizons \({h}^{[{\theta }^{[{\mathrm {single}}]}_{\sigma (1)}]}_{t_p:h_p}\) smaller than 100 among all predictions for \(t_{\mathrm{\tiny p}}\in T_{\mathrm{\tiny p}}\) and bagging CAN2 has achieved all \({h}^{[{\theta }^{[{\mathrm{bag}}]}_{\sigma (1)}]}_{t_p:h_p}\) larger than 100. From (b), we can see that the estimated predictable horizon \({\hat{h}}^{[\theta _{\sigma (1)}]}_{t_p:h_p}\) with \(H_{\mathrm{\tiny th}}=0.5\) is almost the same as actual predictable horizon \({h}^{[\theta _{\sigma (1)}]}_{t_p:h_p}\), while \(H_{\mathrm{\tiny th}}=0.9\) has achieved safe estimation, or \({\hat{h}}^{[\theta _{\sigma (1)}]}_{t_p:h_p}\le {h}^{[\theta _{\sigma (1)}]}_{t_p:h_p}\),

In order to analyze the property of the method, we show the attractor distribution of training and representative time series in Fig. 4. We can see that the similarity of attractors \(S({y}^{[{\theta }^{[{\mathrm {single}}]}_{\sigma (1)}]}_{t_p:h_p},{y}^{[{\mathrm{\tiny train}}]}_{t_g:h_g})=0.859\) obtained by single CAN2 is smaller than \(S({y}^{[{\theta }^{[{\mathrm{bag}}]}_{\sigma (1)}]}_{t_p:h_p},{y}^{[{\mathrm{\tiny train}}]}_{t_g:h_g})=0.939\) obtained by bagging CAN2. From the result on the left in Fig. 2b, we can see that there is a prediction with the similarity larger than 0.859 for single CAN2. Actually, the maximum similarity of single CAN2s is 0.931. The prediction \({y}^{[{\theta }_{\sigma _S(1)}]}_{t_p:h_p}\) with the maximum similarity of attractors in plausible predictions has a possibility to be used for selecting a representative prediction, where \(\theta _{\sigma _S(1)}\) indicates the learning machine with the maximum similarity. The comparison between \({h}^{[\theta _{\sigma (1)}]}_{t_p:h_p}\) and \({h}^{[\theta _{\sigma _S(1)}]}_{t_p:h_p}\) is shown in Fig. 5a, where \({h}^{[\theta _{\sigma _S(1)}]}_{t_p:h_p}\) seems competitive with \({h}^{[\theta _{\sigma (1)}]}_{t_p:h_p}\) for single CAN2, but worse for bagging CAN2. To analyze much more, we have examined the correlation \(r({S}^{[\theta _N]}_{t_p:h_p},{h}^{[{\theta }_{N}]}_{t_p:h_p})\) between the similarity \({S}^{[\theta _N]}_{t_p:h_p}=S({y}^{[{\theta }_{N}]}_{t_p:h_p},{y}^{[{\mathrm{\tiny train}}]}_{t_g:h_g})\) and the predictable horizon \({{h}}^{[\theta _{N}]}_{t_p:h_p}=h({y}^{[{\theta }_{N}]}_{t_p:h_p},{y}^{[{\mathrm{\tiny train}}]}_{t_g:h_g})\), as well as the correlation \(r({\tilde{h}}^{[\theta _N]}_{t_p:h_p},{h}^{[{\theta }_{\sigma _S(1)}]}_{t_p:h_p})\) as shown in Fig. 5b. From this result, there are a number of cases with positive low or negative value of correlations. In particular, the correlation of similarity, \(r({S}^{[\theta _N]}_{t_p:h_p},{h}^{[{\theta }_{N}]}_{t_p:h_p})\), has few cases with the values larger than 0.5 for both single and bagging CAN2. This suggests that the selection of representative prediction by using the similarity measure is not so reliable. On the other hand, bagging CAN2 has larger number of cases with the correlations larger than 0.5 as we can see the thick line of \(r({\tilde{h}}^{[\theta _N]}_{t_p:h_p},{h}^{[{\theta }_{\sigma _S(1)}]}_{t_p:h_p})\) on the right-hand side in Fig. 5b. Furthermore, we can see that there are several cases of \(t_p\) with negative correlations \(r({\tilde{h}}^{[\theta _N]}_{t_p:h_p},{h}^{[{\theta }_{\sigma _S(1)}]}_{t_p:h_p})\) in (b), and the corresponding predictable horizons \({h}^{[\theta _{\sigma (1)}]}_{t_p:h_p}\) in (a) are shorter than the neighboring (w.r.t. \(t_p\)) horizons. This correspondence seems reasonable because negative correlation does not contribute to the selection of the prediction with large predictable horizon. Thus, we have to remove the cases of negative correlations. So far, we have two approaches: one is to improve the performance of learning machine much more as we have done with the bagging method in this paper, and the other is to refine the selection method by means of modifying LOOCV predictable horizon or developing new methods. Actually, we have predictions with much longer predictable horizons not shown in this paper, but we cannot select such predictions without knowing the ground truth time series, so far.

## 4 Conclusion

We have presented a performance improvement in the method for probabilistic prediction of chaotic time series by means of using bagging learning machines. The method obtains a set of plausible predictions by means of using similarity of attractors between training and predicted time series. And then, it provides representative prediction which has the longest LOOCV predictable horizon. By means of executing numerical experiments using single and bagging CAN2s, we have shown that bagging CAN2 improves the performance of single CAN2 and analyzed the relationship between LOOCV and actual predictable horizons. In our future research studies, we would like to overcome the problem of negative correlation between the achieved predictable horizon and the LOOCV predictable horizon, or the measure of selecting representative prediction.

## Notes

## Compliance with ethical standards

## Conflict of interest

The authors declare no conflicts of interest associated with this article.

## References

- 1.Aihara K (2000) Theories and applications of chaotic time series analysis. Sangyo Tosho, TokyoGoogle Scholar
- 2.Lendasse A, Oja E (2004) Time series prediction competition: the cats benchmark. Proc IJCNN 2004:1615–1620Google Scholar
- 3.Kurogi S, Ueno T, Sawa M (2007) Time series prediction of the CATS benchmark using Fourier bandpass filters and competitive associative nets. Neurocomputing 70(13–15):2354–2362CrossRefGoogle Scholar
- 4.Kurogi S, Tanaka S, Koyama R (2007) Combining the predictions of a time series and the first-order difference using bagging of competitive associative nets. In: Proceedings of the European symposium on time series prediction (ESTSP) 2007, pp 123–131Google Scholar
- 5.Kurogi S, Ono K, Nishida T (2013) Experimental analysis of moments of predictive deviations as ensemble diversity measures for model selection in time series prediction. In: Proceedings of ICONIP, (2013) Part III, LNCS 8228. Springer, HeidelbergGoogle Scholar
- 6.Kurogi S, Shigematsu R, Ono K (2014) Properties of direct multi-step ahead prediction of chaos time series and out-of-bag estimate for model selection. In: Proceedings of ICONIP2014, Part II, LNCS 8835. Springer, HeidelbergGoogle Scholar
- 7.Kurogi S, Toidani M, Shigematsu R, Matsuo K (2015) Prediction of chaotic time series using similarity of attractors and LOOCV predictable horizons for obtaining plausible predictions. In: Proceedings of ICONIP 2015, LNCS 9491, pp 72–81Google Scholar
- 8.Slingo J, Palmer T (2011) Uncertainty in weather and climate prediction. Phil Trans R Soc A 369:4751–4767CrossRefMATHGoogle Scholar
- 9.Quiñonero-Candela J, Rasmussen CE, Sinz FH, Bousquet Q, Schölkopf B (2006) Evaluating Predictive Uncertainty Challenge. In: Quiñonero-Candela J et al (eds) MLCW 2005, LNAI 3944. Springer, Heidelberg, pp 1–27Google Scholar
- 10.Kurogi S, Sawa M, Tanaka S (2006) Competitive associative nets and cross-validation for estimating predictive uncertainty on regression problems. Lecture Notes on Artificial Intelligence (LNAI) 3944:78–94Google Scholar
- 11.Breiman L (1996) Bagging predictors. Mach Learn 26:123–140MATHGoogle Scholar
- 12.Kurogi S (2009) Improving generalization performance via out-of-bag estimate using variable size of bags. J Jpn Neural Netw Soc 16(2):81–92Google Scholar
- 13.Efron B, Tbshirani R (1997) Improvements on cross-validation: the 632+ bootstrap method. J Am Stat Assoc 92:548–560MathSciNetMATHGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.