Abstract
Resampling-based methods such as k-fold cross-validation or repeated splitting into training and test sets are routinely used in the context of supervised statistical learning to assess the prediction performances of prediction methods using real data sets. In this paper, we consider methodological issues related to comparison studies of prediction methods which involve several real data sets and use resampling-based error estimators as the evaluation criteria. In the literature papers often claim that, say, “Method 1 performs better than Method 2 on real data” without applying any proper statistical inference approach to support their claims and without clearly explaining what they mean by “perform better.” We recently proposed a new statistical testing framework which provides a statistically correct formulation of such paired tests—which are often performed in the machine learning community—to compare the performances of two methods on several real data sets. However, the behavior of the different available resampling-based error estimation procedures in this statistical framework is unknown. In this paper we empirically assess this behavior through an exemplary benchmark study based on 50 microarray data sets and formulate tentative recommendations regarding the choice of resampling-based error estimation procedures in light of the results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Binder, H., Schumacher, M.: Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Stat. Appl. Genet. Mol. Biol. 7, 12 (2008)
Bock, J.: Bestimmung des Stichprobenumfangs. Oldenburg Verlag, München Wien (1998)
Boulesteix, A.-L.: PLS dimension reduction for classification with microarray data. Stat. Appl. Genet. Mol. Biol. 3, 33 (2004)
Boulesteix, A.-L.: On representative and illustrative comparisons with real data in bioinformatics: response to the letter to the editor by Smith et al. Bioinformatics 29, 2664–2666 (2013)
Boulesteix, A.-L., Lauer, S., Eugster, M.: A plea for neutral comparison studies in computational sciences. PLOS ONE 8, 61562 (2013)
Boulesteix, A.-L., Hable, R., Lauer, S., Eugster, M.: A statistical framework for hypothesis testing in real data comparison studies. Am. Stat. 69, 201–212 (2015)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
de Souza, B.F., de Carvalho, A., Soares, C.: A comprehensive comparison of ML algorithms for gene expression data classification. In: The 2010 International Joint Conference of Neural Networks (IJCNN), Barcelona, pp. 1–8 (2010)
Dougherty, E.R., Sima, C., Hanczar, B., Braga-Neto, U.M.: Performance of error estimators for classification. Curr. Bioinform. 5, 53–67 (2010)
Molinaro, A., Simon, R., Pfeiffer, R.M.: Prediction error estimation: a comparison of resampling methods. Bioinformatics 21, 3301–3307 (2005)
Slawski, M., Daumer, M., Boulesteix, A.-L.: CMA: a comprehensive bioconductor package for supervised classification with high dimensional data. BMC Bioinform. 9, 439 (2008)
Acknowledgements
We thank Rory Wilson for helpful comments.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Boulesteix, AL. (2016). Which Resampling-Based Error Estimator for Benchmark Studies? A Power Analysis with Application to PLS-LDA. In: Abdi, H., Esposito Vinzi, V., Russolillo, G., Saporta, G., Trinchera, L. (eds) The Multiple Facets of Partial Least Squares and Related Methods. PLS 2014. Springer Proceedings in Mathematics & Statistics, vol 173. Springer, Cham. https://doi.org/10.1007/978-3-319-40643-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-40643-5_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40641-1
Online ISBN: 978-3-319-40643-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)