Abstract
Statistical estimation in multivariate data sets presents myriad challenges when the form of the regression function linking the outcome and explanatory variables is unknown. Our study seeks to understand the computational challenges of regression estimation’s underlying optimization problem and design intelligent procedures for this setting. We begin by analyzing the size of the parameter space in polynomial regression in terms of the number of variables and the constraints on the polynomial degree and the number of interacting explanatory variables.We subsequently propose a new procedure for statistical estimation that relies upon cross-validation to select the optimal parameter subspace and an evolutionary algorithm to minimize risk within this subspace based upon the available data. This general purpose procedure can be shown to perform well in a variety of challenging multivariate estimation settings. Furthermore, the procedure is sufficiently flexible to allow the user to incorporate known causal structures into the estimate and to adjust computational parameters such as the population mutation rate according to the problem’s specific challenges. Furthermore, the procedure can be shown to asymptotically converge to the globally optimal estimate. We compare this evolutionary algorithm to a variety of competitors over the course of simulation studies and in the context of a study of disease progression in diabetes patients.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bäck, T.: Evolutionary Algorithms in Theory and Practice. Oxford University Press, Oxford (1996)
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Breiman, L., Friedman, J.H., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. Chapman and Hall, Boca Raton (1984)
Candes, E., Tao, T.: The dantzig selector: Statistical estimation when p is much larger than n. Annals of Statistics 35(6), 2313–2351 (2007)
Chambers, J.M., Cleveland, W.S., Tukey, P.A.: Graphical Methods for Data Analysis. Duxbury Press (1983)
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press, Cambridge (1990)
Dudoit, S., van der Laan, M.J.: Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Statistical Methodology 2(2), 131–154 (2005)
Dudoit, S., van der Laan, M.J., Keleş, S., Molinaro, A.M., Sinisi, S.E., Teng, S.L.: Loss-based estimation with cross-validation: Applications to microarray data analysis. In: Piatetsky-Shapiro, G., Tamayo, P. (eds.) Microarray Data Mining. SIGKDD Explorations, vol. 5, pp. 56–68. ACM, New York (2003), http://www.acm.org/sigs/sigkdd/explorations/issue5-2.htm
Durbin, B., Dudoit, S., van der Laan, M.J.: Optimization of the architecture of neural networks using a Deletion/Substitution/Addition algorithm. Tech. Rep. 170, Division of Biostatistics, University of California, Berkeley (2005), www.bepress.com/ucbbiostat/paper170
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Annals of Statistics 32(4), 407–499 (2004)
Fogel, D.B.: Evolutionary Computation: Toward a New Philosophy of Machine Intelligence. IEEE Press, Los Alamitos (2005)
Freedman, D.A.: Statistical Models: Theory and Practice, 2nd edn. Cambridge University Press, Cambridge (2009)
Friedman, J.H.: Multivariate adaptive regression splines. The Annals of Statistics 19(1), 1–141 (1991)
Friedman, J.H.: Fast sparse regression and classification. Tech. rep., Department of Statistics, Stanford University (2008), http://www-stat.stanford.edu/~jfh/
van der Laan, M.J., Dudoit, S.: Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive ε-net estimator: Finite sample oracle inequalities and examples. Tech. Rep. 130, Division of Biostatistics, University of California, Berkeley (2003), www.bepress.com/ucbbiostat/paper130
Sinisi, S.E., van der Laan, M.J.: Deletion/substitution/addition algorithm in learning with applications in genomics. Statistical Applications in Genetics and Molecular Biology 3(1), Article 18 (2004), www.bepress.com/sagmb/vol3/iss1/art18
Specht, D.F.: A general regression neural network. IEEE Transactions on Neural Networks 2(6), 568–576 (1991)
Stoll, M.: Introduction to Real Analysis. Addison-Wesley, Reading (2000)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society 58(1), 267–288 (1996)
Wolpert, D.H., MacReady, W.G.: No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation 1(1), 67–82 (1997)
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Shilane, D., Liang, R.H., Dudoit, S. (2010). Loss-Based Estimation with Evolutionary Algorithms and Cross-Validation. In: Tenne, Y., Goh, CK. (eds) Computational Intelligence in Expensive Optimization Problems. Adaptation Learning and Optimization, vol 2. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10701-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-10701-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10700-9
Online ISBN: 978-3-642-10701-6
eBook Packages: EngineeringEngineering (R0)