Abstract
We discuss some computationally efficient procedures for robust variable selection in linear regression. A key component in these procedures is the computation of robust correlations between pairs of variables. We show that the robust variable selection procedures can easily handle missing data under the assumption that data are missing completely at random.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
ALQALLAF, F.A., KONIS, K.P., MARTIN, R.D. and ZAMAR, R.H. (2002): Scalable robust covariance and correlation estimates for data mining. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, 14-23.
BREIMAN, L. (2001): Random Forests. Machine Learning 24, 5-32.
EFRON, B.E., HASTIE, T., JOHNSTONE, I. and TIBSHIRANI, R. (2004): Least angle regression. The Annals of Statistics 32(2), 407-451.
FRANK, I. and FRIEDMAN, J.H. (1993): A statistical view of some chemometrics regression tools. Technometrics 35, 109-148.
GATU, C. and KONTOGHIORGHES, E.J. (2006): Branch-and-bound algorithms for computing the best subset regression models. Journal of Computational and Graphical Statistics 15, 139-156.
FURNIVAL, G. and WILSON, R. (1974): Regression by leaps and bounds. Technometrics 16, 499-511.
HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. (2001): The Elements of Statistical Learning. Springer-Verlag, New York.
HOFMANN M., GATU C. and KONTOGHIORGHES, E.J. (2007): Efficient algorithms for computing the best subset regression models for large-scale problems. Computational Statistics and Data Analysis 52(1), 16-29.
HUBER, P.J. (1981): Robust Statistics. John Wiley, New York.
KHAN, J.A., VAN AELST, S. and ZAMAR, R.H. (2007a): Building a robust linear model with forward selection and stepwise procedures. Computational Statistics and Data Analysis 52(1), 239-248.
KHAN, J.A., VAN AELST, S. and ZAMAR, R.H. (2007b): Robust linear model selection based on least angle regression. Journal of the American Statistical Association 102(480), 1289-1299.
LITTLE, R.J.A. (1992): Regression with missing X’s: a review. Journal of the American Statistical Association 87(420), 1227-1237.
LITTLE, R.J.A. and RUBIN, D.B. (1987): Statistical Analysis with Missing Data. John Wiley, New York.
MARONNA, R.A. (1976): Robust M-estimators of multivariate location and scatter. The Annals of Statistics 4, 51-67.
MARONNA, R.A., MARTIN, R.D. and YOHAI, V.J. (2006): Robust Statistics: Theory and Methods. John Wiley, Chichester.
MÜLLER, S. and WELSH, A.H. (2005): Outlier robust model selection in linear regression. Journal of the American Statistical Association 100(472), 1297-1310.
RONCHETTI, E. (1997): Robustness aspects of model choice. Statistica Sinica 7, 327-338.
RONCHETTI, E., FIELD, C. and BLANCHARD, W. (1997): Robust linear model selection by cross-validation. Journal of the American Statistical Association, 92, 1017–1023.
SALIBIAN-BARRERA, M. and VAN AELST, S. (2007): Robust model selection using fast and robust bootstrap. submitted for publication.
WEISBERG, S. (1985): Applied Linear Regression (2nd ed.). Wiley-Interscience, New York.
YOHAI, V.J. (1987): High breakdown point and high efficiency robust estimates for regression. The Annals of Statistics 15, 642-656.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Physica-Verlag Heidelberg
About this paper
Cite this paper
Aelst, S., Khan, J.A., Zamar, R.H. (2008). Fast Robust Variable Selection. In: Brito, P. (eds) COMPSTAT 2008. Physica-Verlag HD. https://doi.org/10.1007/978-3-7908-2084-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-7908-2084-3_30
Publisher Name: Physica-Verlag HD
Print ISBN: 978-3-7908-2083-6
Online ISBN: 978-3-7908-2084-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)