Advertisement

Outlier Detection and Robust Variable Selection for Least Angle Regression

  • Shirin Shahriari
  • Susana Faria
  • A. Manuela Gonçalves
  • Stefan Van Aelst
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8581)

Abstract

The problem of selecting a parsimonious subset of variables from a large number of predictors in a regression model is a topic of high importance. When the data contains vertical outliers and/or leverage points, outlier detection and variable selection are inseparable problems. Therefore a robust method that can simultaneously detect outliers and select variables is needed. An outlier detection and robust variable selection method is introduced that combines robust least angle regression with least trimmed squares regression on jack-knife subsets. In a second stage the detected outliers are removed and standard least angle regression is applied on the cleaned data to robustly sequence the predictor variables in order of importance. The performance of this method is evaluated by simulations that contain vertical outliers and high leverage points. The results of the simulation study show the good performance of this method in both outlier detection and robust variable selection.

Keywords

Outlier Detection Robust Variable Selection Least Angle Regression 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Akaike, H.: Statistical predictor identification. Annals of the Institute of Statistical Mathematics 22(1), 203–217 (1970)CrossRefMathSciNetzbMATHGoogle Scholar
  2. 2.
    Mallows, C.L.: Some comments on c p. Technometrics 15(4), 661–675 (1973)zbMATHGoogle Scholar
  3. 3.
    Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)CrossRefMathSciNetzbMATHGoogle Scholar
  4. 4.
    Ronchetti, E.: Robust model selection in regression. Statistics & Probability Letters 3(1), 21–23 (1985)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Ronchetti, E., Staudte, R.G.: A robust version of mallows’ c p. Journal of the American Statistical Association 89(426), 550–559 (1994)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Maronna, R.A., Martin, R.D., Yohai, V.J.: Robust Statistics: Theory and Methods. J. Wiley & Sons (2006)Google Scholar
  7. 7.
    Müller, S., Welsh, A.: Outlier robust model selection in linear regression. Journal of the American Statistical Association 100(472), 1297–1310 (2005)CrossRefMathSciNetzbMATHGoogle Scholar
  8. 8.
    Salibian-Barrera, M., Van Aelst, S.: Robust model selection using fast and robust bootstrap. Computational Statistics & Data Analysis 52(12), 5121–5135 (2008)CrossRefMathSciNetzbMATHGoogle Scholar
  9. 9.
    Atkinson, A.C., Riani, M.: Forward search added-variable t-tests and the effect of masked outliers on model selection. Biometrika 89(4), 939–946 (2002)CrossRefMathSciNetzbMATHGoogle Scholar
  10. 10.
    Cantoni, E., Ronchetti, E.: Robust inference for generalized linear models. Journal of the American Statistical Association 96(455), 1022–1030 (2001)CrossRefMathSciNetzbMATHGoogle Scholar
  11. 11.
    Weisberg, S.: Applied Linear Regression. J. Wiley & Sons, New York (2005)CrossRefzbMATHGoogle Scholar
  12. 12.
    Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. The Annals of Statistics 32(2), 407–499 (2004)CrossRefMathSciNetzbMATHGoogle Scholar
  13. 13.
    Khan, J.A., Van Aelst, S., Zamar, R.H.: Robust linear model selection based on least angle regression. Journal of the American Statistical Association 102(480), 1289–1299 (2007)CrossRefMathSciNetzbMATHGoogle Scholar
  14. 14.
    Efron, B.: The jackknife, the bootstrap and other resampling plans, vol. 38. SIAM NSF-CBMS (1982)Google Scholar
  15. 15.
    Huber, P.J., Ronchetti, E.M.: Robust Statistics. Wiley, New York (2009)CrossRefzbMATHGoogle Scholar
  16. 16.
    Pison, G., Van Aelst, S., Willems, G.: Small sample corrections for lts and mcd. Metrika 55(1-2), 111–123 (2002)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Hubert, M., Rousseeuw, P.J., Van Aelst, S.: High-breakdown robust multivariate methods. Statistical Science, 92–119 (2008)Google Scholar
  18. 18.
    Frank, L.E., Friedman, J.H.: A statistical view of some chemometrics regression tools. Technometrics 35(2), 109–135 (1993)CrossRefzbMATHGoogle Scholar
  19. 19.
    Core Team, R.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2012)Google Scholar
  20. 20.
    Hastie, T., Efron, B.: lars: Least Angle Regression, Lasso and Forward Stagewise, R package version 1.2 (2013)Google Scholar
  21. 21.
    Alfons, A.: robustHD: Robust methods for high-dimensional data, R package version 0.4.0 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Shirin Shahriari
    • 1
  • Susana Faria
    • 1
  • A. Manuela Gonçalves
    • 1
  • Stefan Van Aelst
    • 2
    • 3
  1. 1.DMA-Department of Mathematics and Applications, CMAT-Centre of MathematicsUniversity of MinhoGuimarãesPortugal
  2. 2.Department of MathematicsK.U. LeuvenLeuvenBelgium
  3. 3.Department of Applied Mathematics, Computer Science and StatisticsGhent UniversityGhentBelgium

Personalised recommendations