Extending AIC to best subset regression

  • J. G. Liao
  • Joseph E. Cavanaugh
  • Timothy L. McMurry
Original Paper
  • 12 Downloads

Abstract

The Akaike information criterion (AIC) is routinely used for model selection in best subset regression. The standard AIC, however, generally under-penalizes model complexity in the best subset regression setting, potentially leading to grossly overfit models. Recently, Zhang and Cavanaugh (Comput Stat 31(2):643–669, 2015) made significant progress towards addressing this problem by introducing an effective multistage model selection procedure. In this paper, we present a rigorous and coherent conceptual framework for extending AIC to best subset regression. A new model selection algorithm derived from our framework possesses well understood and desirable asymptotic properties and consistently outperforms the procedure of Zhang and Cavanaugh in simulation studies. It provides an effective tool for combating the pervasive overfitting that detrimentally impacts best subset regression analysis so that the selected models contain fewer irrelevant predictors and predict future observations more accurately.

Keywords

Akaike information criterion Expected optimism Model selection Overfitting 

References

  1. Akaike H (1973) Information theorey and an extension of the maximum likelihood principle. In: Proceeding of the second international symposium of information theory, pp 267–281Google Scholar
  2. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control AC–19:716–723MathSciNetCrossRefMATHGoogle Scholar
  3. Bengtsson T, Cavanaugh JE (2006) An improved akaike information criterion for state-space model selection. Comput Stat Data Anal 50(10):2635–2654MathSciNetCrossRefMATHGoogle Scholar
  4. Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann Stat 35(6):813–852MathSciNetCrossRefMATHGoogle Scholar
  5. Efron B (1983) Estimating the error rate of a prediction rule: improvement on crossvalidation. J Am Stat Assoc 78(382):316–331CrossRefMATHGoogle Scholar
  6. Efron B (1986) How biased is the apparent error rate of a prediction rule? J Am Stat Assoc 81(394):461–470MathSciNetCrossRefMATHGoogle Scholar
  7. Fujikoshi Y (1983) A criterion for variable selection in multiple discriminant analysis. Hiroshima Math 13:203–214MathSciNetMATHGoogle Scholar
  8. Hurvich CM, Tsai C-L (1989) Regression and time series model selection in small samples. Biometrika 76(2):297–307MathSciNetCrossRefMATHGoogle Scholar
  9. Hurvich CM, Shumway R, Tsai C-L (1990) Improved estimators of Kullback–Leibler information for autoregressive model selection in small samples. Biometrika 77(4):709–719MathSciNetGoogle Scholar
  10. Kitagawa G, Konishi S (2008) Information criteria and statistical modeling. Springer, New YorkMATHGoogle Scholar
  11. Liao J, McGee D (2003) Adjusted coefficients of determination for logistic regression. Am Stat 57:161–165MathSciNetCrossRefMATHGoogle Scholar
  12. Schwarz GE (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464MathSciNetCrossRefMATHGoogle Scholar
  13. Shibata RITEI (1976) Selection of the order of an autoregressive model by Akaike’s information criterion. Biometrika 63(1):117–126MathSciNetCrossRefMATHGoogle Scholar
  14. Sugiura N (1978) Further analysts of the data by Akaike’s information criterion and the finite corrections. Commun Stat 7(1):13–26MathSciNetCrossRefMATHGoogle Scholar
  15. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetMATHGoogle Scholar
  16. White H (1982) Maximum likelihood estimation of misspecified models. Econometrica 50(1):1–25MathSciNetCrossRefMATHGoogle Scholar
  17. Ye J (1998) On measuring and correcting the effects of data mining and model selection. J Am Stat Assoc 93(441):120–131MathSciNetCrossRefMATHGoogle Scholar
  18. Zhang T, Cavanaugh JE (2015) A multistage algorithm for best-subset model selection based on the Kullback-Leibler discrepancy. Comput Stat 31(2):643–669MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • J. G. Liao
    • 1
  • Joseph E. Cavanaugh
    • 2
  • Timothy L. McMurry
    • 3
  1. 1.Penn State UniversityHersheyUSA
  2. 2.University of IowaIowa CityUSA
  3. 3.University of VirginiaCharlottesvilleUSA

Personalised recommendations