Skip to main content

Model Selection and Biased Estimation

  • Chapter
  • First Online:
Advanced Statistics for the Behavioral Sciences
  • 1477 Accesses

Abstract

In Chap. 3 we learned that multiple regression is a powerful tool for modeling the contribution a variable makes to the prediction of a criterion holding other variables constant. Given this ability, it might seem that adding predictors to a regression model is always beneficial, but this is not the case. Part of the problem is collinearity. As discussed in Chap. 5, when the predictors are strongly related, their regression coefficients become unstable and their standard errors become inflated. But even when collinearity is not an issue, adding predictors to a regression equation can sometimes do more harm than good. To understand why, we turn to a discussion of prediction error and model complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Not all models produce precisely the pattern depicted in Fig. 8.1, but the general pattern is typical.

  2. 2.

    There is no objective basis for choosing the number of folds, but it is customary to use K = 5 or K = 10.

  3. 3.

    The data in Table 8.1 will be described more fully in Sect. 8.2.

  4. 4.

    With a linear model, AIC is sometimes computed as n ∗  log [(∑e2)/n ] + 2p.

  5. 5.

    Minimizing AIC is equivalent to minimizing a related metric known as Mallows’ Cp (Stone, 1977).

  6. 6.

    Some statistical packages in \( \mathcal{R} \) omit the constants for information criteria measures, producing different values. These differences don’t matter as long as comparisons are made within the same method.

  7. 7.

    Bias is introduced unless the deleted variables are orthogonal with the retained values or their associated regression coefficients are 0.

  8. 8.

    Although the data are fabricated, the pattern has support; see Richardson, Abraham, and Bond (2012).

  9. 9.

    Equivalently, we can say that IQ has the largest zero-order correlation with college performance.

  10. 10.

    Correspondence between the two method is not guaranteed.

  11. 11.

    Some versions of the sweep operator reverse the sign for the pivot value. In this case, the values in the upper left-hand corner of the swept matrix need to be negated to match the ones reported here.

  12. 12.

    Dividing each value of S by n − 1 yields a complete covariance matrix.

  13. 13.

    The branch and bound strategy is also known as the leaps and bounds algorithm (Furnival & Wilson Jr, 1974).

  14. 14.

    Equivalently, the R2 of a model can never increase when a variable is removed.

  15. 15.

    The number of models of size k from p predictors is found as p ! /(k ! (p − k)!).

  16. 16.

    The \( \mathcal{R} \) function uses R2 not RSS when implementing the algorithm because R2 is easier to obtain from the sweep operator.

  17. 17.

    See Hastie et al. (2009) for a discussion of the geometry of ridge regression and the lasso.

  18. 18.

    Equation (8.23) is appropriate only when the number of predictors is less than the number of observations. When this is not the case, a p-length vector of 1′s can be used as an initial estimate.

  19. 19.

    Cross-validation folds are formed randomly, so the precise value of the tuning parameter will vary unless a seed is set to control the randomization.

References

  • Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Second International Symposium on Information Theory (pp. 267–281). Budapest: Akademiai Kiado.

    Google Scholar 

  • Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics, 7, 397–416.

    MathSciNet  Google Scholar 

  • Furnival, G. M., & Wilson Jr., R. W. (1974). Regressions by leaps and bounds. Technometrics, 16, 499–511.

    Article  Google Scholar 

  • Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21, 215–223.

    Article  MathSciNet  Google Scholar 

  • Goodnight, J. H. (1979). A tutorial on the sweep operator. The American Statistician, 33, 149–158.

    MATH  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). New York: Springer.

    Book  Google Scholar 

  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York: Springer.

    Book  Google Scholar 

  • Murphy, K. P. (2012). Machine learning: A probabilistic perspective. Cambridge, MA: MIT.

    MATH  Google Scholar 

  • Richardson, M., Abraham, C., & Bond, R. (2012). Psychological correlates of university students’ academic performance: A systematic review and meta-analysis. Psychological Bulletin, 138, 353–387.

    Article  Google Scholar 

  • Stone, M. (1977). An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. Journal of the Royal Statistical Society: Series B., 39, 44–47.

    MathSciNet  MATH  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Brown, J.D. (2018). Model Selection and Biased Estimation. In: Advanced Statistics for the Behavioral Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-93549-2_8

Download citation

Publish with us

Policies and ethics