Model Selection and Biased Estimation

Brown, Jonathon D.

doi:10.1007/978-3-319-93549-2_8

Jonathon D. Brown²

1477 Accesses

Abstract

In Chap. 3 we learned that multiple regression is a powerful tool for modeling the contribution a variable makes to the prediction of a criterion holding other variables constant. Given this ability, it might seem that adding predictors to a regression model is always beneficial, but this is not the case. Part of the problem is collinearity. As discussed in Chap. 5, when the predictors are strongly related, their regression coefficients become unstable and their standard errors become inflated. But even when collinearity is not an issue, adding predictors to a regression equation can sometimes do more harm than good. To understand why, we turn to a discussion of prediction error and model complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Not all models produce precisely the pattern depicted in Fig. 8.1, but the general pattern is typical.
2.
There is no objective basis for choosing the number of folds, but it is customary to use K = 5 or K = 10.
3.
The data in Table 8.1 will be described more fully in Sect. 8.2.
4.
With a linear model, AIC is sometimes computed as n ∗ log [(∑e²)/n ] + 2p.
5.
Minimizing AIC is equivalent to minimizing a related metric known as Mallows’ C_p (Stone, 1977).
6.
Some statistical packages in \( \mathcal{R} \) omit the constants for information criteria measures, producing different values. These differences don’t matter as long as comparisons are made within the same method.
7.
Bias is introduced unless the deleted variables are orthogonal with the retained values or their associated regression coefficients are 0.
8.
Although the data are fabricated, the pattern has support; see Richardson, Abraham, and Bond (2012).
9.
Equivalently, we can say that IQ has the largest zero-order correlation with college performance.
10.
Correspondence between the two method is not guaranteed.
11.
Some versions of the sweep operator reverse the sign for the pivot value. In this case, the values in the upper left-hand corner of the swept matrix need to be negated to match the ones reported here.
12.
Dividing each value of S by n − 1 yields a complete covariance matrix.
13.
The branch and bound strategy is also known as the leaps and bounds algorithm (Furnival & Wilson Jr, 1974).
14.
Equivalently, the R² of a model can never increase when a variable is removed.
15.
The number of models of size k from p predictors is found as p ! /(k ! (p − k)!).
16.
The \( \mathcal{R} \) function uses R² not RSS when implementing the algorithm because R² is easier to obtain from the sweep operator.
17.
See Hastie et al. (2009) for a discussion of the geometry of ridge regression and the lasso.
18.
Equation (8.23) is appropriate only when the number of predictors is less than the number of observations. When this is not the case, a p-length vector of 1^′s can be used as an initial estimate.
19.
Cross-validation folds are formed randomly, so the precise value of the tuning parameter will vary unless a seed is set to control the randomization.

References

Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Second International Symposium on Information Theory (pp. 267–281). Budapest: Akademiai Kiado.
Google Scholar
Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics, 7, 397–416.
MathSciNet Google Scholar
Furnival, G. M., & Wilson Jr., R. W. (1974). Regressions by leaps and bounds. Technometrics, 16, 499–511.
Article Google Scholar
Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21, 215–223.
Article MathSciNet Google Scholar
Goodnight, J. H. (1979). A tutorial on the sweep operator. The American Statistician, 33, 149–158.
MATH Google Scholar
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). New York: Springer.
Book Google Scholar
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York: Springer.
Book Google Scholar
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. Cambridge, MA: MIT.
MATH Google Scholar
Richardson, M., Abraham, C., & Bond, R. (2012). Psychological correlates of university students’ academic performance: A systematic review and meta-analysis. Psychological Bulletin, 138, 353–387.
Article Google Scholar
Stone, M. (1977). An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. Journal of the Royal Statistical Society: Series B., 39, 44–47.
MathSciNet MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, University of Washington, Seattle, WA, USA
Jonathon D. Brown

Authors

Jonathon D. Brown
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brown, J.D. (2018). Model Selection and Biased Estimation. In: Advanced Statistics for the Behavioral Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-93549-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-93549-2_8
Published: 01 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93547-8
Online ISBN: 978-3-319-93549-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics