Abstract
In Chap. 3 we learned that multiple regression is a powerful tool for modeling the contribution a variable makes to the prediction of a criterion holding other variables constant. Given this ability, it might seem that adding predictors to a regression model is always beneficial, but this is not the case. Part of the problem is collinearity. As discussed in Chap. 5, when the predictors are strongly related, their regression coefficients become unstable and their standard errors become inflated. But even when collinearity is not an issue, adding predictors to a regression equation can sometimes do more harm than good. To understand why, we turn to a discussion of prediction error and model complexity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Not all models produce precisely the pattern depicted in Fig. 8.1, but the general pattern is typical.
- 2.
There is no objective basis for choosing the number of folds, but it is customary to use KÂ =Â 5 or KÂ =Â 10.
- 3.
- 4.
With a linear model, AIC is sometimes computed as n ∗  log [(∑e2)/n ] + 2p.
- 5.
Minimizing AIC is equivalent to minimizing a related metric known as Mallows’ Cp (Stone, 1977).
- 6.
Some statistical packages in \( \mathcal{R} \) omit the constants for information criteria measures, producing different values. These differences don’t matter as long as comparisons are made within the same method.
- 7.
Bias is introduced unless the deleted variables are orthogonal with the retained values or their associated regression coefficients are 0.
- 8.
Although the data are fabricated, the pattern has support; see Richardson, Abraham, and Bond (2012).
- 9.
Equivalently, we can say that IQ has the largest zero-order correlation with college performance.
- 10.
Correspondence between the two method is not guaranteed.
- 11.
Some versions of the sweep operator reverse the sign for the pivot value. In this case, the values in the upper left-hand corner of the swept matrix need to be negated to match the ones reported here.
- 12.
Dividing each value of S by n − 1 yields a complete covariance matrix.
- 13.
The branch and bound strategy is also known as the leaps and bounds algorithm (Furnival & Wilson Jr, 1974).
- 14.
Equivalently, the R2 of a model can never increase when a variable is removed.
- 15.
The number of models of size k from p predictors is found as p ! /(k ! (p − k)!).
- 16.
The \( \mathcal{R} \) function uses R2 not RSS when implementing the algorithm because R2 is easier to obtain from the sweep operator.
- 17.
See Hastie et al. (2009) for a discussion of the geometry of ridge regression and the lasso.
- 18.
Equation (8.23) is appropriate only when the number of predictors is less than the number of observations. When this is not the case, a p-length vector of 1′s can be used as an initial estimate.
- 19.
Cross-validation folds are formed randomly, so the precise value of the tuning parameter will vary unless a seed is set to control the randomization.
References
Akaike, H. (1973). Information theory as an extension of the maximum likelihood principle. In B. N. Petrov & F. Csaki (Eds.), Second International Symposium on Information Theory (pp. 267–281). Budapest: Akademiai Kiado.
Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. Journal of Computational and Graphical Statistics, 7, 397–416.
Furnival, G. M., & Wilson Jr., R. W. (1974). Regressions by leaps and bounds. Technometrics, 16, 499–511.
Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21, 215–223.
Goodnight, J. H. (1979). A tutorial on the sweep operator. The American Statistician, 33, 149–158.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). New York: Springer.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. New York: Springer.
Murphy, K. P. (2012). Machine learning: A probabilistic perspective. Cambridge, MA: MIT.
Richardson, M., Abraham, C., & Bond, R. (2012). Psychological correlates of university students’ academic performance: A systematic review and meta-analysis. Psychological Bulletin, 138, 353–387.
Stone, M. (1977). An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. Journal of the Royal Statistical Society: Series B., 39, 44–47.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267–288.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Brown, J.D. (2018). Model Selection and Biased Estimation. In: Advanced Statistics for the Behavioral Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-93549-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-93549-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93547-8
Online ISBN: 978-3-319-93549-2
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)