Skip to main content

Linear Model Selection and Regularization

  • Chapter
  • First Online:

Part of the book series: Springer Texts in Statistics ((STS,volume 103))

Abstract

In the regression setting, the standard linear model

$$\displaystyle{ Y =\beta _{0} +\beta _{1}X_{1} + \cdots +\beta _{p}X_{p}+\epsilon }$$
(6.1)

is commonly used to describe the relationship between a response Y and a set of variables \(X_{1},X_{2},\ldots,X_{p}\). We have seen in Chapter 3 that one typically fits this model using least squares.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Though forward stepwise selection considers \(p(p + 1)/2 + 1\) models, it performs a guided search over model space, and so the effective model space considered contains substantially more than \(p(p + 1)/2 + 1\) models.

  2. 2.

    Like forward stepwise selection, backward stepwise selection performs a guided search over model space, and so effectively considers substantially more than \(1 + p(p + 1)/2\) models.

  3. 3.

    Mallow’s C p is sometimes defined as \(C_{p}^{\prime} = \mbox{ RSS}{/\hat{\sigma }}^{2} + 2d - n\). This is equivalent to the definition given above in the sense that \(C_{p} {=\hat{\sigma } }^{2}(C_{p}^{\prime} + n)\), and so the model with smallest C p also has smallest C p .

  4. 4.

    More details can be found in Section 3.5 of Elements of Statistical Learning by Hastie, Tibshirani, and Friedman.

  5. 5.

    In order for glmnet() to yield the exact least squares coefficients when λ = 0, we use the argument exact=T when calling the predict() function. Otherwise, the predict() function will interpolate over the grid of λ values used in fitting the glmnet() model, yielding approximate results. When we use exact=T, there remains a slight discrepancy in the third decimal place between the output of glmnet() when λ = 0 and the output of lm(); this is due to numerical approximation on the part of glmnet().

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). Linear Model Selection and Regularization. In: An Introduction to Statistical Learning. Springer Texts in Statistics, vol 103. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7138-7_6

Download citation

Publish with us

Policies and ethics