Linear Model Selection and Regularization

James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert

doi:10.1007/978-1-4614-7138-7_6

Gareth James⁴,
Daniela Witten⁵,
Trevor Hastie⁶ &
…
Robert Tibshirani⁶

Part of the book series: Springer Texts in Statistics ((STS,volume 103))

367k Accesses
22 Citations
8 Altmetric

Abstract

In the regression setting, the standard linear model

$$\displaystyle{ Y =\beta _{0} +\beta _{1}X_{1} + \cdots +\beta _{p}X_{p}+\epsilon }$$

(6.1)

is commonly used to describe the relationship between a response Y and a set of variables $X_{1},X_{2},\ldots,X_{p}$. We have seen in Chapter 3 that one typically fits this model using least squares.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Though forward stepwise selection considers $p(p + 1)/2 + 1$ models, it performs a guided search over model space, and so the effective model space considered contains substantially more than $p(p + 1)/2 + 1$ models.
2.
Like forward stepwise selection, backward stepwise selection performs a guided search over model space, and so effectively considers substantially more than $1 + p(p + 1)/2$ models.
3.
Mallow’s C _p is sometimes defined as $C_{p}^{\prime} = \mbox{ RSS}{/\hat{\sigma }}^{2} + 2d - n$. This is equivalent to the definition given above in the sense that $C_{p} {=\hat{\sigma } }^{2}(C_{p}^{\prime} + n)$, and so the model with smallest C _p also has smallest C _p ′.
4.
More details can be found in Section 3.5 of Elements of Statistical Learning by Hastie, Tibshirani, and Friedman.
5.
In order for glmnet() to yield the exact least squares coefficients when λ = 0, we use the argument exact=T when calling the predict() function. Otherwise, the predict() function will interpolate over the grid of λ values used in fitting the glmnet() model, yielding approximate results. When we use exact=T, there remains a slight discrepancy in the third decimal place between the output of glmnet() when λ = 0 and the output of lm(); this is due to numerical approximation on the part of glmnet().

Author information

Authors and Affiliations

Department of Information and Operations Management, University of Southern California, Los Angeles, CA, USA
Gareth James
Department of Biostatistics, University of Washington, Seattle, WA, USA
Daniela Witten
Department of Statistics, Stanford University, Stanford, CA, USA
Trevor Hastie & Robert Tibshirani

Authors

Gareth James
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Witten
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Hastie
View author publications
You can also search for this author in PubMed Google Scholar
Robert Tibshirani
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). Linear Model Selection and Regularization. In: An Introduction to Statistical Learning. Springer Texts in Statistics, vol 103. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7138-7_6

Download citation

DOI: https://doi.org/10.1007/978-1-4614-7138-7_6
Published: 18 April 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7137-0
Online ISBN: 978-1-4614-7138-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics