Linear Regression

James, Gareth; Witten, Daniela; Hastie, Trevor; Tibshirani, Robert

doi:10.1007/978-1-4614-7138-7_3

Gareth James⁴,
Daniela Witten⁵,
Trevor Hastie⁶ &
…
Robert Tibshirani⁶

Part of the book series: Springer Texts in Statistics ((STS,volume 103))

369k Accesses
26 Citations

Abstract

This chapter is about linear regression, a very simple approach for supervised learning. In particular, linear regression is a useful tool for predicting a quantitative response. Linear regression has been around for a long time and is the topic of innumerable textbooks. Though it may seem somewhat dull compared to some of the more modern statistical learning approaches described in later chapters of this book, linear regression is still a useful and widely used statistical learning method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The assumption of linearity is often a useful working model. However, despite what many textbooks might tell us, we seldom believe that the true relationship is linear.
2.
This formula holds provided that the n observations are uncorrelated.
3.
Approximately for several reasons. Equation 3.10 relies on the assumption that the errors are Gaussian. Also, the factor of 2 in front of the $\mbox{ SE}(\hat{\beta }_{1})$ term will vary slightly depending on the number of observations n in the linear regression. To be precise, rather than the number 2, (3.10) should contain the 97.5 % quantile of a t-distribution with n − 2 degrees of freedom. Details of how to compute the 95 % confidence interval precisely in R will be provided later in this chapter.
4.
In Table 3.1, a small p-value for the intercept indicates that we can reject the null hypothesis that β ₀ = 0, and a small p-value for TV indicates that we can reject the null hypothesis that β ₁ = 0. Rejecting the latter null hypothesis allows us to conclude that there is a relationship between TV and sales. Rejecting the former allows us to conclude that in the absence of TV expenditure, sales are non-zero.
5.
We note that in fact, the right-hand side of (3.18) is the sample correlation; thus, it would be more correct to write $\widehat{\mbox{ Cor}(X,Y )}$; however, we omit the “hat” for ease of notation.
6.
Even if the errors are not normally-distributed, the F-statistic approximately follows an F-distribution provided that the sample size n is large.
7.
The square of each t-statistic is the corresponding F-statistic.
8.
In other words, if we collect a large number of data sets like the Advertising data set, and we construct a confidence interval for the average sales on the basis of each data set (given $100, 000 in TV and $20, 000 in radio advertising), then 95 % of these confidence intervals will contain the true value of average sales.

Author information

Authors and Affiliations

Department of Information and Operations Management, University of Southern California, Los Angeles, CA, USA
Gareth James
Department of Biostatistics, University of Washington, Seattle, WA, USA
Daniela Witten
Department of Statistics, Stanford University, Stanford, CA, USA
Trevor Hastie & Robert Tibshirani

Authors

Gareth James
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Witten
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Hastie
View author publications
You can also search for this author in PubMed Google Scholar
Robert Tibshirani
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). Linear Regression. In: An Introduction to Statistical Learning. Springer Texts in Statistics, vol 103. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7138-7_3

Download citation

DOI: https://doi.org/10.1007/978-1-4614-7138-7_3
Published: 18 April 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7137-0
Online ISBN: 978-1-4614-7138-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics