Abstract
So far in this book, we have mostly focused on linear models. Linear models are relatively simple to describe and implement, and have advantages over other approaches in terms of interpretation and inference. However, standard linear regression can have significant limitations in terms of predictive power. This is because the linearity assumption is almost always an approximation, and sometimes a poor one. In Chapter 6 we see that we can improve upon least squares using ridge regression, the lasso, principal components regression, and other techniques. In that setting, the improvement is obtained by reducing the complexity of the linear model, and hence the variance of the estimates. But we are still using a linear model, which can only be improved so far! In this chapter we relax the linearity assumption while still attempting to maintain as much interpretability as possible. We do this by examining very simple extensions of linear models like polynomial regression and step functions, as well as more sophisticated approaches such as splines, local regression, and generalized additive models.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
If \(\hat{\mathbf{C}}\) is the 5 ×5 covariance matrix of the \(\hat{\beta }_{j}\), and if \(\boldsymbol{\ell}_{0}^{T} = (1,x_{0},x_{0}^{2},x_{0}^{3},x_{0}^{4})\), then \(\mbox{ Var}[\hat{f}(x_{0})] = \boldsymbol{\ell}_{0}^{T}\hat{\mathbf{C}}\boldsymbol{\ell}_{0}\).
- 2.
We exclude C 0(X) as a predictor in (7.5) because it is redundant with the intercept. This is similar to the fact that we need only two dummy variables to code a qualitative variable with three levels, provided that the model will contain an intercept. The decision to exclude C 0(X) instead of some other C k (X) in (7.5) is arbitrary. Alternatively, we could include C 0(X), C 1(X), …, C K (X), and exclude the intercept.
- 3.
derivative
cubic spline
Cubic splines are popular because most human eyes cannot detect the discontinuity at the knots.
- 4.
There are actually five knots, including the two boundary knots. A cubic spline with five knots would have nine degrees of freedom. But natural cubic splines have two additional natural constraints at each boundary to enforce linearity, resulting in \(9 - 4 = 5\) degrees of freedom. Since this includes a constant, which is absorbed in the intercept, we count it as four degrees of freedom.
- 5.
The exact formulas for computing \(\hat{g}(x_{i})\) and S λ are very technical; however, efficient algorithms are available for computing these quantities.
- 6.
backfitting
A partial residual for X 3, for example, has the form \(r_{i} = y_{i} - f_{1}(x_{i1}) - f_{2}(x_{i2})\). If we know f 1 and f 2, then we can fit f 3 by treating this residual as a response in a non-linear regression on X 3.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). Moving Beyond Linearity. In: An Introduction to Statistical Learning. Springer Texts in Statistics, vol 103. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7138-7_7
Download citation
DOI: https://doi.org/10.1007/978-1-4614-7138-7_7
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7137-0
Online ISBN: 978-1-4614-7138-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)