Errors and Residuals

Brown, Jonathon D.

doi:10.1007/978-3-319-11734-8_7

Jonathon D. Brown²

2170 Accesses

Abstract

In previous chapters, you learned that the errors in a linear regression model are assumed to be independent and normally and identically distributed random variables, with mean 0 and variance σ ²:

$$ \varepsilon \sim NID\left(0,{\sigma}^2\right) $$

We discussed the normality assumption in Chap. 6, learning ways to determine whether the errors are normally distributed (e.g., QQ plot, Jarque-Bera test) and the steps we can take to normalize them if they are not (e.g., Box-Cox transformation). In this chapter, we will consider whether the errors are independent and identically distributed, learning ways to assess these assumptions and correct violations of them when they arise.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In some textbooks, heteroscedasticity is spelled heteroskedasticity.
2.
The raw residuals are plotted against raw income levels in Fig. 7.1 for illustrative purposes. Ordinarily, we would plot a scaled residual against the standardized fitted values, as we learned to do in Chap. 6.
3.
To illustrate, if we had three original predictors, we would have nine predictors for White’s test (i.e., three original predictors, three squared predictors, and three pairwise cross product terms). Note that we do not compute higher-order cross product terms, only pairwise ones, and we do not square dummy-coded variables (see Chap. 11). Finally, the general rule is that there will be $ 2k+\frac{(k)\left(k-1\right)}{2} $ predictors in the equation.
4.
A third approach, which should always be considered before a statistical one, is to rethink our regression model to be sure we have included all relevant variables.
5.
Because we have not included a true intercept, we do not interpret the R ² value from the transformed data.
6.
In some textbooks, autocorrelations are referred to as “serial correlations.” There is no substantive difference between the terms, but I prefer “autocorrelation” because it underscores that the residuals are correlated with themselves.
7.
Notice that we cannot find the difference score for the first observation because there is no preceding error to subtract. Consequently, we will have N − 1 observations when performing this test.
8.
Spreadsheet functions can be used to perform these operations. To illustrate, if the vector of residuals lies in “e3:e14,” the following command produces the Durbin-Watson statistic: SUMXMY2(e3:e13,e4:e14)/SUMSQ(e3:e14).
9.
Notice that we have entered a 0 for our first lagged residual. Alternatively, we can omit the observation entirely, performing the analysis on t − 1 observations. In this case, we multiply R ² by t − 1. Using this method with our data, we find that χ ² = 5.8925, p = .0152.
10.
The procedure goes by a variety of other names, including estimated generalized least squares, the Cochrane-Orcutt method, and the Yule-Walker method. There are slight differences among these procedures, but they all estimate the magnitude of the autocorrelation parameter.
11.
I have left the top half of each matrix empty to make it easier to see how the weights are constructed, but each matrix is symmetrical, so the top half is the transpose of the bottom half.

References

Breusch, T. S., & Pagan, A. R. (1979). Simple test for heteroscedasticity and random coefficient variation. Econometrica, 47, 1287–1294.
Article MATH MathSciNet Google Scholar
Fox, J. (2008). Applied regression analysis and generalized linear models (2nd ed.). Newbury Park: Sage.
Google Scholar
Hayes, A. F., & Li, C. (2007). Using heteroscedasticity-consistent standard error estimators in OLS regression: An introduction and software implementation. Behavior Research Methods, 37, 709–722.
Article Google Scholar
Long, J. S., & Ervin, L. H. (2000). Using heteroscedasticity consistent standard errors in the linear regression model. The American Statistician, 54, 217–224.
Google Scholar
Newey, W. K., & West, K. (1987). A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica, 55, 703–708.
Article MATH MathSciNet Google Scholar
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48, 817–838.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, University of Washington, Seattle, WA, USA
Jonathon D. Brown

Authors

Jonathon D. Brown
View author publications
You can also search for this author in PubMed Google Scholar

7.1 Electronic Supplementary Material

Spreadsheet (ODS 30 KB)

R_Code (TXT 3 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Brown, J.D. (2014). Errors and Residuals. In: Linear Models in Matrix Form. Springer, Cham. https://doi.org/10.1007/978-3-319-11734-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-11734-8_7
Published: 10 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11733-1
Online ISBN: 978-3-319-11734-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics