# Separating explained and error variance

## Abstract

In simple regression analysis, we begin with two variables: an independent variable, x, and a dependent variable, y. In the course of the analysis, however, we create two new variables. One of these variables, ŷ, is the predicted value of the dependent variable using the linear regression equation and the independent variable. The other variable, e, is the error of prediction obtained using this regression equation. This error value is sometimes referred to as a residual value because it is simply the difference between the observed y value and the predicted y value. These two variables, the predicted value of the dependent variable, ŷ, and the error of prediction, e, are defined mathematically, in vector notation, as follows:
$$\begin{gathered} \hat y = {\mathbf{ }}ua{\mathbf{ }} + {\mathbf{ }}xb \hfill \\ e = {\mathbf{ }}y - \hat y \hfill \\ \end{gathered}$$
It has already been noted that the errors of prediction are very important in regression analysis because they tell us how well the model “fits“ the data. It will also be recalled that the objective of least-squares regression is to obtain a regression coefficient and an intercept that minimize the sum of the squared errors of prediction and, therefore, the variance of these errors.

## Keywords

Public Health Linear Regression Regression Analysis Regression Model Linear Relationship
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.