# Regression Analysis

## Abstract

Regression analysis—often referred to simply as regression—is an important tool in statistical analysis. The concept first appeared in an 1877 study on sweet-pea seeds by Sir Francis Galton (1822–1911). He used the idea of regression again in a later study on the heights of fathers and sons. He discovered that sons of tall fathers are tall but somewhat shorter than their fathers, while sons of short fathers are short but somewhat taller than their fathers. In other words, body height tends toward the mean. Galton called this process a *regression—*literally, a step back or decline. We can perform a correlation to measure the association between the heights of sons and fathers. We can also infer the *causal direction of the association*. The height of sons depends on the height of fathers and not the other way around. Galton indicated causal direction by referring to the height of sons as the *dependent variable* and the height of fathers as the *independent variable*. But take heed: regression does not necessarily prove the causality of the association. The direction of effect must be derived theoretically before it can be empirically proven with regression. Sometimes the direction of causality cannot be determined, as, for example, between the ages of couples getting married. Does the age of the groom determine the age of the bride or vice versa? Or do the groom’s age and the bride’s age determine each other mutually? Sometimes the causality is obvious. So, for instance, blood pressure has no influence on age, but age has influence on blood pressure. Body height has an influence on weight, but the reverse association is unlikely (Swoboda 1971, p. 308).

## References

- Hair, J. et al. (2006).
*Multivariate data analysis*, 6th Edition. Upper Saddle River, NJ: Prentice Hall International.Google Scholar - Swoboda, H. (1971).
*Exakte Geheimnisse: Knaurs Buch der modernen Statistik*. Munich, Zurich: Knaur.Google Scholar