Abstract
In Chap. 2 you learned that ordinary least squares (OLS) estimation minimizes the squared discrepancy between observed values and fitted ones. This procedure is primarily a descriptive tool, as it identifies the weights we use in our sample to best predict y from x. Sample description is not the only function that regression analyses serve, however; they can also be used to identify population parameters. As it happens, the least squares solution coincides with a more general method for estimating population parameters known as maximum-likelihood estimation (MLE). With MLE, we ask, “What population parameters are most likely to have produced our sample data?” In this chapter, you will learn about MLE and its application to linear regression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The first term in Eq. (3.1) is sometimes written as
$$ \frac{1}{\sigma \sqrt{2\pi }} $$and the final term can be written in three, mathematically equivalent ways:
$$ \frac{{\left(x-\mu \right)}^2}{2{\sigma}^2}=\frac{.5{\left(x-\mu \right)}^2}{\sigma^2}=\frac{1}{2{\sigma}^2}{\left(x-\mu \right)}^2 $$ - 2.
For purposes of illustration, I have kept the standard deviation at 15 for all five possibilities.
- 3.
The log-likelihood function represents a monotonic transformation of the original likelihood function. Because a monotonic transformation preserves the original order, the maximum values of the two functions coincide.
- 4.
To confuse matters even more, the derivative can be written in different ways. For example, the formula f′(a) is also used to express the derivative at a specific value.
- 5.
As of this writing, an online derivative calculator can also be found at http://www.derivative-calculator.net/
- 6.
These rules are a subset of calculus differentiation rules and are expressed in language that I believe will best convey their implementation. Please consult a calculus textbook for more information or when differentiating functions other than the ones used in this book.
- 7.
Because we are trying to identify the population values that are most likely to have produced our sample data, the standard deviations for these samples are calculated using N in the denominator rather than N − 1.
- 8.
As noted throughout this chapter, although the estimate of the population variance (3.5103) differs from the sample variance (3.2124), the two values are asymptotically equivalent.
Author information
Authors and Affiliations
3.1 Electronic Supplementary Material
3.2 Appendix
First Partial Derivatives of the Log-Likelihood Function for a Normal Distribution
Second Partial Derivatives of the Log-Likelihood Function for a Normal Distribution
First Partial Derivatives of the Log-Likelihood Function for Linear Regression
Second Partial Derivatives of the Log-Likelihood Function for Linear Regression
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Brown, J.D. (2014). Maximum-Likelihood Estimation. In: Linear Models in Matrix Form. Springer, Cham. https://doi.org/10.1007/978-3-319-11734-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-11734-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11733-1
Online ISBN: 978-3-319-11734-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)