The General Linear Model II

Dhrymes, Phoebus

doi:10.1007/978-3-319-65916-9_2

Phoebus Dhrymes³

3509 Accesses

Abstract

In the preceding chapter we derived the OLS estimator of the (coefficient) parameters of the GLM and proved that a number of properties can be ascribed to it. In so doing, we have not assumed any specific form for the distribution of the error process. It was, generally, more than sufficient in that context to assert that the error process was one of i.i.d. random variables with zero mean and finite variance. However, even though unbiasedness, consistency, and efficiency could be proved, the distributional properties of such estimators could not be established. Consequently, tests of significance could not be formulated. In subsequent discussion we shall introduce an explicit assumption regarding the distribution of the error process, and determine what additional implications this might entail for the OLS estimators. In particular, recall that the assumptions under which estimation was carried out were:

(A.1)
The explanatory variables are nonstochastic and linearly independent, i.e., if X = (x _ti) , t = 1 , 2 , … , T , i = 0 , 1 , 2 , … , n, is the matrix of observations on the explanatory variables then X is nonstochastic and rank (X) = n + 1;
(A.2)
The limit

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The discussion in this section may be bypassed without loss of continuity. It represents a digression in several aspects of the distribution of quadratic forms. The reader need only know the conclusions of the various propositions. The proofs and ancillary discussion are not essential to the understanding of subsequent sections.
2.
We remind the reader that a maintained hypothesis is one about whose validity we are certain or, at any rate, one whose validity we do not question—whether this is due to certainty or convenience is another matter.
3.
Strictly speaking we are seeking a saddle point of the Lagrangian, but commonly one speaks of “maximizing.”
4.
Often the nature of the restrictions imposed by (2.29) will be sufficiently simple so that the estimator in (2.32) can be arrived at by first substituting from (2.29) in the model and then carrying out an ordinary (unrestricted) regression procedure.
5.
One should examine and analyze residuals, as suggested by Anscombe and Tukey [4], who advocated examining standardized residuals as a numerical procedure to test for outliers. They suggested that it is generally unsafe to apply a numerical procedure until outliers have been screened out.
6.
There are several procedures that one can use to identify influential observations or outliers, see Huber [191], Andrews, Bickel, Hampel, Huber, Rogers, and Tukey [3], and Maronna, Martin, and Yohai [240].
7.
See Belsley, Kuh, and Welsch [23]. In Chap. 7, the reader is introduced to the impulse-indicator saturation (IIS) estimation procedure of Hendry and Doornik [183] which is a robust method of outlier detection.
8.
Guerard, Rachev, and Shao [161] reported the usefulness of the DFBETA, the studentized residual, CookD, and COVRATIO calculations performed with SAS and the GLER data during the 1997–2011 time period.
9.
Guerard, Rachev, and Shao [137] and Dhrymes and Guerard [86] reported the effective ness of the weighted latent root regression models that made extensive use of outlier-induced collinearities, originally formulated in Webster, Gunst and Mason [322], and Gunst and Mason [165].
10.
This amounts to considering what Raifa and Schleifer [33] have called the preposterior distribution.
11.
At least this is the motivation inherited from the physical sciences.

References

Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H., & Tukey, J. W. (1972). Robust estimate of location: Survey and advances. Princeton, N.J: Princeton University Press.
Google Scholar
Anscombe, F. J., & Tukey, J. W. (1963). The examination and analysis of residuals. Technometrics, 5, 141–169.
Article Google Scholar
Ansley, C. F. (1979). An algorithm for the exact likelihood of a mixed autoregressive moving average process. Biometrika, 66, 59–65.
Article Google Scholar
Basu, S. (1977). Investment performance of common stocks in relations to their price earnings ratios: A test of market efficiency. Journal of Finance, 32, 663–682.
Article Google Scholar
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York: John Wiley & Sons.
Book Google Scholar
Beyer, W. H. (1960). Handbook of tables for probability and statistics. Cleveland: The Chemical Rubber Co..
Google Scholar
Blin, J. M., Bender, S., & Guerard, J. B. Jr. (1997). Earnings Forecasts, Revisions and Momentum in the Estimation of Efficient Market-Neutral Japanese and U.S. Portfolios. In A. Chen (Ed.), Research in Finance, 15.
Google Scholar
Brealey, R. A., Myers, S. C., & Allen, F. (2006). Principles of corporate finance (8th ed.). New York: McGraw-Hill/Irwin.
Google Scholar
Cook, R. D. (1977). Detection of Influential observations in linear regression. Technometrics, 19, 15–18.
Google Scholar
Cook, R. D. (1998). Regression graphics. New York: Wiley.
Book Google Scholar
Dhrymes, P. J., & Guerard Jr., J. B. (2017). Returns, risk, portfolio selection, and evaluation. In J. Guerard (Ed.), Portfolio construction, Measurment, and efficiency: Essays in honor of Jack Tteynor. New York: Springer.
Google Scholar
Graham, B., & Dodd, D. (1934). Security analysis: Principles and technique. New York: McGraw-Hill Book Company.
Google Scholar
Guerard Jr., J. B., Rachev, R. T., & Shao, B. (2013). Efficient global portfolios: Big data and investment universe. IBM Journal of Research and Development, 57(5), 11.
Article Google Scholar
Gunst, R. F., & Mason, R. L. (1980). Regression analysis and its application. New York: Marcel Dekker, Inc.
Google Scholar
Hendry, D. F., & Doornik, J. A. (2014). Empirical model discovery and theory evaluation. Cambridge: MIT Press.
Book Google Scholar
Huber, P. J. (1981). Rubust statistics. Cambridge, MA: Harvard University Press.
Book Google Scholar
Koopmans, T. C., & Hood, W. C. (1953). The estimation of simultaneous linear economic relationships. In W. C. Hood & T. C. Koopmans (Eds.)., Chapter 6 Studies in econometric method. New York: Wiley.
Google Scholar
Maronna, R. A., Martin, R. D., & Yojai, V. J. (2006). Robust statistics: Theory and methods. New York: Wiley.
Book Google Scholar
Scheffè, H. (1959). The analysis of variance. New York: Wiley.
Google Scholar
Scheffè, H. (1977). A note on a formulation of the S-method of multiple comparison. Journal of the American Statistical Association, 72, 143–146.
Google Scholar
Webster, J. T., Gunst, R. F., & Mason, R. L. (1974). Latent root regression analysis. Technometrics, 16, 513–522.
Article Google Scholar

Download references

Author information

Authors and Affiliations

New York, New York, USA
Phoebus Dhrymes

Authors

Phoebus Dhrymes
View author publications
You can also search for this author in PubMed Google Scholar

Appendix

The discussion in this appendix has two objectives:

(i)
to examine the power aspects of tests of significance, i.e., the probability of rejecting the null hypothesis when the alternative is true;
(ii)
to present the multiple comparison test as a complement to the usual F-test for testing, simultaneously, a number of hypotheses.

We remind the reader that if we wish to test the hypothesis

$$ {\mathrm{H}}_0:\beta =0, $$

as against the alternative

$$ {\mathrm{H}}_1:\beta \ne 0, $$

where β is, say, an (n + 1)-element vector, then:

If the hypothesis is accepted we “know” that, within the level of significance specified, all elements of β are zero;

If the hypothesis is rejected, however, then all we “know” is that at least one element of β is nonzero. It would be desirable to go beyond this and determine which of the elements are and which are not, zero.

It is this function that is performed by the multiple comparison test.

We recall that designing a test for a hypothesis is equivalent to specifying a critical region. The latter is a subset of the sample space such that, if the (sample) observations fall therein we reject the hypothesis and if they do not accept it. The probability assigned to the critical region (as a function of the unknown parameters of the underlying distribution) is called the power function associated with the test (or the critical region). The reader may have observed that when we discussed various tests in the preceding chapter we had always derived the distribution of the test statistic on the assumption that the null hypothesis is correct. The usefulness of tests in discriminating between true and false hypotheses is, in part, judged by the level of significance—which is the value assumed by the power function on the assumption that the null hypothesis represents the truth. This gives us some indication of the frequency with which such procedures will reject true null hypotheses. Another criterion, however, is the power of the test—which is the value assumed by the power function on the assumption that the alternative hypothesis represents the truth. In order to evaluate the latter, however, we need the distribution of the test statistic when the alternative represents the truth. The perceptive reader should have been impressed by the fact that (and possibly wondered why) in our discussions we had always derived the distribution of the various test statistics on the assumption that the null hypothesis is true. In part, this is consistent with the nature and inherent logic of the procedure as well as the presumption that an investigator will resort to a statistical test when his substantive knowledge of a phenomenon has progressed to the stage where, having mastered nearly all its ramifications, he has an inkling that his entire conception of the phenomenon is consistent with a specific set of parametric values. This he formulates as the null hypothesis; and, indeed he would be very surprised if the test were to lead to its rejection.^{Footnote 10} In part, however, the practice is also due to the fact that the distributions under the alternative hypothesis are appreciably more difficult to deal with.

Insofar as we have dealt earlier with the chi-square, t-, and F-distributions, we shall now deal with the noncentral chi-square, t-, and F-distributions. The latter are the distributions of the (appropriate) test statistics, dealt with previously, when the alternative hypothesis is true.

1.1 Noncentral Chi Square

We recall that the chi-square variable with m degrees of freedom (typically denoted by $ {\chi}_m^2 $ and more appropriately called central chi square) has the density function

$$ {h}_m(z)=\frac{e^{-\left(1/2\right)z}{\left(\frac{z}{2}\right)}^{\left(m/2\right)-1}}{2\Gamma \left(\frac{m}{2}\right)}, $$

where Γ(⋅) is the gamma function (See Problem 15). We also recall that if y ∼ N(0, I), then

$$ {y}^{\prime }y=\sum \limits_{i=1}^m{y}_i^2\sim {\chi}_m^2. $$

The noncentral chi-square distribution arises when, in the relation above, the means of (some of) the basic normal variables are nonzero. Thus, the problem to be examined is as follows: if x is an n-element (column) random vector such that

$$ x\sim N\left(\theta, \kern0.5em I\right), $$

what can we say about the distribution of

$$ z={x}^{\prime }x? $$

To handle this problem we employ a transformation that reduces the situation as closely as possible to the standard chi-square distribution—also called the central chi-square distribution.

We note that there exists an orthogonal matrix A such that its first row is (θ ^′/δ) where

$$ \delta ={\left({\theta}^{\prime}\theta \right)}^{1/2}. $$

(In this connection see also Problem 16.) Now put

$$ y= Ax $$

and observe that

$$ y\sim N\left(\mu, I\right),\kern1em \mu ={\left(\delta, 0,0,\dots, 0\right)}^{\prime }. $$

Since

$$ {x}^{\prime }x={x}^{\prime }{A}^{\prime } Ax={y}^{\prime }y={y}_1^2+u,\kern1em u=\sum \limits_{i=2}^n{y}_i^2, $$

we see that the desired random variable (x ^′ x) has been expressed as the sum of the square of an N(δ, 1) random variable $ \left({y}_1^2\right) $ and a $ {\chi}_m^2\left(m=n-1\right) $ random variable (u). Moreover we know that $ {y}_1^2 $ and u are mutually independent.

To find the distribution of x ^′ x we can begin with the joint density of (y ₁, u), which is

$$ \frac{e^{-\left(1/2\right){\left({y}_1-\delta \right)}^2}}{\sqrt{2\pi }}\cdot \frac{{\left(\frac{u}{2}\right)}^{\left(m/2\right)-1}{e}^{-\left(1/2\right)u}}{2\Gamma \left(\frac{m}{2}\right)},\kern1em m=n-1. $$

From this, applying the results of Proposition 4 of Chap. 10, we can hope to find the density of $ {y}_1^2+u $. Thus, consider the transformation

$$ z={y}_1^2+u,\kern1em z\in \left(0,\infty \right), $$

$$ t=\frac{y_1}{z^{1/2}},\kern1em t\in \left(-1,1\right). $$

The Jacobian of this transformation is z ^1/2. Hence, the joint density of (z, t) is

$$ {\displaystyle \begin{array}{l}\frac{e^{-\left(1/2\right){\delta}^2}{e}^{\delta {tz}^{1/2}}{e}^{-\left(1/2\right)z}{\left(\frac{z}{2}\right)}^{\left(n/2\right)-1}{\left(1-{t}^2\right)}^{\left(m/2\right)-1}}{\sqrt{\pi }2\Gamma \left(\frac{m}{2}\right)}\\ {}\kern1em =\frac{e^{-\left(1/2\right){\delta}^2}}{2\sqrt{\pi}\Gamma \left(\frac{m}{2}\right)}\sum \limits_{r=0}^{\infty}\frac{\delta^r}{r!}{2}^{r/2}{\left(\frac{z}{2}\right)}^{\left[\left(n+r\right)/2\right]-1}{e}^{-\left(1/2\right)z}{t}^r{\left(1-{t}^2\right)}^{\left(m/2\right)-1}.\end{array}} $$

The right member above is obtained by expanding

$$ {e}^{\delta {tz}^{1/2}}=\sum \limits_{r=0}^{\infty}\frac{\delta^r{t}^r{z}^{r/2}}{r!}. $$

To obtain the density of z we integrate out the irrelevant variable t, i.e., we obtain from the joint density of (z, t) the marginal density of z. The integration involved is

$$ {\int}_{-1}^1{t}^r{\left(1-{t}^2\right)}^{\left(m/2\right)-1} dt $$

and we observe that the integral vanishes for odd r. Hence in the series above we should replace r by 2r, the range of r still being {0, 1, …}. Now, for even powers we deal with

$$ {\int}_{-1}^1{t}^{2r}{\left(1-{t}^2\right)}^{\left(m/2\right)-1} dt=2{\int}_0^1{t}^{2r}{\left(1-{t}^2\right)}^{\left(m/2\right)-1} dt $$

and the integral may be shown to be (see Problem 17)

$$ \frac{\Gamma \left(r+\frac{1}{2}\right)\Gamma \left(\frac{m}{2}\right)}{\Gamma \left(\frac{n+2r}{2}\right)},\kern1em n=m+1. $$

Hence, remembering that terms containing odd r vanish and substituting the results above in the infinite sum, we find that the density function of z is given by

$$ {\displaystyle \begin{array}{c}h(z)=\sum \limits_{r=0}^{\infty }{e}^{-\left(1/2\right){\delta}^2}\frac{\delta^{2r}}{(2r)!}\left[\frac{e^{-\left(1/2\right)z}{\left(\frac{z}{2}\right)}^{\left[\left(n+2r\right)/2\right]-1}}{2\Gamma \left(\frac{n+2r}{2}\right)}\right]\frac{\Gamma \left(r+\frac{1}{2}\right){2}^r}{\Gamma \left(\frac{1}{2}\right)}\\ {}=\sum \limits_{r=0}^{\infty }{C}_r{h}_{n+2r}(z),\end{array}} $$

where: h _n + 2r(⋅) is the density function of a central chi-square variable with n + 2r degrees of freedom;

$$ {C}_r={e}^{-\left(1/2\right){\delta}^2}\frac{\delta^{2r}}{(2r)!}\frac{\Gamma \left(r+\frac{1}{2}\right){2}^r}{\Gamma \left(\frac{1}{2}\right)}={e}^{-\left(1/2\right){\delta}^2}{\left(\frac{\delta^2}{2}\right)}^r\frac{1}{r!}\frac{2^rr!}{(2r)!}\frac{\Gamma \left(r+\frac{1}{2}\right){2}^r}{\Gamma \left(\frac{1}{2}\right)}; $$

and (see Problem 15)

$$ \Gamma \left(\frac{1}{2}\right)=\sqrt{\pi }. $$

We shall now show that

$$ \frac{2^{2r}r!\Gamma \left(r+\frac{1}{2}\right)}{\left(2\mathrm{r}\right)!\Gamma \left(\frac{1}{2}\right)}=1, $$

thus expressing the noncentral chi-square density as a weighted average of central chi-square densities (with degrees of freedom n + 2r). It is a weighted average since the weights e ^−λ(λ ^r/r!) sum to unity, where $ \lambda =\frac{1}{2}{\delta}^2 $. In fact the weights are simply the ordinates of the massfunction of a Poisson distributed (discrete) random variable.

Now, observe that

$$ {\displaystyle \begin{array}{c}(2r)!=(2r)\left(2r-1\right)2\left(r-1\right)\left(2r-3\right)2\left(r-2\right)\cdots 1\\ {}={2}^rr!\left(2r-1\right)\left(2r-3\right)\left(2r-5\right)\cdots 1.\end{array}} $$

Also

$$ {\displaystyle \begin{array}{c}\Gamma \left(r+\frac{1}{2}\right)=\left(r-1+\frac{1}{2}\right)\left(r-2+\frac{1}{2}\right)\cdots \left(r-r+\frac{1}{2}\right)\Gamma \left(\frac{1}{2}\right)\\ {}=\frac{1}{2^r}\left[\left(2r-1\right)\left(2r-3\right)\cdots 1\right]\Gamma \left(\frac{1}{2}\right).\end{array}} $$

Thus

$$ \frac{2^{2r}r!\Gamma \left(r+\frac{1}{2}\right)}{(2r)!\Gamma \left(\frac{1}{2}\right)}=\frac{2^{2r}r!\left(1/{2}^r\right)\left[\left(2r-1\right)\left(2r-3\right)\cdots 1\right]\Gamma \left(\frac{1}{2}\right)}{2^rr!\left[\left(2r-1\right)\left(2r-3\right)\cdots 1\right]\Gamma \left(\frac{1}{2}\right)}=1 $$

as claimed, and we can now write

$$ h(z)=\sum \limits_{r=0}^{\infty }{e}^{-\lambda}\frac{\lambda^r}{r!}{h}_{n+2r}(z),\kern1em \lambda =\frac{1}{2}\left({\theta}^{\prime}\theta \right). $$

We, therefore, have the following:

Proposition A.1

Let x ∼ N(θ, I) , x being n × 1; the density function of

$$ z={x}^{\prime }x $$

is given by

$$ h(z)=\sum \limits_{r=0}^{\infty }{e}^{-\lambda}\frac{\lambda^r}{r!}{h}_{n+2r}(z), $$

where

$$ \lambda =\frac{1}{2}{\theta}^{\prime}\theta $$

and

$$ {h}_{n+2r}(z)=\frac{e^{-\left(1/2\right)z}{\left(\frac{z}{2}\right)}^{\left[\left(n+2r\right)/2\right]-1}}{2\Gamma \left(\frac{n+2r}{2}\right)}, $$

i.e., the density of z is a convex combination, with Poisson weights of central chi-square distributions with parameters n + 2r , r = 0 , 1 , 2 , ….

Remark A.1

The variable z above is said to have the noncentral chi-square distribution with parameters n and $ \lambda =\frac{1}{2}{\theta}^{\prime}\theta $. The latter is said to be the noncentrality parameter, while the former is called the degrees of freedom parameter. Such a variable is denoted by $ {\chi}_n^2\left(\lambda \right) $. Note that $ {\chi}_n^2(0) $ is the usual central chi-square variable.

1.2 Noncentral F-Distributions

We remind the reader that the (central) F-distribution with n ₁ , n ₂ degrees of freedom is the distribution of the ratio

$$ \frac{w_1}{w_2}\frac{n_2}{n_1} $$

where $ {w}_i\sim {\chi}_{n_i}^2,\kern1em i=1,2 $, and the two random variables are mutually independent. We also recall that the (central) t-distribution is the distribution of the ratio

$$ \frac{u}{\sqrt{\frac{w_2}{n_2}}} $$

where u ∼ N(0, 1) , w ₂ is $ {\chi}_{n_2}^2 $, and the two variables are mutually independent. Hence the square of a t-distributed variable with n ₂ degrees of freedom is distributed according to the F-distribution with 1 and n ₂ degrees of freedom. For symmetric t-tests, which are almost universal in applied econometric work, we may most conveniently employ the central F-distribution. Consequently, in the discussion to follow we shall only deal with noncentral F-distributions.

Thus, let w _i , i = 1 , 2, be two mutually independent noncentral chi-square variables

$$ {w}_i\sim {\chi}_{n_i}^2\left({\lambda}_i\right),\kern1em i=1,2. $$

We seek the distribution of

$$ F=\frac{w_1}{w_2}\frac{n_2}{n_1}. $$

Instead of the distribution of F, however, we shall first find the distribution of

$$ u=\frac{w_1}{w_1+{w}_2}. $$

The reason why u is considered first instead of F is that there exist tabulations for the distribution of u (for selected parameter values) while no such tabulations exist for the distribution of F. Noting that

$$ \Pr \left\{F\le {F}_{\alpha}\right\}=\Pr \left\{u\le \frac{n_1{F}_{\alpha }}{n_2+{n}_1{F}_{\alpha }}\right\} $$

we see that operating with u is perfectly equivalent to operating with F.

To find the distribution of u we employ the same procedure as employed in the previous section. The joint density of w ₁ , w ₂ is

$$ {\displaystyle \begin{array}{l}{e}^{-\left({\lambda}_1+{\lambda}_2\right)}\sum \limits_{r_1=0}^{\infty}\kern0.5em \sum \limits_{r_2=0}^{\infty }{r}_2{r}_1\frac{1}{4\Gamma \left(\frac{n_1+2{r}_1}{2}\right)\Gamma \left(\frac{n_2+2{r}_2}{2}\right)}\kern1em \\ {}\kern4.079998em \times {\left(\frac{w_1}{2}\right)}^{\left[\left({n}_1+2{r}_1\right)/2\right]-1}{\left(\frac{w_2}{2}\right)}^{\left[\left({n}_2+2{r}_2\right)/2\right]-1}{e}^{-\left(1/2\right)\left({w}_1+{w}_2\right)}\end{array}} $$

where

$$ {r}_2=\frac{\lambda_1^{r_1}}{r_1!},\kern1em {r}_1=\frac{\lambda_2^{r_2}}{r_2!}. $$

Use the transformation

$$ u=\frac{w_1}{w_1+{w}_2},\kern1em w={w}_1,\kern1em u\in \left(0,1\right),w\in \left(0,\infty \right). $$

The Jacobian of the transformation is

$$ \frac{w}{u^2}. $$

Upon substitution, the typical term of the infinite series above—apart from r ₂ , r ₁—becomes

$$ \frac{1}{C\left({r}_1,{r}_2\right)}{\left(\frac{w}{2}\right)}^{\left[\left({n}_1+{n}_2+2{r}_1+2{r}_2\right)/2\right]-1}{e}^{-\left(1/2\right)\left(w/u\right)}{\left(1-u\right)}^{\left[\left({n}_2+2{r}_2\right)/2\right]-1}{u}^{-\left[\left({n}_2+2{r}_2\right)/2\right]-1} $$

where

$$ C\left({r}_1,{r}_2\right)=2B\left({s}_1,{s}_2\right)\Gamma \left(\frac{n_1+{n}_2+2{r}_1+2{r}_2}{2}\right),\kern1em {s}_1=\frac{n_1+2{r}_1}{2},\kern0.62em {s}_2=\frac{n_2+2{r}_2}{2}, $$

and B(s ₁, s ₂) is the beta function with parameters s ₁ and s ₂. To find the density of u we integrate out w; to this effect make the change in variable

$$ {w}^{\ast }=\frac{w}{u},\kern1em {w}^{\ast}\in \left(0,\infty \right), $$

and observe that, apart from a factor of proportionality, we have to integrate with respect to w ^∗

$$ \left[\frac{1}{2\Gamma \left(\frac{n_1+{n}_2+2{r}_1+2{r}_2}{2}\right)}{\left(\frac{w^{\ast }}{2}\right)}^{\left[\left({n}_1+{n}_2+2{r}_1+2{r}_2\right)/2\right]-1}{e}^{-\left(1/2\right){w}^{\ast }}\right]\kern0.62em \times {u}^{s_1-1}{\left(1-u\right)}^{s_2-1}. $$

The integral of the bracketed expression is unity, since it is recognized as the integral of the (central) chi-square density with 2(s ₁ + s ₂) degrees of freedom. Consequently, the density of u is given by

$$ g\left(u;{n}_1,{n}_2,{\lambda}_1,{\lambda}_2\right)={e}^{-\left({\lambda}_1+{\lambda}_2\right)}\sum \limits_{r_1=0}^{\infty}\kern0.5em \sum \limits_{r_2=0}^{\infty}\frac{\lambda_1^{r_1}{\lambda}_2^{r_2}}{r_1!{r}_2!}\frac{1}{B\left({s}_1,{s}_2\right)}{u}^{s_1-1}{\left(1-u\right)}^{s_2-1}, $$

which is recognized as a convex combination of beta distributions with parameters s _i , i = 1 , 2 , s _i = (n _i + 2r _i)/2. If we wish to obtain the density function of

$$ F=\frac{w_1}{w_2}\frac{n_2}{n_1} $$

we need only observe that

$$ u=\frac{n_1F}{n_2+{n}_1F}. $$

The Jacobian of this transformation is

$$ \left(\frac{n_1}{n_2}\right){\left(1+\frac{n_1}{n_2}F\right)}^{-2}, $$

while

$$ 1-u={\left(1+\frac{n_1}{n_2}F\right)}^{-1}. $$

Thus, substituting above, we have the density function of F,

$$ h\left(F;{n}_1,{n}_2,{\lambda}_1,{\lambda}_2\right)={e}^{-\left({\lambda}_1+{\lambda}_2\right)}\sum \limits_{r_1=0}^{\infty}\kern0.5em \sum \limits_{r_2=0}^{\infty}\frac{\lambda_1^{r_1}{\lambda}_2^{r_2}}{r_1!{r}_2!}\frac{\left(\frac{n_1}{n_2}\right)}{B\left({s}_1,{s}_2\right)}\frac{{\left(\frac{n_1}{n_2}F\right)}^{s_1-1}}{{\left(1+\frac{n_1}{n_2}F\right)}^{s_2+{s}_1}}. $$

We therefore have

Proposition A.2

Let

$$ {w}_i\sim {\chi}_{n_i}^2\left({\lambda}_i\right),\kern1em i=1,2, $$

be mutually independent. Then the density function of

$$ F=\frac{w_1}{w_2}\frac{n_2}{n_1} $$

is given by

$$ h\left(F;{n}_1,{n}_2,{\lambda}_1,{\lambda}_2\right),\kern1em F\in \left(0,\infty \right). $$

The density function of

$$ u=\frac{w_1}{w_1+{w}_2} $$

is given by

$$ g\left(u;{n}_1,{n}_2,{\lambda}_1,{\lambda}_2\right),\kern1em u\in \left(0,1\right). $$

In either case the density is uniquely determined by four parameters: n ₁ , n ₂, which are the degrees of freedom parameters, and λ ₁ , λ ₂, which are the noncentrality parameters (of w ₁ and w ₂ respectively).

The results of Proposition A.2 may be specialized to the various tests considered in the preceding chapter.

Example A.1

Consider again the test given in Proposition 8. Suppose the null hypothesis is in fact false and

$$ {\beta}_i={\overline{\beta}}_i. $$

Define

$$ \lambda ={\overline{\beta}}_i-{\beta}_i^0. $$

The square of the statistic developed there is

$$ \frac{{\left({\widehat{\beta}}_i-{\beta}_i^0\right)}^2}{{\widehat{\sigma}}^2{q}_{ii}} $$

and is, thus, distributed as a noncentral F-distribution with parameters

$$ {n}_1=1,\kern1em {\lambda}_1=\frac{\lambda^2}{2{\sigma}^2{q}_{ii}},\kern1em {n}_2=T-n-1,\kern1em {\lambda}_2=0. $$

Hence, given the critical region the power of the test may be evaluated on the basis of the noncentral F-distribution with parameters as above. Unfortunately, power calculations are not performed very frequently in applied econometric work, although the necessary tables do exist. (See e.g., [30].)

Remark A.2

The preceding example affords us an opportunity to assess, somewhat heuristically, the impact of multicollinearity on hypotheses testing. We shall examine collinearity in a subsequent chapter. For the moment it is sufficient to say that it is relevant to the extent that one or more of the explanatory variables in a GLM can be “explained” by the others.

In the context of this example note that if by $ {\widehat{x}}_{\cdot i} $ we denote that part of the variable x _⋅i that can be explained by (its regression on) the remaining explanatory variables and by s _⋅i, the vector of residuals, we can write

$$ {x}_{\cdot i}={\widehat{x}}_{\cdot i}+{s}_{\cdot i}. $$

Note that

$$ {s}_{\cdot i}^{\prime }{\widehat{x}}_{\cdot i}=0,\kern1em {s}_{\cdot i}^{\prime }{x}_{\cdot j}=0,\kern1em j\ne i, $$

the x _⋅j being the observations on the remaining variables of the GLM.

One can then show that

$$ {q}_{ii}={\left({s}_{\cdot i}^{\prime }{s}_{\cdot i}\right)}^{-1},\kern1em {\widehat{\beta}}_i={\left({s}_{\cdot i}^{\prime }{s}_{\cdot i}\right)}^{-1}{s}_{\cdot i}^{\prime }y, $$

where, of course,

$$ {s}_{\cdot i}={x}_{\cdot i}-{X}_1{\left({X}_1^{\prime }{X}_1\right)}^{-1}{X}_1^{\prime }{x}_{\cdot i} $$

and X ₁ is the submatrix of X obtained when from the latter we suppress x _⋅i.

If the hypothesis

$$ {\beta}_i={\beta}_i^0 $$

is false then the quantity

$$ \frac{{\widehat{\beta}}_i-{\beta}_i^0}{\sqrt{\sigma^2{q}_{ii}}} $$

has mean

$$ \frac{\lambda }{\sqrt{\sigma^2{q}_{ii}}},\kern1em \lambda ={\beta}_i-{\beta}_i^0. $$

Hence, the noncentrality parameter will be

$$ {\lambda}_1=\frac{1}{2}\frac{\lambda^2{s}_{\cdot i}^{\prime }{s}_{\cdot i}}{\sigma^2}. $$

If the sample is relatively collinear with respect to x _⋅i then, even though λ ² may be large, the noncentrality parameter λ ₁ would tend to be “smaller” owing to the fact the sum of squared residuals, $ {s}_{\cdot i}^{\prime }{s}_{\cdot i} $, would tend to be (appreciably) smaller relative to the situation that would prevail if the sample were not collinear. Since, generally, we would expect the power of the test to increase with the noncentrality parameter it follows that collinearity would exert an undesirable effect in this context.

Example A.2

A common situation is that of testing the returns to scale parameter in production function studies. If, for example, we deal with the Cobb–Douglas production function

$$ {AK}^{\alpha }{L}^{\beta }U $$

(U representing the error process) we may estimate $ \widehat{\alpha}+\widehat{\beta} $. If the variance of the sum is of the form $ {\widehat{\sigma}}^2r $ then r will be known and $ {\widehat{\sigma}}^2 $ would be proportional to a $ {\chi}_{T-3}^2 $-variable. Thus, on the null hypothesis

$$ {\mathrm{H}}_0:\alpha +\beta =1, $$

we would have

$$ \frac{{\left(\widehat{\alpha}+\widehat{\beta}-1\right)}^2}{{\widehat{\sigma}}^2r}\sim {F}_{1,\kern0.5em T-3}. $$

Suppose, however,

$$ \alpha +\beta -1=\lambda \ne 0. $$

Then the statistic above is noncentral F with parameters

$$ {n}_1=1,\kern1em {\lambda}_1=\frac{\lambda^2}{2{\sigma}^2r},\kern1em {n}_2=T-3,\kern1em {\lambda}_2=0. $$

Suppose that

$$ {\lambda}_1=.5,\kern1em {n}_2=20. $$

From the relevant tables in [5] we see that the power of such a test (with level of significance 0.05) is approximately 0.3. But this means that the returns to scale parameter may be quite high and, depending on r, the probability of rejecting the null hypothesis will still only be 0.3. We will have such a situation if λ = . 7 and σ ² r = . 5. Thus, such procedures lead to the acceptance of false null hypotheses alarmingly frequently. Of course, if λ ₁ were even closer to zero the power would have been even lower. This is to warn the reader that adjacent hypotheses cannot be distinguished with great confidence when the sample on which the inference is based is not very large. Thus, in such procedures as that given in this example, if a given moderate-sized sample (say 30 or 40) leads us to accept the constant returns to scale hypothesis, the reader ought to bear in mind that in all likelihood it would also lead us to accept the null hypothesis

$$ {\mathrm{H}}_0:\alpha +\beta =1.1, $$

as against the alternative

$$ {\mathrm{H}}_1:\alpha +\beta \ne 1.1. $$

The economic implications of constant, as against increasing, returns to scale, however, are very different indeed!

1.3 Multiple Comparison Tests

Consider the test of the hypothesis

$$ {\mathrm{H}}_0:{\beta}_{\ast }=0, $$

as against

$$ {\mathrm{H}}_1:\kern1em {\beta}_{\ast }=0, $$

in the context of the GLM

$$ y= X\beta +u, $$

where

$$ \beta ={\left({\beta}_0,{\beta}_{\ast}^{\prime}\right)}^{\prime },\kern1em X=\left(e,{X}_1\right), $$

in the usual notation of the chapter.

To carry out the test we proceed as follows. Letting σ ² Q _∗ be the covariance matrix of the OLS estimator of β _∗ we form the quantity

$$ F=\frac{{\left({\widehat{\beta}}_{\ast }-{\beta}_{\ast}\right)}^{\prime }{Q}_{\ast}^{-1}\left({\widehat{\beta}}_{\ast }-{\beta}_{\ast}\right)}{{\widehat{u}}^{\prime}\widehat{u}}\cdot \frac{T-n-1}{n}, $$

(A.1)

it being understood that there are T observations, that β _∗ contains n parameters, and that $ \widehat{u} $ is the vector of OLS residuals. Under the null hypothesis (A.1) becomes

$$ F=\frac{{\widehat{\beta}}_{\ast}^{\prime }{Q}_{\ast}^{-1}{\widehat{\beta}}_{\ast }}{{\widehat{u}}^{\prime}\widehat{u}}\cdot \frac{T-n-1}{n}. $$

(A.2)

The quantity in (A.2) is now completely determined, given the data, i.e., it is a statistic. The distribution of (A.2) under H₀ is central F with n and T − n − 1 degrees of freedom.

The mechanics of the test are these. From the tables of the F _{n , T − n − 1}-distribution we determine a number, say F _α, such that

$$ \Pr \left\{{F}_{n,\kern0.5em T-n-1}\le {F}_{\alpha}\right\}=1-\alpha, $$

where α is the level of significance, say α = . 05 or α = . 025 or α = . 01 or whatever

The acceptance region is

$$ F\le {F}_{\alpha }, $$

while the rejection region is

$$ F>{F}_{\alpha }. $$

The geometric aspects of the test are, perhaps, most clearly brought out if we express the acceptance region somewhat differently. Thus, we may write

$$ {\left({\widehat{\beta}}_{\ast }-{\overline{\beta}}_{\ast}\right)}^{\prime }{Q}_{\ast}^{-1}\left({\widehat{\beta}}_{\ast }-{\overline{\beta}}_{\ast}\right)\le n{\widehat{\sigma}}^2{F}_{\alpha }, $$

(A.3)

where, evidently,

$$ {\widehat{\sigma}}^2=\frac{{\widehat{u}}^{\prime}\widehat{u}}{T-n-1}, $$

and where for the sake of generality we have written the hypothesis as

$$ {\mathrm{H}}_0:{\beta}_{\ast }={\overline{\beta}}_{\ast }, $$

as against

$$ {\mathrm{H}}_1:{\beta}_{\ast}\ne {\overline{\beta}}_{\ast }. $$

In the customary formulation of such problems one takes

$$ {\overline{\beta}}_{\ast }=0. $$

In the preceding, of course, the elements of $ {\overline{\beta}}_{\ast } $ are numerically specified. For greater clarity we may illustrate the considerations entering the test procedure, in the two-dimensional case, as in Fig. A.1.

The relation in (A.3) represents an ellipsoid with center at $ {\overline{\beta}}_{\ast } $. If the statistic

$$ {\left({\widehat{\beta}}_{\ast }-{\overline{\beta}}_{\ast}\right)}^{\prime }{Q}_{\ast}^{-1}\left({\widehat{\beta}}_{\ast }-{\overline{\beta}}_{\ast}\right) $$

falls within the ellipsoid we accept H₀, while if it falls outside we accept H₁. If the statistic is represented by p ₁ we will reject H₀, but it may well be that what is responsible for this is the fact that we “ought” to accept

$$ {\beta}_1={\overline{\beta}}_{\ast 1},\kern1em {\beta}_2\ne {\overline{\beta}}_{\ast 2}. $$

Similarly, if the statistic is represented by p ₂ we will reject H₀ but it may well be that this is so because we “ought” to accept

$$ {\beta}_1\ne {\overline{\beta}}_{\ast 1},\kern1em {\beta}_2={\overline{\beta}}_{\ast 2}. $$

If the statistic is represented by p ₃ then, perhaps, we “ought” to accept

$$ {\beta}_1\ne {\overline{\beta}}_{\ast 1},\kern1em {\beta}_2\ne {\overline{\beta}}_{\ast 2}. $$

The F-test, however, does not give any indication as to which of these alternatives may be appropriate.

The configurations above belong to the class of functions of the form

$$ {h}^{\prime }{\beta}_{\ast }=0. $$

(A.4)

It would be desirable to find a way in which tests for the hypotheses in (A.4) may be linked to the acceptance ellipsoid of the F-test. If this is accomplished, then perhaps upon rejection of a hypothesis by the F-test we may be able to find the “parameters responsible” for the rejection. Needless to say this will shed more light on the empirical implications of the test results.

The connection alluded to above is provided by the relation of the planes of support of an ellipsoid to the “supported” ellipsoid. Some preliminary geometric concepts are necessary before we turn to the issues at hand. In part because of this, the discussion will be somewhat more formal than in the earlier part of this appendix.

1.4 Geometric Preliminaries

Definition A.1

Let $ x\in {\mathbb{E}}_n $, where $ {\mathbb{E}}_n $ is the n-dimensional Euclidean space. The set of points

$$ E=\left\{x:{\left(x-a\right)}^{\prime }M\left(x-a\right)\le c\right\} $$

where c is a positive constant and M a positive definite matrix is said to be an ellipsoid with center at a.

Remark A.3

It entails no loss of generality to take c = 1. Thus, in the definition above, dividing through by c we have

$$ E=\left\{x:{\left(x-a\right)}^{\prime}\left(\frac{M}{c}\right)\left(x-a\right)\le 1\right\}. $$

If M is positive definite and c > 0 then, clearly,

$$ \left(\frac{M}{c}\right) $$

is also a positive definite matrix. In the definitions to follow we will always take c = 1.

Remark A.4

The special case

$$ M=\operatorname{diag}\left({m}_1,{m}_2,\dots, {m}_n\right),\kern1em a=0, $$

is referred to as an ellipsoid in canonical or standard form.

Definition A.2

Let $ x\in {\mathbb{E}}_n $. The set of points

$$ S=\left\{x:{\left(x-a\right)}^{\prime}\left(x-a\right)\le c\right\} $$

where c > 0 is said to be an n-dimensional sphere, with center at a. The special case

$$ c=1,\kern1em a=0, $$

is referred to as a sphere in canonical or standard form, or simply a unit sphere.

Remark A.5

Notice that an ellipsoid in canonical form is simply a unit sphere whose coordinates, x _i , i = 1 , 2 , …, n, have been stretched or contracted, respectively, by the factors m _i , i = 1 , 2 , … , n.

Lemma A.1

Every ellipsoid

(i)
can be put in canonical form, and
(ii)
can be transformed into the unit sphere.

Proof

Let

$$ E=\left\{x:{\left(x-a\right)}^{\prime }M\left(x-a\right)\le 1\right\} $$

(A.5)

be an ellipsoid. Let R ^∗ , Λ be, respectively, the matrices of characteristic vectors and roots of M. Put

$$ y={R}^{\ast \prime}\left(x-a\right) $$

and rewrite (A.5) as

$$ E=\left\{y:{y}^{\prime}\Lambda y\le 1\right\}. $$

(A.6)

This proves the first part of the lemma. For the second part, we put

$$ z={\Lambda}^{1/2}y,\kern1em {\Lambda}^{1/2}=\operatorname{diag}\left({\lambda}_1^{1/2},{\lambda}_2^{1/2},\dots, {\lambda}_n^{1/2}\right), $$

and note that E is transformed to the unit sphere

$$ S=\left\{z:{z}^{\prime }z\le 1\right\}.\kern1.1em \mathrm{q}.\mathrm{e}.\mathrm{d}. $$

Remark A.6

The relationship of the coordinates of the sphere (to which the ellipsoid is transformed) to the coordinates of the original ellipsoid is

$$ z={\Lambda}^{1/2}{R}^{\ast \prime}\left(x-a\right)={R}^{\prime}\left(x-a\right),\kern1em R={R}^{\ast }{\Lambda}^{1/2} $$

or

$$ x={R^{\prime}}^{-1}z+a. $$

Definition A.3

Let $ x\in {\mathbb{E}}_n $. The set of points

$$ P=\left\{x:{h}^{\prime}\left(x-a\right)=0\right\} $$

is said to be a plane through a, orthogonal to the vector h.

Remark A.7

The plane above can be thought of as the set of vectors, measured from a, that are orthogonal to the vector h. It is obvious that a ∈ P.

Definition A.4

Let

$$ P=\left\{x:{h}^{\prime}\left(x-a\right)=0\right\} $$

be the plane through a orthogonal to h, and put

$$ {h}^{\prime }a={c}_0. $$

The planes described by

$$ {P}_i=\left\{x:{h}^{\prime }x={c}_i\right\} $$

such that

$$ {c}_i\ne {c}_0,\kern1em {c}_i\ne {c}_j, $$

are said to be parallel to P and to each other.

Remark A.8

Parallel planes do not have points in common. Thus if P ₁ , P ₂ are two parallel planes, let x ₀ be a point on both of them. Since x ₀ ∈ P ₁ we have

$$ {h}^{\prime }{x}_0={c}_1 $$

Since x ₀ ∈ P ₂ we have

$$ {h}^{\prime }{x}_0={c}_2. $$

But this implies c ₁ = c ₂, which is a contradiction.

Remark A.9

The planes through a and −a , ∣a∣ ≠ 0, orthogonal to a vector h are parallel. Thus, the first plane is described by the equation

$$ {h}^{\prime}\left(x-a\right)=0 $$

while the second is described by

$$ {h}^{\prime}\left(x+a\right)=0. $$

Rewriting slightly we have

$$ {\displaystyle \begin{array}{l}{h}^{\prime }x={h}^{\prime }a=c,\\ {}{h}^{\prime }x=-{h}^{\prime }a=-c.\end{array}} $$

Provided c ≠ 0, it is evident that the two planes are parallel.

Definition A.5

Let $ x\in {\mathbb{E}}_n $ and let E be the ellipsoid

$$ E=\left\{x:{\left(x-a\right)}^{\prime }M\left(x-a\right)\le 1\right\}. $$

The plane (through the point x ₀ and orthogonal to the vector h)

$$ P=\left\{x:{h}^{\prime}\left(x-{x}_0\right)=0\right\} $$

is said to be a plane of support of the ellipsoid if

(a)
E and P have one point in common, and
(b)
E lies entirely on one side of P.

Remark A.10

To fully understand what “lies on one side of P” means, consider the special case of a line. Thus, if n = 2 and, for simplicity, x ₀ = 0, we have

$$ {h}_1{x}_1+{h}_2{x}_2=0. $$

Notice that the equation above divides E ₂ into three sets of points:

$$ P=\left\{x:{h}^{\prime }x=0\right\}; $$

(A.7)

$$ {P}_{-}=\left\{x:{h}^{\prime }x<0\right\}; $$

(A.8)

$$ {P}_{+}=\left\{x:{h}^{\prime }x>0\right\}. $$

(A.9)

The equation for the set given by (A.7) is

$$ {x}_2=-\frac{h_1}{h_2}{x}_1. $$

(A.10)

That for P ₋ is

$$ {x}_2=-\frac{c_1}{h_2}-\frac{h_1}{h_2}{x}_1, $$

(A.11)

and that for P ₊ is

$$ {x}_2=\frac{c_1}{h_2}-\frac{h_1}{h_2}{x}_1. $$

(A.12)

Suppose, for definiteness in our discussion,

$$ {h}_2>0. $$

The set P lies on the line in (A.10); the set P ₋ consists of all lines as in (A.11) with c ₁ > 0 ; P ₊ consists of all lines as in (A.12) with c ₁ > 0. For any c ₁ > 0 it is clear that the lines described in (A.11) lie below the line describing P, and the lines described by (A.12) lie above the line describing P. In this sense, P ₋ lies on one side of P (below) while P ₊ lies on the other side (above). The directions “above” and “below” will of course be reversed for

$$ {h}_2<0. $$

Now suppose we solve the problem of finding for a given ellipsoid the two parallel planes of support that are orthogonal to some vector h. By varying h it should be possible to describe the ellipsoid by its planes of support. This, then, will do exactly what we had asked at the beginning of this discussion, viz., produce a connection between the acceptance region of an F-test (an ellipsoid) and a number of linear hypotheses of the form h ^′ β _∗ = c (its planes of support). We have

Lemma A.2

Let $ x\in {\mathbb{E}}_n $, and let E be the ellipsoid

$$ E=\left\{x:{\left(x-a\right)}^{\prime }M\left(x-a\right)\le 1\right\} $$

and h a given vector. Let x ₀ be a point on the boundary of E, i.e., x ₀ obeys

$$ {\left({x}_0-a\right)}^{\prime }M\left({x}_0-a\right)=1. $$

Then the planes

$$ {P}_1=\left\{x:{h}^{\prime}\left(x-a\right)={\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}\right\}, $$

$$ {P}_2=\left\{x:{h}^{\prime}\left(x-a\right)=-{\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}\right\} $$

are parallel planes of support for E (at the points x ₀ and −x ₀ + 2a, respectively,) orthogonal to the vector h.

Proof

Let E and h be given. By Lemma A.1, E can be transformed to the unit sphere

$$ S=\left\{z:{z}^{\prime }z\le 1\right\}, $$

where

$$ z={R}^{\prime}\left(x-a\right) $$

(A.13)

and R is such that

$$ M=R{R}^{\prime }. $$

(A.14)

The plane

$$ P=\left\{z:{z}_0^{\prime}\left(z-{z}_0\right)=0\right\} $$

is a plane of support for S, where z ₀ lies on the boundary of S and thus

$$ {z}_0^{\prime }{z}_0=1. $$

To see this, define

$$ {P}_{-}=\left\{z:{z}_0^{\prime }z-1<0\right\}. $$

Clearly P and S have z ₀ in common. Let z ∈ S. Then z obeys

$$ {z}^{\prime }z\le 1,\kern1em \mid z\mid \le 1. $$

Since

$$ \mid {z}_0^{\prime }z\mid \le \mid {z}_0\mid \mid z\mid \le \mid z\mid \le 1 $$

we have

$$ {z}_0^{\prime }z-1\le 0, $$

so that if z ∈ S then either z ∈ P or z ∈ P ₋; hence P is indeed a plane of support. Similarly, the plane

$$ {P}^{\ast }=\left\{z:{z}_0^{\prime}\left(z+{z}_0\right)=0\right\} $$

is parallel to P and is a plane of support for S. First, −z ₀ lies on the boundary of S since

$$ \left(-{z}_0^{\prime}\right)\left(-{z}_0\right)={z}_0^{\prime }{z}_0=1; $$

moreover, −z ₀ ∈ P ^∗ since

$$ {z}_0^{\prime}\left(-{z}_0+{z}_0\right)=0. $$

Second, define

$$ {P}_{+}^{\ast }=\left\{z:{z}_0^{\prime }z+1>0\right\} $$

and note that if z ∈ S then

$$ \mid z\mid \le 1,\kern1em \mid {z}_0^{\prime }z\mid \le \mid {z}_0\mid \mid z\mid \le 1. $$

Consequently,

$$ {z}_0^{\prime }z+1\kern0.5em \geqslant \kern0.5em 0, $$

which shows that S lies on one side of $ {P}_{+}^{\ast } $; hence, the latter is a plane of support. The equation for P is, explicitly,

$$ {z}_0^{\prime }z=1, $$

and that for P ^∗ is

$$ {z}_0^{\prime }z=-1, $$

so that, indeed, they are parallel.

Let us now refer the discussion back to the original coordinates of the ellipsoid. From (A.13) we have that the equations for P and P ^∗ are, respectively,

$$ {z}_0^{\prime }{R}^{\prime}\left(x-a\right)=1,\kern1em {z}_0^{\prime }{R}^{\prime}\left(x-a\right)=-1. $$

(A.15)

Since we are seeking planes that are orthogonal to a given vector h we must have

$$ rh={Rz}_0, $$

where r is a constant. Alternatively,

$$ {z}_0={rR}^{-1}h. $$

(A.16)

But since

$$ 1={z}_0^{\prime }{z}_0={r}^2{h^{\prime }{R}^{\prime}}^{-1}{R}^{-1}h={r}^2\left({h}^{\prime }{M}^{-1}h\right), $$

we conclude that

$$ r=\pm {\left({h}^{\prime }{M}^{-1}h\right)}^{-1/2} $$

and the desired planes are

$$ {h}^{\prime}\left(x-a\right)=\pm {\left({h}^{\prime }{M}^{-1}h\right)}^{1/2} $$

(A.17)

We now show explicitly that

$$ {h}^{\prime}\left(x-a\right)={\left({h}^{\prime }{M}^{-1}h\right)}^{1/2} $$

(A.18)

is a plane of support of E through the point

$$ {x}_0=a+{R^{\prime}}^{-1}{z}_0 $$

(A.19)

orthogonal to the vector h.

Noting Eq. (A.16) and related manipulations we can write (A.18) more usefully as

$$ {h}^{\prime}\left(x-a-{R^{\prime}}^{-1}{z}_0\right)=0 $$

(A.20)

since

$$ {\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}=r\left({h}^{\prime }{M}^{-1}h\right)={h^{\prime }{R}^{\prime}}^{-1}{z}_0. $$

(A.21)

Now substituting from (A.19) in (A.20) we verify that x ₀ lies on the plane described by (A.18). But x ₀ lies also on E since

$$ {\left({x}_0-a\right)}^{\prime }M\left({x}_0-a\right)={z}_0^{\prime }{z}_0=1. $$

Moreover, if x ∈ E then

$$ \mid {R}^{\prime}\left(x-a\right)\mid \le 1 $$

so that

$$ \mid {h}^{\prime}\left(x-a\right)\mid =\left|\frac{1}{r}{z}_0^{\prime }{R}^{\prime}\left(x-a\right)\right|\le \frac{1}{r}={\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}. $$

Consequently,

$$ {h}^{\prime}\left(x-a\right)-{\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}\le 0, $$

which shows that (A.18) represents a plane of support for E. The plane parallel to that in (A.18) can be written, using the notation in (A.20), as

$$ {h}^{\prime}\left(x-a+{R^{\prime}}^{-1}{z}_0\right)=0. $$

(A.22)

With x ₀ as in (A.19) it is evident that

$$ -{x}_0+2a=a-{R^{\prime}}^{-1}{z}_0 $$

also lies on the plane. Moreover,

$$ {\left(-{x}_0+2a-a\right)}^{\prime }M\left(-{x}_0+2a-a\right)={\left({x}_0-a\right)}^{\prime }M\left({x}_0-a\right)=1, $$

so that it lies on E as well. It remains only to show that E lies on one side of the plane in (A.22). To see this let x be any point in E. As before, we have

$$ \mid {h}^{\prime}\left(x-a\right)\mid \le {\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}. $$

Consequently,

$$ {h}^{\prime}\left(x-a\right)+{\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}\ge 0, $$

which shows that E lies on one side of the plane in (A.22). q.e.d.

Corollary A.1

The equation of the strip between the two parallel planes of support for E, say at x ₀ and −x ₀ + 2a, that are orthogonal to a vector h is given by

$$ -{\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}\le {h}^{\prime}\left(x-a\right)\le {\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}. $$

Proof

Obvious from the lemma.

Remark A.11

It is clear that the ellipsoid is contained in the strip above. Indeed, if we determine all strips between parallel planes of support, the ellipsoid E can be represented as the intersection of all such strips. Hence, a point x belongs to the ellipsoid E if and only if it is contained in the inter-section.

Remark A.12

If the ellipsoid is centered at zero to begin with then

$$ a=0 $$

and the results above are somewhat simplified. For such a case the strip is given by

$$ -{\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}\le {h}^{\prime }x\le {\left({h}^{\prime }{M}^{-1}h\right)}^{1/2} $$

and the parallel planes of support have the points x ₀ and −x ₀, respectively, in common with the ellipsoid

$$ {x}^{\prime } Mx\le 1. $$

1.5 Multiple Comparison Tests—The S-Method

In this section we develop the S-method, first suggested by Scheffè (see [282, 283]) and thus named after him. The method offers a solution to the following problem: upon rejection of a hypothesis on a set of parameters to find the parameter(s) responsible for the rejection.

Consider the GLM under the standard assumptions and further assume normality for the errors. Thus the model is

$$ y= X\beta +u,\kern1em u\sim N\left(0,{\sigma}^2I\right), $$

and we have the OLS estimator

$$ \widehat{\beta}=\beta +{\left({X}^{\prime }X\right)}^{-1}{X}^{\prime }u,\kern1em \widehat{\beta}\sim N\left(\beta, {\sigma}^2{\left({X}^{\prime }X\right)}^{-1}\right). $$

Let β _∗ be a subvector of β containing k ≤ n + 1 elements. Thus, in the obvious notation,

$$ {\widehat{\beta}}_{\ast}\sim N\left({\beta}_{\ast },{\sigma}^2{Q}_{\ast}\right), $$

where Q _∗ is the submatrix of (X ^′ X)⁻¹ corresponding to the elements of β _∗. We are interested in testing, say,

$$ {\mathrm{H}}_0:{\beta}_{\ast }={\overline{\beta}}_{\ast }, $$

as against the alternative

$$ {\mathrm{H}}_1:{\beta}_{\ast}\ne {\overline{\beta}}_{\ast }. $$

First, we recall that for the true parameter vector, say $ {\beta}_{\ast}^0 $,

$$ \frac{1}{k}{\left({\widehat{\beta}}_{\ast }-{\beta}_{\ast}^0\right)}^{\prime}\frac{Q_{\ast}^{-1}}{{\widehat{\sigma}}^2}\left({\widehat{\beta}}_{\ast }-{\beta}_{\ast}^0\right)\sim {F}_{k,\kern0.5em T-n-1}, $$

where

$$ {\widehat{\sigma}}^2=\frac{{\widehat{u}}^{\prime}\widehat{u}}{T-n-1},\kern1em \widehat{u}=y-X\widehat{\beta}, $$

and F _{k , T − n − 1} is a central F-distributed variable with k and T − n − 1 degrees of freedom.

The mechanics of the test are as follows. Given the level of significance, say α, we find a number, say F _α, such that

$$ \Pr \left\{{F}_{k,\kern0.5em T-n-1}\le {F}_{\alpha}\right\}=1-\alpha . $$

In the terminology of this appendix we consider the ellipsoid E with center $ {\widehat{\beta}}_{\ast } $;

$$ E=\left\{{\beta}_{\ast }:\frac{1}{k}{\left({\beta}_{\ast }-{\widehat{\beta}}_{\ast}\right)}^{\prime}\frac{Q_{\ast}^{-1}}{{\widehat{\sigma}}^2}\left({\beta}_{\ast }-{\widehat{\beta}}_{\ast}\right)\le {F}_{\alpha}\right\}. $$

If the point specified by the null hypothesis lies in E, i.e., if

$$ {\overline{\beta}}_{\ast}\in E $$

we accept H₀, while if

$$ {\overline{\beta}}_{\ast}\notin E $$

we accept H₁

Let us rewrite the ellipsoid slightly to conform with the conventions of this appendix. Thus

$$ E=\left\{{\beta}_{\ast }:{\left({\beta}_{\ast }-{\widehat{\beta}}_{\ast}\right)}^{\prime }M\left({\beta}_{\ast }-{\widehat{\beta}}_{\ast}\right)\le 1\right\}, $$

where

$$ M=\frac{Q_{\ast}^{-1}}{k{\widehat{\sigma}}^2{F}_{\alpha }}. $$

The test then is as follows:

$$ \mathrm{accept}\kern0.5em {\mathrm{H}}_0\kern1em \mathrm{if}\kern0.5em {\overline{\beta}}_{\ast}\in E; $$

$$ \mathrm{accept}\kern0.5em {\mathrm{H}}_1\kern1em \mathrm{if}\kern0.5em {\overline{\beta}}_{\ast}\notin E. $$

In the previous discussion, however, we have established that a point belongs to an ellipsoid E if (and only if) it is contained in the intersection of the strips between all parallel planes of support. The strip between two parallel planes of support to E orthogonal to a vector h is described by

$$ {h}^{\prime}\left({\beta}_{\ast }-{\widehat{\beta}}_{\ast}\right)=\pm {\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}. $$

Hence a point, say $ {\overline{\beta}}_{\ast } $, obeys

$$ {\overline{\beta}}_{\ast}\in E $$

if any only if for any vector $ h\in {\mathbb{E}}_k $ it obeys

$$ -{\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}<{h}^{\prime}\left({\overline{\beta}}_{\ast }-{\widehat{\beta}}^{\ast}\right)<{\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}. $$

We are now in a position to prove

Theorem A.1

Consider the GLM

$$ y= X\beta +u $$

under the standard assumptions, and suppose further that

$$ u\sim N\left(0,{\sigma}^2I\right). $$

Let

$$ \widehat{\beta}=\beta +{\left({X}^{\prime }X\right)}^{-1}{X}^{\prime }u,\kern1em {\widehat{\sigma}}^2=\frac{{\widehat{u}}^{\prime}\widehat{u}}{T-n-1},\kern1em \widehat{u}=y-X\widehat{\beta}, $$

where β has n + 1 elements. Let β _∗ be a subvector of β containing k ≤ n + 1 elements and $ {\widehat{\beta}}_{\ast } $ its OLS estimator, so that

$$ {\widehat{\beta}}_{\ast}\sim N\left({\beta}_{\ast },{\sigma}^2{Q}_{\ast}\right), $$

where Q _∗ is the submatrix of (X ^′ X)⁻¹ corresponding to the elements of β _∗. Further, let there be a test of the hypothesis

$$ {\mathrm{H}}_0:{\beta}_{\ast }={\overline{\beta}}_{\ast }, $$

as against the alternative

$$ {\mathrm{H}}_1:{\beta}_{\ast}\ne {\overline{\beta}}_{\ast }. $$

Then the probability is 1 − α that simultaneously, for all vectors $ h\in {\mathbb{E}}_k $, the intervals

$$ \left({h}^{\prime }{\widehat{\beta}}_{\ast }-S{\widehat{\sigma}}_{\widehat{\phi}},\kern1em {h}^{\prime }{\widehat{\beta}}_{\ast }+S{\widehat{\sigma}}_{\widehat{\phi}}\right) $$

will contain the true parameter point, where

$$ S={\left({kF}_{\alpha}\right)}^{1/2},\kern1em {\sigma}_{\widehat{\phi}}^2={\sigma}^2{h}^{\prime }{Q}_{\ast }h=\mathrm{Var}\left({h}^{\prime }{\widehat{\beta}}_{\ast}\right),\kern1em {\widehat{\sigma}}_{\widehat{\phi}}^2={\widehat{\sigma}}^2\left({h}^{\prime }{Q}_{\ast }h\right),\kern1em \phi ={h}^{\prime }{\beta}_{\ast }, $$

and F _α is a number such that

$$ \Pr \left\{{F}_{k,T-n-1}\le {F}_{\alpha}\right\}=1-\alpha, $$

F _{k , T − n − 1} being a central F-distributed variable with k and T − n − 1 degrees of freedom.

Proof

From the preceeding discussion we have determined that the mechanics of carrying out the F-test on the hypothesis above involves the construction of the ellipsoid E with center $ {\widehat{\beta}}_{\ast } $ obeying

$$ E=\left\{{\beta}_{\ast }:{\left({\beta}_{\ast }-{\widehat{\beta}}_{\ast}\right)}^{\prime }M\left({\beta}_{\ast }-{\widehat{\beta}}_{\ast}\right)\le 1\right\}, $$

where

$$ M=\frac{Q_{\ast}^{-1}}{k{\widehat{\sigma}}^2{F}_{\alpha }} $$

and α is the specified level of significance. We accept

$$ {\mathrm{H}}_0:{\beta}_{\ast }={\overline{\beta}}_{\ast}\kern1em \mathrm{if}\kern1em {\overline{\beta}}_{\ast}\in E $$

and we accept

$$ {\mathrm{H}}_1:{\beta}_{\ast}\ne {\overline{\beta}}_{\ast}\kern1em \mathrm{if}\kern1em {\overline{\beta}}_{\ast}\notin E. $$

Another implication of the construction above is that the ellipsoid E will contain the true parameter point with probability 1 − α. But a point lies in the ellipsoid above if it lies in the intersection of the strips

$$ \mid {h}^{\prime}\left({\beta}_{\ast }-{\widehat{\beta}}_{\ast}\right)\mid <{\left({h}^{\prime }{M}^{-1}h\right)}^{1/2}=S{\widehat{\sigma}}_{\widehat{\phi}} $$

for all $ h\in {\mathbb{E}}_k $, where

$$ \phi ={h}^{\prime }{\beta}_{\ast },\kern1em {\sigma}_{\widehat{\phi}}^2=\mathrm{Var}\left({h}^{\prime }{\widehat{\beta}}_{\ast}\right). $$

Since the probability is 1−α that the ellipsoid contains the true parameter point, it follows that the probability is 1 − α that the intersection of all strips

$$ -S{\widehat{\sigma}}_{\widehat{\phi}}<{h}^{\prime}\left({\beta}_{\ast }-{\beta}_{\ast}\right)<S{\widehat{\sigma}}_{\widehat{\phi}} $$

for $ h\in {\mathbb{E}}_k $ will contain the true parameter point. Alternatively, we may say that the probability is 1 − α that simultaneously, for all vectors $ h\in {\mathbb{E}}_k $, the intervals

$$ \left({h}^{\prime }{\widehat{\beta}}_{\ast }-S{\widehat{\sigma}}_{\widehat{\dot{\phi}}},\kern1em {h}^{\prime }{\widehat{\beta}}_{\ast }+S{\widehat{\sigma}}_{\widehat{\dot{\phi}}}\right) $$

will contain the true parameter point. q.e.d.

Remark A.13

The result above is quite substantially more powerful than the usual F-test. If it is desired to test the hypothesis stated in the theorem we proceed to check whether the point $ {\overline{\beta}}_{\ast } $ lies in the ellipsoid, i.e., whether

$$ {\left({\overline{\beta}}_{\ast }-{\widehat{\beta}}_{\ast}\right)}^{\prime }M\left({\overline{\beta}}_{\ast }-{\widehat{\beta}}_{\ast}\right)\le 1. $$

If so we accept H₀; if not we accept H₁. In the latter case, however, we can only conclude that at least one element of β _∗ is different from the corresponding element in $ {\overline{\beta}}_{\ast } $. But which, we cannot tell. Nor can we tell whether more than one such element differs. This aspect is perhaps best illustrated by a simple example. Consider the case where

$$ {\beta}_{\ast }={\left({\beta}_1,{\beta}_2\right)}^{\prime }. $$

If the F-test rejects H₀ we may still wish to ascertain whether we should accept:

$$ {\beta}_1=0,\kern1em {\beta}_2\ne 0; $$

or

$$ {\beta}_2=0,\kern1em {\beta}_1\ne 0; $$

or

$$ {\beta}_1\ne 0,\kern1em {\beta}_2\ne 0,\kern1em {\beta}_1\ne {\beta}_2. $$

The standard practice is to use the t-test on each of the relevant parameters. But proceeding in this sequential fashion means that the nominal levels of significance we claim for these tests are not correct. In particular, we shall proceed to test, e.g.,

$$ {\beta}_1=0,\kern1em {\beta}_2\ne 0 $$

only if the initial F-test leads us to accept

$$ {\beta}_{\ast}\ne 0. $$

Then, the t-test above is a conditional test and the level of significance could not be what we would ordinarily claim. The theorem above ensures that we can carry out these tests simultaneously, at the α level of significance. Thus, consider the vectors

$$ {h}_{\cdot 1}={\left(1,0\right)}^{\prime },\kern1em {h}_{\cdot 2}={\left(0,1\right)}^{\prime },\kern1em {h}_{\cdot 3}={\left(1,-1\right)}^{\prime } $$

and define

$$ {\phi}_1={\beta}_1,\kern1em {\phi}_2={\beta}_2,\kern1em {\phi}_3={\beta}_1-{\beta}_2, $$

$$ {Q}_{\ast }=\left({q}_{ij}\right),\kern1em i,j=1,2. $$

If $ {\widehat{\sigma}}^2 $ is the OLS induced estimate of σ ² we have

$$ {\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_1}^2={\widehat{\sigma}}^2{q}_{11},\kern1em {\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_2}^2={\widehat{\sigma}}^2{q}_{22},\kern1em {\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_3}^2={\widehat{\sigma}}^2\left({q}_{11}-2{q}_{12}+{q}_{22}\right). $$

The intervals induced by the S-method are

$$ {\displaystyle \begin{array}{l}\left({\widehat{\beta}}_1-{\left(2{F}_{\alpha}\right)}^{1/2}{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_1},{\widehat{\beta}}_1+{\left(2{F}_{\alpha}\right)}^{1/2}{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_1}\right),\kern1em \left({\widehat{\beta}}_2-{\left(2{F}_{\alpha}\right)}^{1/2}{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_2},{\widehat{\beta}}_2+{\left(2{F}_{\alpha}\right)}^{1/2}{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_2}\right)\;\\ {}\kern3.239999em \left({\widehat{\beta}}_1-{\widehat{\beta}}_2-{\left(2{F}_{\alpha}\right)}^{1/2}{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_3},{\widehat{\beta}}_1-{\widehat{\beta}}_2+{\left(2{F}_{\alpha}\right)}^{1/2}{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_3}\right),\end{array}} $$

where F _α is a number such that

$$ \Pr \left\{{F}_{2,\kern0.5em T-n-1}\le {F}_{\alpha}\right\}=1-\alpha . $$

Remark A.14

The common practice in testing the hypotheses above is to apply the t-test seriatim. Let t _α be a number such that

$$ \Pr \left\{|{t}_{T-n-1}|\kern0.5em \le {t}_{\alpha}\right\}=1-\alpha, $$

where t _{T − n − 1} is a central t-variable with T − n − 1 degrees of freedom. The intervals based on the t-statistic are

$$ \left({\widehat{\beta}}_1-{t}_{\alpha }{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_1},{\widehat{\beta}}_1+{t}_{\alpha }{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_1}\right),\kern1em \left({\widehat{\beta}}_2-{t}_{\alpha }{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_2},{\widehat{\beta}}_2+{t}_{\alpha }{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_2}\right) $$

$$ \left({\widehat{\beta}}_1-{\widehat{\beta}}_2-{t}_{\alpha }{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_3},{\widehat{\beta}}_1-{\widehat{\beta}}_2+{t}_{\alpha }{\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_3}\right). $$

It is not correct to say that the true level of significance of tests based on these intervals is the stated one and we certainly cannot state that the probability is 1 − α that simultaneously the three intervals above contain the true parameter point.

Remark A.15

To make the comparison between intervals given by the S-method and those yielded by the t-statistic concrete, let us use a specific example. Thus, take

$$ \alpha =.05,\kern1em k=2,\kern1em T-n-1=30, $$

so that

$$ {F}_{\alpha }=3.32,\kern1em {t}_{\alpha }=2.04. $$

Suppose that a sample yields

$$ {Q}_{\ast }=\left[\begin{array}{ll}.198& .53\\ {}.53& 1.84\end{array}\right],\kern1em {\widehat{\sigma}}^2=.05. $$

We can easily establish that any combination of estimates $ {\widehat{\beta}}_1,{\widehat{\beta}}_2 $ obeying

$$ \frac{{\widehat{\beta}}_{\ast}^{\prime }{Q}_{\ast}^{-1}{\widehat{\beta}}_{\ast }}{2{\widehat{\sigma}}^2}=220.57{\widehat{\beta}}_1^2+23.74{\widehat{\beta}}_2^2-127.10{\widehat{\beta}}_1{\widehat{\beta}}_2>3.32 $$

will result in rejection of the null hypothesis

$$ {\beta}_{\ast }=0. $$

Further, we obtain

$$ {\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_1}^2=.0099,\kern1em {\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_2}^2=.092,\kern1em {\widehat{\sigma}}_{{\widehat{\dot{\phi}}}_3}^2=.049. $$

The intervals based on the S-method are

$$ \left({\widehat{\beta}}_1-.256,{\widehat{\beta}}_1+.256\right),\kern1em \left({\widehat{\beta}}_2-.782,{\widehat{\beta}}_2+.782\right), $$

$$ \left({\widehat{\beta}}_1-{\widehat{\beta}}_2-.570,{\widehat{\beta}}_1-{\widehat{\beta}}_2+.570\right). $$

The intervals based on (bilateral) t-tests, all at the nominal significance level of 5%, are

$$ \left({\widehat{\beta}}_1-.203,{\widehat{\beta}}_1+.203\right),\kern1em \left({\widehat{\beta}}_2-.619,{\widehat{\beta}}_2+.619\right), $$

$$ \left({\widehat{\beta}}_1-{\widehat{\beta}}_2-.451,{\widehat{\beta}}_1-{\widehat{\beta}}_2+.451\right). $$

The reader should note that the invervals based on the S-method are appreciably wider than those based on bilateral t-tests. This is, indeed, one of the major arguments employed against the multiple comparisons test. In the general case, the comparison of the width of these two sets of intervals depends on the comparison of

$$ {\left({kF}_{\alpha; \kern0.5em k,\kern0.5em T-n-1}\right)}^{1/2}\kern1em \mathrm{and}\kern1em {t}_{\alpha; \kern0.5em T-n-1}. $$

Since

$$ {t}_{T-n-1}^2={F}_{1,\kern0.5em T-n-1} $$

it follows that the comparison may also be said to rest on the difference

$$ {kF}_{\alpha; \kern0.5em k,\kern0.5em T-n-1}-{F}_{\alpha; 1,\kern0.5em T-n-1}. $$

Moreover, it may be verified (even by a casual look at tables of the F-distribution) that the difference above for T − n − 1 in the vicinity of 30 grows with k; hence, it follows that the more parameters we deal with the wider the intervals based on the S-method, relative to those implied by the (bilateral) t-tests.

Remark A.16

It is clear that in the context of the example in Remark A.15 and basing our conclusion on bilateral t-tests, any estimates obeying

$$ {\widehat{\beta}}_1>.203,\kern1em {\widehat{\beta}}_2>.619,\kern1em {\widehat{\beta}}_1-{\widehat{\beta}}_2>.451 $$

will lead us to accept

$$ {\beta}_1\ne 0,\kern1em {\beta}_2\ne 0,\kern1em {\beta}_1-{\beta}_2\ne 0. $$

Any estimates obeying

$$ {\widehat{\beta}}_1>.203,\kern1em {\widehat{\beta}}_2<.619 $$

will lead us to accept

$$ {\beta}_1\ne 0,\kern1em {\beta}_2=0, $$

while any statistics obeying

$$ {\widehat{\beta}}_1<.203,\kern1em {\widehat{\beta}}_2>.619 $$

will lead us to accept

$$ {\beta}_1=0,\kern1em {\beta}_2\ne 0. $$

Using the S-method, however, would require for the cases enumerated above (respectively):

$$ {\displaystyle \begin{array}{lll}{\widehat{\beta}}_1>.256,& {\widehat{\beta}}_2>.782,& {\widehat{\beta}}_1-{\widehat{\beta}}_2>.570;\\ {}{\widehat{\beta}}_1>.256,& {\widehat{\beta}}_2<.782;& \\ {}{\widehat{\beta}}_1<.256,& {\widehat{\beta}}_2>.782.& \end{array}} $$

It is worth noting that if the parameter estimates were, in fact,

$$ {\widehat{\beta}}_1=.22,\kern1em {\widehat{\beta}}_2=.80 $$

and $ {\widehat{\sigma}}^2 $ and Q _∗ as in Remark A.15, the relevant F-statistic would have been

$$ {F}_{2,30}=3.50. $$

Since, for α = . 05 , F _{α ; 2 , 30} = 3.32 the hypothesis

$$ {\beta}_{\ast }=0 $$

would have been rejected. If, subsequently, we were to use a series of bilateral t-tests each at the nominal level of significance of 5% we could not reject the hypothesis

$$ {\beta}_1\ne 0,\kern1em {\beta}_2\ne 0,\kern1em {\beta}_1-{\beta}_2\ne 0 $$

since the estimates

$$ {\widehat{\beta}}_1=.22,\kern1em {\widehat{\beta}}_2=.80,\kern1em {\widehat{\beta}}_1-{\widehat{\beta}}_2=-.58 $$

will define the intervals

$$ \left(.017,.423\right),\kern1em \left(.181,1.419\right),\kern1em \left(-1.031,-.129\right). $$

On the other hand, if we employ the S-method of multiple comparisons we could not reject

$$ {\beta}_1=0,\kern1em {\beta}_2\ne 0,\kern1em {\beta}_1-{\beta}_2\ne 0. $$

This is so since the estimates will define the intervals

$$ \left(-0.036,.476\right),\kern1em \left(.018,1.582\right),\kern1em \left(-1.150,-0.010\right), $$

and the conclusions reached by the two methods will differ.

In view of the fact that the nominal levels of significance of the t-test are incorrect, it might be better to rely more extensively, in empirical work, on the S-method for multiple comparisons.

Remark A.17

It is important to stress that the S-method is not to be interpreted as a sequential procedure, i.e., we should not think that the multiple tests procedure is to be undertaken only if the F-test rejects the null hypothesis, say

$$ {\beta}_{\ast }=0. $$

If we followed this practice we would obviously have a conditional test, just as in the case of the sequential t-tests. In such a context the multiple tests could not have the stated level of significance. Their correct significance level may be considerably lower and will generally depend on unknown parameters. In this connection see the exchange between H. Scheffè and R. A. Olshen [38].

The proper application of the S-method requires that the type of comparisons desired be formulated prior to estimation rather than be formulated and carried out as an afterthought following the rejection of the null hypothesis by the F-test.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dhrymes, P. (2017). The General Linear Model II. In: Introductory Econometrics. Springer, Cham. https://doi.org/10.1007/978-3-319-65916-9_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-65916-9_2
Published: 23 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65914-5
Online ISBN: 978-3-319-65916-9
eBook Packages: Economics and FinanceEconomics and Finance (R0)

Publish with us

Policies and ethics

The General Linear Model II

Abstract

Access this chapter

Notes

References

Author information

Authors and Affiliations

Appendix

Appendix

1.1 Noncentral Chi Square

Proposition A.1

Remark A.1

1.2 Noncentral F-Distributions

Proposition A.2

Example A.1

Remark A.2

Example A.2

1.3 Multiple Comparison Tests

1.4 Geometric Preliminaries

Definition A.1

Remark A.3

Remark A.4

Definition A.2

Remark A.5

Lemma A.1

Proof

Remark A.6

Definition A.3

Remark A.7

Definition A.4

Remark A.8

Remark A.9

Definition A.5

Remark A.10

Lemma A.2

Proof

Corollary A.1

Proof

Remark A.11

Remark A.12

1.5 Multiple Comparison Tests—The S-Method

Theorem A.1

Proof

Remark A.13

Remark A.14

Remark A.15

Remark A.16

Remark A.17

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation