Abstract
In the preceding chapter we derived the OLS estimator of the (coefficient) parameters of the GLM and proved that a number of properties can be ascribed to it. In so doing, we have not assumed any specific form for the distribution of the error process. It was, generally, more than sufficient in that context to assert that the error process was one of i.i.d. random variables with zero mean and finite variance. However, even though unbiasedness, consistency, and efficiency could be proved, the distributional properties of such estimators could not be established. Consequently, tests of significance could not be formulated. In subsequent discussion we shall introduce an explicit assumption regarding the distribution of the error process, and determine what additional implications this might entail for the OLS estimators. In particular, recall that the assumptions under which estimation was carried out were:
-
(A.1)
The explanatory variables are nonstochastic and linearly independent, i.e., if X = (x ti ) , t = 1 , 2 , … , T , i = 0 , 1 , 2 , … , n, is the matrix of observations on the explanatory variables then X is nonstochastic and rank (X) = n + 1;
-
(A.2)
The limit
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The discussion in this section may be bypassed without loss of continuity. It represents a digression in several aspects of the distribution of quadratic forms. The reader need only know the conclusions of the various propositions. The proofs and ancillary discussion are not essential to the understanding of subsequent sections.
- 2.
We remind the reader that a maintained hypothesis is one about whose validity we are certain or, at any rate, one whose validity we do not question—whether this is due to certainty or convenience is another matter.
- 3.
Strictly speaking we are seeking a saddle point of the Lagrangian, but commonly one speaks of “maximizing.”
- 4.
- 5.
One should examine and analyze residuals, as suggested by Anscombe and Tukey [4], who advocated examining standardized residuals as a numerical procedure to test for outliers. They suggested that it is generally unsafe to apply a numerical procedure until outliers have been screened out.
- 6.
- 7.
- 8.
Guerard, Rachev, and Shao [161] reported the usefulness of the DFBETA, the studentized residual, CookD, and COVRATIO calculations performed with SAS and the GLER data during the 1997–2011 time period.
- 9.
- 10.
This amounts to considering what Raifa and Schleifer [33] have called the preposterior distribution.
- 11.
At least this is the motivation inherited from the physical sciences.
References
Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H., & Tukey, J. W. (1972). Robust estimate of location: Survey and advances. Princeton, N.J: Princeton University Press.
Anscombe, F. J., & Tukey, J. W. (1963). The examination and analysis of residuals. Technometrics, 5, 141–169.
Ansley, C. F. (1979). An algorithm for the exact likelihood of a mixed autoregressive moving average process. Biometrika, 66, 59–65.
Basu, S. (1977). Investment performance of common stocks in relations to their price earnings ratios: A test of market efficiency. Journal of Finance, 32, 663–682.
Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York: John Wiley & Sons.
Beyer, W. H. (1960). Handbook of tables for probability and statistics. Cleveland: The Chemical Rubber Co..
Blin, J. M., Bender, S., & Guerard, J. B. Jr. (1997). Earnings Forecasts, Revisions and Momentum in the Estimation of Efficient Market-Neutral Japanese and U.S. Portfolios. In A. Chen (Ed.), Research in Finance, 15.
Brealey, R. A., Myers, S. C., & Allen, F. (2006). Principles of corporate finance (8th ed.). New York: McGraw-Hill/Irwin.
Cook, R. D. (1977). Detection of Influential observations in linear regression. Technometrics, 19, 15–18.
Cook, R. D. (1998). Regression graphics. New York: Wiley.
Dhrymes, P. J., & Guerard Jr., J. B. (2017). Returns, risk, portfolio selection, and evaluation. In J. Guerard (Ed.), Portfolio construction, Measurment, and efficiency: Essays in honor of Jack Tteynor. New York: Springer.
Graham, B., & Dodd, D. (1934). Security analysis: Principles and technique. New York: McGraw-Hill Book Company.
Guerard Jr., J. B., Rachev, R. T., & Shao, B. (2013). Efficient global portfolios: Big data and investment universe. IBM Journal of Research and Development, 57(5), 11.
Gunst, R. F., & Mason, R. L. (1980). Regression analysis and its application. New York: Marcel Dekker, Inc.
Hendry, D. F., & Doornik, J. A. (2014). Empirical model discovery and theory evaluation. Cambridge: MIT Press.
Huber, P. J. (1981). Rubust statistics. Cambridge, MA: Harvard University Press.
Koopmans, T. C., & Hood, W. C. (1953). The estimation of simultaneous linear economic relationships. In W. C. Hood & T. C. Koopmans (Eds.)., Chapter 6 Studies in econometric method. New York: Wiley.
Maronna, R. A., Martin, R. D., & Yojai, V. J. (2006). Robust statistics: Theory and methods. New York: Wiley.
Scheffè, H. (1959). The analysis of variance. New York: Wiley.
Scheffè, H. (1977). A note on a formulation of the S-method of multiple comparison. Journal of the American Statistical Association, 72, 143–146.
Webster, J. T., Gunst, R. F., & Mason, R. L. (1974). Latent root regression analysis. Technometrics, 16, 513–522.
Author information
Authors and Affiliations
Appendix
Appendix
The discussion in this appendix has two objectives:
-
(i)
to examine the power aspects of tests of significance, i.e., the probability of rejecting the null hypothesis when the alternative is true;
-
(ii)
to present the multiple comparison test as a complement to the usual F-test for testing, simultaneously, a number of hypotheses.
We remind the reader that if we wish to test the hypothesis
as against the alternative
where β is, say, an (n + 1)-element vector, then:
If the hypothesis is accepted we “know” that, within the level of significance specified, all elements of β are zero;
If the hypothesis is rejected, however, then all we “know” is that at least one element of β is nonzero. It would be desirable to go beyond this and determine which of the elements are and which are not, zero.
It is this function that is performed by the multiple comparison test.
We recall that designing a test for a hypothesis is equivalent to specifying a critical region. The latter is a subset of the sample space such that, if the (sample) observations fall therein we reject the hypothesis and if they do not accept it. The probability assigned to the critical region (as a function of the unknown parameters of the underlying distribution) is called the power function associated with the test (or the critical region). The reader may have observed that when we discussed various tests in the preceding chapter we had always derived the distribution of the test statistic on the assumption that the null hypothesis is correct. The usefulness of tests in discriminating between true and false hypotheses is, in part, judged by the level of significance—which is the value assumed by the power function on the assumption that the null hypothesis represents the truth. This gives us some indication of the frequency with which such procedures will reject true null hypotheses. Another criterion, however, is the power of the test—which is the value assumed by the power function on the assumption that the alternative hypothesis represents the truth. In order to evaluate the latter, however, we need the distribution of the test statistic when the alternative represents the truth. The perceptive reader should have been impressed by the fact that (and possibly wondered why) in our discussions we had always derived the distribution of the various test statistics on the assumption that the null hypothesis is true. In part, this is consistent with the nature and inherent logic of the procedure as well as the presumption that an investigator will resort to a statistical test when his substantive knowledge of a phenomenon has progressed to the stage where, having mastered nearly all its ramifications, he has an inkling that his entire conception of the phenomenon is consistent with a specific set of parametric values. This he formulates as the null hypothesis; and, indeed he would be very surprised if the test were to lead to its rejection.Footnote 10 In part, however, the practice is also due to the fact that the distributions under the alternative hypothesis are appreciably more difficult to deal with.
Insofar as we have dealt earlier with the chi-square, t-, and F-distributions, we shall now deal with the noncentral chi-square, t-, and F-distributions. The latter are the distributions of the (appropriate) test statistics, dealt with previously, when the alternative hypothesis is true.
1.1 Noncentral Chi Square
We recall that the chi-square variable with m degrees of freedom (typically denoted by \( {\chi}_m^2 \) and more appropriately called central chi square) has the density function
where Γ(⋅) is the gamma function (See Problem 15). We also recall that if y ∼ N(0, I), then
The noncentral chi-square distribution arises when, in the relation above, the means of (some of) the basic normal variables are nonzero. Thus, the problem to be examined is as follows: if x is an n-element (column) random vector such that
what can we say about the distribution of
To handle this problem we employ a transformation that reduces the situation as closely as possible to the standard chi-square distribution—also called the central chi-square distribution.
We note that there exists an orthogonal matrix A such that its first row is (θ ′/δ) where
(In this connection see also Problem 16.) Now put
and observe that
Since
we see that the desired random variable (x ′ x) has been expressed as the sum of the square of an N(δ, 1) random variable \( \left({y}_1^2\right) \) and a \( {\chi}_m^2\left(m=n-1\right) \) random variable (u). Moreover we know that \( {y}_1^2 \) and u are mutually independent.
To find the distribution of x ′ x we can begin with the joint density of (y 1, u), which is
From this, applying the results of Proposition 4 of Chap. 10, we can hope to find the density of \( {y}_1^2+u \). Thus, consider the transformation
The Jacobian of this transformation is z 1/2. Hence, the joint density of (z, t) is
The right member above is obtained by expanding
To obtain the density of z we integrate out the irrelevant variable t, i.e., we obtain from the joint density of (z, t) the marginal density of z. The integration involved is
and we observe that the integral vanishes for odd r. Hence in the series above we should replace r by 2r, the range of r still being {0, 1, …}. Now, for even powers we deal with
and the integral may be shown to be (see Problem 17)
Hence, remembering that terms containing odd r vanish and substituting the results above in the infinite sum, we find that the density function of z is given by
where: h n + 2r (⋅) is the density function of a central chi-square variable with n + 2r degrees of freedom;
and (see Problem 15)
We shall now show that
thus expressing the noncentral chi-square density as a weighted average of central chi-square densities (with degrees of freedom n + 2r). It is a weighted average since the weights e −λ(λ r/r!) sum to unity, where \( \lambda =\frac{1}{2}{\delta}^2 \). In fact the weights are simply the ordinates of the massfunction of a Poisson distributed (discrete) random variable.
Now, observe that
Also
Thus
as claimed, and we can now write
We, therefore, have the following:
Proposition A.1
Let x ∼ N(θ, I) , x being n × 1; the density function of
is given by
where
and
i.e., the density of z is a convex combination, with Poisson weights of central chi-square distributions with parameters n + 2r , r = 0 , 1 , 2 , ….
Remark A.1
The variable z above is said to have the noncentral chi-square distribution with parameters n and \( \lambda =\frac{1}{2}{\theta}^{\prime}\theta \). The latter is said to be the noncentrality parameter, while the former is called the degrees of freedom parameter. Such a variable is denoted by \( {\chi}_n^2\left(\lambda \right) \). Note that \( {\chi}_n^2(0) \) is the usual central chi-square variable.
1.2 Noncentral F-Distributions
We remind the reader that the (central) F-distribution with n 1 , n 2 degrees of freedom is the distribution of the ratio
where \( {w}_i\sim {\chi}_{n_i}^2,\kern1em i=1,2 \), and the two random variables are mutually independent. We also recall that the (central) t-distribution is the distribution of the ratio
where u ∼ N(0, 1) , w 2 is \( {\chi}_{n_2}^2 \), and the two variables are mutually independent. Hence the square of a t-distributed variable with n 2 degrees of freedom is distributed according to the F-distribution with 1 and n 2 degrees of freedom. For symmetric t-tests, which are almost universal in applied econometric work, we may most conveniently employ the central F-distribution. Consequently, in the discussion to follow we shall only deal with noncentral F-distributions.
Thus, let w i , i = 1 , 2, be two mutually independent noncentral chi-square variables
We seek the distribution of
Instead of the distribution of F, however, we shall first find the distribution of
The reason why u is considered first instead of F is that there exist tabulations for the distribution of u (for selected parameter values) while no such tabulations exist for the distribution of F. Noting that
we see that operating with u is perfectly equivalent to operating with F.
To find the distribution of u we employ the same procedure as employed in the previous section. The joint density of w 1 , w 2 is
where
Use the transformation
The Jacobian of the transformation is
Upon substitution, the typical term of the infinite series above—apart from r 2 , r 1—becomes
where
and B(s 1, s 2) is the beta function with parameters s 1 and s 2. To find the density of u we integrate out w; to this effect make the change in variable
and observe that, apart from a factor of proportionality, we have to integrate with respect to w ∗
The integral of the bracketed expression is unity, since it is recognized as the integral of the (central) chi-square density with 2(s 1 + s 2) degrees of freedom. Consequently, the density of u is given by
which is recognized as a convex combination of beta distributions with parameters s i , i = 1 , 2 , s i = (n i + 2r i )/2. If we wish to obtain the density function of
we need only observe that
The Jacobian of this transformation is
while
Thus, substituting above, we have the density function of F,
We therefore have
Proposition A.2
Let
be mutually independent. Then the density function of
is given by
The density function of
is given by
In either case the density is uniquely determined by four parameters: n 1 , n 2, which are the degrees of freedom parameters, and λ 1 , λ 2, which are the noncentrality parameters (of w 1 and w 2 respectively).
The results of Proposition A.2 may be specialized to the various tests considered in the preceding chapter.
Example A.1
Consider again the test given in Proposition 8. Suppose the null hypothesis is in fact false and
Define
The square of the statistic developed there is
and is, thus, distributed as a noncentral F-distribution with parameters
Hence, given the critical region the power of the test may be evaluated on the basis of the noncentral F-distribution with parameters as above. Unfortunately, power calculations are not performed very frequently in applied econometric work, although the necessary tables do exist. (See e.g., [30].)
Remark A.2
The preceding example affords us an opportunity to assess, somewhat heuristically, the impact of multicollinearity on hypotheses testing. We shall examine collinearity in a subsequent chapter. For the moment it is sufficient to say that it is relevant to the extent that one or more of the explanatory variables in a GLM can be “explained” by the others.
In the context of this example note that if by \( {\widehat{x}}_{\cdot i} \) we denote that part of the variable x ⋅i that can be explained by (its regression on) the remaining explanatory variables and by s ⋅i , the vector of residuals, we can write
Note that
the x ⋅j being the observations on the remaining variables of the GLM.
One can then show that
where, of course,
and X 1 is the submatrix of X obtained when from the latter we suppress x ⋅i .
If the hypothesis
is false then the quantity
has mean
Hence, the noncentrality parameter will be
If the sample is relatively collinear with respect to x ⋅i then, even though λ 2 may be large, the noncentrality parameter λ 1 would tend to be “smaller” owing to the fact the sum of squared residuals, \( {s}_{\cdot i}^{\prime }{s}_{\cdot i} \), would tend to be (appreciably) smaller relative to the situation that would prevail if the sample were not collinear. Since, generally, we would expect the power of the test to increase with the noncentrality parameter it follows that collinearity would exert an undesirable effect in this context.
Example A.2
A common situation is that of testing the returns to scale parameter in production function studies. If, for example, we deal with the Cobb–Douglas production function
(U representing the error process) we may estimate \( \widehat{\alpha}+\widehat{\beta} \). If the variance of the sum is of the form \( {\widehat{\sigma}}^2r \) then r will be known and \( {\widehat{\sigma}}^2 \) would be proportional to a \( {\chi}_{T-3}^2 \)-variable. Thus, on the null hypothesis
we would have
Suppose, however,
Then the statistic above is noncentral F with parameters
Suppose that
From the relevant tables in [5] we see that the power of such a test (with level of significance 0.05) is approximately 0.3. But this means that the returns to scale parameter may be quite high and, depending on r, the probability of rejecting the null hypothesis will still only be 0.3. We will have such a situation if λ = . 7 and σ 2 r = . 5. Thus, such procedures lead to the acceptance of false null hypotheses alarmingly frequently. Of course, if λ 1 were even closer to zero the power would have been even lower. This is to warn the reader that adjacent hypotheses cannot be distinguished with great confidence when the sample on which the inference is based is not very large. Thus, in such procedures as that given in this example, if a given moderate-sized sample (say 30 or 40) leads us to accept the constant returns to scale hypothesis, the reader ought to bear in mind that in all likelihood it would also lead us to accept the null hypothesis
as against the alternative
The economic implications of constant, as against increasing, returns to scale, however, are very different indeed!
1.3 Multiple Comparison Tests
Consider the test of the hypothesis
as against
in the context of the GLM
where
in the usual notation of the chapter.
To carry out the test we proceed as follows. Letting σ 2 Q ∗ be the covariance matrix of the OLS estimator of β ∗ we form the quantity
it being understood that there are T observations, that β ∗ contains n parameters, and that \( \widehat{u} \) is the vector of OLS residuals. Under the null hypothesis (A.1) becomes
The quantity in (A.2) is now completely determined, given the data, i.e., it is a statistic. The distribution of (A.2) under H0 is central F with n and T − n − 1 degrees of freedom.
The mechanics of the test are these. From the tables of the F n , T − n − 1-distribution we determine a number, say F α , such that
where α is the level of significance, say α = . 05 or α = . 025 or α = . 01 or whatever
The acceptance region is
while the rejection region is
The geometric aspects of the test are, perhaps, most clearly brought out if we express the acceptance region somewhat differently. Thus, we may write
where, evidently,
and where for the sake of generality we have written the hypothesis as
as against
In the customary formulation of such problems one takes
In the preceding, of course, the elements of \( {\overline{\beta}}_{\ast } \) are numerically specified. For greater clarity we may illustrate the considerations entering the test procedure, in the two-dimensional case, as in Fig. A.1.
The relation in (A.3) represents an ellipsoid with center at \( {\overline{\beta}}_{\ast } \). If the statistic
falls within the ellipsoid we accept H0, while if it falls outside we accept H1. If the statistic is represented by p 1 we will reject H0, but it may well be that what is responsible for this is the fact that we “ought” to accept
Similarly, if the statistic is represented by p 2 we will reject H0 but it may well be that this is so because we “ought” to accept
If the statistic is represented by p 3 then, perhaps, we “ought” to accept
The F-test, however, does not give any indication as to which of these alternatives may be appropriate.
The configurations above belong to the class of functions of the form
It would be desirable to find a way in which tests for the hypotheses in (A.4) may be linked to the acceptance ellipsoid of the F-test. If this is accomplished, then perhaps upon rejection of a hypothesis by the F-test we may be able to find the “parameters responsible” for the rejection. Needless to say this will shed more light on the empirical implications of the test results.
The connection alluded to above is provided by the relation of the planes of support of an ellipsoid to the “supported” ellipsoid. Some preliminary geometric concepts are necessary before we turn to the issues at hand. In part because of this, the discussion will be somewhat more formal than in the earlier part of this appendix.
1.4 Geometric Preliminaries
Definition A.1
Let \( x\in {\mathbb{E}}_n \), where \( {\mathbb{E}}_n \) is the n-dimensional Euclidean space. The set of points
where c is a positive constant and M a positive definite matrix is said to be an ellipsoid with center at a.
Remark A.3
It entails no loss of generality to take c = 1. Thus, in the definition above, dividing through by c we have
If M is positive definite and c > 0 then, clearly,
is also a positive definite matrix. In the definitions to follow we will always take c = 1.
Remark A.4
The special case
is referred to as an ellipsoid in canonical or standard form.
Definition A.2
Let \( x\in {\mathbb{E}}_n \). The set of points
where c > 0 is said to be an n-dimensional sphere, with center at a. The special case
is referred to as a sphere in canonical or standard form, or simply a unit sphere.
Remark A.5
Notice that an ellipsoid in canonical form is simply a unit sphere whose coordinates, x i , i = 1 , 2 , …, n, have been stretched or contracted, respectively, by the factors m i , i = 1 , 2 , … , n.
Lemma A.1
Every ellipsoid
-
(i)
can be put in canonical form, and
-
(ii)
can be transformed into the unit sphere.
Proof
Let
be an ellipsoid. Let R ∗ , Λ be, respectively, the matrices of characteristic vectors and roots of M. Put
and rewrite (A.5) as
This proves the first part of the lemma. For the second part, we put
and note that E is transformed to the unit sphere
Remark A.6
The relationship of the coordinates of the sphere (to which the ellipsoid is transformed) to the coordinates of the original ellipsoid is
or
Definition A.3
Let \( x\in {\mathbb{E}}_n \). The set of points
is said to be a plane through a, orthogonal to the vector h.
Remark A.7
The plane above can be thought of as the set of vectors, measured from a, that are orthogonal to the vector h. It is obvious that a ∈ P.
Definition A.4
Let
be the plane through a orthogonal to h, and put
The planes described by
such that
are said to be parallel to P and to each other.
Remark A.8
Parallel planes do not have points in common. Thus if P 1 , P 2 are two parallel planes, let x 0 be a point on both of them. Since x 0 ∈ P 1 we have
Since x 0 ∈ P 2 we have
But this implies c 1 = c 2, which is a contradiction.
Remark A.9
The planes through a and −a , ∣a∣ ≠ 0, orthogonal to a vector h are parallel. Thus, the first plane is described by the equation
while the second is described by
Rewriting slightly we have
Provided c ≠ 0, it is evident that the two planes are parallel.
Definition A.5
Let \( x\in {\mathbb{E}}_n \) and let E be the ellipsoid
The plane (through the point x 0 and orthogonal to the vector h)
is said to be a plane of support of the ellipsoid if
-
(a)
E and P have one point in common, and
-
(b)
E lies entirely on one side of P.
Remark A.10
To fully understand what “lies on one side of P” means, consider the special case of a line. Thus, if n = 2 and, for simplicity, x 0 = 0, we have
Notice that the equation above divides E 2 into three sets of points:
The equation for the set given by (A.7) is
That for P − is
and that for P + is
Suppose, for definiteness in our discussion,
The set P lies on the line in (A.10); the set P − consists of all lines as in (A.11) with c 1 > 0 ; P + consists of all lines as in (A.12) with c 1 > 0. For any c 1 > 0 it is clear that the lines described in (A.11) lie below the line describing P, and the lines described by (A.12) lie above the line describing P. In this sense, P − lies on one side of P (below) while P + lies on the other side (above). The directions “above” and “below” will of course be reversed for
Now suppose we solve the problem of finding for a given ellipsoid the two parallel planes of support that are orthogonal to some vector h. By varying h it should be possible to describe the ellipsoid by its planes of support. This, then, will do exactly what we had asked at the beginning of this discussion, viz., produce a connection between the acceptance region of an F-test (an ellipsoid) and a number of linear hypotheses of the form h ′ β ∗ = c (its planes of support). We have
Lemma A.2
Let \( x\in {\mathbb{E}}_n \), and let E be the ellipsoid
and h a given vector. Let x 0 be a point on the boundary of E, i.e., x 0 obeys
Then the planes
are parallel planes of support for E (at the points x 0 and −x 0 + 2a, respectively,) orthogonal to the vector h.
Proof
Let E and h be given. By Lemma A.1, E can be transformed to the unit sphere
where
and R is such that
The plane
is a plane of support for S, where z 0 lies on the boundary of S and thus
To see this, define
Clearly P and S have z 0 in common. Let z ∈ S. Then z obeys
Since
we have
so that if z ∈ S then either z ∈ P or z ∈ P −; hence P is indeed a plane of support. Similarly, the plane
is parallel to P and is a plane of support for S. First, −z 0 lies on the boundary of S since
moreover, −z 0 ∈ P ∗ since
Second, define
and note that if z ∈ S then
Consequently,
which shows that S lies on one side of \( {P}_{+}^{\ast } \); hence, the latter is a plane of support. The equation for P is, explicitly,
and that for P ∗ is
so that, indeed, they are parallel.
Let us now refer the discussion back to the original coordinates of the ellipsoid. From (A.13) we have that the equations for P and P ∗ are, respectively,
Since we are seeking planes that are orthogonal to a given vector h we must have
where r is a constant. Alternatively,
But since
we conclude that
and the desired planes are
We now show explicitly that
is a plane of support of E through the point
orthogonal to the vector h.
Noting Eq. (A.16) and related manipulations we can write (A.18) more usefully as
since
Now substituting from (A.19) in (A.20) we verify that x 0 lies on the plane described by (A.18). But x 0 lies also on E since
Moreover, if x ∈ E then
so that
Consequently,
which shows that (A.18) represents a plane of support for E. The plane parallel to that in (A.18) can be written, using the notation in (A.20), as
With x 0 as in (A.19) it is evident that
also lies on the plane. Moreover,
so that it lies on E as well. It remains only to show that E lies on one side of the plane in (A.22). To see this let x be any point in E. As before, we have
Consequently,
which shows that E lies on one side of the plane in (A.22). q.e.d.
Corollary A.1
The equation of the strip between the two parallel planes of support for E, say at x 0 and −x 0 + 2a, that are orthogonal to a vector h is given by
Proof
Obvious from the lemma.
Remark A.11
It is clear that the ellipsoid is contained in the strip above. Indeed, if we determine all strips between parallel planes of support, the ellipsoid E can be represented as the intersection of all such strips. Hence, a point x belongs to the ellipsoid E if and only if it is contained in the inter-section.
Remark A.12
If the ellipsoid is centered at zero to begin with then
and the results above are somewhat simplified. For such a case the strip is given by
and the parallel planes of support have the points x 0 and −x 0, respectively, in common with the ellipsoid
1.5 Multiple Comparison Tests—The S-Method
In this section we develop the S-method, first suggested by Scheffè (see [282, 283]) and thus named after him. The method offers a solution to the following problem: upon rejection of a hypothesis on a set of parameters to find the parameter(s) responsible for the rejection.
Consider the GLM under the standard assumptions and further assume normality for the errors. Thus the model is
and we have the OLS estimator
Let β ∗ be a subvector of β containing k ≤ n + 1 elements. Thus, in the obvious notation,
where Q ∗ is the submatrix of (X ′ X)−1 corresponding to the elements of β ∗. We are interested in testing, say,
as against the alternative
First, we recall that for the true parameter vector, say \( {\beta}_{\ast}^0 \),
where
and F k , T − n − 1 is a central F-distributed variable with k and T − n − 1 degrees of freedom.
The mechanics of the test are as follows. Given the level of significance, say α, we find a number, say F α , such that
In the terminology of this appendix we consider the ellipsoid E with center \( {\widehat{\beta}}_{\ast } \);
If the point specified by the null hypothesis lies in E, i.e., if
we accept H0, while if
we accept H1
Let us rewrite the ellipsoid slightly to conform with the conventions of this appendix. Thus
where
The test then is as follows:
In the previous discussion, however, we have established that a point belongs to an ellipsoid E if (and only if) it is contained in the intersection of the strips between all parallel planes of support. The strip between two parallel planes of support to E orthogonal to a vector h is described by
Hence a point, say \( {\overline{\beta}}_{\ast } \), obeys
if any only if for any vector \( h\in {\mathbb{E}}_k \) it obeys
We are now in a position to prove
Theorem A.1
Consider the GLM
under the standard assumptions, and suppose further that
Let
where β has n + 1 elements. Let β ∗ be a subvector of β containing k ≤ n + 1 elements and \( {\widehat{\beta}}_{\ast } \) its OLS estimator, so that
where Q ∗ is the submatrix of (X ′ X)−1 corresponding to the elements of β ∗. Further, let there be a test of the hypothesis
as against the alternative
Then the probability is 1 − α that simultaneously, for all vectors \( h\in {\mathbb{E}}_k \), the intervals
will contain the true parameter point, where
and F α is a number such that
F k , T − n − 1 being a central F-distributed variable with k and T − n − 1 degrees of freedom.
Proof
From the preceeding discussion we have determined that the mechanics of carrying out the F-test on the hypothesis above involves the construction of the ellipsoid E with center \( {\widehat{\beta}}_{\ast } \) obeying
where
and α is the specified level of significance. We accept
and we accept
Another implication of the construction above is that the ellipsoid E will contain the true parameter point with probability 1 − α. But a point lies in the ellipsoid above if it lies in the intersection of the strips
for all \( h\in {\mathbb{E}}_k \), where
Since the probability is 1−α that the ellipsoid contains the true parameter point, it follows that the probability is 1 − α that the intersection of all strips
for \( h\in {\mathbb{E}}_k \) will contain the true parameter point. Alternatively, we may say that the probability is 1 − α that simultaneously, for all vectors \( h\in {\mathbb{E}}_k \), the intervals
will contain the true parameter point. q.e.d.
Remark A.13
The result above is quite substantially more powerful than the usual F-test. If it is desired to test the hypothesis stated in the theorem we proceed to check whether the point \( {\overline{\beta}}_{\ast } \) lies in the ellipsoid, i.e., whether
If so we accept H0; if not we accept H1. In the latter case, however, we can only conclude that at least one element of β ∗ is different from the corresponding element in \( {\overline{\beta}}_{\ast } \). But which, we cannot tell. Nor can we tell whether more than one such element differs. This aspect is perhaps best illustrated by a simple example. Consider the case where
If the F-test rejects H0 we may still wish to ascertain whether we should accept:
or
or
The standard practice is to use the t-test on each of the relevant parameters. But proceeding in this sequential fashion means that the nominal levels of significance we claim for these tests are not correct. In particular, we shall proceed to test, e.g.,
only if the initial F-test leads us to accept
Then, the t-test above is a conditional test and the level of significance could not be what we would ordinarily claim. The theorem above ensures that we can carry out these tests simultaneously, at the α level of significance. Thus, consider the vectors
and define
If \( {\widehat{\sigma}}^2 \) is the OLS induced estimate of σ 2 we have
The intervals induced by the S-method are
where F α is a number such that
Remark A.14
The common practice in testing the hypotheses above is to apply the t-test seriatim. Let t α be a number such that
where t T − n − 1 is a central t-variable with T − n − 1 degrees of freedom. The intervals based on the t-statistic are
It is not correct to say that the true level of significance of tests based on these intervals is the stated one and we certainly cannot state that the probability is 1 − α that simultaneously the three intervals above contain the true parameter point.
Remark A.15
To make the comparison between intervals given by the S-method and those yielded by the t-statistic concrete, let us use a specific example. Thus, take
so that
Suppose that a sample yields
We can easily establish that any combination of estimates \( {\widehat{\beta}}_1,{\widehat{\beta}}_2 \) obeying
will result in rejection of the null hypothesis
Further, we obtain
The intervals based on the S-method are
The intervals based on (bilateral) t-tests, all at the nominal significance level of 5%, are
The reader should note that the invervals based on the S-method are appreciably wider than those based on bilateral t-tests. This is, indeed, one of the major arguments employed against the multiple comparisons test. In the general case, the comparison of the width of these two sets of intervals depends on the comparison of
Since
it follows that the comparison may also be said to rest on the difference
Moreover, it may be verified (even by a casual look at tables of the F-distribution) that the difference above for T − n − 1 in the vicinity of 30 grows with k; hence, it follows that the more parameters we deal with the wider the intervals based on the S-method, relative to those implied by the (bilateral) t-tests.
Remark A.16
It is clear that in the context of the example in Remark A.15 and basing our conclusion on bilateral t-tests, any estimates obeying
will lead us to accept
Any estimates obeying
will lead us to accept
while any statistics obeying
will lead us to accept
Using the S-method, however, would require for the cases enumerated above (respectively):
It is worth noting that if the parameter estimates were, in fact,
and \( {\widehat{\sigma}}^2 \) and Q ∗ as in Remark A.15, the relevant F-statistic would have been
Since, for α = . 05 , F α ; 2 , 30 = 3.32 the hypothesis
would have been rejected. If, subsequently, we were to use a series of bilateral t-tests each at the nominal level of significance of 5% we could not reject the hypothesis
since the estimates
will define the intervals
On the other hand, if we employ the S-method of multiple comparisons we could not reject
This is so since the estimates will define the intervals
and the conclusions reached by the two methods will differ.
In view of the fact that the nominal levels of significance of the t-test are incorrect, it might be better to rely more extensively, in empirical work, on the S-method for multiple comparisons.
Remark A.17
It is important to stress that the S-method is not to be interpreted as a sequential procedure, i.e., we should not think that the multiple tests procedure is to be undertaken only if the F-test rejects the null hypothesis, say
If we followed this practice we would obviously have a conditional test, just as in the case of the sequential t-tests. In such a context the multiple tests could not have the stated level of significance. Their correct significance level may be considerably lower and will generally depend on unknown parameters. In this connection see the exchange between H. Scheffè and R. A. Olshen [38].
The proper application of the S-method requires that the type of comparisons desired be formulated prior to estimation rather than be formulated and carried out as an afterthought following the rejection of the null hypothesis by the F-test.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Dhrymes, P. (2017). The General Linear Model II. In: Introductory Econometrics. Springer, Cham. https://doi.org/10.1007/978-3-319-65916-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-65916-9_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65914-5
Online ISBN: 978-3-319-65916-9
eBook Packages: Economics and FinanceEconomics and Finance (R0)