Abstract
In the two preceding chapters we have set forth, in some detail, the estimation of parameters and the properties of the resulting estimators in the context of the standard GLM. We recall that rather stringent assumptions were made relative to the error process and the explanatory variables. Now that the exposition has been completed it behooves us to inquire as to what happens when some, or all, of these assumptions are violated. The motivation is at least twofold. First, situations may, in fact, arise in which some nonstandard assumption may be appropriate. In such a case we would want to know how to handle the problem. Second, we would like to know what is the cost in terms of the properties of the resulting estimators if we operate under the standard assumptions that, as it turns out, are not valid. Thus, even though we may not know that the standard assumptions are, in part, violated we would like to know what is the cost in case they are violated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Actually more pervasive forms of dependence will materialize due to the budget restriction imposed on a household’s consumption activities and the utility maximization hypothesis.
- 2.
Note that if Σ is a positive matrix we can always write it as Σ = σ 2 Φ where σ 2 > 0 and Φ is positive definite. This involves no sacrifice of generality whatever. Only when we assert that Φ is known do we impose (significant) restrictions on the generality of the results.
- 3.
Such estimators are more appropriately called minimum chi-square (MCS) estimators.
- 4.
Not all random variables have density functions. Strictly speaking, this statement should be phrased “... a random variable having distribution function....” A distribution function need not be differentiable. Thus, the density function need not exist. But in this book all (continuous) random variables are assumed to have density functions.
- 5.
This section may be omitted without essential loss of continuity.
- 6.
Clearly, Φ must be specified a bit more precisely. If left as a general positive definite matrix then it would, generally, contain more than T (independent) unknown parameters, and thus its elements could not be consistently estimated through a sample of T observations.
- 7.
In this context only, and for notational convenience, the dimension of X is set at T × n; elsewhere, we shall continue to take X as T × (n + 1).
- 8.
Thank you to Professor David Hendry for this point.
- 9.
It is for this reason that A. Zellner who studied the problem of this section quite extensively termed it the problem of seemingly unrelated regressions [45].
- 10.
Heuristically we may approach the problem as follows: since e i2πs = 1 , s = 1 , 2 , …, we may write r 2T = 1 as r = e i2sπ/2T. In some sense this is a solution to the equation r 2T = 1, since if we raise both sides to the 2T power we get back the equation. Cancelling the factor 2 we get the solution e isπ/T , s = 0 , 1 , 2 , … , T. Extending the index s beyond T simply repeats the roots above.
- 11.
If one wished, one could seek to determine coefficients γ and δ such that u r + k + 2 − γu r + k + 1 − δu r satisfies conditions similar to those above.
References
Anderson, T. W. (1948). On the theory of testing serial correlation. Skandinavisk Aktuarietidskrift, 31, 88–116.
Anderson, T. W. (1971). The statistical analysis of time series. New York: Wiley.
Arnott, R. (1985). The use and misuse of consensus earnings. Journal of Portfolio Management, 12, 18–27.
Ashley, R., & Patterson, D. M. (2010). Apparent long memory in time series as an artifact of a time-varying mean: Considering alternatives to the fractionally integrated model. Macroeconomic Dynamics, 14, 59–87.
Bunn, D. (1989). Editorial: Forecasting with more than one model. Journal of Forecasting, 8, 161–166.
Cochrane, D., & Orcutt, G. H. (1949). Applications of least squares to relations containing autocorrelated error terms. Journal of the American Statistical Association, 44, 32–61.
Dhrymes, P. J. (1971). Distributed lags: Problems of estimation and formulation. San Francisco: Holden–Day.
Durbin, J., & Watson, G. S. (1950). Testing for serial correlation in least squares regression, I. Biometrika, 37, 408–428.
Sargan, J. D. (1964). Wages and prices in the United Kingdom: A study in econometric methodology. In P. E. Hart et al. (Eds.), Econometric analysis for National Economic Planning. London: Butterworths.
Author information
Authors and Affiliations
Appendices
These tables are reproduced with the kind permission of the publisher Marcel Dekker Inc., and the author H. D. Vinod. Tables 3.4, 3.5, and 3.6 first appeared in H. D. Vinod, “Generalization of the Durbin–Watson Statistic for Higher Order Autoregressive Processes,” Communications in Statistics, vol. 2, 1973, pp. 115–144.
Appendix
1.1 Durbin–Watson Theory
Here we explore, in somewhat greater detail, the issues involved in the derivation and use of the Durbin–Watson statistic.
Derivation. From Eq. (3.18) of this chapter we have
where it is assumed that
and
For notational simplicity put
and note that, since N is idempotent,
i.e., the two matrices
commute. We shall take advantage of this fact in order to greatly simplify the expression in (A.1). Because N is a symmetric idempotent matrix there exists an orthogonal matrix B such that
the identity matrix being of dimension equal to the rank of N, which is T − n − 1. Define
and partition its rows and columns conformably with respect to D, i.e.,
where C 22 is (T − n − 1) × (T − n − 1) and C 11 is (n + 1) × (n + 1).Observe that
However,
The relations above clearly imply
so that consequently
Since A is a positive semidefinite (symmetric) matrix, C 11 and C 22 will have similar properties. Let E i be the orthogonal matrix of characteristic vectors of C ii , i = 1 , 2, and Θ i be the (diagonal) matrix of characteristic roots of C ii , i = 1 , 2, i.e.,
It is clear that
are (respectively) the matrices of characteristic vectors and roots for the matrix C. Thus
Bearing in mind what C is we have
From
we also see that
Defining
we note that
i.e., Q is orthogonal; moreover, (A.4) and (A.5) imply that
where Θ is the diagonal matrix of the characteristic vectors of NAN.
If we put
we note that ξ ∼ N(0, I) and
Bounds on Characteristic Roots
The difficulty with the representation in (A.7) is that the numerator depends on the characteristic roots of NAN and thus, ultimately, on the data. A way out of this is found through the bounds, d L and d U, discussed earlier in the chapter. Let us now see how these bounds are established. We begin with
Lemma A.1
The characteristic roots of the matrix A, as exhibited in Eq. (3.16), are given by
Proof
We can write
where
Since
where
it follows that if we determine the characteristic roots of A ∗, say
then the (corresponding) characteristic roots of A are given by
If μ and w are, respectively, a characteristic root and the corresponding characteristic vector of A ∗, they satisfy the following set of equations:
In order to obtain an expression for μ and the elements of w, we note that the second set above may be rewritten as
which is recognized as a second-order difference equation. The desired characteristic root and vector are related to the solution of the equation in (3.13). Its characteristic equation is
whose solutions are
Since
we conclude that
Thus for notational simplicity we shall denote the two roots by
From the general theory of solution of difference equations (see Sect. 2.5 of Mathematics for Econometrics), we know that the solution to (A.13) may be written as
where c 1 and c 2 are constants to be determined by Eqs. (A.10) and (A.12). From (A.10) we find
After considerable simplification this yields
which implies
Substituting in (A.12) and canceling c 1 yields
which implies r 2T = 1, i.e., the solutions to (A.17) are the 2T roots of unity, plus the root r = 1. As is well known, the 2T roots of unityFootnote 10 are given by, say,
The roots of the matrix are, thus,
Since
it follows that the only distinct roots correspond to
Moreover, the root
is inadmissible since the characteristic vector corresponding to it is
is inadmissible. Consequently, the characteristic roots of the matrix A ∗ are given by
and the corresponding characteristic roots of A by
Corollary A.1
Let λ s , as in (A.19), be the sth characteristic root of A. Then
is the corresponding characteristic vector.
Proof
We first note that if μ s , as in (A.18), is the sth characteristic root of A ∗ then
is the corresponding characteristic vector. If we choose
then
and it can be shown easily that
Thus, we see that the vectors corresponding to the roots r s are mutually orthogonal. If we wished we could have taken
in which case we would have determined
and thus ensured that
Let
the elements being as just defined above. We see that
where
But (A.20) shows that W is the matrix of characteristic vectors of A. q.e.d.
Remark A.1
Notice that since the roots above may be defined as
and
we can conclude that
Notice further that they are arranged in increasing order of magnitude, i.e.,
Let us now turn to the relation between the roots of A, as established above, and those of
Since (X ′ X)−1 is positive definite there exists a nonsingular matrix G such that
Define
where W is the matrix of characteristic vectors of A. We have first.
Lemma A.2
The matrix
is a T × (n + 1) matrix of rank (n + 1), where G is as in (A.21) and W is the matrix of characteristic vectors of A. Moreover, its columns are mutually orthogonal.
Proof
The assertion that P is T × (n + 1) is evident. We further note
Now consider the roots of NAN, i.e., consider
But
Hence the roots of NAN are exactly those of
where P is as defined in (A.22). It turns out that we can simplify this aspect considerably. Thus,
Lemma A.3
Let p ⋅i be the ith column of P and let
Then
Proof
since the columns of P are orthogonal. Moreover, the P i are symmetric idempotent, i.e.,
It follows therefore that
Since
we conclude
A very useful consequence of the lemma is that the problem may now be posed as follows: what is the relation of the roots of
to those of A, i.e., to the elements of Λ. Moreover, since
it follows that the problem may be approached recursively, by first asking: what are the relations between the elements of Λ and the roots of P1ΛP1? If we answer that question then we have automatically answered the question: what are the relations between the roots of
and those of
Hence, repeating the argument we can determine the relation between the roots of NAN and those of A. Before we take up these issues we state a very useful result.
Lemma A.4
Let D be a nonsingular diagonal matrix of order m. Let α be a scalar and a , b be two m-element column vectors, and put
Then
Proof
See Proposition 31 of Mathematics for Econometrics.
Lemma A.5
The characteristic roots of
arranged in increasing order, are the solution of
and obey
Proof
The characteristic roots of P 1ΛP 1 are the solutions of
Taking
applying Lemma A.4, and noting that
we conclude
where
We note that the characteristic equation of P 1ΛP 1 as exhibited in the two equations above is a polynomial of degree T. Since P 1ΛP 1 will, generally, be of rank T − 1, the polynomial equation
will not have a zero root. Indeed,
unless
We remind the reader that in the preceding we employ notation
so that the roots of A are arranged as
Now the roots of
are the one obvious zero root (associated with the factor θ) and the T − 1 (nonzero) roots of
But for any r ≥ 2,
provided p r1 ≠ 0. Assuming this to be so we have that if, say, f(λ r ) > 0, then f (λ r + 1) < 0. Thus, between the two roots of A , λ r and λ r + 1, lies a root of P 1ΛP 1. Denote such roots by
What the preceding states is that
The following Lemma is also important in the chain of our argumentation.
Lemma A.6
Let
Then Q j has at least j zero roots.
Proof
By definition
and, for k ≥ j, we have
which means that Q j must have at least j zero roots. q.e.d.
Lemma A.7
Let Q j − 1 be defined as in Lemma A.6. Let M j − 1 , Θ(j − 1) be its associated matrices of characteristic vectors and roots respectively. Let Q j and Θ(j) be similarly defined. Then
Proof
The first assertion is a restatement of the conclusion of Lemma A.6. For the second, consider
where
We note that
Thus
where now
Since we know that ψ(θ) has (at least) j zero roots we therefore know that f(θ) must have (at least) j − 1 zero roots. Hence
which need not imply
Consequently, we can write
and, for k ≥ j, we have
In general,
provided \( {\overline{p}}_{kj}\ne 0 \). Thus if, e.g., \( f\left({\theta}_k^{\left(j-1\right)}\right)>0 \), then
Consequently, a root of Q j , say \( {\theta}_{k+1}^{(j)} \), lies between \( {\theta}_k^{\left(j-1\right)} \) and \( {\theta}_{k+1}^{\left(j-1\right)} \). This is so since if
then
Thus, the first nonzero root of Q j , viz., \( {\theta}_{j+1}^{(j)} \) obeys
Consequently, we have established
We may now prove.
Theorem A.1
Let
be the characteristic roots of Λ arranged as
Let
be the roots of NAN similarly arranged in increasing order. Then, the following is true:
Proof
The first part of the theorem is evident. For the second part we note that from Lemma A.5 we have
From Lemma A.7 we have that
Again using Lemma A.7,
Combining (A.26) and (A.28) we conclude
Let us now consider certain special cases. Thus, suppose k (<n + 1) columns of X are linear combinations of k characteristic vectors of A, say (for definiteness) those corresponding to the k smallest characteristic roots of A. We shall see that this type of X matrix will have an appreciable impact on the bounds determined in Theorem A.1.
The X matrix we deal with obeys
where B is a nonsingular k × k matrix and W 1 is the matrix containing the characteristic vectors of A corresponding to its k smallest roots. Retracing our argument in the early part of this appendix we note that, in this case,
where C is defined by
Remark A.2
The matrix C will always exist unless X 2 is such that
or is not of full rank. This, however, is to be ruled out since X 2 is not a linear transformation of the first k characteristic vectors of A.
The matrix P of (A.22) is now
where
We verify that P ∗ is (T − k) × (n + 1 − k) and obeys
Hence
We may now state
Theorem A.2
Assume the conditions of Theorem A.1 and, in addition, that the data matrix of the GLM is
where B is k × k nonsingular and W 1 is the T × k matrix containing the characteristic vectors of A corresponding to, say, the k smallest characteristic roots. Then, the following is true regarding the relation between the roots θ i , i = 1 , 2 , … , T, of NAN and λ i , i = 1 , 2 , … , T, of A:
Proof
The roots of NAN are exactly those of
For the special case under consideration,
where I is k × k and P ∗ is (T − k) × (n + 1 − k), its columns being orthogonal. Defining
where
we have that the characteristic equation of NAN is, in this special case,
But it is evident that (A.32) has k zero roots and that the remaining roots are those of
the identity matrices in (A.33) being of order T − k. Let the roots of (A.33) be denoted by
But the problem in (A.33) is one for which Theorem A.1 applies. We thus conclude that
This is so since
is a (T − k) × (T − k) idempotent matrix of rank T − n − 1. Hence
must have (at least)
zero roots. Defining, further, the elements of Λ∗ by
and applying again Theorem A.1 we conclude
Collecting results and translating to the unstarred notation we have
Remark A.3
If k of the columns of X are linear transformations of k characteristic vectors of A—not necessarily those corresponding to the smallest or largest k characteristic roots—we proceed exactly as before, except that now we should renumber the roots of A so that
and
correspond to the specified characteristic vectors of A involved in the representation of the specified columns of X; without loss of generality we may take the latter to be the first k columns. The remaining roots we arrange in increasing order of magnitude, i.e.,
Proceeding as before we will obtain a relation just as in (A.32), from which we shall conclude that the equation there has k zero roots and that the remaining roots are those of the analog of (A.33). The elements of the matrix Λ∗ will be, in the present case,
Putting
we will thus conclude that the roots of NAN, say θ i , obey
where, of course,
Evidently, if we have as in the case of Theorem A.2 that the k roots are the k smallest characteristic roots then
and the bounds are exactly as before. On the other hand, if the characteristic vectors in question are those corresponding to the first and last k − 1 roots of A, i.e., to λ 1 and λ T − k + 2 , λ T − k + 3 , … , λ T , then
Thus the bounds become
For the special case
(A.37) yields
which implies
On the other hand, (A.39) yields
which implies
Remark A.4
The effect of having k vectors of X expressible as linear combinations of k characteristic vectors of A is to make the bounds on the roots of NAN tighter.
Remark A.5
Referring to Theorem A.1 we note that the smallest characteristic root of A is
One can easily verify that
is a characteristic vector corresponding to this root. But in most cases the GLM will contain a constant term, and hence the appropriate bounds can be determined quite easily from (A.37) to be
Remark A.6
In the previous chapter we considered the special case where X is a linear transformation of n + 1 of the characteristic vectors of V −1. It will be recalled that in such a case the OLS and Aitken estimators of the parameter vector β will be the same. It might also be noted in passing that if u is a T-element random vector having the joint density
where K is a suitable constant, α > 0 , D is positive definite, A is symmetric, and γ is scalar such that D + γA is a positive definite matrix, then the uniformly most powerful test of the hypothesis
as against
is provided by r < r ∗ where
and r ∗ is a suitable constant, determined by the desired level of significance. Here it is understood that we deal with the model
and that the columns of X are linear combinations of n + 1 of the characteristic vectors of A. In the definition above we have
This result is due to Anderson [1, 2].
It would appear that for the cases we have considered in this book (i.e., when the errors are normally distributed), taking
and A as in Eq. (3.16) of the chapter we are dealing with autoregressive errors obeying
Thus, testing
as against
is equivalent, in the context of (A.45) and (A.46), to testing
as against
Thus, to paraphrase the Anderson result: if we are dealing with a GLM whose error structure obeys (A.46) then a uniformly most powerful (UMP) test for the hypothesis (A.47a) will exist when the density function of the error obeys (A.44) and, moreover, the data matrix X is a linear transformation of an appropriate submatrix of the matrix of characteristic vectors of A. Furthermore, when these conditions hold the UMP test is the Durbin–Watson test, which uses the d-statistic computed from OLS residuals, as defined in Eq. (3.17) of this chapter. Examining these conditions a bit more closely, however, shows a slight discrepancy. If we make the assignments in (A.45) then
But the (inverse of the) covariance matrix of the error terms obeying (A.46) with the ε’s i.i.d. and normal with variance σ 2 would be
A comparison of the two matrices shows that they differ, although rather slightly, in the upper left-and lower right-hand corner elements. They would coincide, of course, when
We note, further, that
Hence, if W is the matrix of characteristic vectors of A then
This shows:
-
(a)
the characteristic roots of the matrix in (A.48) are given by ψ i = (1 − p)2 + ρλ i , i = 1 , 2 , … , T, where λ i are the corresponding characteristic roots of A;
-
(b)
if W is the matrix of characteristic vectors of A then it is also that of the matrix in (A.48).
Remark A.7
What implications emerge from the lengthy remark regarding tests for autocorrelation? We have, in particular, the following:
-
(i)
in a very strict sense, we can never have UMP tests for the case we have considered in this chapter since the matrix of the quadratic form in (A.44) subject to the parameteric assignments given in (A.45) is never the same as the (inverse of the) covariance matrix of the error process in (A.46) when the ε’s are i.i.d. and normal with mean zero and variance σ 2 The difference, however, is small—the (1, 1) and (T, T) elements differ by −ρ(1 − ρ). Thus, the difference is positive when ρ < 0 and negative when ρ > 0. It vanishes when ρ = 0 or ρ = 1;
-
(ii)
if we are prepared to ignore the differences in (i) and thus consider the roots and vectors of (1 − ρ)2 I + ρA and V −1 as the same, then a UMP test will exist and will be the Durbin–Watson test only in the special case where the data matrix X is a linear transformation of n + 1 of the characteristic vectors of A—and hence of V −1;
-
(iii)
When X is a linear transformation of n + 1 of the characteristic vectors of V −1, as we established in the current chapter, the OLS and Aitken estimators of β coincide.
Remark A.8
The preceding discussion reveals a very interesting aspect of the problem. Presumably, we are interested in testing for autocorrelation in the error process, because if such is the case OLS will not be an efficient procedure and another (efficient) estimator is called for. The test utilized is the Durbin–Watson test. Yet when this test is optimal, i.e., a UMP test, OLS is an efficient estimator—hence the result of the test would not matter. On the other hand, when the results of the test would matter, i.e., when in the presence of autocorrelation OLS is inefficient, the Durbin–Watson test is not UMP.
Bounds on the Durbin–Watson Statistic
Let us now return to the problem that has motivated much of the discussion above. We recall that for testing the null hypothesis
as against the alternative
where it is understood that the error terms of the GLM obey (A.46) and the ε’s are i.i.d. normal random variables with mean zero and variance σ 2, we use the test statistic
where ξ is a T × 1 vector obeying
and the θ i are the characteristic roots of NAN arranged in increasing order. Noting that
we may thus rewrite the statistic more usefully as
Considering now the bounds as given by Theorem A.2 let us define
and thus conclude
Remark A.9
An important byproduct of the derivations above is that the bounds d L and d U do not depend on the data matrix X. It is the tabulation of the significance points of d L and d U that one actually uses in carrying out tests for the presence of autocorrelation.
Remark A.10
Consider the special cases examined in Remark A.3. If X is a linear transformation of the n + 1 characteristic vectors corresponding to the n + 1 smallest characteristic roots of A, then
and hence, in this case,
On the other hand, when the n + 1 characteristic vectors above correspond to the smallest (zero) and the n largest characteristic roots of A, then the condition in (A.42) holds. Hence, in this special case, given the bounds in (A.39) we have
In these two special cases the test for autocorrelation may be based on the exact distribution of the test (Durbin–Watson) statistic, since the relevant parts of the distribution of d L and d U have been tabulated.
Use of the Durbin–Watson Statistic
Let F L(⋅) , F(⋅), and F U(⋅) be (respectively) the distribution functions of d L , d, and d U and let r be a point in the range of these random variables. Then by definition
Now, it is clear that
since
But the converse is not true. Similarly note that
but that the converse is not true.
This means that
which in turn implies
Combining (A.53), (A.54), and (A.55) we have
But this immediately suggests a way for testing the autocorrelation hypothesis. Let r L be a number such that
where α is the chosen level of significance, say
If F L( ⋅ ) were the appropriate distribution function and d were the Durbin–Watson (hereafter abbreviated D.W.) statistic, in a given instance, the acceptance region would be
and the rejection region
The level of significance of the test would be α. What is the consequence, for the properties of the test, of the inequalities in (A.56)? Well, since
it follows that the number r ∗ such that
obeys
Thus, in using the acceptance region in (A.57a) we are being too conservative, in the sense that we could have
Conversely, let r U be a number such that
Arguing as before we establish
If we define the rejection region as
we again see that we reject conservatively, in the sense that we could have
The application of the D.W. test in practice makes use of both conditions (A.57a, A.57b and A.58), i.e., we accept
if, for a given statistic d, (A.57a) is satisfied, and we accept
only if (A.58) is satisfied. In so doing we are being very conservative, in the sense that if
and if
A consequence of this conservatism, however, is that we are left with a region of indeterminacy. Thus, if
then we have no rigorous basis of accepting either
If the desired test is
as against
we proceed somewhat differently. Let α again be the level of significance and choose two numbers say r L and r U, such that
In view of (A.56), the number r ∗ such that
obeys
Now the acceptance region is defined as
while the rejection region is defined as
Just as in the preceding case we are being conservative, and the consequence is that we have, again, a region of indeterminacy
Let us now recapitulate the procedure for carrying out a test of the hypothesis that the error terms in a GLM are a first order autoregressive process.
-
(i)
Obtain the residuals
-
(ii)
Compute the D.W. statistic
-
where A is as defined in Eq. (3.16) of the chapter.
-
(iii)
Choose the level of significance, say α.
-
(a)
If it is desired to test
-
(a)
-
as against
-
determine, from the tabulated distributions two numbers r L , r U such that F L(r L) = 1 − α , F U(r U) = 1 − α. If d ≤ r L, accept ρ = 0. If d ≥ r U, accept ρ < 0. If r L < a < r U, the result of the test is inconclusive and other means must be found for determining whether ρ = 0 or ρ < 0 is to be accepted as true.
-
(b)
If it is desired to test
-
(b)
-
as against
-
with level of significance α, determine from the tabulated distributions two numbers, say r L and r U, such that F L(r L) = α , F U(r U) = α. If d ≥ r U, accept the hypothesis ρ = 0. If d ≤ r L, accept the hypothesis ρ > 0. If r L < d < r U, the result of the test is indeterminate.
Remark A.11
Tabulations of F L(⋅) and F U(⋅) exist typically in the form of 5% significance points (i.e., values of r L and r U) for varying numbers of observations and explanatory variables (exclusive of the constant term). Such tabulations are constructed from the point of view of the test
as against
It is suggested that when we are interested in the hypothesis
as against
we use 4-d as the test statistic and the r L , r U significance points from the tabulated distributions.
Remark A.12
The existing tabulations assume that the data matrix X contains one column that is (a multiple of) a characteristic vector of A. As we have remarked earlier the vector
is the characteristic vector corresponding to the smallest (zero) characteristic root of A. Consequently, the bounds for the roots of NAN for this case are
and the tabulations are based on
The reader should note, therefore, that if the GLM under consideration does not contain a constant term, then the tabulated percentage points of the D.W. statistic are not applicable. This is so since in this case we are dealing with k = 0 and the lower bound should be defined as
We observe that, since λ 1 = 0,
Thus, the tabulated distribution is inappropriate for the case of the excluded constant term. This can be remedied by running a regression with a constant term and carrying out the test in the usual way. At the cost of being redundant let us stress again that there is nothing peculiar with the D.W. statistic when the GLM does not contain a constant term. It is merely that the existing tabulations are inappropriate for this case in so far as the lower bound is concerned; the tabulations are quite appropriate, however, for the upper bound.
Remark A.13
At the end of this volume we present more recent tabulations of the bounds of the D.W. statistics giving 1%, 2.5%, and 5% significance points. As in the earlier tabulations it is assumed that the GLM model does contain a constant term; thus, these tabulations are inappropriate when there is no constant term.
Remark A.14
Two aspects of the use of D.W. tabulations deserve comment. First, it is conceivable that the test statistic will fall in the region of indeterminancy and hence that the test will be inconclusive. A number of suggestions have been made for this eventuality, the most useful of which is the use of the approximation
where a and b are fitted by the first two moments of d. The virtue of this is that the test is based on existing tabulations of d U. The reader interested in exploring the details of this approach is referred to Durbin and Watson [98].
Remark A.15
An alternative to the D.W. statistic for testing the hypothesis that the error terms of a GLM constitute a first-order autoregression may be based on the asymptotic distribution of the natural estimator of ρ obtained from the residuals. Thus, e.g., in the GLM
let \( {\tilde{u}}_t,\kern0.5em t=1,2,\dots \), T, be the OLS residuals. An obvious estimator of ρ is
obtained by regressing \( {\tilde{u}}_t \) on \( {\tilde{u}}_{t-1} \) and suppressing the constant term. It may be shown that if the GLM does not contain lagged dependant variables then, asymptotically,
Consequently, if the sample is reasonably large a test for the presence of autoregression (of the first order) in the errors may be carried out on the basis of the statistic
which, under the null hypothesis (of no autoregression) will have the distribution
1.2 Gaps in Data
It frequently happens with time series data that observations are noncontiguous. A typical example is the exclusion of a certain period from the sample as representing nonstandard behavior. Thus, in time series studies of consumption behavior one usually excludes observations for the period 1941–1944 or 1945; this is justified by noting the shortages due to price controls during the war years. In such a case we are presented with the following problem: if we have a model with autoregressive errors, what is the appropriate form of the autoregressive transformation and of the D.W. statistic when there are gaps in the sample?
We shall examine the following problem. Suppose in the GLM
observations are available for
so that at “time” r there is a gap of k observations. How do we obtain efficient estimators
-
(a)
when the error process obeys
-
(b)
when the error process obeys
Moreover, how in the case (a) do we test the hypothesis ρ = 0? To provide an answer to these problems, we recall that in the standard first-order auto-regression the estimation proceeds, conceptually, by determining a matrix M such that Mu consists of uncorrelated—and in the case of normality, of independent—elements, it being understood that u is the vector of errors of the GLM.
What is the analog of M for the process
when observations for t = r + 1 , r + 2 , …, r + k are missing? The usual transformation, through the matrix M referred to above, yields
This is not feasible in the present case since the observations u t for r + 1 ≤ t ≤ r + k are missing; in particular, the observation following u r is u r + k + 1. Remembering that the goal is to replace u by a vector of uncorrelated elements, we note that for 1 ≤ j ≤ k + 1
Thus
We observe that
and that
has variance σ 2 for t ≤ r and t ≥ r + k + 2; thus, we conclude
also has variance σ 2 and is independent of the other terms. Hence the matrix
where 1/s = [(1 − ρ 2)/(1 − ρ 2k)]1/2, implies that M 1 u has covariance matrix σ 2 I and mean zero.
Hence the autoregressive transformation is of the form
with the corresponding transformation on the dependent and explanatory variables. Estimation by the search method proceeds as before, i.e., for given ρ we compute the transformed variables and carry out the OLS procedure. We do this for a set of ρ-values that is sufficiently dense in the interval (−1, 1). The estimator corresponds to the coefficients obtained in the regression exhibiting the smallest sum of squared residuals. Aside from this more complicated transformation, the situation is entirely analogous to the standard case where no observations are missing.
In their original paper Cochrane and Orcutt (C.O.) did not provide a procedure for handling the missing observations case (see [65]). Carrying on in their framework, however, one might suggest the following. Regress the dependent on the explanatory variables, neglecting the fact that some observations are missing. Obtain the residuals
Obtain
where
indicates that the terms for t = r + 1 , r + 2 , … , r + k have been omitted. Given this \( \tilde{\rho} \) compute
and similarly for the explanatory variables. Carry out the regression with the transformed variables. Obtain the residuals and, thus, another estimate of ρ. Continue in this fashion until convergence is obtained.
If one wishes to test for the presence of first-order autoregression in the residuals then, following the initial regression above, compute the “adjusted” D.W. statistic
Clearly, the procedures outlined above apply for a data gap at any point r ≥ 1. Obviously if r = 1 or if r = T − k then the “gap” occasions no problems whatever since all sample observations are contiguous.
Remark A.16
It should be pointed out that the statistic in (A.62) cannot be used in conjunction with the usual D.W. tables. This is so since
just as in the standard D.W. case. However, the matrix A ∗ is not of the same form as the matrix A in the usual case; in fact, if we put A r , A T − k − r for two matrices of order r , T − k − r respectively and of the same form as in the standard case, then
This makes it clear, however, that A ∗ has two zero roots while the matrix A in the usual case has only one zero root. Thus, to test for first-order autoregression in the “gap” case it is better to rely on asymptotic theory if the sample is moderately large. The estimator \( \tilde{\rho} \), as given in the discussion of the C.O. procedure, may be easily obtained from the OLS residuals and, asymptotically, behaves as
To close this section we derive the appropriate autoregressive transformation when the error process is a second-order autoregression and there is a gap in the data. Such an error specification is often required for quarterly data.
As in the earlier discussion, what we wish is to represent the observation immediately following the gap in terms of the adjacent observations(s). To be concrete, suppose
where
is a sequence of i.i.d. random variables with mean zero and variance σ 2. We require the process above to be stable, i.e., we require that for σ 2 < ∞ the variance of the u t ’s also be finite. Introduce the lag operator L such that, for any function x t ,
and, in general,
For s = 0 we set L 0 = I, the identity operator. Polynomials in the lag operator L are isomorphic to polynomials in a real (or complex) indeterminate, i.e., ordinary polynomials like
where the a i , i = 0 , 1 , …, n are real numbers and t is the real (or complex) indeterminate (i.e., the “unknown”). This isomorphism means that whatever operations may be performed on ordinary polynomials can also be performed in the same manner on polynomials in the lag operator L. The reader desiring greater detail is referred to Dhrymes [9, Chaps. 1 and 2]. Noting that
we can write
Note also that
for
Thus, the process in (A.63) can also be represented as
But I/(I − λ 1 L) behaves like
Hence
Assuming that ∣λ 2∣ ≤ ∣λ 1∣ < 1, we can rewrite the double sum above as
where
If in the sequence
there is a gap of length k at t = r, it means that u r is followed by u r + k + 1 , u r + k + 2, and so on. For observations u t , 2 < t ≤ r, we know that the transformation
yields i.i.d. random variables, viz., the ε’s. For t > r, however, we encounter a difficulty. The observation following u r is u r + k + 1. Thus, blindly applying the transformation yields
But the expression above is not ε r + k + 1. So the problem may be formulated as: what coefficients should we attach to u r and u r − 1 in order to render the difference a function only of {ε t : t = r + 1, r + 2, …, r + k + 1}?
It is for this purpose that the expression in (A.65) is required. We note
But, putting j = i − (k + 2) yields
Similarly,
Thus,
and we require
This is satisfied by the choice
It is, of course, apparent that
and thus
Similarly, if we wish to find quantities γ and δ such thatFootnote 11
is a function of at most only {ε t : t = r + 1, r + 2, …, r + k + 2}, we conclude, following the same procedure as above, that
Thus,
Put
To produce the transformation desired we need only derive an expression for the variances and covariances of v 1 , v 2, those of v r + k + 1 , v r + k + 2, as well as an expression for the coefficients c i , appearing immediately below (A.65), involving only ρ 1 and ρ 2. Now,
where
Moreover,
which yields
Thus
In addition,
Now, if x and y are two random variables, it is well known that
is uncorrelated with x, where, obviously,
Consequently,
is uncorrelated with u 1. Similarly,
is uncorrelated with v r + k + 1.
Define the lower triangular matrix
where
To express the coefficients c i in terms of ρ 1 , ρ 2 (instead of λ 1 , λ 2, as we did earlier) we proceed as follows. Consider the recursive relation (neglecting the ε’s)
If we put, in general,
we obtain the recursions
with the “initial conditions”
But from (A.66) and (A.67) we easily see that
where the c s ’s are exactly the quantities defined just below in, (A.65). Computing, recursively, a few of the coefficients c s and taking the initial conditions (A.69) into account, we find
or, in general,
where [s/2] is the integral part of s/2;
and for all s
while for even s
and the “initial” conditions are
The recursion in (A.51) together with the conditions just enumerated completely describes the coefficients
and thus completely determines the elements of the matrix S in terms of the parameters ρ 1 , ρ 2. What this accomplishes is the following. Suppose that
and that the sequence
is one of i.i.d. random variables with mean zero and variance σ 2.
Let
Then S u is a vector of uncorrelated random elements whose mean is zero and whose (common) variance is σ 2 If we assert that the elements of the ε-sequence obey, in addition,
then
where I is the identity matrix of order T − k.
One could, clearly, estimate the parameters of the GLM, exhibiting a gap of length k, and whose errors are a second-order autoregression, by a search procedure applied to the transformed data Sy , SX, where S is as defined in (A.68). We may summarize the discussion of this section in
Theorem A.3
Consider the GLM
where y is (T − k) × 1 , X is (T − k) × (n + 1) , u is (T − k) × 1, and there is a gap of length k in the observations as follows:
Provided
-
(a)
rank(X) = n + 1,
-
(b)
(p)lim T→∞(X ′ X/T) is positive definite,
-
(c)
E(u| X) = 0,
-
the following is true.
-
(i)
If, in addition, Cov(u| X) = σ 2 I, then the OLS estimator of β is consistent, unbiased, and efficient.
-
(ii)
If u t = ρu t − 1 + ε t and {ε t : t = 0, ±1, ±2, …} is a sequence of i.i.d. random variables with mean zero and variance σ 2, the OLS estimator \( \tilde{\beta}={\left({X}^{\prime }X\right)}^{-1}{X}^{\prime }y \) is unbiased, consistent, but inefficient. The (feasible) efficient estimator is obtained as
-
where M 1 is a (T − k) × (T − k) matrix with elements that are all zero except:
-
The matrix \( {\tilde{M}}_1 \) is obtained by substituting for ρ its estimator \( \tilde{\rho} \) . The latter may be obtained by the usual search method applied to the transformed model
-
or by a suitable extension of the C.O. method. In the observation gap model above the OLS residuals, say \( {\tilde{u}}_t \) , may be used to compute a D.W.-like statistic
-
The usual tabulations, however, for the D.W. statistic are not appropriate in this instance and one should, if the sample is moderately large, apply asymptotic theory.
-
(iii)
If u t = ρ 1 u t − 1 + ρ 2 u t − 2 + ε t , the sequence {ε t : t = 0, ±1, ±2, …}, being one of i.i.d. random variables with mean zero and variance σ 2, and if the process is stable, i.e., the roots of z 2 − ρ 1 z − ρ 2 = 0 are less than unity in absolute value, then the OLS estimator is unbiased and consistent but is is inefficient. The (feasible) efficient estimator is obtained by the search method, which minimizes
-
over the range ρ 1 ∈ (−2, 2) , ρ 2 ∈ (−1, 1). The estimator is
-
where S is a (T − k) × (T − k) matrix all of whose elements are zero except:
-
The coefficients c s above are given by
-
where [s/2] denotes the integral part of s/2, and
-
where
-
and for even s
Tables for Testing Hypotheses on the Autoregressive Structure of the Errors in a GLM
These tables are reproduced with the kind permission of the publisher Marcel Dekker Inc., and the author H. D. Vinod. Tables 3.4, 3.5, and 3.6 first appeared in H. D. Vinod, “Generalization of the Durbin–Watson Statistic for Higher Order Autoregressive Processes,” Communications in Statistics, vol. 2, 1973, pp. 115–144.
These tables are meant to facilitate the test of hypotheses regarding the autoregressive properties of the error term in a general linear model containing a constant term but no lagged dependent variables. The order of the autoregression can be at most four.
Tables 3.1, 3.2, and 3.3 contain upper and lower significance points at the 1%, 2.5%, and 5% level respectively, for testing that the first-order auto-correlation is zero. Table 3.4 contains upper and lower significance points at the 5% level for testing that the second-order autocorrelation is zero.
Similarly, Tables 3.5 and 3.6 contain upper and lower significance points at the 5% level for tests on the third- and fourth-order autocorrelation.
Perhaps a word of explanation is in order regarding their use. The tables have been constructed on the basis of the properties of the statistics
where the \( {\widehat{u}}_t \) are the residuals of a GLM containing a constant term but not containing lagged dependent variables. If X is the data matrix of the GLM. then
where
and it is assumed that
Hence, if we wish to test a first-order hypothesis i.e., that in
we have
as against
we can use Tables 3.1, 3.2, or 3.3 exactly as we use the standard Durbin–Watson tables—indeed, they are the same.
If we wish to test for a second-order autoregression of the special form
we can do so using the statistic d 2 and Table 3.4 in exactly the same fashion as one uses the standard Durbin–Watson tables.
Similarly, if we wish to test for a third-order autoregression of the special type
or for a fourth-order autoregression of the special type
we may do so using the statistics d 3 and d 4 and Tables 3.5 and 3.6 respectively.
Again, the tables are used in the same fashion as the standard Durbin–Watson tables, i.e., we accept the hypothesis that the (relevant) autocorrelation coefficient is zero if the statistic d 3 or d 4 exceeds the appropriate upper significance point, and we accept the hypothesis that the (relevant) autocorrelation coefficient is positive if the statistic d 3 or d 4 is less than the lower significance point.
Now, it may be shown that if we have two autoregressions of order m and m + 1 respectively and if it is known that these two autoregressions have the same autocorrelations of order 1 , 2 , … , m, then a certain relationship must exist between the coefficients describing these autoregressions. In particular, it may be shown that if for a fourth-order autoregression, say
the autocorrelations of order 1, 2, 3 are zero, then
and thus the process is of the special form
i.e., a 44 is the autocorrelation of order 4. Similarly, if for the third-order autoregression
it is known that the first two autocorrelations are zero, then
so that the process is of the special form
i.e., a 33 is the autocorrelation of order 3. Finally, if for the second-order autoregression
it is known that the first-order autocorrelation is zero then
so that the process is of the special form
i.e., a 22 is the autocorrelation of order 2.
Vinod [?] uses these relations to suggest a somewhat controversial test for the case where we wish to test for autoregression in the error term of the GLM and are willing to limit the alternatives to, at most, the fourth-order autoregression
The proposed test is as follows. First test that the first-order autocorrelation is zero, i.e.,
as against
using Tables 3.1, 3.2, or 3.3. If H01 is accepted then test
as against
If H02 is also accepted then test
as against
If H03 is accepted then test
as against
There are a number of problems with this: first, the level of significance of the second, third, and fourth tests cannot be the stated ones, since we proceed to the ith test only conditionally upon accepting the null hypothesis in the (i − 1)th test; second, if at any point we accept the alterative, it is not clear what we should conclude.
Presumably, if we accept H12 (at the second test) we should conclude that the process is at least second order, make allowance for this, in terms of search or Cochrane-Orcutt procedures, and then proceed to test using the residuals of the transformed equation.
An alternative to the tests suggested by Vinod [?] would be simply to regress the residuals \( {\widehat{u}}_t \) on \( {\widehat{u}}_{t-1},{\widehat{u}}_{t-2},{\widehat{u}}_{t-3},{\widehat{u}}_{t-4} \), thus obtaining the estimates
Since we desire to test
as against
where a = (a 41, a 42, a 43, a 44)′, we may use the (asymptotic) distribution of \( \widehat{a} \) under the null hypothesis as well as the multiple comparison test, as given in the appendix to Chapter??. Thus, testing the null hypothesis of no auto-correlation in the errors, i.e.,
as against
is best approached through the asymptotic distribution, given by
This implies the chi-square and associated multiple comparison tests: accept H0 if \( T{\widehat{a}}^{\hbox{'}}\widehat{a}\le {\chi}_{\alpha; 4}^2 \), where \( {\chi}_{\alpha; 4}^2 \) is the α significance point of a chi-square variable with four degrees of freedom; otherwise reject H0 and accept any of the hypotheses whose acceptance is implied by the multiple compassion intervals \( -{\left({\chi}_{\alpha; 4}^2{h}^{\prime }h\right)}^{1/2}\le \sqrt{T}{h}^{\prime}\widehat{a}\le {\left({\chi}_{\alpha; 4}^2{h}^{\prime }h\right)}^{1/2} \).
Finally, we illustrate the use of these tables by an example. Suppose in a GLM with five bona fide explanatory variables and thirty observations we have the Durbin-Watson statistic
From Table 3.1 we see that the upper significance point for the 1% level is 1.606. Hence the hypothesis of no autocorrelation will be accepted. For the 2.5% level the upper significant point is 1.727; hence we will not accept it at this level. On the other hand the lower significance point is 0.999 so that the test is indeterminate. For the 5% level the upper significance point is 1.833 while the lower is 1.070; hence at the 5% level the test is indeterminate as well.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Dhrymes, P. (2017). The General Linear Model III. In: Introductory Econometrics. Springer, Cham. https://doi.org/10.1007/978-3-319-65916-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-65916-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65914-5
Online ISBN: 978-3-319-65916-9
eBook Packages: Economics and FinanceEconomics and Finance (R0)