Skip to main content

The General Linear Model III

  • Chapter
  • First Online:
Introductory Econometrics
  • 3473 Accesses

Abstract

In the two preceding chapters we have set forth, in some detail, the estimation of parameters and the properties of the resulting estimators in the context of the standard GLM. We recall that rather stringent assumptions were made relative to the error process and the explanatory variables. Now that the exposition has been completed it behooves us to inquire as to what happens when some, or all, of these assumptions are violated. The motivation is at least twofold. First, situations may, in fact, arise in which some nonstandard assumption may be appropriate. In such a case we would want to know how to handle the problem. Second, we would like to know what is the cost in terms of the properties of the resulting estimators if we operate under the standard assumptions that, as it turns out, are not valid. Thus, even though we may not know that the standard assumptions are, in part, violated we would like to know what is the cost in case they are violated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Actually more pervasive forms of dependence will materialize due to the budget restriction imposed on a household’s consumption activities and the utility maximization hypothesis.

  2. 2.

    Note that if Σ is a positive matrix we can always write it as Σ = σ 2 Φ where σ 2 > 0 and Φ is positive definite. This involves no sacrifice of generality whatever. Only when we assert that Φ is known do we impose (significant) restrictions on the generality of the results.

  3. 3.

    Such estimators are more appropriately called minimum chi-square (MCS) estimators.

  4. 4.

    Not all random variables have density functions. Strictly speaking, this statement should be phrased “... a random variable having distribution function....” A distribution function need not be differentiable. Thus, the density function need not exist. But in this book all (continuous) random variables are assumed to have density functions.

  5. 5.

    This section may be omitted without essential loss of continuity.

  6. 6.

    Clearly, Φ must be specified a bit more precisely. If left as a general positive definite matrix then it would, generally, contain more than T (independent) unknown parameters, and thus its elements could not be consistently estimated through a sample of T observations.

  7. 7.

    In this context only, and for notational convenience, the dimension of X is set at T  ×  n; elsewhere, we shall continue to take X as T × (n + 1).

  8. 8.

    Thank you to Professor David Hendry for this point.

  9. 9.

    It is for this reason that A. Zellner who studied the problem of this section quite extensively termed it the problem of seemingly unrelated regressions [45].

  10. 10.

    Heuristically we may approach the problem as follows: since e i2πs = 1 , s = 1 , 2 , …, we may write r 2T = 1 as r = e i2/2T. In some sense this is a solution to the equation r 2T = 1, since if we raise both sides to the 2T power we get back the equation. Cancelling the factor 2 we get the solution e isπ/T , s = 0 , 1 , 2 , … , T. Extending the index s beyond T simply repeats the roots above.

  11. 11.

    If one wished, one could seek to determine coefficients γ and δ such that u r + k + 2 − γu r + k + 1 − δu r satisfies conditions similar to those above.

References

  1. Anderson, T. W. (1948). On the theory of testing serial correlation. Skandinavisk Aktuarietidskrift, 31, 88–116.

    Google Scholar 

  2. Anderson, T. W. (1971). The statistical analysis of time series. New York: Wiley.

    Google Scholar 

  3. Arnott, R. (1985). The use and misuse of consensus earnings. Journal of Portfolio Management, 12, 18–27.

    Article  Google Scholar 

  4. Ashley, R., & Patterson, D. M. (2010). Apparent long memory in time series as an artifact of a time-varying mean: Considering alternatives to the fractionally integrated model. Macroeconomic Dynamics, 14, 59–87.

    Article  Google Scholar 

  5. Bunn, D. (1989). Editorial: Forecasting with more than one model. Journal of Forecasting, 8, 161–166.

    Article  Google Scholar 

  6. Cochrane, D., & Orcutt, G. H. (1949). Applications of least squares to relations containing autocorrelated error terms. Journal of the American Statistical Association, 44, 32–61.

    Google Scholar 

  7. Dhrymes, P. J. (1971). Distributed lags: Problems of estimation and formulation. San Francisco: Holden–Day.

    Google Scholar 

  8. Durbin, J., & Watson, G. S. (1950). Testing for serial correlation in least squares regression, I. Biometrika, 37, 408–428.

    Google Scholar 

  9. Sargan, J. D. (1964). Wages and prices in the United Kingdom: A study in econometric methodology. In P. E. Hart et al. (Eds.), Econometric analysis for National Economic Planning. London: Butterworths.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Appendices

These tables are reproduced with the kind permission of the publisher Marcel Dekker Inc., and the author H. D. Vinod. Tables 3.4, 3.5, and 3.6 first appeared in H. D. Vinod, “Generalization of the Durbin–Watson Statistic for Higher Order Autoregressive Processes,” Communications in Statistics, vol. 2, 1973, pp. 115–144.

Table 3.4 Second-order autoregression: level of significance 5%
Table 3.5 Third-order autoregression: level of significance 5%
Table 3.6 Fourth-order autoregression: level of significance 5%

Appendix

1.1 Durbin–Watson Theory

Here we explore, in somewhat greater detail, the issues involved in the derivation and use of the Durbin–Watson statistic.

Derivation. From Eq. (3.18) of this chapter we have

$$ d=\frac{u^{\prime}\left(I-M\right)A\left(I-M\right)u}{u^{\prime}\left(I-M\right)u}, $$
(A.1)

where it is assumed that

$$ u\sim N\left(0,{\sigma}^2I\right) $$

and

$$ M=X{\left({X}^{\prime }X\right)}^{-1}{X}^{\prime }. $$

For notational simplicity put

$$ N=I-M $$

and note that, since N is idempotent,

$$ N(NAN)=(NAN)N $$

i.e., the two matrices

$$ N,\kern1em NAN $$

commute. We shall take advantage of this fact in order to greatly simplify the expression in (A.1). Because N is a symmetric idempotent matrix there exists an orthogonal matrix B such that

$$ {B}^{\prime } NB=D,\kern1em D=\operatorname{diag}\left(0,I\right), $$

the identity matrix being of dimension equal to the rank of N, which is T − n − 1. Define

$$ {B}^{\prime }(NAN)B=C $$
(A.2)

and partition its rows and columns conformably with respect to D, i.e.,

$$ C=\left[\begin{array}{ll}{C}_{11}& {C}_{12}\\ {}{C}_{21}& {C}_{22}\end{array}\right], $$

where C 22 is (T − n − 1) × (T − n − 1) and C 11 is (n + 1) × (n + 1).Observe that

$$ {\displaystyle \begin{array}{l} CD={B}^{\prime }(NAN)B{B}^{\prime } NB\\ {}\kern2.5em ={B}^{\prime } NANB\\ {}\kern2.5em ={B}^{\prime } NNANB={B}^{\prime } NB{B}^{\prime } NANB= DC.\\ {}\end{array}} $$

However,

$$ CD=\left[\begin{array}{cc}0& {C}_{12}\\ {}0& {C}_{22}\end{array}\right],\kern1em DC=\left[\begin{array}{cc}0& 0\\ {}{C}_{21}& {C}_{22}\end{array}\right]. $$
(A.3)

The relations above clearly imply

$$ {C}_{12}=0,\kern1em {C}_{21}=0, $$

so that consequently

$$ C=\left[\begin{array}{cc}{C}_{11}& 0\\ {}0& {C}_{22}\end{array}\right]. $$

Since A is a positive semidefinite (symmetric) matrix, C 11 and C 22 will have similar properties. Let E i be the orthogonal matrix of characteristic vectors of C ii  , i = 1 , 2, and Θ i be the (diagonal) matrix of characteristic roots of C ii  ,  i = 1 , 2, i.e.,

$$ {E}_1^{\prime }{C}_{11}{E}_1={\Theta}_1,\kern1em {E}_2^{\prime }{C}_{22}{E}_2={\Theta}_2. $$

It is clear that

$$ E=\operatorname{diag}\left({E}_1,\kern0.5em {E}_2\right),\kern1em \Theta =\operatorname{diag}\left({\Theta}_1,{\Theta}_2\right) $$

are (respectively) the matrices of characteristic vectors and roots for the matrix C. Thus

$$ {E}^{\prime } CE=\Theta . $$

Bearing in mind what C is we have

$$ {E}^{\prime }{B}^{\prime }(NAN) BE=\Theta . $$
(A.4)

From

$$ {B}^{\prime } NB=\left[\begin{array}{cc}0& 0\\ {}0& I\end{array}\right]=D $$

we also see that

$$ {E}^{\prime }{B}^{\prime } NBE=\left[\begin{array}{cc}{E}_1^{\prime }& 0\\ {}0& {E}_2^{\prime}\end{array}\right]\left[\begin{array}{cc}0& 0\\ {}0& I\end{array}\right]\left[\begin{array}{cc}{E}_1& 0\\ {}0& {E}_2\end{array}\right]=\left[\begin{array}{cc}0& 0\\ {}0& I\end{array}\right]. $$
(A.5)

Defining

$$ Q= BE. $$

we note that

$$ {Q}^{\prime }Q=Q{Q}^{\prime }={(BE)}^{\prime }(BE)=(BE){(BE)}^{\prime }=I, $$

i.e., Q is orthogonal; moreover, (A.4) and (A.5) imply that

$$ {\displaystyle \begin{array}{ll}{Q}^{\prime } NANQ& =\Theta, \\ {}{Q}^{\prime } NQ& =D,\end{array}} $$
(A.6)

where Θ is the diagonal matrix of the characteristic vectors of NAN.

If we put

$$ \xi ={Q}^{\prime}\left(\frac{u}{\sigma}\right) $$

we note that ξ ∼ N(0,  I) and

$$ d=\frac{\xi^{\prime}\Theta \xi }{\xi^{\prime } D\xi}, $$
(A.7)

Bounds on Characteristic Roots

The difficulty with the representation in (A.7) is that the numerator depends on the characteristic roots of NAN and thus, ultimately, on the data. A way out of this is found through the bounds, d L and d U, discussed earlier in the chapter. Let us now see how these bounds are established. We begin with

Lemma A.1

The characteristic roots of the matrix A, as exhibited in Eq. (3.16), are given by

$$ {\lambda}_j=2\left\{1-\cos \left[\frac{\left(j-1\right)\pi }{T}\right]\right\},\kern1em j=1,2,\dots, T. $$
(A.8)

Proof

We can write

$$ A=2I-2{A}_{\ast } $$

where

$$ {A}_{\ast }=\frac{1}{2}\left[\begin{array}{ccccc}1& 1& 0& \cdots & 0\\ {}1& 0& 1& & \\ {}0& 1& 0& & \vdots \\ {}& & & & \\ {}\vdots & & & & 0\\ {}& & & 0& 1\\ {}0& \cdots & 0& 1& 1\end{array}\right], $$
(A.9)

Since

$$ \mid \lambda I-A\mid \kern0.5em =\kern0.5em \mid \lambda I-2I+2{A}_{\ast}\mid \kern0.5em ={\left(-2\right)}^T\mid \mu I-{A}_{\ast}\mid, $$

where

$$ \mu =\frac{1}{2}\left(2-\lambda \right), $$

it follows that if we determine the characteristic roots of A , say

$$ \left\{{\mu}_i:i=1,2,\dots, T\right\}, $$

then the (corresponding) characteristic roots of A are given by

$$ {\lambda}_i=2\left(1-{\mu}_i\right),\kern1em i=1,2,\dots, T. $$

If μ and w are, respectively, a characteristic root and the corresponding characteristic vector of A , they satisfy the following set of equations:

$$ \frac{1}{2}\left({w}_1+{w}_2\right)=\mu {w}_1; $$
(A.10)
$$ \frac{1}{2}\left({w}_{i\kern0.5em -\kern0.5em 1}+{w}_{i\kern0.5em +\kern0.5em 1}\right)=\mu {w}_i,\kern0.62em i=2,3,\dots, T-1;\kern0.5em $$
(A.11)
$$ \frac{1}{2}\left({w}_{T-1}+{w}_T\right)=\mu {w}_T.\kern0.5em $$
(A.12)

In order to obtain an expression for μ and the elements of w, we note that the second set above may be rewritten as

$$ {w}_{i\kern0.5em +\kern0.5em 1}-2\mu {w}_i+{w}_{i\kern0.5em -\kern0.5em 1}=0, $$
(A.13)

which is recognized as a second-order difference equation. The desired characteristic root and vector are related to the solution of the equation in (3.13). Its characteristic equation is

$$ {r}^2-2\mu r+1=0, $$

whose solutions are

$$ {r}_1=\mu +\sqrt{\mu^2-1},\kern1em {r}_2=\mu -\sqrt{\mu^2-1}. $$

Since

$$ {r}_1+{r}_2=2\mu, \kern1em {r}_1{r}_2=1, $$
(A.14)

we conclude that

$$ {r}_2=\frac{1}{r_1}. $$
(A.15)

Thus for notational simplicity we shall denote the two roots by

$$ r,\kern1em \frac{1}{r}. $$

From the general theory of solution of difference equations (see Sect. 2.5 of Mathematics for Econometrics), we know that the solution to (A.13) may be written as

$$ {w}_t={c}_1{r}^t+{c}_2{r}^{-t}, $$

where c 1 and c 2 are constants to be determined by Eqs. (A.10) and (A.12). From (A.10) we find

$$ \left(1-r-{r}^{-1}\right)\left({c}_1r+{c}_2{r}^{-1}\right)+{c}_1{r}^2+{c}_2{r}^{-2}=0. $$

After considerable simplification this yields

$$ \left(r-1\right)\left({c}_1-{c}_2{r}^{-1}\right)=0, $$
(A.16)

which implies

$$ {c}_2={c}_1r. $$

Substituting in (A.12) and canceling c 1 yields

$$ \left(1-r\right)\left({r}^T-{r}^{-T}\right)=0, $$
(A.17)

which implies r 2T = 1, i.e., the solutions to (A.17) are the 2T roots of unity, plus the root r = 1. As is well known, the 2T roots of unityFootnote 10 are given by, say,

$$ {e}^{i2\pi s/2T}. $$

The roots of the matrix are, thus,

$$ \mu =\frac{1}{2}\left(r+{r}^{-1}\right)=\frac{1}{2}\left({e}^{i2\pi s/2T}+{e}^{-i2\pi s/2T}\right) $$

Since

$$ {e}^{i2\pi s/2T}={e}^{i2\pi \left(s-2T\right)/2T} $$

it follows that the only distinct roots correspond to

$$ r={e}^{i\pi s/T},\kern1em s=0,1,2,\dots, T. $$

Moreover, the root

$$ {r}_T={e}^{i\pi}=-1 $$

is inadmissible since the characteristic vector corresponding to it is

$$ {w}_t={c}_1{r}_T^t+{c}_2{r}_T^{-t}={c}_1{\left(-1\right)}^t+{c}_1{\left(-1\right)}^{-\left(t-1\right)}=0 $$

is inadmissible. Consequently, the characteristic roots of the matrix A are given by

$$ {\mu}_s=\frac{1}{2}\left({r}_s+{r}_s^{-1}\right)=\frac{1}{2}\left({e}^{i\pi s/T}+{e}^{- i\pi s/T}\right)=\cos \left(\pi s/T\right),\kern1em s=0,1,\dots, T-1, $$
(A.18)

and the corresponding characteristic roots of A by

$$ {\lambda}_s=2\left[1-\cos \left(\frac{\pi s}{T}\right)\right],\kern1em s=0,1,2,\dots, T-1.\kern1.1em q.e.d. $$
(A.19)

Corollary A.1

Let λ s , as in (A.19), be the sth characteristic root of A. Then

$$ {w}_{ts}=\cos \left[\frac{\left(t-\frac{1}{2}\right)\pi s}{T}\right],\kern1em t=1,2,\dots, T $$

is the corresponding characteristic vector.

Proof

We first note that if μ s , as in (A.18), is the sth characteristic root of A then

$$ {w}_{ts}={c}_1{r}_s^t+{c}_1{r}_s^{-t\kern0.5em +\kern0.5em 1},\kern1em t=1,2,\dots, T, $$

is the corresponding characteristic vector. If we choose

$$ {c}_1=\frac{1}{2}{r}_s^{-\left(1/2\right)} $$

then

$$ {w}_{ts}=\frac{1}{2}\left[{r}_s^{t-\left(1/2\right)}+{r}_s^{-\left(t-\left(1/2\right)\right)}\right]=\cos \left[\frac{\left(t-\frac{1}{2}\right)\pi s}{T}\right],\kern1em t=1,2,\dots, T, $$

and it can be shown easily that

$$ \sum \limits_{t=1}^T{w}_{ts}^2=\frac{T}{2},\kern1em \sum \limits_{t=1}^T{w}_{ts}{w}_{t{s}^{\prime }}=0,\kern1em s\ne {s}^{\prime }. $$

Thus, we see that the vectors corresponding to the roots r s are mutually orthogonal. If we wished we could have taken

$$ {c}_1=\frac{1}{2}\left(\sqrt{\frac{2}{T}}\right){r}_s^{-\left(1/2\right)}, $$

in which case we would have determined

$$ {w}_{ts}=\sqrt{\frac{2}{T}}\cos \left[\frac{\left(t-\frac{1}{2}\right)\pi s}{T}\right],\kern1em t=1,2,\dots, T,\kern0.74em s=0,1,2,\dots, T-1, $$

and thus ensured that

$$ \sum \limits_{t=1}^T{w}_{ts}^2=1. $$

Let

$$ W=\left({w}_{ts}\right),\kern1em s=0,1,2,\dots, T-1,\kern1em t=1,2,\dots, T, $$

the elements being as just defined above. We see that

$$ AW=2W-2{A}_{\ast }W=W2\left(I-\overline{M}\right)=W\Lambda, $$
(A.20)

where

$$ \Lambda =\operatorname{diag}\left({\lambda}_1{\lambda}_2,\dots, {\lambda}_T\right),\kern1em \overline{M}=\operatorname{diag}\left({\mu}_1,{\mu}_2,\dots, {\mu}_T\right). $$

But (A.20) shows that W is the matrix of characteristic vectors of A. q.e.d.

Remark A.1

Notice that since the roots above may be defined as

$$ {\lambda}_j=2\left[1-\cos \left(\frac{\uppi \left(j-1\right)}{T}\right)\right],\kern1em j=1,2,\dots, T, $$

and

$$ {\lambda}_1=2\left[1-\cos 0\right]=0,\kern1em \cos \left(\pi \right)=-1, $$

we can conclude that

$$ {\lambda}_j\in \left[0,4\right). $$

Notice further that they are arranged in increasing order of magnitude, i.e.,

$$ {\lambda}_1<{\lambda}_2<{\lambda}_3\cdots <{\lambda}_T. $$

Let us now turn to the relation between the roots of A, as established above, and those of

$$ NAN\kern1em N=I-X{\left({X}^{\prime }X\right)}^{-1}{X}^{\prime }. $$

Since (X X)−1 is positive definite there exists a nonsingular matrix G such that

$$ {\left({X}^{\prime }X\right)}^{-1}=G{G}^{\prime } $$
(A.21)

Define

$$ P={W}^{\prime } XG, $$
(A.22)

where W is the matrix of characteristic vectors of A. We have first.

Lemma A.2

The matrix

$$ P={W}^{\prime } XG $$

is a T × (n + 1) matrix of rank (n + 1), where G is as in (A.21) and W is the matrix of characteristic vectors of A. Moreover, its columns are mutually orthogonal.

Proof

The assertion that P is T × (n + 1) is evident. We further note

$$ {P}^{\prime }P={G}^{\prime }{X}^{\prime }W{W}^{\prime } XG={G}^{\prime }{X}^{\prime } XG=I.\kern1.1em \mathrm{q}.\mathrm{e}.\mathrm{d}. $$

Now consider the roots of NAN, i.e., consider

$$ \mid \theta I- NAN\mid \kern0.5em =\kern0.5em \mid \theta I- NW\Lambda {W}^{\prime }N\mid \kern0.5em =\kern0.5em \mid {W}^{\prime }W\mid \mid \theta I-{N}^{\ast}\Lambda {N}^{\ast}\mid . $$

But

$$ \mid {W}^{\prime }W\mid \kern1em =1,\kern0.98em {N}^{\ast }={W}^{\prime } NW=I-P{P}^{\prime }. $$

Hence the roots of NAN are exactly those of

$$ \left(I\kern0.5em -P{P}^{\prime}\right)\Lambda \left(I-P{P}^{\prime}\right), $$

where P is as defined in (A.22). It turns out that we can simplify this aspect considerably. Thus,

Lemma A.3

Let p i be the ith column of P and let

$$ {P}_i=I-{p}_{\cdot i}{p}_{\cdot i}^{\prime }. $$

Then

$$ 1-P{P}^{\prime }=\prod \limits_{i\kern0.5em =\kern0.5em 1}^{n\kern0.5em +\kern0.5em 1}{P}_i. $$

Proof

$$ {P}_i{P}_j=\left(I-{p}_{.i}{p}_{.i}^{\prime}\right)\left(I-{p}_{.j}{p}_{.j}^{\prime}\right)=I-{p}_{.i}{p}_{.i}^{\prime }-{p}_{.j}{p}_{.j}^{\prime } $$

since the columns of P are orthogonal. Moreover, the P i are symmetric idempotent, i.e.,

$$ {P}_i{P}_i^{\prime }={P}_i. $$

It follows therefore that

$$ \prod \limits_{i\kern0.5em =\kern0.5em 1}^{n\kern0.5em +\kern0.5em 1}{P}_i=I-\sum \limits_{i=1}^{n\kern0.5em +\kern0.5em 1}{p}_{\cdot i}{p}_{\cdot i}^{\prime }. $$

Since

$$ P{P}^{\prime }=\sum \limits_{i\kern0.5em =\kern0.5em 1}^{n\kern0.5em +\kern0.5em 1}{P}_{\cdot i}{P}_{\cdot i}^{\prime } $$

we conclude

$$ I-P{P}^{\prime }=\prod \limits_{i=1}^{n+1}{P}_i.\kern1.1em \mathrm{q}.\mathrm{e}.\mathrm{d}. $$

A very useful consequence of the lemma is that the problem may now be posed as follows: what is the relation of the roots of

$$ \left|\theta I-\left(\prod \limits_{i=1}^{n+1}{P}_i\right)\Lambda \left(\prod \limits_{i=1}^{n+1}{P}_i\right)\right|=0 $$

to those of A, i.e., to the elements of Λ. Moreover, since

$$ \left(\prod \limits_{i=1}^{n+1}\;{P}_i\right)\Lambda \left(\prod \limits_{i=1}^{n+1}\;{P}_j\right)={P}_{n+1}\left({P}_n\cdots {P}_2\left({P}_1\;\Lambda\;{P}_1\right){P}_2\cdots {P}_n\right){P}_{n+1} $$

it follows that the problem may be approached recursively, by first asking: what are the relations between the elements of Λ and the roots of P1ΛP1? If we answer that question then we have automatically answered the question: what are the relations between the roots of

$$ {P}_2{P}_1\kern0.5em \Lambda \kern0.5em {P}_1{P}_2 $$

and those of

$$ {P}_1\Lambda {P}_1? $$

Hence, repeating the argument we can determine the relation between the roots of NAN and those of A. Before we take up these issues we state a very useful result.

Lemma A.4

Let D be a nonsingular diagonal matrix of order m. Let α be a scalar and a , b be two m-element column vectors, and put

$$ H=D+\alpha a{b}^{\prime }. $$

Then

$$ \mid H\mid =\left[1+\alpha \sum \limits_{i\kern0.5em =\kern0.5em 1}^m\left(\frac{a_i{b}_i}{d_{ii}}\right)\right]\prod \limits_{j\kern0.5em =\kern0.5em 1}^m{d}_{jj}. $$

Proof

See Proposition 31 of Mathematics for Econometrics.

Lemma A.5

The characteristic roots of

$$ {P}_1\Lambda {P}_1, $$

arranged in increasing order, are the solution of

$$ 0=\psi \left(\theta \right)=\theta f\left(\theta \right),\kern1em f\left(\theta \right)=\sum \limits_{i=1}^T{p}_{i1}^2\prod \limits_{s\ne i}\left(\theta -{\lambda}_s\right), $$

and obey

$$ {\theta}_i^{(1)}=0,\kern1em {\lambda}_i\le {\theta}_{i+1}^{(1)}\le {\lambda}_{i+1},\kern1em i=1,2,\dots, T-1. $$

Proof

The characteristic roots of P 1ΛP 1 are the solutions of

$$ 0=\kern0.5em \mid \theta I-{P}_1\Lambda {P}_1\mid \kern0.5em =\kern0.5em \mid \theta I-\Lambda {P}_1\mid \kern0.5em =\kern0.5em \mid \theta I-\Lambda +\Lambda {p}_{\cdot 1}{p}_{\cdot 1}^{\prime}\mid . $$

Taking

$$ D=\theta I-\Lambda, \kern1em \alpha =1,\kern1em \Lambda {p}_{\cdot 1}=a,\kern1em b={p}_{\cdot 1}, $$

applying Lemma A.4, and noting that

$$ \sum \limits_{i=1}^T{p}_{i1}^2=1, $$

we conclude

$$ \mid \theta I-{P}_1\Lambda {P}_1\mid =\theta f\left(\theta \right), $$
(A.23)

where

$$ f\left(\theta \right)=\sum \limits_{i=1}^T{P}_{i1}^2\prod \limits_{s\ne i}\left(\theta -{\lambda}_s\right). $$

We note that the characteristic equation of P 1ΛP 1 as exhibited in the two equations above is a polynomial of degree T. Since P 1ΛP 1 will, generally, be of rank T − 1, the polynomial equation

$$ f\left(\theta \right)=0 $$

will not have a zero root. Indeed,

$$ f\left({\lambda}_1\right)={p}_{11}^2\prod \limits_{s\kern0.5em \ne \kern0.5em 1}\left({\lambda}_1-{\lambda}_s\right)={\left(-1\right)}^{T-1}{p}_{11}^2\prod \limits_{s\kern0.5em =\kern0.5em 2}^T{\lambda}_s\ne 0 $$

unless

$$ {p}_{11}=0 $$

We remind the reader that in the preceding we employ notation

$$ {\lambda}_j=2\left\{1-\cos \left[\frac{\left(j-1\right)\uppi}{T}\right]\right\},\kern1em j=1,2,3,\dots, T, $$

so that the roots of A are arranged as

$$ 0={\lambda}_1<{\lambda}_2<{\lambda}_2\cdots <{\lambda}_T<4. $$

Now the roots of

$$ \psi \left(\theta \right)=0 $$

are the one obvious zero root (associated with the factor θ) and the T − 1 (nonzero) roots of

$$ f\left(\theta \right)=0. $$

But for any r ≥ 2,

$$ f\left({\lambda}_r\right)={\left(-1\right)}^{T-r}{p}_{r1}^2\prod \limits_{i<r}\left({\lambda}_r-{\lambda}_i\right)\prod \limits_{i>r}\left({\lambda}_i-{\lambda}_r\right)\ne 0 $$

provided p r1 ≠ 0. Assuming this to be so we have that if, say, f(λ r ) > 0, then f (λ r + 1) < 0. Thus, between the two roots of A , λ r and λ r + 1, lies a root of P 1ΛP 1. Denote such roots by

$$ {\theta}_i^{(1)},\kern1em i=1,2,\dots, T. $$

What the preceding states is that

$$ {\theta}_1^{(1)}=0,\kern1em {\lambda}_i\le {\theta}_{i+1}^{(1)}\le {\lambda}_{i+1},\kern1em i=1,2,\dots, T-1.\kern1.1em \mathrm{q}.\mathrm{e}.\mathrm{d}. $$

The following Lemma is also important in the chain of our argumentation.

Lemma A.6

Let

$$ {Q}_j=\left(\prod \limits_{i=1}^j{P}_j\right)\Lambda \left(\prod \limits_{i=1}^j{P}_i\right),\kern1em j=1,2,\dots, n+1. $$

Then Q j has at least j zero roots.

Proof

By definition

$$ \prod \limits_{i=1}^j\;{P}_i=I-\sum \limits_{i=1}^j\;{p}_{\cdot i}{p}_{\cdot i}^{\prime }, $$

and, for k ≥ j, we have

$$ \operatorname{rank}\left({Q}_j\right)\le T-j,\kern1em j=1,2,3,\dots, N+1, $$

which means that Q j must have at least j zero roots. q.e.d.

Lemma A.7

Let Q j − 1 be defined as in Lemma A.6. Let M j − 1 , Θ(j − 1) be its associated matrices of characteristic vectors and roots respectively. Let Q j and Θ(j) be similarly defined. Then

$$ {\theta}_i^{(j)}=0,\kern1em i=1,2,\dots, j, $$
$$ {\theta}_i^{\left(j-1\right)}\le {\theta}_{i+1}^{(j)}\le {\theta}_{i+1}^{\left(j-1\right)},\kern1em i=j,\kern0.5em j+1,\dots, T. $$

Proof

The first assertion is a restatement of the conclusion of Lemma A.6. For the second, consider

$$ 0=\kern0.5em \mid \theta I-{Q}_j\mid \kern0.5em =\kern0.5em \mid \theta I-{P}_j{Q}_{j-1}{P}_j\mid \kern0.5em =\kern0.5em \mid \theta I-{\overline{P}}_j{\Theta}^{\left(j-1\right)}{\overline{P}}_j\mid, $$

where

$$ {\overline{P}}_j={M}_{j-1}^{\prime }{P}_j{M}_{j-1}=I-{\overline{p}}_{\cdot j}{\overline{p}}_{\cdot j}^{\prime },\kern1em {\overline{p}}_{\cdot j}={M}_{j-1}^{\prime }{p}_{\cdot j}. $$

We note that

$$ {\overline{P}}_j{\overline{P}}_j^{\prime }={\overline{P}}_j. $$

Thus

$$ \mid \theta I-{\overline{P}}_j{\Theta}^{\left(j-1\right)}{\overline{P}}_j\mid =\mid \theta I-{\Theta}^{\left(j-1\right)}{\overline{P}}_j\mid =\mid \theta I-{\Theta}^{\left(j-1\right)}+{\Theta}^{\left(j-1\right)}{\overline{p}}_{\cdot j}{\overline{p}}_{\cdot j}^{\prime}\mid =\psi \left(\theta \right), $$

where now

$$ \psi \left(\theta \right)=\theta f\left(\theta \right),\kern1em f\left(\theta \right)=\sum \limits_{i=1}^T{\overline{p}}_{ij}^2\prod \limits_{s\ne i}\left(\theta -{\theta}_s^{\left(j-1\right)}\right). $$

Since we know that ψ(θ) has (at least) j zero roots we therefore know that f(θ) must have (at least) j − 1 zero roots. Hence

$$ f\left({\theta}_k^{\left(j-1\right)}\right)={\overline{p}}_{kj}^2\prod \limits_{s\kern0.5em \ne \kern0.5em k}\left({\theta}_k^{\left(j\kern0.5em -\kern0.5em 1\right)}-{\theta}_s^{\left(j\kern0.5em -\kern0.5em 1\right)}\right)=0,\kern1em k=1,2,\dots, j-1, $$

which need not imply

$$ {\overline{p}}_{kj}=0,\kern1em k=1,2,\dots, j-1. $$

Consequently, we can write

$$ f\left(\theta \right)=\sum \limits_{i=1}^T{\overline{p}}_{ij}^2\prod \limits_{s\ne i}\left(\theta -{\theta}_s^{\left(j-1\right)}\right). $$

and, for k ≥ j, we have

$$ f\left({\theta}_k^{\left(j-1\right)}\right)={\overline{p}}_{kj}^2\prod \limits_{s\kern0.5em \ne \kern0.5em k}\left({\theta}_k^{\left(j-1\right)}-{\theta}_s^{\left(j-1\right)}\right) $$
$$ ={\left(-1\right)}^{T-k}{\overline{p}}_{kj}^2\prod \limits_{s<k}\left({\theta}_k^{\left(j-1\right)}-{\theta}_s^{\left(j-1\right)}\right)\prod \limits_{s>k}\left({\theta}_s^{\left(j-1\right)}-{\theta}_k^{\left(j-1\right)}\right). $$

In general,

$$ f\left({\theta}_k^{\left(j-1\right)}\right)\ne 0,\kern1em k\ge j, $$

provided \( {\overline{p}}_{kj}\ne 0 \). Thus if, e.g., \( f\left({\theta}_k^{\left(j-1\right)}\right)>0 \), then

$$ f\left({\theta}_{k+1}^{\left(j-1\right)}\right)<0. $$

Consequently, a root of Q j , say \( {\theta}_{k+1}^{(j)} \), lies between \( {\theta}_k^{\left(j-1\right)} \) and \( {\theta}_{k+1}^{\left(j-1\right)} \). This is so since if

$$ f\left({\theta}_j^{\left(j-1\right)}\right)>0, $$

then

$$ f\left({\theta}_{j+1}^{\left(j-1\right)}\right)<0. $$

Thus, the first nonzero root of Q j , viz., \( {\theta}_{j+1}^{(j)} \) obeys

$$ {\theta}_j^{\left(j-1\right)}\le {\theta}_{j+1}^{(j)}\le {\theta}_{j+1}^{\left(j-1\right)}. $$

Consequently, we have established

$$ {\theta}_i^{(j)}=0,\kern1em i=1,2,\dots, j, $$
$$ {\theta}_i^{\left(j-1\right)}\le {\theta}_{i+1}^{(j)}\le {\theta}_{i+1}^{\left(j-1\right)},\kern1em i=j,j+1,\dots, T-1.\kern1.1em \mathrm{q}.\mathrm{e}.\mathrm{d}. $$

We may now prove.

Theorem A.1

Let

$$ {\lambda}_i,\kern1em i=1,2,\dots, T, $$

be the characteristic roots of Λ arranged as

$$ 0={\lambda}_1<{\lambda}_2<\cdots <{\lambda}_T<4. $$

Let

$$ {\theta}_i,\kern1em i=1,2,\dots, T, $$

be the roots of NAN similarly arranged in increasing order. Then, the following is true:

$$ {\theta}_i=0,\kern1em i=1,2,\dots, n+1; $$
$$ {\lambda}_j\le {\theta}_{j+n+1}\le {\lambda}_{j+n+1},\kern1em j=1,2,\dots, T-n-1. $$

Proof

The first part of the theorem is evident. For the second part we note that from Lemma A.5 we have

$$ {\lambda}_i\le {\theta}_{i\kern0.5em +\kern0.5em 1}^{(1)}\le {\lambda}_{i\kern0.5em +\kern0.5em 1},\kern1em i=1,2,\dots, T-1. $$
(A.24)

From Lemma A.7 we have that

$$ {\theta}_{i\kern0.5em +\kern0.5em 1}^{(1)}\le {\theta}_{i\kern0.5em +\kern0.5em 2}^{(2)}\cdots \le {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^{\left(n\kern0.5em +\kern0.5em 1\right)}={\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1},\kern1em i=1,2,\dots, T-n-1. $$
(A.25)

Thus (A.24) and (A.25) imply

$$ {\lambda}_i\le {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1},\kern1em i=1,2,\dots, T-n-1. $$
(A.26)

Again using Lemma A.7,

$$ {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}={\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^{\left(n\kern0.5em +\kern0.5em 1\right)}\le {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^{(n)}\cdots \le {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^{(1)}, $$
(A.27)

and (A.27) and (A.24) imply

$$ {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}\le {\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1},\kern1em i=1,2,\dots, T-n-1. $$
(A.28)

Combining (A.26) and (A.28) we conclude

$$ {\lambda}_i\le {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}\le {\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1},\kern1em i=1,2,\dots, T-n-1.\kern1.1em \mathrm{q}.\mathrm{e}.\mathrm{d}. $$

Let us now consider certain special cases. Thus, suppose k (<n + 1) columns of X are linear combinations of k characteristic vectors of A, say (for definiteness) those corresponding to the k smallest characteristic roots of A. We shall see that this type of X matrix will have an appreciable impact on the bounds determined in Theorem A.1.

The X matrix we deal with obeys

$$ X=\left({X}_1,\kern0.5em {X}_2\right),\kern1em {X}_1={W}_1B, $$

where B is a nonsingular k × k matrix and W 1 is the matrix containing the characteristic vectors of A corresponding to its k smallest roots. Retracing our argument in the early part of this appendix we note that, in this case,

$$ G=\left[\begin{array}{cc}{B}^{-1}& -{B}^{-1}{W}_1^{\prime }{X}_2C\\ {}0& C\end{array}\right], $$
(A.29)

where C is defined by

$$ C{C}^{\prime }={\left({X}_2^{\prime }{W}_2{W}_2^{\prime }{X}_2\right)}^{-1}. $$

Remark A.2

The matrix C will always exist unless X 2 is such that

$$ {W}_2^{\prime }{X}_2=0,\kern1em W=\left({W}_1,\kern0.5em {W}_2\right), $$

or is not of full rank. This, however, is to be ruled out since X 2 is not a linear transformation of the first k characteristic vectors of A.

The matrix P of (A.22) is now

$$ P={W}^{\prime } XG=\left[\begin{array}{cc}I& 0\\ {}0& {P}^{\ast}\end{array}\right], $$

where

$$ {P}^{\ast }={W}_2^{\prime }{X}_2C. $$
(A.30)

We verify that P is (T − k) × (n + 1 − k) and obeys

$$ P{\ast}^{\prime }{P}^{\ast }=I. $$

Hence

$$ I-P{P}^{\prime }=\left[\begin{array}{cc}0& 0\\ {}0& I-{P}^{\ast }P{\ast}^{\prime}\end{array}\right]. $$
(A.31)

We may now state

Theorem A.2

Assume the conditions of Theorem A.1 and, in addition, that the data matrix of the GLM is

$$ X=\left({X}_1,\kern0.5em {X}_2\right),\kern1em {X}_1={W}_1B, $$

where B is k × k nonsingular and W 1 is the T × k matrix containing the characteristic vectors of A corresponding to, say, the k smallest characteristic roots. Then, the following is true regarding the relation between the roots θ i  , i = 1 , 2 , … , T, of NAN and λ i  , i = 1 , 2 , … , T, of A:

$$ {\theta}_i=0,\kern1em i=1,2,\dots, n+1; $$
$$ {\lambda}_{j+k}\le {\theta}_{j+n+1}\le {\lambda}_{j+n+1},\kern1em j=1,2,\dots, T-n-1. $$

Proof

The roots of NAN are exactly those of

$$ \left(I\kern0.5em -P{P}^{\prime}\right)\Lambda \left(I-P{P}^{\prime}\right). $$

For the special case under consideration,

$$ P=\left[\begin{array}{ll}I& 0\\ {}0& {P}^{\ast}\end{array}\right], $$

where I is k × k and P is (T − k) × (n + 1 − k), its columns being orthogonal. Defining

$$ \Lambda =\left[\begin{array}{cc}{\Lambda}_1& 0\\ {}0& {\Lambda}^{\ast}\end{array}\right], $$

where

$$ {\Lambda}^{\ast }=\operatorname{diag}\left({\lambda}_{k+1},{\lambda}_{k+2},\dots, {\lambda}_T\right), $$

we have that the characteristic equation of NAN is, in this special case,

$$ \left|\theta I-\left[\begin{array}{ccc}0& & 0\\ {}0& \left(I-{P}^{\ast }P{\ast}^{\prime}\right)& {\Lambda}^{\ast}\left(I-{P}^{\ast }P{\ast}^{\prime}\right)\end{array}\right]\right|=0. $$
(A.32)

But it is evident that (A.32) has k zero roots and that the remaining roots are those of

$$ \mid \theta I-\left(I-{P}^{\ast }P{\ast}^{\prime}\right){\Lambda}^{\ast}\left(I-{P}^{\ast }P{\ast}^{\prime}\right)\mid =0, $$
(A.33)

the identity matrices in (A.33) being of order T − k. Let the roots of (A.33) be denoted by

$$ {\theta}_i^{\ast },\kern1em i=1,2,\dots, T-k. $$

But the problem in (A.33) is one for which Theorem A.1 applies. We thus conclude that

$$ {\theta}_i^{\ast }=0,\kern1em i=1,2,\dots, n+1-k. $$
(A.34)

This is so since

$$ I-{P}^{\ast }P{\ast}^{\prime } $$

is a (T − k) × (T − k) idempotent matrix of rank T − n − 1. Hence

$$ \left(I-{P}^{\ast }P{\ast}^{\prime}\right){\Lambda}^{\ast}\left(I-{P}^{\ast }P{\ast}^{\prime}\right) $$

must have (at least)

$$ T-k-\left(T-n-1\right)=n+1-k $$

zero roots. Defining, further, the elements of Λ by

$$ {\lambda}_i^{\ast }={\lambda}_{i+k},\kern1em i=1,2,\dots, T-k, $$
(A.35)

and applying again Theorem A.1 we conclude

$$ {\lambda}_i^{\ast}\le {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1\kern0.5em -\kern0.5em k}^{\ast}\le {\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1\kern0.5em -\kern0.5em k}^{\ast },\kern1em i=1,2,\dots, T-n-1+k. $$
(A.36)

Collecting results and translating to the unstarred notation we have

$$ {\theta}_i=0,\kern1em i=1,2,\dots, n+1, $$
(A.37)
$$ {\lambda}_{j\kern0.5em +\kern0.5em k}\le {\theta}_{j\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}\le {\lambda}_{j\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1},\kern1em j=1,2,\dots, T-n-1.\kern1.1em \mathrm{q}.\mathrm{e}.\mathrm{d}. $$

Remark A.3

If k of the columns of X are linear transformations of k characteristic vectors of A—not necessarily those corresponding to the smallest or largest k characteristic roots—we proceed exactly as before, except that now we should renumber the roots of A so that

$$ {\lambda}_i^{\prime },\kern1em i=1,2,\dots, T, $$

and

$$ {\lambda}_i^{\prime },\kern1em i=1,2,\dots, k, $$

correspond to the specified characteristic vectors of A involved in the representation of the specified columns of X; without loss of generality we may take the latter to be the first k columns. The remaining roots we arrange in increasing order of magnitude, i.e.,

$$ {\lambda}_{k+1}^{\prime }<{\lambda}_{k+2}^{\prime }<{\lambda}_{k+3}^{\prime }<\cdots <{\lambda}_T^{\prime }. $$

Proceeding as before we will obtain a relation just as in (A.32), from which we shall conclude that the equation there has k zero roots and that the remaining roots are those of the analog of (A.33). The elements of the matrix Λ will be, in the present case,

$$ {\lambda}_{i+k}^{\prime },\kern1em i=1,2,\dots, T-k. $$

Putting

$$ {\lambda}_i^{\ast }={\lambda}_{i+k}^{\prime } $$

we will thus conclude that the roots of NAN, say θ i , obey

$$ {\displaystyle \begin{array}{c}{\theta}_i=0,\kern1em i=1,2,\dots, n+1\\ {}{\lambda}_i^{\ast}\le {\theta}_i^{\ast}\le {\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1\kern0.5em -\kern0.5em k}^{\ast },\kern1em i=1,2,\dots, T-n-1+k,\end{array}} $$
(A.38)

where, of course,

$$ {\theta}_i^{\ast }={\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}. $$

Evidently, if we have as in the case of Theorem A.2 that the k roots are the k smallest characteristic roots then

$$ {\lambda}_i^{\ast^{\prime }}={\lambda}_{i\kern0.5em +\kern0.5em k},\kern1em {\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1\kern0.5em -\kern0.5em k}^{\ast \hbox{'}}={\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}, $$

and the bounds are exactly as before. On the other hand, if the characteristic vectors in question are those corresponding to the first and last k − 1 roots of A, i.e., to λ 1 and λ T − k  +  2 , λ T − k + 3 , … , λ T , then

$$ {\lambda}_i^{\ast \hbox{'}}={\lambda}_{i\kern0.5em +\kern0.5em 1},\kern1em i=1,2,\dots, T-k. $$

Thus the bounds become

$$ {\displaystyle \begin{array}{c}{\theta}_i=0,\kern1em i=1,2,\dots, n+1,\\ {}{\lambda}_{i\kern0.5em +\kern0.5em 1}\le {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}\le {\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 2\kern0.5em -\kern0.5em k},\kern1em i=1,2,\dots, T-n-1.\\ {}\end{array}} $$
(A.39)

For the special case

$$ k=n+1 $$

(A.37) yields

$$ {\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}\le {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}\le {\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}, $$
(A.40)

which implies

$$ {\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}={\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1},\kern1em i=1,2,\dots, T-n-1. $$

On the other hand, (A.39) yields

$$ {\lambda}_{i+1}\le {\theta}_{i+n+1}\le {\lambda}_{i+1},\kern1em i=1,2,\dots, T-n-1, $$
(A.41)

which implies

$$ {\lambda}_{i+1}\kern1em =\kern1em {\theta}_{i+n+1},\kern1em i=1,2,\dots, T-n-1. $$
(A.42)

Remark A.4

The effect of having k vectors of X expressible as linear combinations of k characteristic vectors of A is to make the bounds on the roots of NAN tighter.

Remark A.5

Referring to Theorem A.1 we note that the smallest characteristic root of A is

$$ {\lambda}_1=0. $$

One can easily verify that

$$ e={\left(1,1,\dots, 1\right)}^{\prime } $$

is a characteristic vector corresponding to this root. But in most cases the GLM will contain a constant term, and hence the appropriate bounds can be determined quite easily from (A.37) to be

$$ {\lambda}_{i+1}\le {\theta}_{i+n+1}\le {\lambda}_{i+n+1},\kern1em i=1,2,\dots, T-n-1. $$
(A.43)

Remark A.6

In the previous chapter we considered the special case where X is a linear transformation of n + 1 of the characteristic vectors of V −1. It will be recalled that in such a case the OLS and Aitken estimators of the parameter vector β will be the same. It might also be noted in passing that if u is a T-element random vector having the joint density

$$ K\exp \left\{-\frac{1}{2}\alpha \left[{u}^{\prime}\left(D+\gamma A\right)u\right]\right\}, $$
(A.44)

where K is a suitable constant, α > 0 , D is positive definite, A is symmetric, and γ is scalar such that D + γA is a positive definite matrix, then the uniformly most powerful test of the hypothesis

$$ \gamma =0, $$

as against

$$ \gamma <0, $$

is provided by r < r where

$$ r=\frac{{\tilde{u}}^{\prime }A\tilde{u}}{{\tilde{u}}^{\prime }D\tilde{u}} $$

and r is a suitable constant, determined by the desired level of significance. Here it is understood that we deal with the model

$$ y= X\beta +u $$

and that the columns of X are linear combinations of n + 1 of the characteristic vectors of A. In the definition above we have

$$ \tilde{u}=\left[I-X{\left({X}^{\prime }X\right)}^{-1}{X}^{\prime}\right]u= Nu. $$

This result is due to Anderson [1, 2].

It would appear that for the cases we have considered in this book (i.e., when the errors are normally distributed), taking

$$ D=I,\kern1em \alpha =\frac{{\left(1-\rho \right)}^2}{\sigma^2},\kern1em \gamma =\frac{\rho }{{\left(1-\rho \right)}^2}, $$
(A.45)

and A as in Eq. (3.16) of the chapter we are dealing with autoregressive errors obeying

$$ {u}_t=\rho {u}_{t-1}+{\varepsilon}_t. $$
(A.46)

Thus, testing

$$ {\mathrm{H}}_0:\kern0.5em \gamma =0, $$

as against

$$ {\mathrm{H}}_1:\kern0.5em \gamma <0, $$

is equivalent, in the context of (A.45) and (A.46), to testing

$$ {\mathrm{H}}_0:\kern0.5em \rho =0, $$
(A.47a)

as against

$$ {\mathrm{H}}_1:\kern0.5em \rho <0. $$
(A.47b)

Thus, to paraphrase the Anderson result: if we are dealing with a GLM whose error structure obeys (A.46) then a uniformly most powerful (UMP) test for the hypothesis (A.47a) will exist when the density function of the error obeys (A.44) and, moreover, the data matrix X is a linear transformation of an appropriate submatrix of the matrix of characteristic vectors of A. Furthermore, when these conditions hold the UMP test is the Durbin–Watson test, which uses the d-statistic computed from OLS residuals, as defined in Eq. (3.17) of this chapter. Examining these conditions a bit more closely, however, shows a slight discrepancy. If we make the assignments in (A.45) then

$$ \alpha \left[D+\gamma A\right]=\frac{1}{\sigma^2}\left[\begin{array}{ccccc}1+{\rho}^2-\rho & -\rho & 0& \cdots & 0\\ {}-\rho & 1+{\rho}^2& -\rho & & \vdots \\ {}0& \vdots & \vdots & & 0\\ {}\vdots & -\rho & 1+{\rho}^2& \cdots & -\rho \\ {}0& 0& -\rho & \cdots & 1+{\rho}^2-\rho \end{array}\right]. $$

But the (inverse of the) covariance matrix of the error terms obeying (A.46) with the ε’s i.i.d. and normal with variance σ 2 would be

$$ \frac{1}{\sigma^2}{V}^{-1}=\frac{1}{\sigma^2}\left[\begin{array}{cccccc}1& -\rho & 0& & \cdots & 0\\ {}-\rho & 1+{\rho}^2& -\rho & & & \\ {}0& -\rho & 1+{\rho}^2& -\rho & & \vdots \\ {}\vdots & & & & & 0\\ {}& & & -\rho & 1+{\rho}^2& -\rho \\ {}0& & \cdots & 0& -\rho & 1\end{array}\right]. $$

A comparison of the two matrices shows that they differ, although rather slightly, in the upper left-and lower right-hand corner elements. They would coincide, of course, when

$$ \rho =0\kern1em \mathrm{or}\kern1em \rho =1. $$

We note, further, that

$$ {\sigma}^2\alpha \left[D+\gamma A\right]={\left(1-\rho \right)}^2I+\rho A. $$
(A.48)

Hence, if W is the matrix of characteristic vectors of A then

$$ \left[{\left(1-\rho \right)}^2I+\rho A\right]W=\left[{\left(1-\rho \right)}^2I+\rho \Lambda \right]W. $$

This shows:

  1. (a)

    the characteristic roots of the matrix in (A.48) are given by ψ i  = (1 − p)2 + ρλ i  , i = 1 , 2 , … , T, where λ i are the corresponding characteristic roots of A;

  2. (b)

    if W is the matrix of characteristic vectors of A then it is also that of the matrix in (A.48).

Remark A.7

What implications emerge from the lengthy remark regarding tests for autocorrelation? We have, in particular, the following:

  1. (i)

    in a very strict sense, we can never have UMP tests for the case we have considered in this chapter since the matrix of the quadratic form in (A.44) subject to the parameteric assignments given in (A.45) is never the same as the (inverse of the) covariance matrix of the error process in (A.46) when the ε’s are i.i.d. and normal with mean zero and variance σ 2 The difference, however, is small—the (1, 1) and (T,  T) elements differ by −ρ(1 − ρ). Thus, the difference is positive when ρ < 0 and negative when ρ > 0. It vanishes when ρ = 0 or ρ = 1;

  2. (ii)

    if we are prepared to ignore the differences in (i) and thus consider the roots and vectors of (1  − ρ)2 I + ρA and V −1 as the same, then a UMP test will exist and will be the Durbin–Watson test only in the special case where the data matrix X is a linear transformation of n + 1 of the characteristic vectors of A—and hence of V −1;

  3. (iii)

    When X is a linear transformation of n + 1 of the characteristic vectors of V −1, as we established in the current chapter, the OLS and Aitken estimators of β coincide.

Remark A.8

The preceding discussion reveals a very interesting aspect of the problem. Presumably, we are interested in testing for autocorrelation in the error process, because if such is the case OLS will not be an efficient procedure and another (efficient) estimator is called for. The test utilized is the Durbin–Watson test. Yet when this test is optimal, i.e., a UMP test, OLS is an efficient estimator—hence the result of the test would not matter. On the other hand, when the results of the test would matter, i.e., when in the presence of autocorrelation OLS is inefficient, the Durbin–Watson test is not UMP.

Bounds on the Durbin–Watson Statistic

Let us now return to the problem that has motivated much of the discussion above. We recall that for testing the null hypothesis

$$ {\mathrm{H}}_0:\kern0.5em \rho =0, $$

as against the alternative

$$ {\mathrm{H}}_1:\kern0.5em \rho <0, $$

where it is understood that the error terms of the GLM obey (A.46) and the ε’s are i.i.d. normal random variables with mean zero and variance σ 2, we use the test statistic

$$ d=\frac{\xi^{\prime}\Theta \xi }{\xi^{\prime } D\xi}, $$

where ξ is a T × 1 vector obeying

$$ \xi \sim N\left(0,\kern0.5em {\sigma}^2I\right), $$
$$ D=\left[\begin{array}{ll}0& 0\\ {}0& {I}_{T-n-1}\end{array}\right], $$
$$ \Theta =\operatorname{diag}\left({\theta}_1,\kern0.5em {\theta}_2,\dots, {\theta}_T\right), $$

and the θ i are the characteristic roots of NAN arranged in increasing order. Noting that

$$ {\theta}_i=0,\kern1em i=1,2,\dots, n+1, $$

we may thus rewrite the statistic more usefully as

$$ d=\frac{\sum_{i=l}^{T-n-1}{\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}{\xi}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^2}{\sum_{i\kern0.5em =\kern0.5em 1}^{T-n-1}{\xi}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^2}. $$
(A.49)

Considering now the bounds as given by Theorem A.2 let us define

$$ {d}_{\mathrm{L}}=\frac{\sum_{i=1}^{T-n-1}{\lambda}_{i\kern0.5em +\kern0.5em k}{\xi}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^2}{\sum_{i=1}^{T-n-1}{\xi}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^2}. $$
(A.50)
$$ {d}_{\mathrm{U}}=\frac{\sum_{i=1}^{T-n-1}{\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}{\xi}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^2}{\sum_{i=1}^{T-n-1}{\xi}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^2}, $$
(A.51)

and thus conclude

$$ {d}_{\mathrm{L}}\le d\le {d}_{\mathrm{U}}. $$
(A.52)

Remark A.9

An important byproduct of the derivations above is that the bounds d L and d U do not depend on the data matrix X. It is the tabulation of the significance points of d L and d U that one actually uses in carrying out tests for the presence of autocorrelation.

Remark A.10

Consider the special cases examined in Remark A.3. If X is a linear transformation of the n + 1 characteristic vectors corresponding to the n + 1 smallest characteristic roots of A, then

$$ {\lambda}_{i+n+1}={\theta}_{i+n+1} $$

and hence, in this case,

$$ d={d}_{\mathrm{U}}. $$

On the other hand, when the n + 1 characteristic vectors above correspond to the smallest (zero) and the n largest characteristic roots of A, then the condition in (A.42) holds. Hence, in this special case, given the bounds in (A.39) we have

$$ d={d}_{\mathrm{L}}. $$

In these two special cases the test for autocorrelation may be based on the exact distribution of the test (Durbin–Watson) statistic, since the relevant parts of the distribution of d L and d U have been tabulated.

Use of the Durbin–Watson Statistic

Let F L(⋅) , F(⋅), and F U(⋅) be (respectively) the distribution functions of d L , d, and d U and let r be a point in the range of these random variables. Then by definition

$$ \mathit{\Pr}\left\{{d}_{\mathrm{L}}\le r\right\}={F}_{\mathrm{L}}(r), $$
$$ \mathit{\Pr}\left\{d\le r\right\}=F(r), $$
(A.53)
$$ \mathit{\Pr}\left\{{d}_{\mathrm{U}}\le r\right\}={F}_{\mathrm{U}}(r). $$

Now, it is clear that

$$ \mathit{\Pr}\left\{{d}_{\mathrm{U}}\le r\right\}\le \mathit{\Pr}\left\{d\le r\right\} $$
(A.54)

since

$$ {d}_{\mathrm{U}}\le r\kern1em \mathrm{implies}\kern1em d\le r. $$

But the converse is not true. Similarly note that

$$ {d}_{\mathrm{L}}>r\kern1em \mathrm{implies}\kern1em d>r $$

but that the converse is not true.

This means that

$$ 1\kern0.5em -\mathit{\Pr}\left\{{d}_{\mathrm{L}}\le r\right\}=\mathit{\Pr}\left\{{d}_{\mathrm{L}}>r\right\}\le \mathit{\Pr}\left\{d>r\right\}=1-\mathit{\Pr}\left\{d\le r\right\}, $$

which in turn implies

$$ \mathit{\Pr}\left\{d\le r\right\}\le \mathit{\Pr}\left\{{d}_{\mathrm{L}}\le r\right\}. $$
(A.55)

Combining (A.53), (A.54), and (A.55) we have

$$ {F}_{\mathrm{U}}(r)\le F(r)\le {F}_{\mathrm{L}}(r). $$
(A.56)

But this immediately suggests a way for testing the autocorrelation hypothesis. Let r L be a number such that

$$ {F}_{\mathrm{L}}\left({r}_{\mathrm{L}}\right)=1-\alpha, $$

where α is the chosen level of significance, say

$$ \alpha =.10\kern1em \mathrm{or}\kern1em \alpha =.05\kern1em \mathrm{or}\kern1em \alpha =.025. $$

If F L(  ⋅  ) were the appropriate distribution function and d were the Durbin–Watson (hereafter abbreviated D.W.) statistic, in a given instance, the acceptance region would be

$$ d\le {r}_{\mathrm{L}} $$
(A.57a)

and the rejection region

$$ d>{r}_{\mathrm{L}}. $$
(A.57b)

The level of significance of the test would be α. What is the consequence, for the properties of the test, of the inequalities in (A.56)? Well, since

$$ F\left({r}_{\mathrm{L}}\right)\le {F}_{\mathrm{L}}\left({r}_{\mathrm{L}}\right), $$

it follows that the number r such that

$$ F\left({r}^{\ast}\right)=1-\alpha $$

obeys

$$ {r}^{\ast}\ge {r}_{\mathrm{L}}. $$

Thus, in using the acceptance region in (A.57a) we are being too conservative, in the sense that we could have

$$ d>{r}_{\mathrm{L}}\kern1.1em \mathrm{and}\kern0.5em \mathrm{at}\kern0.5em \mathrm{the}\kern0.5em \mathrm{same}\kern0.5em \mathrm{time}\kern1.1em d\le {r}^{\ast }. $$

Conversely, let r U be a number such that

$$ {F}_{\mathrm{U}}\left({r}_{\mathrm{U}}\right)=1-\alpha . $$

Arguing as before we establish

$$ {r}^{\ast}\le {r}_{\mathrm{U}}. $$

If we define the rejection region as

$$ d>{r}_{\mathrm{U}}, $$
(A.58)

we again see that we reject conservatively, in the sense that we could have

$$ d\le {r}_{\mathrm{U}}\kern1.1em \mathrm{but}\kern0.5em \mathrm{at}\kern0.5em \mathrm{the}\kern0.5em \mathrm{same}\kern0.5em \mathrm{time}\kern1.1em d>{r}^{\ast }. $$

The application of the D.W. test in practice makes use of both conditions (A.57a, A.57b and A.58), i.e., we accept

$$ {\mathrm{H}}_0:\kern0.5em \rho =0 $$

if, for a given statistic d, (A.57a) is satisfied, and we accept

$$ {\mathrm{H}}_1:\kern0.5em \rho <0 $$

only if (A.58) is satisfied. In so doing we are being very conservative, in the sense that if

$$ d\le {r}_{\mathrm{L}}\kern1.1em \mathrm{then}\kern0.5em \mathrm{surely}\kern1.1em d\le {r}^{\ast }, $$

and if

$$ d>{r}_{\mathrm{U}}\kern1.1em \mathrm{then}\kern0.5em \mathrm{surely}\kern1.1em d>{r}^{\ast }. $$

A consequence of this conservatism, however, is that we are left with a region of indeterminacy. Thus, if

$$ {r}_{\mathrm{L}}<d<{r}_{\mathrm{U}}, $$

then we have no rigorous basis of accepting either

$$ {\mathrm{H}}_0:\rho =0\kern1em \mathrm{or}\kern1em {\mathrm{H}}_1:\rho <0. $$

If the desired test is

$$ {\mathrm{H}}_0:\kern0.5em \rho =0, $$

as against

$$ {\mathrm{H}}_1:\kern0.5em \rho >0, $$

we proceed somewhat differently. Let α again be the level of significance and choose two numbers say r L and r U, such that

$$ {F}_{\mathrm{L}}\left({r}_{\mathrm{L}}\right)=\alpha \kern1em \mathrm{and}\kern1em {F}_{\mathrm{U}}\left({r}_{\mathrm{U}}\right)=\alpha . $$

In view of (A.56), the number r such that

$$ F\left({r}^{\ast}\right)=\alpha $$

obeys

$$ {r}_{\mathrm{L}}\le {r}^{\ast}\le {r}_{\mathrm{U}}. $$

Now the acceptance region is defined as

$$ d\ge {r}_{\mathrm{U}} $$
(A.59)

while the rejection region is defined as

$$ d\le {r}_{\mathrm{L}}. $$
(A.60)

Just as in the preceding case we are being conservative, and the consequence is that we have, again, a region of indeterminacy

$$ {r}_{\mathrm{L}}<d<{r}_{\mathrm{U}}. $$

Let us now recapitulate the procedure for carrying out a test of the hypothesis that the error terms in a GLM are a first order autoregressive process.

  1. (i)

    Obtain the residuals

$$ \tilde{u}=y-X\tilde{\beta},\kern1em \tilde{\beta}={\left({X}^{\prime }X\right)}^{-1}{X}^{\prime }y. $$
  1. (ii)

    Compute the D.W. statistic

$$ d=\frac{{\tilde{u}}^{\prime }A\tilde{u}}{{\tilde{u}}^{\prime}\tilde{u}}, $$
  • where A is as defined in Eq. (3.16) of the chapter.

  1. (iii)

    Choose the level of significance, say α.

    1. (a)

      If it is desired to test

$$ {\mathrm{H}}_0:\kern0.5em \rho =0, $$
  • as against

$$ {\mathrm{H}}_1:\kern0.5em \rho <0, $$
  • determine, from the tabulated distributions two numbers r L , r U such that F L(r L) = 1 − α ,  F U(r U) = 1 − α. If d ≤ r L, accept ρ = 0. If d ≥ r U, accept ρ < 0. If r L < a < r U, the result of the test is inconclusive and other means must be found for determining whether ρ = 0 or ρ < 0 is to be accepted as true.

    1. (b)

      If it is desired to test

$$ {\mathrm{H}}_0:\kern0.5em \rho =0, $$
  • as against

$$ {\mathrm{H}}_1:\kern0.5em \rho >0, $$
  • with level of significance α, determine from the tabulated distributions two numbers, say r L and r U, such that F L(r L) = α ,  F U(r U) = α. If d ≥ r U, accept the hypothesis ρ = 0. If d ≤ r L, accept the hypothesis ρ > 0. If r L < d < r U, the result of the test is indeterminate.

Remark A.11

Tabulations of F L(⋅) and F U(⋅) exist typically in the form of 5% significance points (i.e., values of r L and r U) for varying numbers of observations and explanatory variables (exclusive of the constant term). Such tabulations are constructed from the point of view of the test

$$ {\mathrm{H}}_0:\kern0.5em \rho =0, $$

as against

$$ {\mathrm{H}}_1:\kern0.5em \rho >0. $$

It is suggested that when we are interested in the hypothesis

$$ {\mathrm{H}}_0:\kern0.5em \rho =0, $$

as against

$$ {\mathrm{H}}_1:\kern0.5em \rho <0, $$

we use 4-d as the test statistic and the r L , r U significance points from the tabulated distributions.

Remark A.12

The existing tabulations assume that the data matrix X contains one column that is (a multiple of) a characteristic vector of A. As we have remarked earlier the vector

$$ e={\left(1,1,1,\dots, 1\right)}^{\prime } $$

is the characteristic vector corresponding to the smallest (zero) characteristic root of A. Consequently, the bounds for the roots of NAN for this case are

$$ {\lambda}_{i\kern0.5em +\kern0.5em 1}\le {\theta}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}\le {\lambda}_{i\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1},\kern1em i=1,2,\dots, T-n-1, $$

and the tabulations are based on

$$ {\displaystyle \begin{array}{l}{d}_{\mathrm{L}}=\frac{\sum_{s\kern0.5em =\kern0.5em 1}^{T-n-1}{\lambda}_{s\kern0.5em +\kern0.5em 1}{\xi}_{s\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^2}{\sum_{s\kern0.5em =\kern0.5em 1}^{T-n-1}{\xi}_{s\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^2},\\ {}{d}_{\mathrm{U}}=\frac{\sum_{s\kern0.5em =\kern0.5em 1}^{T-n-1}{\lambda}_{s\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}{\xi}_{s\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^2}{\sum_{s\kern0.5em =\kern0.5em 1}^{T-n-1}{\xi}_{s\kern0.5em +\kern0.5em n\kern0.5em +\kern0.5em 1}^2}.\end{array}} $$

The reader should note, therefore, that if the GLM under consideration does not contain a constant term, then the tabulated percentage points of the D.W. statistic are not applicable. This is so since in this case we are dealing with k = 0 and the lower bound should be defined as

$$ {d}_{\mathrm{L}}^{\prime }=\frac{\sum_{s=1}^{T-n-1}{\lambda}_s{\xi}_{s+n+1}^2}{\sum_{s=1}^{T-n-1}{\xi}_{s+n+1}^2}. $$

We observe that, since λ 1 = 0,

$$ {d}_{\mathrm{L}}-{d}_{\mathrm{L}}^{\prime }=\frac{\sum_{s=1}^{T-n-1}\left({\lambda}_{s+1}-{\lambda}_s\right){\xi}_{s+n+1}^2}{\sum_{s=1}^{T-n-1}{\xi}_{s+n+1}^2}\ge 0. $$

Thus, the tabulated distribution is inappropriate for the case of the excluded constant term. This can be remedied by running a regression with a constant term and carrying out the test in the usual way. At the cost of being redundant let us stress again that there is nothing peculiar with the D.W. statistic when the GLM does not contain a constant term. It is merely that the existing tabulations are inappropriate for this case in so far as the lower bound is concerned; the tabulations are quite appropriate, however, for the upper bound.

Remark A.13

At the end of this volume we present more recent tabulations of the bounds of the D.W. statistics giving 1%, 2.5%, and 5% significance points. As in the earlier tabulations it is assumed that the GLM model does contain a constant term; thus, these tabulations are inappropriate when there is no constant term.

Remark A.14

Two aspects of the use of D.W. tabulations deserve comment. First, it is conceivable that the test statistic will fall in the region of indeterminancy and hence that the test will be inconclusive. A number of suggestions have been made for this eventuality, the most useful of which is the use of the approximation

$$ d\approx a+{bd}_{\mathrm{U}}, $$

where a and b are fitted by the first two moments of d. The virtue of this is that the test is based on existing tabulations of d U. The reader interested in exploring the details of this approach is referred to Durbin and Watson [98].

Remark A.15

An alternative to the D.W. statistic for testing the hypothesis that the error terms of a GLM constitute a first-order autoregression may be based on the asymptotic distribution of the natural estimator of ρ obtained from the residuals. Thus, e.g., in the GLM

$$ y= X\beta +u $$

let \( {\tilde{u}}_t,\kern0.5em t=1,2,\dots \), T, be the OLS residuals. An obvious estimator of ρ is

$$ \tilde{\rho}=\frac{\sum_{t=2}^T{\tilde{u}}_t{\tilde{u}}_{t-1}}{\sum_{t=1}^{T-1}{\tilde{u}}_t^2}, $$

obtained by regressing \( {\tilde{u}}_t \) on \( {\tilde{u}}_{t-1} \) and suppressing the constant term. It may be shown that if the GLM does not contain lagged dependant variables then, asymptotically,

$$ \sqrt{T}\left(\tilde{\rho}-\rho \right)\sim N\left(0,\kern0.5em 1-{\rho}^2\right). $$

Consequently, if the sample is reasonably large a test for the presence of autoregression (of the first order) in the errors may be carried out on the basis of the statistic

$$ \sqrt{T}\tilde{\rho}, $$

which, under the null hypothesis (of no autoregression) will have the distribution

$$ \sqrt{T}\tilde{\rho}\sim N\left(0,\kern0.5em 1\right). $$

1.2 Gaps in Data

It frequently happens with time series data that observations are noncontiguous. A typical example is the exclusion of a certain period from the sample as representing nonstandard behavior. Thus, in time series studies of consumption behavior one usually excludes observations for the period 1941–1944 or 1945; this is justified by noting the shortages due to price controls during the war years. In such a case we are presented with the following problem: if we have a model with autoregressive errors, what is the appropriate form of the autoregressive transformation and of the D.W. statistic when there are gaps in the sample?

We shall examine the following problem. Suppose in the GLM

$$ y= X\beta +u $$

observations are available for

$$ t=1,\kern0.5em 2,\dots, \kern0.5em r,\kern1em r+k+1,\kern0.5em \dots, \kern0.5em T, $$

so that at “time” r there is a gap of k observations. How do we obtain efficient estimators

  1. (a)

    when the error process obeys

$$ {u}_t=\rho {u}_{t-1}+{\varepsilon}_t? $$
  1. (b)

    when the error process obeys

$$ {u}_t={\rho}_1{u}_{t-1}+{\rho}_2{u}_{t-2}+{\varepsilon}_t? $$

Moreover, how in the case (a) do we test the hypothesis ρ = 0? To provide an answer to these problems, we recall that in the standard first-order auto-regression the estimation proceeds, conceptually, by determining a matrix M such that Mu consists of uncorrelated—and in the case of normality, of independent—elements, it being understood that u is the vector of errors of the GLM.

What is the analog of M for the process

$$ {u}_t=\rho {u}_{t-1}+{\varepsilon}_t,\kern1em t=1,2,\dots, T, $$

when observations for t = r + 1 ,  r + 2 ,  …, r + k are missing? The usual transformation, through the matrix M referred to above, yields

$$ {\displaystyle \begin{array}{l}\sqrt{1-{\rho}^2}{u}_1,\\ {}{u}_2-\rho {u}_1,\\ {}\vdots \\ {}{u}_T-\rho {u}_{T-1}.\end{array}} $$

This is not feasible in the present case since the observations u t for r + 1 ≤ t ≤ r + k are missing; in particular, the observation following u r is u r + k + 1. Remembering that the goal is to replace u by a vector of uncorrelated elements, we note that for 1 ≤ j ≤ k + 1

$$ {\displaystyle \begin{array}{l}{u}_{r+j}=\sum \limits_{s=0}^{\infty }{\rho}^s{\varepsilon}_{r+j-s}\\ {}\kern3em =\sum \limits_{s=0}^{j-1}{\rho}^s{\varepsilon}_{r+j-s}+\sum \limits_{s=j}^{\infty }{\rho}^s{\varepsilon}_{r+j-s}\\ {}\kern3em ={\rho}^j{u}_r+\sum \limits_{s=0}^{j-1}{\rho}^s{\varepsilon}_{r+j-s}.\end{array}} $$
(A.61)

Thus

$$ {u}_{r+j}-{\rho}^j{u}_r=\sum \limits_{s=0}^{j-1}{\rho}^s{\varepsilon}_{r+j-s}. $$

We observe that

$$ \mathrm{Var}\left(\sum \limits_{s=0}^{j-1}{\rho}^s{\varepsilon}_{r+j-s}\right)={\sigma}^2\left(\frac{1-{\rho}^{2j}}{1-{\rho}^2}\right)={\sigma}^2{\phi}^2, $$

and that

$$ {u}_t-\rho {u}_{t-1} $$

has variance σ 2 for t ≤ r and t ≥ r + k + 2; thus, we conclude

$$ \frac{1}{s}\left({u}_{r+k+1}-{\rho}^k{u}_r\right) $$

also has variance σ 2 and is independent of the other terms. Hence the matrix

$$ {M}_1=\left[\begin{array}{cccccccc}\sqrt{1-{\rho}^2}& 0& & & \cdots & & & 0\\ {}-\rho & 1& & & & & & \\ {}0& \ddots & \ddots & & & & & \\ {}& & -\rho & 1& \ddots & & & \vdots \\ {}& & & -\left({\rho}^k/s\right)& \left(1/s\right)& & & \\ {}\vdots & & & \ddots & -\rho & 1& & \\ {}& & & & & \ddots & \ddots & 0\\ {}0& & & \cdots & & 0& -\rho \kern1.56em & 1\end{array}\right] $$

where 1/s = [(1 − ρ 2)/(1 − ρ 2k)]1/2, implies that M 1 u has covariance matrix σ 2 I and mean zero.

Hence the autoregressive transformation is of the form

$$ {\displaystyle \begin{array}{l}\sqrt{1-{\rho}^2}{u}_1,\\ {}{u}_2-\rho {u}_1,\\ {}{u}_3-\rho {u}_2,\\ {}\kern0.5em \vdots \\ {}{u}_r-\rho {u}_{r-1},\\ {}\frac{1}{s}\left({u}_{r+k+1}-{\rho}^{k+1}{u}_r\right),\\ {}{u}_{r+k+2}-\rho {u}_{r+k+1},\\ {}\vdots \\ {}{u}_T-\rho {u}_{T-1},\end{array}} $$

with the corresponding transformation on the dependent and explanatory variables. Estimation by the search method proceeds as before, i.e., for given ρ we compute the transformed variables and carry out the OLS procedure. We do this for a set of ρ-values that is sufficiently dense in the interval (−1,  1). The estimator corresponds to the coefficients obtained in the regression exhibiting the smallest sum of squared residuals. Aside from this more complicated transformation, the situation is entirely analogous to the standard case where no observations are missing.

In their original paper Cochrane and Orcutt (C.O.) did not provide a procedure for handling the missing observations case (see [65]). Carrying on in their framework, however, one might suggest the following. Regress the dependent on the explanatory variables, neglecting the fact that some observations are missing. Obtain the residuals

$$ {\tilde{u}}_t,\kern1em t=1,2,\dots, r,r+k+1,r+k+2,\dots, T. $$

Obtain

$$ \tilde{\rho}=\frac{\sum_{t=2}^r{\tilde{u}}_t{\tilde{u}}_{t-1}+{\sum}_{i=2}^{T-r-k}{\tilde{u}}_{r+k+i}{\tilde{u}}_{r+k+i-1}}{\sum_{t=1}^{\prime T-1}{\tilde{u}}_t^2} $$

where

$$ \underset{t=1}{\overset{T-1}{\sum \limits^{\prime }}}{\tilde{u}}_t^2 $$

indicates that the terms for t = r + 1 , r + 2 , … , r + k have been omitted. Given this \( \tilde{\rho} \) compute

$$ {\displaystyle \begin{array}{ll}{y}_t-\tilde{\rho}{y}_{t-1},& \kern1em t=1,2,\dots, r,\\ {}{y}_t-\tilde{\rho}{y}_{t-1},& \kern1em t=r+k+2,\dots, T,\end{array}} $$

and similarly for the explanatory variables. Carry out the regression with the transformed variables. Obtain the residuals and, thus, another estimate of ρ. Continue in this fashion until convergence is obtained.

If one wishes to test for the presence of first-order autoregression in the residuals then, following the initial regression above, compute the “adjusted” D.W. statistic

$$ {d}^{\ast }=\frac{\sum_{t\kern0.5em =\kern0.5em 2}^r{\left({\tilde{u}}_t-{\tilde{u}}_{t-1}\right)}^2+{\sum}_{t\kern0.5em =\kern0.5em r\kern0.5em +\kern0.5em k\kern0.5em +2}^r{\left({\tilde{u}}_t-{\tilde{u}}_{t-1}\right)}^2}{\sum_{t=1}^{\prime T-1}{\tilde{u}}_t^2}. $$
(A.62)

Clearly, the procedures outlined above apply for a data gap at any point r ≥ 1. Obviously if r = 1 or if r = T − k then the “gap” occasions no problems whatever since all sample observations are contiguous.

Remark A.16

It should be pointed out that the statistic in (A.62) cannot be used in conjunction with the usual D.W. tables. This is so since

$$ {d}^{\ast }=\frac{{\tilde{u}}^{\prime }{A}^{\ast}\tilde{u}}{{\tilde{u}}^{\prime}\tilde{u}}, $$

just as in the standard D.W. case. However, the matrix A is not of the same form as the matrix A in the usual case; in fact, if we put A r  , A T − k − r for two matrices of order r ,  T − k − r respectively and of the same form as in the standard case, then

$$ {A}^{\ast }=\operatorname{diag}\left({A}_r,\kern0.5em {A}_{T-k-r}\right). $$

This makes it clear, however, that A has two zero roots while the matrix A in the usual case has only one zero root. Thus, to test for first-order autoregression in the “gap” case it is better to rely on asymptotic theory if the sample is moderately large. The estimator \( \tilde{\rho} \), as given in the discussion of the C.O. procedure, may be easily obtained from the OLS residuals and, asymptotically, behaves as

$$ \sqrt{T}\left(\tilde{\rho}-\rho \right)\sim N\left(0,\kern0.5em 1-{\rho}^2\right). $$

To close this section we derive the appropriate autoregressive transformation when the error process is a second-order autoregression and there is a gap in the data. Such an error specification is often required for quarterly data.

As in the earlier discussion, what we wish is to represent the observation immediately following the gap in terms of the adjacent observations(s). To be concrete, suppose

$$ {u}_t={\rho}_1{u}_{t-1}+{\rho}_2{u}_{t-2}+{\varepsilon}_t, $$
(A.63)

where

$$ \left\{{\varepsilon}_t:\kern0.5em t=0,\kern0.5em \pm 1,\kern0.5em \pm 2,\dots \right\} $$

is a sequence of i.i.d. random variables with mean zero and variance σ 2. We require the process above to be stable, i.e., we require that for σ 2 < ∞ the variance of the u t ’s also be finite. Introduce the lag operator L such that, for any function x t ,

$$ {Lx}_t={x}_{t-1} $$

and, in general,

$$ {L}^s{x}_t={x}_{t-s},\kern1em s\ge 0. $$

For s = 0 we set L 0 = I, the identity operator. Polynomials in the lag operator L are isomorphic to polynomials in a real (or complex) indeterminate, i.e., ordinary polynomials like

$$ P(t)={a}_0+{a}_1t+{a}_2{t}^2+\cdots +{a}_n{t}^n, $$

where the a i  ,  i = 0 , 1 , …, n are real numbers and t is the real (or complex) indeterminate (i.e., the “unknown”). This isomorphism means that whatever operations may be performed on ordinary polynomials can also be performed in the same manner on polynomials in the lag operator L. The reader desiring greater detail is referred to Dhrymes [9, Chaps. 1 and 2]. Noting that

$$ {u}_{t-1}={Lu}_t,\kern1em {u}_{t-2}={L}^2{u}_t, $$

we can write

$$ \left(I\kern0.5em -{\rho}_1L-{\rho}_2{L}^2\right){u}_t={\varepsilon}_t. $$

Note also that

$$ \left(I\kern0.5em -{\rho}_1L-{\rho}_2{L}^2\right)=\left(I-{\lambda}_1L\right)\left(I-{\lambda}_2L\right) $$

for

$$ {\lambda}_1+{\lambda}_2={\rho}_1,\kern1em -{\lambda}_1{\lambda}_2={\rho}_2. $$
(A.64)

Thus, the process in (A.63) can also be represented as

$$ {u}_t=\frac{I}{\left(I-{\lambda}_1L\right)\left(I-{\lambda}_2L\right)}{\varepsilon}_t. $$

But I/(I − λ 1 L) behaves like

$$ \frac{1}{1-{\lambda}_1t}=\sum \limits_{i=0}^{\infty }{\lambda}_1^i{t}^i. $$

Hence

$$ {u}_t=\sum \limits_{j=0}^{\infty }{\lambda}_1^j\sum \limits_{i=0}^{\infty }{\lambda}_2^i{\varepsilon}_{t-i-j}. $$

Assuming that ∣λ 2∣  ≤  ∣λ 1∣  < 1, we can rewrite the double sum above as

$$ {u}_t=\sum \limits_{i=0}^{\infty }{c}_i{\varepsilon}_{t-i} $$
(A.65)

where

$$ {c}_i=\frac{\lambda_1^{i+1}-{\lambda}_2^{i+1}}{\lambda_1-{\lambda}_2}. $$

If in the sequence

$$ {u}_t,\kern1em t=1,2,\dots, T, $$

there is a gap of length k at t = r, it means that u r is followed by u r + k + 1 ,  u r + k + 2, and so on. For observations u t  ,  2 < t ≤ r, we know that the transformation

$$ {u}_t-{\rho}_1{u}_{t-1}-{\rho}_2{u}_{t-2} $$

yields i.i.d. random variables, viz., the ε’s. For t > r, however, we encounter a difficulty. The observation following u r is u r + k + 1. Thus, blindly applying the transformation yields

$$ {u}_{r+k+1}-{\rho}_1{u}_r-{\rho}_2{u}_{r-1}. $$

But the expression above is not ε r + k + 1. So the problem may be formulated as: what coefficients should we attach to u r and u r − 1 in order to render the difference a function only of {ε t  : t = r + 1,  r + 2, …, r + k + 1}?

It is for this purpose that the expression in (A.65) is required. We note

$$ {u}_{r\kern0.5em +\kern0.5em k\kern0.5em +\kern0.5em 1}=\sum \limits_{i=0}^{\infty }{c}_i{\varepsilon}_{r\kern0.5em +\kern0.5em k\kern0.5em +\kern0.5em 1\kern0.5em -\kern0.5em i}=\sum \limits_{i=0}^{k+1}{c}_i{\varepsilon}_{r\kern0.5em +\kern0.5em k\kern0.5em +\kern0.5em 1\kern0.5em -\kern0.5em i}+\sum \limits_{i\kern0.5em =\kern0.5em k\kern0.5em +\kern0.5em 2}^{\infty }{c}_i{\varepsilon}_{r\kern0.5em +\kern0.5em k\kern0.5em +\kern0.5em 1\kern0.5em -\kern0.5em i}. $$

But, putting j = i − (k + 2) yields

$$ \sum \limits_{i=k+2}^{\infty }{c}_i{\varepsilon}_{r+k+1-i}=\sum \limits_{j=0}^{\infty }{c}_{k+2+j}{\varepsilon}_{r-1-j} $$

Similarly,

$$ {u}_r={\varepsilon}_r+\sum \limits_{j=0}^{\infty }{c}_{j+1}{\varepsilon}_{r-1-j},\kern1em {u}_{r-1}=\sum \limits_{j=0}^{\infty }{c}_j{\varepsilon}_{r-1-j}. $$

Thus,

$$ {\displaystyle \begin{array}{l}{u}_{r\kern0.5em +\kern0.5em k\kern0.5em +\kern0.5em 1}-\alpha {u}_r-\beta {u}_{r-1}=\sum \limits_{i=0}^{k+1}{c}_i{\varepsilon}_{r\kern0.5em +\kern0.5em k\kern0.5em +\kern0.5em 1\kern0.5em -\kern0.5em i}-{\alpha \varepsilon}_r\kern1em \\ {}\kern5.639997em +\sum \limits_{j=0}^{\infty}\left({c}_{k\kern0.5em +\kern0.5em 2\kern0.5em +\kern0.5em j}-\alpha {c}_{j\kern0.5em +\kern0.5em 1}-\beta {c}_j\right){\varepsilon}_{r-1-j}\end{array}} $$

and we require

$$ {c}_{k\kern0.5em +\kern0.5em 2\kern0.5em +\kern0.5em j}-\alpha {c}_{j+1}-\beta {c}_j=0. $$

This is satisfied by the choice

$$ \alpha =\frac{\lambda_1^{k+2}-{\lambda}_2^{k+2}}{\lambda_1-{\lambda}_2},\kern1em \beta =-{\lambda}_1{\lambda}_2\frac{\lambda_1^{k+1}-{\lambda}_2^{k+1}}{\lambda_1-{\lambda}_2}. $$
(A.66)

It is, of course, apparent that

$$ \alpha ={c}_{k+1},\kern1em {\rho}_2=-{\lambda}_1{\lambda}_2, $$

and thus

$$ {u}_{r+k+1}-\alpha {u}_r-\beta {u}_{r-1}={u}_{r+k+1}-{c}_{k+1}{u}_r-{\rho}_2{c}_k{u}_{r-1}=\sum \limits_{i=0}^k{c}_i{\varepsilon}_{r+k+1-i}. $$

Similarly, if we wish to find quantities γ and δ such thatFootnote 11

$$ {u}_{r+k+2}-\gamma {u}_r-\delta {u}_{r-1} $$

is a function of at most only {ε t  : t = r + 1, r + 2, …, r + k + 2}, we conclude, following the same procedure as above, that

$$ \gamma ={c}_{k+2},\kern1em \delta ={\rho}_2{c}_{k+1}. $$
(A.67)

Thus,

$$ {u}_{r+k+2}-\gamma {u}_r-\delta {u}_{r-1}={u}_{r+k+2}-{c}_{k+2}{u}_r-{\rho}_2{c}_{k+1}{u}_{r-1}=\sum \limits_{i=0}^{k+1}{c}_i{\varepsilon}_{r+k+2-i}. $$

Put

$$ {\displaystyle \begin{array}{ll}{v}_t={u}_t,& t=1,2,\dots, r\\ {}\kern0.6em ={u}_t-{c}_{k+1}{u}_r-{\rho}_2{c}_k{u}_{r-1},& t=r+k+1\\ {}\kern0.6em ={u}_t-{c}_{k+2}{u}_r-{\rho}_2{c}_{k+1}{u}_{r-1},& t=r+k+2\\ {}\kern0.6em ={u}_t,& t=r+k+3,\kern0.5em r+k+4,\kern0.5em \dots, \kern0.5em T.\end{array}} $$

To produce the transformation desired we need only derive an expression for the variances and covariances of v 1 , v 2, those of v r + k + 1 , v r + k + 2, as well as an expression for the coefficients c i , appearing immediately below (A.65), involving only ρ 1 and ρ 2. Now,

$$ \mathrm{Var}\left({u}_t\right)={\sigma}_{00}={\rho}_1^2{\sigma}_{00}+{\rho}_2^2{\sigma}_{00}+2{\rho}_1{\rho}_2{\sigma}_{01}+{\sigma}^2, $$

where

$$ {\sigma}_{01}=\mathrm{Cov}\left({u}_t,\kern1em {u}_{t-1}\right). $$

Moreover,

$$ {\sigma}_{01}={\rho}_1{\sigma}_{00}+{\rho}_2{\sigma}_{01}, $$

which yields

$$ {\sigma}_{00}=\left\{\frac{1-{\rho}_2}{\left(1-{\rho}_2\right)\left(1-{\rho}_2^2\right)-{\rho}_1^2\left(1+{\rho}_2\right)}\right\}{\sigma}^2,\kern1em {\sigma}_{01}=\left[\frac{\rho_1}{1-{\rho}_2}\right]{\sigma}_{00}. $$

Thus

$$ {\displaystyle \begin{array}{l}\kern1.2em \mathrm{Var}\left({v}_t\right)={\sigma}_{00},\kern1em t=1,2,\\ {}\mathrm{Cov}\left({v}_t,\kern0.5em {v}_{t-1}\right)={\sigma}_{01},\kern1em t=2.\end{array}} $$

In addition,

$$ {\displaystyle \begin{array}{l}\mathrm{Var}\left({v}_{r+k+1}\right)={\sigma}^2\left[\sum \limits_{i=0}^k{c}_i^2\right],\kern1em \mathrm{Var}\left({v}_{r+k+2}\right)={\sigma}^2\left[\sum \limits_{i=0}^{k+1}{c}_i^2\right],\\ {}\kern1.8em \mathrm{Cov}\left({v}_{r+k+2},{v}_{r+k+1}\right)={\sigma}^2\left[\sum \limits_{i=0}^k{c}_i{c}_{i+1}\right].\end{array}} $$

Now, if x and y are two random variables, it is well known that

$$ -\left(\frac{\sigma_{xy}}{\sigma_{xx}}\right)x+y $$

is uncorrelated with x, where, obviously,

$$ {\sigma}_{xx}=\mathrm{Var}(x),\kern1em {\sigma}_{xy}=\mathrm{Cov}\left(x,\kern0.5em y\right). $$

Consequently,

$$ -\left(\frac{\rho_1}{1-{\rho}_2}\right){u}_1+{u}_2 $$

is uncorrelated with u 1. Similarly,

$$ -\left(\frac{\sum_{i=0}^k{c}_{i+1}{c}_i}{\sum_{i=0}^k{c}_i^2}\right){v}_{r+k+1}+{v}_{r+k+2} $$

is uncorrelated with v r + k + 1.

Define the lower triangular matrix

$$ S=\left[\begin{array}{cccccccc}{s}_{11}& 0& & & \cdots & & & 0\\ {}{s}_{21}& {s}_{22}& 0& & & & & \\ {}-{\rho}_2& -{\rho}_1& 1& 0& & & & \\ {}0& -{\rho}_2& -{\rho}_1& 1& & & & \\ {}\vdots & & & & & & & \vdots \\ {}0& \cdots & {s}_{r+1,r-1}& {s}_{r+1,r}& {s}_{r+1,r+1}& & & \\ {}0& \cdots & {s}_{r+2,r-1}& {s}_{r+2,r}& {s}_{r+2,r+1}& {s}_{r+2,r+2}& & \\ {}& & & & -{\rho}_2& -{\rho}_1& 1& 0\\ {}\vdots & & & & & & & 0\\ {}& & & & & & & 0\\ {}0& & & \cdots \kern0.48em & & -{\rho}_2& -{\rho}_1& 1\end{array}\right], $$
(A.68)

where

$$ {\displaystyle \begin{array}{c}{s}_{11}={\left\{\frac{1-{\rho}_2}{\left(1-{\rho}_2\right)\left(1-{\rho}_2^2\right)-\left(1+{\rho}_2\right){\rho}_1^2}\right\}}^{-1/2}\\ {}{s}_{22}=\left\{\frac{1-{\rho}_2}{{\left[{\left(1-{\rho}_2\right)}^2-{\rho}_1^2\right]}^{1/2}}\right\}{s}_{11}\\ {}{s}_{21}=-\left\{\frac{\rho_1}{{\left[{\left(1-{\rho}_2\right)}^2-{\rho}_1^2\right]}^{1/2}}\right\}{s}_{11}\\ {}{s}_{r+1,r-1}=-{\rho}_2{c}_k{s}_{r+1,r+1},\kern0.5em {s}_{r+1,r}=-{c}_{k+1}{s}_{r+1,r+1},\\ {}{s}_{r+1,r+1}={\left[\sum \limits_{i=0}^k{c}_i^2\right]}^{-1/2},\\ {}{s}_{r+2,r-1}=-{\rho}_2{c}_{k+1}{s}_{r+2,r+2}-{\rho}_2{c}_k{s}_{r+2,r+1},\\ {}{s}_{r+2,r}=-{c}_{k+2}{s}_{r+2,r+2}-{c}_{k+1}{s}_{r+2,r+1},\\ {}{s}_{r+2,r+1}=-\left(\frac{\sum_{i=0}^k{c}_{i+1}{c}_i}{\sum_{i=0}^k{c}_i^2}\right){s}_{r+2,r+2},\\ {}{s}_{r+2,r+2}={\left\{\frac{\left({\sum}_{i=0}^k{c}_i^2\right)\left({\sum}_{i=0}^{k+1}{c}_i^2\right)-{\left({\sum}_{i=0}^k{c}_{i+1}{c}_i\right)}^2}{\sum_{i=0}^k{c}_i^2}\right\}}^{-1/2}.\end{array}} $$

To express the coefficients c i in terms of ρ 1 , ρ 2 (instead of λ 1 , λ 2, as we did earlier) we proceed as follows. Consider the recursive relation (neglecting the ε’s)

$$ {\displaystyle \begin{array}{l}{u}_{r+1}={\rho}_1{u}_r+{\rho}_2{u}_{r-1},\\ {}{u}_{r+2}={\rho}_1{u}_{r+1}+{\rho}_2{u}_r=\left({\rho}_1^2+{\rho}_2\right){u}_r+{\rho}_1{\rho}_2{u}_{r-1}.\end{array}} $$

If we put, in general,

$$ {u}_{r+s}={c}_s^{\ast }{u}_r+{d}_s{u}_{r-1} $$

we obtain the recursions

$$ {c}_s^{\ast }={\rho}_1{c}_{s-1}^{\ast }+{\rho}_2{c}_{s-2}^{\ast },\kern1em {d}_s={\rho}_1{d}_{s-1}+{\rho}_2{d}_{s-2} $$

with the “initial conditions”

$$ {c}_0^{\ast }=1,\kern1em {c}_{-1}^{\ast }=0,\kern1em {d}_0=0,\kern1em {d}_{-1}=1. $$
(A.69)

But from (A.66) and (A.67) we easily see that

$$ {c}_s^{\ast }={c}_s,\kern1em {d}_s={\rho}_2{c}_{s-1}, $$
(A.70)

where the c s ’s are exactly the quantities defined just below in, (A.65). Computing, recursively, a few of the coefficients c s and taking the initial conditions (A.69) into account, we find

$$ {\displaystyle \begin{array}{ll}{c}_1& ={\rho}_1,\\ {}{c}_2& ={\rho}_1^2+{\rho}_2,\\ {}{c}_3& ={\rho}_1^3+2{\rho}_1{\rho}_2,\\ {}{c}_4& ={\rho}_1^4+3{\rho}_1^2{\rho}_2+{\rho}_2^2,\end{array}}\kern1em {\displaystyle \begin{array}{ll}{c}_5& ={\rho}_1^5+4{\rho}_1^3{\rho}_2+3{\rho}_1{\rho}_2^2,\\ {}{c}_6& ={\rho}_1^6+5{\rho}_1^4{\rho}_2+6{\rho}_1^2{\rho}_2^2+{\rho}_2^3,\\ {}{c}_7& ={\rho}_1^7+6{\rho}_1^5{\rho}_2+10{\rho}_1^3{\rho}_2^2+4{\rho}_1{\rho}_2^3,\\ {}{c}_8& ={\rho}_1^8+7{\rho}_1^6{\rho}_2+15{\rho}_1^4{\rho}_2^2+10{\rho}_1^2{\rho}_2^3+{\rho}_2^4,\end{array}} $$

or, in general,

$$ {c}_s=\sum \limits_{i=0}^{\left[s/2\right]}{a}_{si}{\rho}_1^{s-2i}{\rho}_2^i,\kern1em s\ge 1, $$
(A.71)

where [s/2] is the integral part of s/2;

$$ {a}_{si}={a}_{s-1,i}+{a}_{s-2,\kern0.5em i-1},\kern1em i\ge 1,s\ge 2, $$
(A.72)

and for all s

$$ {a}_{s0}=1,\kern1em {a}_{sj}=0,\kern1em j>\left[\frac{s}{2}\right],s\ge 1, $$

while for even s

$$ {a}_{s,\left[s/2\right]}=1; $$

and the “initial” conditions are

$$ {a}_{00}=0. $$

The recursion in (A.51) together with the conditions just enumerated completely describes the coefficients

$$ \left\{{c}_i:\kern0.62em i=0,1,2,\dots \right\} $$

and thus completely determines the elements of the matrix S in terms of the parameters ρ 1 ,  ρ 2. What this accomplishes is the following. Suppose that

$$ {u}_t={\rho}_1{u}_{t-1}+{\rho}_2{u}_{t-2}+{\varepsilon}_t $$

and that the sequence

$$ \left\{{\varepsilon}_t:\kern0.62em t=0,\pm 1,\pm 2,\dots \right\} $$

is one of i.i.d. random variables with mean zero and variance σ 2.

Let

$$ u={\left({u}_1,\kern0.5em {u}_2,\dots, {u}_r,\kern0.5em {u}_{r+k+1},{u}_{r+k+2},\dots, {u}_T\right)}^{\prime }. $$

Then S u is a vector of uncorrelated random elements whose mean is zero and whose (common) variance is σ 2 If we assert that the elements of the ε-sequence obey, in addition,

$$ {\varepsilon}_t\sim N\left(0,\kern0.5em {\sigma}^2\right)\kern1em \mathrm{for}\kern0.5em \mathrm{all}\kern0.5em t $$

then

$$ Su\sim N\left(0,\kern0.5em {\sigma}^2I\right), $$

where I is the identity matrix of order T − k.

One could, clearly, estimate the parameters of the GLM, exhibiting a gap of length k, and whose errors are a second-order autoregression, by a search procedure applied to the transformed data Sy , SX, where S is as defined in (A.68). We may summarize the discussion of this section in

Theorem A.3

Consider the GLM

$$ y= X\beta +u, $$

where y is (T − k) × 1 ,    X is (T − k) × (n + 1) ,  u is (T − k) × 1, and there is a gap of length k in the observations as follows:

$$ \left\{{y}_t,\kern0.5em {x}_{ti}:\kern0.62em t=1,2,\dots, r,\kern0.5em r+k+1,\kern0.5em r+k+2,\dots, T,i=0,1,2,\dots, n\right\}. $$

Provided

  1. (a)

    rank(X) = n + 1,

  2. (b)

    (p)lim T→∞(X X/T) is positive definite,

  3. (c)

    E(u| X) = 0,

  • the following is true.

  1. (i)

    If, in addition, Cov(u| X) = σ 2 I, then the OLS estimator of β is consistent, unbiased, and efficient.

  2. (ii)

    If u t  = ρu t − 1 + ε t and {ε t  :  t = 0, ±1, ±2, …} is a sequence of i.i.d. random variables with mean zero and variance σ 2, the OLS estimator \( \tilde{\beta}={\left({X}^{\prime }X\right)}^{-1}{X}^{\prime }y \) is unbiased, consistent, but inefficient. The (feasible) efficient estimator is obtained as

$$ \widehat{\beta}={\left({X^{\prime }{\tilde{M}}^{\prime}}_1{\tilde{M}}_1X\right)}^{-1}{X^{\prime }{\tilde{M}}^{\prime}}_1{\tilde{M}}_1y, $$
  • where M 1 is a (T − k) × (T − k) matrix with elements that are all zero except:

$$ {m}_{jj}\left\{\begin{array}{ll}=\sqrt{1-{\rho}^2}& for\kern0.5em j=1\\ {}={\left[\frac{1-{\rho}^2}{1-{\rho}^{2k}}\right]}^{1/2}& for\kern0.5em j=r+1\\ {}=1& otherwise;\end{array}\right. $$
$$ {m}_{j+1,j}\left\{\begin{array}{ll}=-{\rho}^k{\left[\frac{1-{\rho}^2}{1-{\rho}^{2k}}\right]}^{1/2}& for\kern0.5em j=r\\ {}=-\rho & otherwise.\end{array}\right. $$
  • The matrix \( {\tilde{M}}_1 \) is obtained by substituting for ρ its estimator \( \tilde{\rho} \) . The latter may be obtained by the usual search method applied to the transformed model

$$ {M}_1y={M}_1 X\beta +{M}_1u $$
  • or by a suitable extension of the C.O. method. In the observation gap model above the OLS residuals, say \( {\tilde{u}}_t \) , may be used to compute a D.W.-like statistic

$$ \frac{\sum_{t=2}^r{\left({\tilde{u}}_t-{\tilde{u}}_{t-1}\right)}^2+{\sum}_{t=r+k+2}^T{\left({\tilde{u}}_t-{\tilde{u}}_{t-1}\right)}^2}{\sum_{t=1}^r{\tilde{u}}_t^2+{\sum}_{t=r+k+1}^T{\tilde{u}}_t^2}. $$
  • The usual tabulations, however, for the D.W. statistic are not appropriate in this instance and one should, if the sample is moderately large, apply asymptotic theory.

  1. (iii)

    If u t  = ρ 1 u t − 1 + ρ 2 u t − 2 + ε t , the sequence {ε t  : t = 0, ±1, ±2, …}, being one of i.i.d. random variables with mean zero and variance σ 2, and if the process is stable, i.e., the roots of z 2 − ρ 1 z − ρ 2 = 0 are less than unity in absolute value, then the OLS estimator is unbiased and consistent but is is inefficient. The (feasible) efficient estimator is obtained by the search method, which minimizes

$$ {\left( Sy- SX\beta \right)}^{\prime}\left( Sy- SX\beta \right) $$
  • over the range ρ 1 ∈ (−2,  2) , ρ 2 ∈ (−1, 1). The estimator is

$$ \widehat{\beta}={\left({X}^{\prime }{\tilde{S}}^{\prime}\tilde{S}X\right)}^{-1}{X}^{\prime }{\tilde{S}}^{\prime}\tilde{S}y, $$
  • where S is a (T − k) × (T − k) matrix all of whose elements are zero except:

$$ {s}_{jj}\left\{\begin{array}{ll}={\left\{\frac{1-{\rho}_2}{\left(1-{\rho}_2\right)\left(1-{\rho}_2^2\right)-\left(1+{\rho}_2\right){\rho}_1^2}\right\}}^{-1/2},& j=1\\ {}=\left\{\frac{1-{\rho}_2}{{\left[{\left(1-{\rho}_2\right)}^2-{\rho}_1^2\right]}^{1/2}}\right\}{s}_{11},& j=2\\ {}=1,& j=3,\dots, r\\ {}{\left[\sum \limits_{i=0}^k{c}_i^2\right]}^{-1/2},& j=r+1\\ {}={\left[\frac{\left({\sum}_{i=0}^k{c}_i^2\right)\left({\sum}_{i=0}^{k+1}{c}_i^2\right)-\left({\sum}_{i=0}^k{c}_i{c}_{i+1}\right)}{\sum_{i=0}^k{c}_i^2}\right]}^{-1/2},& j=r+2\\ {}=1,& j=r+3,\dots, T-k;\end{array}\right. $$
$$ {s}_{j+1,j}\left\{\begin{array}{ll}=\left\{-\frac{\rho_1}{{\left[{\left(1-{\rho}_2\right)}^2-{\rho}_1^2\right]}^{1/2}}\right\}{s}_{11},& j=1\\ {}=-{\rho}_1,& j=2,3,\dots, r-1\\ {}=-{c}_{k+1}{s}_{r+1,r+1},& j=r\\ {}=-\left(\frac{\sum_{i=0}^k{c}_i{c}_{i+1}}{\sum_{i=0}^k{c}_i^2}\right){s}_{r+2,r+2},& j=r+1\\ {}=-{\rho}_1,& j=r+2,r+3,\dots, T-k;\end{array}\right. $$
$$ {s}_{j+2,j}\left\{\begin{array}{ll}=-{\rho}_2,& j=1,2,\dots, r-2\\ {}=-{\rho}_2{c}_k{s}_{r+1,r+1},& j=r-1\\ {}=-{c}_{k+2}{s}_{r+2,r+2}-{c}_{k+1}{s}_{r+2,r+1},& j=r\\ {}=-{\rho}_2,& j=r+1,\kern0.5em r+2,\dots, T-k;\end{array}\right. $$
$$ {s}_{j+3,j}\left\{\begin{array}{ll}=0,& j=1,2,\dots, r-2\\ {}=-{\rho}_2\left({c}_{k+1}{s}_{r+2,r+2}+{c}_k{s}_{r+2,r+1}\right),& j=r-1\\ {}=0,& j=r,\kern0.5em r+1,\dots, \kern0.5em T-k.\end{array}\right. $$
  • The coefficients c s above are given by

$$ {c}_s=\sum \limits_{i=0}^{\left[s/2\right]}{a}_{si}{\rho}_1^{s-2i}{\rho}_2^i, $$
  • where [s/2] denotes the integral part of s/2, and

$$ {a}_{si}={a}_{s-1,\kern0.5em i}+{a}_{s-2,\kern0.5em i-1},\kern1em i\ge 1,\kern0.5em s\ge 2, $$
  • where

$$ {a}_{00}=0 $$
$$ {a}_{s0}=1,\kern1em {a}_{sj}=0,\kern1em j>\left[\frac{s}{2}\right],\kern1em s\ge 1, $$
  • and for even s

$$ {a}_{s,\left[s/2\right]}=1. $$

Tables for Testing Hypotheses on the Autoregressive Structure of the Errors in a GLM

These tables are reproduced with the kind permission of the publisher Marcel Dekker Inc., and the author H. D. Vinod. Tables 3.4, 3.5, and 3.6 first appeared in H. D. Vinod, “Generalization of the Durbin–Watson Statistic for Higher Order Autoregressive Processes,” Communications in Statistics, vol. 2, 1973, pp. 115–144.

Table 3.4 Second-order autoregression: level of significance 5%
Table 3.5 Third-order autoregression: level of significance 5%
Table 3.6 Fourth-order autoregression: level of significance 5%

These tables are meant to facilitate the test of hypotheses regarding the autoregressive properties of the error term in a general linear model containing a constant term but no lagged dependent variables. The order of the autoregression can be at most four.

Tables 3.1, 3.2, and 3.3 contain upper and lower significance points at the 1%, 2.5%, and 5% level respectively, for testing that the first-order auto-correlation is zero. Table 3.4 contains upper and lower significance points at the 5% level for testing that the second-order autocorrelation is zero.

Similarly, Tables 3.5 and 3.6 contain upper and lower significance points at the 5% level for tests on the third- and fourth-order autocorrelation.

Perhaps a word of explanation is in order regarding their use. The tables have been constructed on the basis of the properties of the statistics

$$ {d}_j=\sum \limits_{t=j+1}^T\left({\widehat{u}}_t-{\widehat{u}}_{t-j}\right)/\sum \limits_{t=1}^T{\widehat{u}}_t^2,\kern2em j=1,2,3,4, $$

where the \( {\widehat{u}}_t \) are the residuals of a GLM containing a constant term but not containing lagged dependent variables. If X is the data matrix of the GLM. then

$$ \widehat{u}=\left[I-X{\left({X}^{\prime }X\right)}^{-1}{X}^{\prime}\right]u, $$

where

$$ \widehat{u}={\left({\widehat{u}}_1,{\widehat{u}}_2,{\widehat{u}}_3,\dots, {\widehat{u}}_T\right)}^{\prime } $$

and it is assumed that

$$ u\sim N\left(0,{\sigma}^2I\right). $$

Hence, if we wish to test a first-order hypothesis i.e., that in

$$ {u}_t={\rho}_1{u}_{t-1}+{\varepsilon}_t $$

we have

$$ {\mathrm{H}}_0:\kern0.5em {\rho}_1=0, $$

as against

$$ {\mathrm{H}}_1:\kern0.5em {\rho}_1>0, $$

we can use Tables 3.1, 3.2, or 3.3 exactly as we use the standard Durbin–Watson tables—indeed, they are the same.

If we wish to test for a second-order autoregression of the special form

$$ {u}_t={\rho}_2{u}_{t-2}+{\varepsilon}_t $$

we can do so using the statistic d 2 and Table 3.4 in exactly the same fashion as one uses the standard Durbin–Watson tables.

Similarly, if we wish to test for a third-order autoregression of the special type

$$ {u}_t={\rho}_3{u}_{t-3}+{\varepsilon}_t $$

or for a fourth-order autoregression of the special type

$$ {u}_t={\rho}_4{u}_{t-4}+{\varepsilon}_t $$

we may do so using the statistics d 3 and d 4 and Tables 3.5 and 3.6 respectively.

Again, the tables are used in the same fashion as the standard Durbin–Watson tables, i.e., we accept the hypothesis that the (relevant) autocorrelation coefficient is zero if the statistic d 3 or d 4 exceeds the appropriate upper significance point, and we accept the hypothesis that the (relevant) autocorrelation coefficient is positive if the statistic d 3 or d 4 is less than the lower significance point.

Now, it may be shown that if we have two autoregressions of order m and m + 1 respectively and if it is known that these two autoregressions have the same autocorrelations of order 1 , 2 , … , m, then a certain relationship must exist between the coefficients describing these autoregressions. In particular, it may be shown that if for a fourth-order autoregression, say

$$ {u}_t={a}_{41}{u}_{t-1}+{a}_{42}{u}_{t-2}+{a}_{43}{u}_{t-3}+{a}_{44}{u}_{t-4}+{\varepsilon}_t, $$

the autocorrelations of order 1, 2, 3 are zero, then

$$ {a}_{41}={a}_{42}={a}_{43}=0 $$

and thus the process is of the special form

$$ {u}_t={a}_{44}{u}_{t-4}+{\varepsilon}_t\kern0.5em \mathrm{and}{a}_{44}={\rho}_4, $$

i.e., a 44 is the autocorrelation of order 4. Similarly, if for the third-order autoregression

$$ {u}_t={a}_{31}{u}_{t-1}+{a}_{32}{u}_{t-2}+{a}_{33}{u}_{t-3}+{\varepsilon}_t $$

it is known that the first two autocorrelations are zero, then

$$ {a}_{31}={a}_{32}=0 $$

so that the process is of the special form

$$ {u}_t={a}_{33}{u}_{t-3}+{\varepsilon}_t\kern1em \mathrm{and}\kern1em {a}_{33}={\rho}_3, $$

i.e., a 33 is the autocorrelation of order 3. Finally, if for the second-order autoregression

$$ {u}_t={a}_{21}{u}_{t-1}+{a}_{22}{u}_{t-2}+{\varepsilon}_t $$

it is known that the first-order autocorrelation is zero then

$$ {a}_{21}=0 $$

so that the process is of the special form

$$ {u}_t={a}_{22}{u}_{t-2}+{\varepsilon}_t\kern1em \mathrm{and}\kern1em {a}_{22}={\rho}_2, $$

i.e., a 22 is the autocorrelation of order 2.

Vinod [?] uses these relations to suggest a somewhat controversial test for the case where we wish to test for autoregression in the error term of the GLM and are willing to limit the alternatives to, at most, the fourth-order autoregression

$$ {u}_t=\sum \limits_{i=1}^4{a}_{4i}{u}_{t-1}+{\varepsilon}_t. $$

The proposed test is as follows. First test that the first-order autocorrelation is zero, i.e.,

$$ {\mathrm{H}}_{01}:\kern1em {\rho}_1=0, $$

as against

$$ {\mathrm{H}}_{11}:\kern1em {\rho}_1>0, $$

using Tables 3.1, 3.2, or 3.3. If H01 is accepted then test

$$ {\mathrm{H}}_{02}:\kern1em {\rho}_2=0, $$

as against

$$ {\mathrm{H}}_{12}:\kern1em {\rho}_2>0. $$

If H02 is also accepted then test

$$ {\mathrm{H}}_{03}:\kern1em {\rho}_3=0 $$

as against

$$ {\mathrm{H}}_{13}:\kern1em {\rho}_3>0. $$

If H03 is accepted then test

$$ {\mathrm{H}}_{04}:\kern1em {\rho}_4=0 $$

as against

$$ {\mathrm{H}}_{14}:\kern1em {\rho}_4>0. $$

There are a number of problems with this: first, the level of significance of the second, third, and fourth tests cannot be the stated ones, since we proceed to the ith test only conditionally upon accepting the null hypothesis in the (i − 1)th test; second, if at any point we accept the alterative, it is not clear what we should conclude.

Presumably, if we accept H12 (at the second test) we should conclude that the process is at least second order, make allowance for this, in terms of search or Cochrane-Orcutt procedures, and then proceed to test using the residuals of the transformed equation.

An alternative to the tests suggested by Vinod [?] would be simply to regress the residuals \( {\widehat{u}}_t \) on \( {\widehat{u}}_{t-1},{\widehat{u}}_{t-2},{\widehat{u}}_{t-3},{\widehat{u}}_{t-4} \), thus obtaining the estimates

$$ {\widehat{a}}_{4i},\kern1em i=1,2,\dots, 4. $$

Since we desire to test

$$ {\mathrm{H}}_0:\kern1em a=0, $$

as against

$$ {\mathrm{H}}_1:\kern1em a\ne 0, $$

where a = (a 41, a 42, a 43, a 44), we may use the (asymptotic) distribution of \( \widehat{a} \) under the null hypothesis as well as the multiple comparison test, as given in the appendix to Chapter??. Thus, testing the null hypothesis of no auto-correlation in the errors, i.e.,

$$ {\mathrm{H}}_0:\kern1em a=0 $$

as against

$$ {\mathrm{H}}_1:\kern1em a\ne 0, $$

is best approached through the asymptotic distribution, given by

$$ \sqrt{T}\widehat{a}\sim N\left(0,\kern0.5em \mathrm{I}\right). $$

This implies the chi-square and associated multiple comparison tests: accept H0 if \( T{\widehat{a}}^{\hbox{'}}\widehat{a}\le {\chi}_{\alpha; 4}^2 \), where \( {\chi}_{\alpha; 4}^2 \) is the α significance point of a chi-square variable with four degrees of freedom; otherwise reject H0 and accept any of the hypotheses whose acceptance is implied by the multiple compassion intervals \( -{\left({\chi}_{\alpha; 4}^2{h}^{\prime }h\right)}^{1/2}\le \sqrt{T}{h}^{\prime}\widehat{a}\le {\left({\chi}_{\alpha; 4}^2{h}^{\prime }h\right)}^{1/2} \).

Finally, we illustrate the use of these tables by an example. Suppose in a GLM with five bona fide explanatory variables and thirty observations we have the Durbin-Watson statistic

$$ d=1.610. $$

From Table 3.1 we see that the upper significance point for the 1% level is 1.606. Hence the hypothesis of no autocorrelation will be accepted. For the 2.5% level the upper significant point is 1.727; hence we will not accept it at this level. On the other hand the lower significance point is 0.999 so that the test is indeterminate. For the 5% level the upper significance point is 1.833 while the lower is 1.070; hence at the 5% level the test is indeterminate as well.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dhrymes, P. (2017). The General Linear Model III. In: Introductory Econometrics. Springer, Cham. https://doi.org/10.1007/978-3-319-65916-9_3

Download citation

Publish with us

Policies and ethics