# Locally stationary spatio-temporal processes

- 35 Downloads

## Abstract

This paper proposes a locally stationary spatio-temporal process to analyze the motivating example of US precipitation data, which is a huge data set composed of monthly observations of precipitation on thousands of monitoring points scattered irregularly all over US continent. Allowing the parameters of continuous autoregressive and moving average (CARMA) random fields by Brockwell and Matsuda (J R Stat Soc Ser B 79(3):833–857, 2017) to be dependent spatially, we generalize locally stationary time series by Dahlhaus (Ann Stat 25:1–37, 1997) to spatio-temporal processes that are locally stationary in space. We develop Whittle likelihood estimation for the spatially dependent parameters and derive the asymptotic properties rigorously. We demonstrate that the spatio-temporal models actually work to account for nonstationary spatial covariance structures in US precipitation data.

## Keywords

CARMA kernel Compound Poisson Locally stationary process Seasonal AR model Spatially dependent spectral density function Spatial nonstationarity Whittle likelihood estimation## Introduction

Continuous autoregressive and moving average (CARMA) random fields, which were proposed by Brockwell and Matsuda (2017) as stationary spatial model defined on \({\mathbf {R}}^d,d\ge 2\), shall be extended for the motivating example. Extensions to spatio-temporal random fields with stationary temporal and nonstationary spatial covariances are to be tried to describe spatially dependent behaviors in US precipitation data. Stationary temporal extension can be done easily by discrete ARMA time series models, while nonstationary spatial extension requires some careful considerations.

Nonstationary spatial models have been attracting great interests in spatial statistics areas, since it is usual to find nonstationary features in environmental data whose covariances depend not only on lags but also on locations (Sampson 2010). Kernel-based methods by Fuentes (2001), basis function approach by Nychka et al. (2002), convolution models by Higdon Higdon (1998), and spatial deformation methods by Guttorp and Sampson (1994) are the typical studies proposing nonstationary spatial models. Although all the approaches work well to express nonstationary spatial covariances in theoretically sophisticated ways, they have often difficulties in conducting estimation and kriging for huge spatial data sets, which are often the case recently because of rapid progress of data collecting technology such as remote sensing data by satellites. US precipitation is a typical case of huge spatio-temporal data set that requires nonstationary spatial covariance models.

Locally stationary processes, proposed by Dahlhaus (1997), are nonstationary time series by allowing parameters to be dependent on time. Dahlhaus (1997) succeeded in estimating the time dependence of parameters by a frequency domain-based method and derived the asymptotic properties rigorously. His essential idea that makes it possible to establish the asymptotic theories is in the expression of the time dependence of parameters \(\theta \), which is denoted as \(\theta (t/T)\) for sample size *T*. Similar studies in prior to his paper expressed time-dependent parameters as \(\theta (t)\), for which asymptotic arguments were difficult to formulate (Priestley 1971).

Extending locally stationary time series by Dahlhaus (1997) to random fields, we propose locally stationary spatio-temporal processes. CARMA random fields with spatially dependent parameters are special cases of locally stationary spatio-temporal processes with separable covariances given by the product of stationary temporal and locally stationary spatial covariances. Following Dahlhaus (1997) in estimation, we develop Whittle likelihood estimation for spatially dependent parameters in locally stationary spatio-temporal processes. To establish asymptotic theories for the estimation, we need to generalize a asymptotic scheme for time series to that for spatio-temporal data. Extending the so-called mixed asymptotics in spatial data (Stein 1999) to that for spatio-temporal data, in which sample size and sampling region jointly diverge, we derive the asymptotic properties rigorously.

The striking features of locally stationary spatio-temporal CARMA random fields are as follows: First, the parameters are efficiently estimated by minimizing Whittle likelihood which requires no matrix operations. Second, asymptotic theories for Whittle estimations are established under the asymptotic scheme regarded as an extension of mixed asymptotics in spatial statistics literature. Third, kriging and forecasting, which usually require huge matrix inversions for large spatial data set, are conducted with light computational burdens. Applying an approximation to the kriging procedure in Brockwell and Matsuda (2017), we conduct efficient kriging that does not require matrix inversions. Finally, locally stationary CARMA models provide an easy way of simulating spatio-temporal data with spatially nonstationary and temporally stationary covariances. Simulating spatial data with nonstationary covariances are also possible as a part of simulating spatio-temporal data.

We use the following notation. For \(A=(A_1,A_2), s=(s_1,s_2)\), \([0,A]=[0,A_1]\times [0,A_2]\), \(|A|=A_1\times A_2\), \(s/A=\left( s_1/A_1,s_2/A_2\right) \).

## Locally stationary random fields

### Extension of stationary CARMA random fields

*g*(

*s*) to satisfy \(g(0)=1\). Let

*n*(

*dx*) be the number of knot points contained in the region \(dx\in {\mathbf {R}}^2\). Then, we normalize them to satisfy \(E(n(dx))=\mathrm{var}(n(dx)=dx\). The two normalizations are necessary to guarantee the identifiability for \(\tau ^2\).

*B*is the backward shift operator defined by \(BZ_{jt}=Z_{j,t-1}\), we have temporally extended CARMA random fields expressed by the following:

*s*, we have the spatially nonstationary model denoted as follows:

*A*] for

*s*in

*X*(

*s*,

*t*) tend to be large. Consistent estimation for the spatially dependent parameters requires finer samples over the domain as the sample size tends to be large. Following Dahlhaus (1997), we replace the spatial dependencies for the parameters with the local dependencies defined by \(\theta (s/A),\phi (s/A),\psi (s/A),\sigma (s/A)\), which leads to the expression:

### Locally stationary spatio-temporal processes

Here, we generalize the locally stationary spatio-temporal CARMA processes in (4) to locally stationary spatio-temporal processes. Dahlhaus (1997) proposed locally stationary processes to express nonstationarity with valid asymptotic theories. Here, we extend the one for nonstationary time series to that for spatio-temporal data. We consider the cases when locally stationary in space but stationary in time that include (4) as a special case.

**Definition 1**

*K*, if there exists a representation:

## Example 1

## Estimation of parameters

### Whittle likelihood

Suppose that we have observed spatio-temporal data \(X_A(s_p,t), p=1,\ldots ,N,t=1,\ldots ,T\) that follow locally stationary models in (5) with the spatially dependent spectral density function \(f(u,\omega ,\lambda )=|K(u,\omega ,\lambda )|^2\), which is expressed as \(f(\theta _u,\omega ,\lambda )\) with spatially dependent parameters \(\theta _u\). Our aim is to estimate \(\theta _u\) for a fixed \(u\in [0,1]^2\) in a nonparametric way that would not specify any parametric form for the dependence of \(\theta \) on *u*. In other words, we assume parametric function for the spectral density with parameter \(\theta \) that may depend *u*, but do not give any parametric form for the functional form of \(\theta (u)\). We assume that all the observation points \(\{s_p\}\subset [0,A]\).

*h*(

*x*) be a probability density function over \([0,1]^2\). We assume that \(s_p\)s are independently and identically distributed over [0,

*A*] with the density \(|A|^{-1}h(s/A)\). Under conditions that will be clarified later, we find that \(I_B(u,\omega ,\lambda )\) is biased unlike discrete time series case, and that

*D*is a compact and symmetric region on \({\mathbf {R}}^2\), such that \(-\omega \in D\) whenever \(\omega \in D\). Regarding \(C_u,\tilde{C}_u\) as nuisance parameters and concentrating out \(C_u\) from the function, we have the concentrated likelihood

*c*for a fixed

*u*, we estimate \(\theta _u\) by \(\hat{\theta }\), which means that the dependencies of \(\theta _u\) on

*u*are estimated in the nonparametric way.

Notice that \(l_c(\theta )\) cannot identify the scale parameter \(\sigma _u^2\) when \(f(u,\omega ,\lambda )\) is given by \(\sigma _u^2f_0(u,\omega ,\lambda )\), as it is seen easily that \(l_c(\theta )\) does not depend on \(\sigma _u^2\). Hence, Whittle estimation proposed here just provides the estimators only for the parameters included in \(f_0(u,\omega ,\lambda )\). In addition, \(l_c\) in which the periodogram is replaced with the modified one multiplied with any constant would provide exactly the same values of the estimators by the same reason. In Example 1, all the parameters except for \(\sigma (u)^2\) are identifiable by the likelihood \(l_c\) and can be estimated by minimizing it.

## Remark 1

*j*th element in the set of Fourier frequency:

*M*is the cardinality of the Fourier frequencies included in \(D\times [-\pi ,\pi ]\).

### Assumptions

- (A1)Suppose that \(X_A(s,t)\) follows locally stationary processes in (5) with the spatially dependent spectral density function \(f(u,\omega ,\lambda )=|K(u,\omega ,\lambda )|^2\), and is observed on \((s_p,t),p=1,\ldots ,N,t=1,\ldots ,T,s_p\in [0,A]\). \(s_p,p=1,\ldots ,N\) are written as follows:where \(\varepsilon _p=(\varepsilon _{p1},\varepsilon _{p2})\) is a sequence of independently and identically distributed random vectors with a probability density function$$\begin{aligned} s_p=(A_1\varepsilon _{p1},A_2\varepsilon _{p2})', \end{aligned}$$
*h*(*x*) over the compact region \([0,1]^2\). - (A2)
We assume that \(A_j,B_j,j=1,2\),

*N*and*T*are the functions of*k*, such that \(A_j=A_j(k),B_j=B_j(k)\rightarrow \infty \), \(N=N_k\rightarrow \infty \) and \(T=T_k\rightarrow \infty \) as \(k\rightarrow \infty \). \(N_k^{-1}|A_k|\rightarrow 0\), \(B_j(k)/A_j(k)\rightarrow 0\), \(\sqrt{T_k|B_k}|B_j(k)^{-2}\rightarrow 0\), \(\sqrt{T_k^{-3}|B_k}|\rightarrow 0\), and \(\sqrt{T_k|B_k|}B_j(k)/A_j(k)\rightarrow 0\) for \(j=1,2\) as \(k\rightarrow \infty \). - (A3)
The spatially dependent spectral density function \(f(u,\omega ,\lambda )\) is an integrable, bounded, and twice partially differentiable function with respect to \(\omega \in {\mathbf {R}}^2\), \(\lambda \in [-\pi ,\pi ]\), and partially differentiable with respect to \(u\in [0,1]^2\).

- (A4)
The tapers \(w_{sp}(x),x\in [-.5,.5]^2\) and \(w_{tmp}(x),x\in [0,1]\) are twice partially differentiable functions when they are regarded as functions over \({\mathbf {R}}^2\) and \({\mathbf {R}}\), respectively.

- (A5)
We fit, for a fixed \(u\in [0,1]^2\), the parametric spectral density \(f(\theta _u,\omega ,\lambda )\), \(\theta _u\in \Theta \), a compact subset in \({\mathbf {R}}^d\). \(f(\theta _u,\omega ,\lambda )\) is positive on \(\Theta \times D\times [-\pi ,\pi ]\) and twice differentiable with respect to \(\theta _u\) for \((\omega ,\lambda )\in D\times [-\pi ,\pi ]\). \(\theta _1(u)\ne \theta _2(u)\) implies that \(f(\theta _1(u),\omega ,\lambda )\ne f(\theta _2(u),\omega ,\lambda )\) on a subset of \(D\times [-\pi ,\pi ]\) with positive Lebesgue measure. The true parameter denoted by \(\theta _0(u)\) lies in the interior of \(\Theta \), namely \(f(\theta _0(u),\omega ,\lambda )=f(u,\omega ,\lambda )\).

### Asymptotic results

Consider the asymptotic results under the scheme in (A1) and (A2). Let \(\hat{\theta }_k(u)\) be the estimator minimizing \(l_c(\theta )\) in (7) for a fixed \(u\in [0,1]^2\) under the asymptotic scheme in (A1) and (A2) for \(k=1,2,\ldots \).

**Theorem 1**

*Under Assumptions A1–A5*,

- 1.
*For a fixed*\(u\in [0,1]^2\),*such that*\(h(u)>0\), \(\hat{\theta }_k(u)\)*converges to*\(\theta _0(u)\)*in probability as*\(k\rightarrow \infty \). - 2.
*For a fixed*\(u\in [0,1]^2\),*such that*\(h(u)>0\),$$\begin{aligned} \sqrt{T_k|B_k|}\left( \hat{\theta }_k(u)-\theta _0(u)\right) \rightarrow N\left( 0,b_w\left( \Gamma _{0u}-\Phi _{0u}\right) ^{-1} (2\Gamma _{0u}+\Delta _{0u}) \left( \Gamma _{0u}-\Phi _{0u}\right) ^{-1} \right) , \end{aligned}$$*in distribution as*\(k\rightarrow \infty \),*where*\(\Gamma _{0u}=\Gamma (\theta _0(u)), \Phi _{0u}=\Phi (\theta _0(u)), \Delta _{0u}=\Delta (\theta _0(u))\) with$$\begin{aligned} b_w&= \left\{ \int \int |w_{\mathrm{sp}}(x)|^4|w_{\mathrm{tmp}}(y)|^4\mathrm{d}x\mathrm{d}y \right\} \left\{ \int \int |w_{\mathrm{sp}}(x)|^2|w_{\mathrm{tmp}}(y)|^2\mathrm{d}x\mathrm{d}y \right\} ^{-2},\\ \Gamma (\theta )&=(2\pi )^{-3}\int _D\int _{-\pi }^{\pi } \left( \frac{\partial \log f(\theta ,\omega ,\lambda )}{\partial \theta } \right) \left( \frac{\partial \log f(\theta ,\omega ,\lambda )}{\partial \theta } \right) ' \mathrm{d}\omega \mathrm{d}\lambda ,\\ \Phi (\theta )&=(2\pi )^{-3}(2\pi |D|)^{-1}\int _D\int _{-\pi }^{\pi } \left( \frac{\partial \log f(\theta ,\omega ,\lambda )}{\partial \theta } \right) \mathrm{d}\omega \mathrm{d}\lambda \int _D\int _{-\pi }^{\pi } \left( \frac{\partial \log f(\theta ,\omega ,\lambda )}{\partial \theta } \right) ' \mathrm{d}\omega \mathrm{d}\lambda ,\\ \Delta (\theta )&=(2\pi )^{-3}\int _D\int _{-\pi }^{\pi }\int _D\int _{-\pi }^{\pi } \left( \frac{\partial \log f(\theta ,\omega _1,\lambda _1)}{\partial \theta } \right) \left( \frac{\partial \log f(\theta ,\omega _2,\lambda _2)}{\partial \theta } \right) '\\&\quad\times a_4(\omega _1,-\omega _1,\omega _2)b_4(\lambda _1,-\lambda _1,\lambda _2) \mathrm{d}\omega _1 \mathrm{d}\lambda _1\mathrm{d}\omega _2 \mathrm{d}\lambda _2. \end{aligned}$$

The asymptotic variance is different from the popular one in discrete time series models (Dunsmuir 1979). Precisely, \(\Phi (\theta )\) in the asymptotic variance disappears in the cases of discrete time series models, since the integration of logged spectral density is the constant, i.e., the logged innovation variance (see Theorem 5.8.1 in Brockwell and Davis 1991). It is different also from the one in Matsuda and Yajima 2009, which employs the non-concentrated Whittle likelihood in (6). The non-concentrated likelihood estimator does not include \(\Phi (\theta )\) in the asymptotic variance. Hessian matrices between (6) and (7) correspond in the cases of discrete time series, while they do not in our cases, which is the reason for the difference.

## Empirical studies

We apply locally stationary spatio-temporal CARMA models in (4) to US precipitation data, the motivating example for the temporal and nonstationary extensions of CARMA random fields, to check empirical properties of Whittle likelihood estimation and forecasting performances based on the identified model. US precipitation data are monthly precipitation observed at weather stations all over US from 1895 through 1997, which is available in the web page of Institute for Mathematics Applied to Geosciences (IMAG): http://www.image.ucar.edu/Data/US.monthly.met/USmonthlyMet.shtml.

*z*are transformed by

We fit the locally stationary spatio-temporal CARMA(2,1) model introduced in Example 1, where \(\theta _3\), the smoothness parameter and \(\phi ,\psi \), the AR parameters, are designed to be dependent spatially. The other two of \(\theta _1\) and \(\theta _2\) were fixed as 3.63 and 0.53 to guarantee the identifiability, which are obtained by minimizing (7) in which the periodogram was modified with the one for the spatial weight \(w_{\mathrm{sp}}=1\), namely by the usual Whittle likelihood estimation. The samples for 36 months from Jan. in 1994 to Dec. in 1996 were used for conducting the estimation minimizing the Whittle likelihood function in (7), where the weight \(w_{\mathrm{sp}}(x)=\exp (-x^2/8^2) and w_{\mathrm{tmp}}(x)=1\) were employed.

Before introducing the estimation results, we shall state the reason why we focus on CARMA(2,1) kernels in the empirical study. First, for general higher order CARMA kernels, the parameters that govern smoothness of covariances have low identifiability for discretely observed data. In other words, fit of continuous models to discrete data usually results in low identifiability. Second, CARMA(2,1) kernels are general enough to cover practical behaviors of covariance functions including CAR(1), which reduces to Ornstein-Uhlenbeck process in one dimension, and express even negative covariances (see Example 2.1 of Brockwell and Matsuda 2017). Finally, model selection criteria such as AIC do not work for CARMA model selection, since Whittle estimators do not have standard asymptotic results that justify the use of the criteria.

We find that the estimators caught the spatial dependencies of the three parameters well in the nonparametric way. Figure 2 of the smoothness parameter shows that smoothness of covariances decreases over the range in the Rocky mountains in comparison with that in plain fields, which appeals to our intuitive observations of spiky behaviors of precipitations in mountainous areas. Figures 3 and 4 show that the seasonal coefficient depends spatially in the way of gradually decreasing from the west to east, while the autoregressive coefficient is nearly constant of 0.40. In the east coast area, even negative estimator is found for the seasonal parameter.

Finally, we conduct 1, 2, and 3 months ahead forecasts for the samples from Jan. till Dec. in 1997 by the identified spatio-temporal CARMA(2,1) model based on the samples till Dec. 1996. Table 1 shows the MSEs of the forecasts for precipitation in 100 randomly selected stations from the ones in 1997, in comparison with the two benchmarks given by the averages of precipitation on the previous and same months in the past 3 years. For example, the two benchmark forecasts in March, 1997 are 3 year averages of precipitation in February, 1995–1997 and those of March, 1994–1996.

*w*by the samples \(X_A(s_j,k), j=1,\ldots ,N,k\le t\). By (4), we have, for \(I_t\) being the information generated by \(X_A(s,k),s\in [0,A],k\le t\):

*w*, should be estimated with

*h*-step, say) ahead forecasts for \(X_A(w,t+h)\) are constructed recursively for \(h=2,3,\ldots \) by replacing the unobserved values with the predicted values in the previous step.

Comparisons of MSEs among the forecasts for US monthly precipitation in 1997 by CARMA(2,1) and two benchmarks evaluated by the samples from Jan. 1994 till Dec. 1996, where 100 stations were randomly chosen for the precipitation to be predicted

Month | MSE | ||||
---|---|---|---|---|---|

bmrk 1 | bmrk 2 | Step 1 | Step 2 | Step 3 | |

Jan. | 0.228 | 0.321 | 0.331 | 0.592 | 0.598 |

Feb. | 0.811 | 0.698 | 0.628 | 0.592 | 0.529 |

Mar. | 0.615 | 0.559 | 0.558 | 0.557 | 0.546 |

Apr. | 0.559 | 0.451 | 0.532 | 0.402 | 0.444 |

May | 0.263 | 0.346 | 0.286 | 0.352 | 0.321 |

Jun. | 0.296 | 0.291 | 0.327 | 0.391 | 0.455 |

Jul. | 0.552 | 0.526 | 0.513 | 0.495 | 0.490 |

Aug. | 0.260 | 0.449 | 0.327 | 0.362 | 0.361 |

Sep. | 0.408 | 0.355 | 0.321 | 0.322 | 0.342 |

Oct. | 0.461 | 0.366 | 0.350 | 0.363 | 0.357 |

Nov. | 0.341 | 0.317 | 0.394 | 0.373 | 0.366 |

Dec. | 0.468 | 0.625 | 0.397 | 0.471 | 0.481 |

Average | 0.439 | 0.442 | 0.414 | 0.439 | 0.441 |

## Discussion

This paper has proposed locally stationary spatio-temporal processes to describe the empirical properties of US precipitation data, the huge set of spatio-temporal data. Extending stationary CARMA random fields on \({\mathbf {R}}^2\) to spatio-temporal models with spatially nonstationary and temporarily stationary covariances, we have locally stationary spatio-temporal CARMA processes, which are, moreover, generalized to locally stationary spatio-temporal processes. Following Dahlhaus (1997), we estimate the spatially dependent parameter by minimizing Whittle likelihood and derive the asymptotic properties rigorously. Applications to US precipitation data demonstrate that the nonstationary spatial behaviors are accounted well by the locally stationary CARMA(2,1) model.

The critical restriction of spatio-temporal CARMA processes is that the covariances are confined to separable ones given by the products of spatial and temporal covariances. Nonseparable extensions that can express fruitful class of covariance structures are our next target. One more interesting extension is to allow Lévy sheets that drive CARMA random fields to have infinite variances, which makes it possible to express several varieties of spiky behaviors in spatial data. New parameter estimation method over Whittle estimation, which may not work for the infinite variance cases, is required. Their asymptotic properties are important issues that attract empirical as well as mathematical interests.

## Sketch of the proof

This section shows the outline of the proof for Theorem 1.

*Proof of Theorem 1(a)*

*Proof of Theorem 1(b)*

## Notes

### Acknowledgements

The research was supported by the Grants-in-Aid for Scientific Research, 17H01701, 17H02508.

## References

- Brockwell, P. J., & Davis, R. A. (1991).
*Time series: theory and methods*(2nd ed.). New York: Springer.CrossRefMATHGoogle Scholar - Brockwell, P. J., & Matsuda, Y. (2017). Continuous auto-regressive moving average random fields on \(R^n\).
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*,*79*(3), 833–857.MathSciNetCrossRefGoogle Scholar - Dahlhaus, R. (1997). Fitting time series models to nonstationary processes.
*The Annals of Statistics*,*25*, 1–37.MathSciNetCrossRefMATHGoogle Scholar - Dunsmuir, W. (1979). A central limit theorem for parameter estimation in stationary vector time series and its application to models for a signal observed with noise.
*The Annals of Statistics*,*7*, 490–506.MathSciNetCrossRefMATHGoogle Scholar - Fuentes, M. (2001). A new high frequency kriging approach for nonstationary environmental processes.
*Environmetrics*,*12*, 469–483.CrossRefGoogle Scholar - Guttorp, P., & Sampson, P. D. (1994). Methods for estimating heterogeneous spatial covariance functions with environmental applications. In G. P. Patil & C. R. Rao (Eds.),
*Handbook of Statistics*(Vol. 12, pp. 661–689). New York: Elsevier Science.Google Scholar - Higdon, D. (1998). A process-convolution approach to modelling temperatures in the North Atlantic ocean.
*Environmental and Ecological Statistics*,*5*, 173–190.CrossRefGoogle Scholar - Matsuda, Y., & Yajima, Y. (2009). Fourier analysis of irregularly spaced data on \(R^d\).
*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*,*71*(1), 191–217.MathSciNetCrossRefMATHGoogle Scholar - Nychka, D., Wikle, C. K., & Royle, J. A. (2002). Multiresolution models for nonstationary spatial covariance functions.
*Statistical Modelling*,*2*, 315–332.MathSciNetCrossRefMATHGoogle Scholar - Priestley, M. B. (1971). Time-dependent spectral analysis and its application in prediction and control.
*Journal of Sound and Vibration*,*17*, 517–534.CrossRefGoogle Scholar - Sampson, P. D. (2010). Constructions for Nonstationary Spatial Processes. In A. E. Gelfand, P. Diggle, M. Fuentes & P. Guttorp (Eds.),
*Handbook of Spatial Statistics*(pp. 119–130). Boca Ration: CRC Press.Google Scholar - Stein, M. L. (1999).
*Interpolation of spatial data*. New York: Springer.CrossRefMATHGoogle Scholar - Walker, A. M. (1964). Asymptotic properties of least-squares estimates of parameters of the spectrum of a stationary non-deterministic time-series.
*Journal of the Australian Mathematical Society*,*4*, 363–384.MathSciNetCrossRefMATHGoogle Scholar