# Encyclopedia of Social Network Analysis and Mining

Living Edition
| Editors: Reda Alhajj, Jon Rokne

# Spatial Statistics

Living reference work entry
DOI: https://doi.org/10.1007/978-1-4614-7163-9_167-1

## Glossary

Correlation/covariance

Measures of similarity between observations

Geostatistics

A branch of spatial statistics

Isotropy

Property of covariance and variogram functions that make them is invariant under rotation of locations

Kriging

Method for linear unbiased prediction

Random field

A collection of random variables indexed by location

Stationarity

Property of random fields in which their mean and covariance functions are invariant under translation of locations

Variogram/semivariogram

Measures of dissimilarity between observations

## Definition

Spatial statistics is a branch of statistics that studies methods to make inference based on data observed over spatial regions. In typical applications these regions are either 2- or 3-dimensional. The methodology is mostly aimed at accounting and modeling aspects of the so-called First Law of Geography: attributes from locations that are closer together are more closely related than attributes from locations that are farther apart. This is accomplished through appropriate measures of spatial association. An overview of models and methods is given for the three main types of spatial data: geostatistical, lattice, and point pattern.

## Introduction

Spatial data refer to measurements of phenomena that vary over a region of space D ⊂ ℝ d  , d ≥ 1, which would be called the region of interest. Each datum is associated to a subset of D that indicates where it was collected, often called the datum’s support. This may be a single point or a larger subset, depending on the context.

There are three basic types of spatial data: geostatistical (or point referenced), lattice (or areal), and point pattern. The three types may be viewed as pairs {( si, zi) : i = 1,  … , n} where the interpretation and characteristics of the data components vary from type to type. For geostatistical data z1,…,zn are measurements or observations of a phenomenon of interest taken at sampling locations s1 ;  …  , sn ∈ D, which are single points. In the models to be described later, the zi s are random, while n (the sample size) and the si s are known and fixed. For lattice data s1,…,sn are subregions that form a partition of D, such as counties or postal codes, and z1,…,zn are averages or summaries of the phenomenon of interest over these subregions. For this type of data, it also holds that the zi s are random, while n and the si s are known and fixed. For point pattern data s1,…,sn are points where a certain event of interest occurs, such as the presence of a type of tree or the epicenter of an earthquake, and z1,…,zn are a feature of the aforementioned events, such as the diameter of the tree at breast height or the magnitude of the earthquake. In the models for point pattern data to be described later, all components n, si and, zi are random. Often the zi s are absent when interest centers only on the pattern of occurrences. For all three types of spatial data, additional variables could also be available, that serve as explanatory variables. Comprehensive treatments of statistical models and methods for all three types of data appear in Cressie (1993), Schabenberger and Gotway (2005), and the recent edited volume by Gelfand et al. (2010). Table 1 summarizes the key concepts and gives an overview of models and examples.
Table 1

Summary and overview of concepts, models, and examples in the three types of spatial data

Geostatistical

Lattice

Point pattern

Domain D

Fixed, continuous

Fixed, discrete

Random, continuous

Observation sites {si : i = 1,  … , n}

si fixed n fixed

si fixed n fixed

si random n random

Inference for

Z.(s) only

Z.(s) only

Both Z.(s) and D

Main models for Z.(s)

Sum of regression trend and stationary random field

Simultaneous Autoregressive Model (SAR)

Poisson process (homogeneous and inhomogeneous)

Key aims, concepts

Kriging (minimum MSE prediction)

Spatial proximity matrix (W)

Assess tendency for clustering

Examples

Meteorological and geological variables

Geographic and demographic variables

Location and intensity of events

## Key Points

### Random Fields

A random field {Z(s) : s ∈ D} on the region D ⊂ ℝ d is a collection of random variables indexed by the elements of D, where D can be finite or infinite. These random variables are often nonidentically distributed and dependent, so modeling these aspects is a key starting point. The simplest way to do this is through the mean and covariance functions of the random field, defined as
$\begin{array}{r}\mu \left(s\right):=\mathrm{E}\left\{Z\left(s\right)\right\}\phantom{\rule{1em}{0ex}}\mathtt{and}\\ C\left(s,u\right):=\mathtt{cov}\left\{Z\left(s\right),Z\left(u\right)\right\},\phantom{\rule{0.5em}{0ex}}s,u\in D.\end{array}$

The former determines the spatial trend, a measure of variation over large distances, while the latter determines the spatial association, a measure of variation over small distances.

Other features of a random field also related with spatial association are the correlation function and semivariogram function, defined respectively as
$$\begin{array}{lll} K\left( s, u\right)\hfill & :=\hfill & \mathrm{corr}\left\{ Z(s),\left\{ Z(u)\right\}=\frac{C\left( s, u\right)}{\sigma (s)\sigma (u)}\right.\hfill \\ {}\gamma \left( s, u\right)\hfill & :=\hfill & \frac{1}{2}\mathrm{var}\left\{ Z(s)- Z(u)\right\}\hfill \\ {}\hfill & =\hfill & \frac{1}{2}\left({\sigma}^2(s)+{\sigma}^2(u)-2 C\left( s, u\right)\right),\hfill \end{array}$$
where σ2 . (s) = var{Z . (s)} is the variance function. The functions C.(s, u) and γ.(s, u) provide similar information about the spatial association of the random field, with the former being a measure of similarity between Z.(s) and Z(u), while the latter is a measure of dissimilarity. When choosing the aforementioned functions, it is important to note that any function can be used as a mean function, but not any function can be used as a covariance function. The latter needs to be positive semi-definite, meaning that for any m ∈ ℕ , s 1 ,  …  , s m  ∈ D and a 1 ,  …  , a m  ∈ ℝ it holds that
$$\sum_{i=1}^m\sum_{j=1}^m{a}_i{a}_j C\left({s}_i,{s}_j\right)\ge 0.$$

This is a difficult condition to verify, but fortunately the literature provides many functions known to be positive semi-definite; see Cressie (1993) and Chilès and Delfiner (1999) for examples. These references also provide an intermediate treatment on the theory and methods of random fields and their application to spatial statistics, while Matérn (1986), Yaglom (1987), and Stein (1999) provide more mathematical treatments.

Lattice data usually represent averages or summaries of a quantity of interest over subregions, so covariance and semivariogram functions are not the most suitable to quantify spatial association among this type of data. Instead, neighborhood relations and weight matrices are used. In this case the collection of subregions {s1,…,sn} is endowed with a neighborhood system {Ni : i = 1,  … , n}, where Ni denotes the subregions that are, in a precisely defined way, neighbors of subregion si. For rectangular regular lattices where the subregions may be thought of as pixels, it is common to use first-order neighborhood systems, where the neighbors of a pixel are the pixels adjacent to the north, south, east, and west; see Fig. 1a. Second-order neighborhood systems are also used, where the neighbors of a pixel are its first-order neighbors and their first-order neighbors; see Fig. 1b. In these cases all pixels have the same number of neighbors, except for pixels at (or near) the boundary of D. For regions divided in unequally shaped subregions (like counties in a state), a commonly used neighborhood system is defined in terms of geographic adjacency, Ni = {sj: subregions si and sj share a boundary}; other examples not based on geographic adjacency are also possible. In these cases the number of neighbors for each subregion usually differs. Fig. 1 Examples of first-order (a) and second-order (b) neighborhood systems. Pixels in blue are the neighbors of the pixel marked with an “x”

In addition a weight (or neighborhood) matrix W = (wij) is specified, where wij measures the strength of direct association between sites si and sj. It must satisfy that wij ≥ 0, wii = 0 and wij > 0 if and only if si and sj are neighbors (i.e., sj ∈ Ni). The most common example of weight matrix is wij = 1 if si and sj are neighbors and wij = 0 otherwise, but other more refined specifications are also possible, e.g., based on distance between subregions’ centroids. Anselin (1988), Cressie (1993), Rue and Held (2005), and LeSage and Pace (2009) provide ample treatments of models and methods for the analysis of lattice data.

For geostatistical and lattice data, the sampling locations s1,…,sn are fixed and known, so these types of data are usually written as z = (z1;…,zn)T (T denotes transpose of a vector or matrix). The stochastic approach for modeling and inference assumes the data are a part of a realization of a random field Z(∙), so datum zi is the realized value of the random variable Z(si).

### Stationarity and Ergodicity

Spatial data typically contain no replicates as usually a single observation is available at each location, so some assumptions on the random field are needed to make statistical inference feasible. To illustrate this point consider the conceptual decomposition Z(si) = μ(si) + ε(si), with ε(∙)a random field with mean zero and co-variance function C(s, u). Without some extra assumptions it is not possible to identify both μ(si) and ε(si) with a single observation at si. This is so because a term can be added to μ(si) and subtracted from ε(si) in infinitely many ways, any of which will not change the datum Z(si) but will change the components that seek to be identified.

The assumptions alluded above are those of stationarity and ergodicity. A random field Z(s) is said to be (second-order or weakly) stationary if
$$\begin{array}{c}\mu (s)=\mu \left(\mathrm{constant}\right)\mathrm{and}\\ {} C\left( s, u\right)=\tilde{C}\left( s- u\right), s, u\in D,\end{array}$$
where $$\tilde{\mathrm{C}}\left(\cdot \right)$$ is a function of a single spatial variable. The above means that the mean and covariance functions are invariant under translations of the spatial locations. From these follow that the variance, correlation, and semivariogram functions are also invariant under translations of the spatial locations, and we have
$$\begin{array}{c}{\sigma}^2(s)={\sigma}^2, C\left( s, u\right)={\sigma}^2\tilde{K}\left( s- u\right),\\ {}\gamma \left( s, u\right)={\sigma}^2\left(1-\tilde{K}\left( s- u\right)\right).\end{array}$$

An important and commonly used special case of stationarity is called isotropy, meaning that $$C\left( s, u\right)=\overline{C}\left(|| s- u||\right),$$ where $$\left|\right| h\left|\right|:={\left({h}_1^2+\cdots +{h}_d^2\right)}^{1/2}$$ is the Euclidean norm of h ∈ ℝ d and $$\overline{\mathrm{C}}\ \left(\cdot \right)$$ is a function of a single real variable. In this case the covariance function is also invariant under rotations of the spatial locations, so the nature of spatial association is the same in all directions; see Ripley (1981), Cressie (1993), and Schabenberger and Gotway (2005) for further discussion on stationarity.

A precise definition of ergodicity is somewhat technical (see Cressie 1993, pp. 53–58), but this assumption is key to make statistical inference based on spatial data feasible. This is so because the meaning and interpretation of many features of a random field, such as the mean function, are based on ensemble (i.e., population) averages, namely, averages over the possible realizations of the random field. Ergodicity requires that spatial averages computed from a single realization converge to their respective ensemble averages as the sample size increases to infinity.

A complete description of a random field requires specifying its family of finite-dimensional distributions, namely, the family of joint distributions
$$\begin{array}{l}{F}_{s_1,\dots, {s}_m}\left({x}_1,\dots, {x}_m\right)= P\left\{ Z\left({s}_1\right)\right.\\ {} \left.\le {x}_1,\dots, Z\left({s}_m\right)\le {s}_m\right\},\end{array}$$

m ∈ ℕ and s 1 ,  …  , s m  ∈ D. The simplest and most commonly used of such specification is that of Gaussian random fields, meaning that all the aforementioned distributions are multi-variate normal. Gaussian random fields are completely specified by their mean and covariance functions, and when they are stationary, a sufficient condition for them to be ergodic is that $${ \lim}_{\left\Vert h\right\Vert \to \infty}\tilde{C}(h)=0$$. Gaussian random fields are the most commonly used models because of their convenient mathematical properties and wide applicability, as well as their use as “building blocks” for more complex random fields models. Examples of the latter are hierarchical models used to describe discrete spatial data; see Banerjee et al. (2004) and Diggle and Ribeiro (2007).

## Models and Inference

### Geostatistical Data Models

The basic geostatistical model is based on the conceptual decomposition of the random field of interest as
$$\begin{array}{cc}\hfill Z(s)=\mu (s)+\varepsilon (s),\hfill & \hfill s\in D,\hfill \end{array}$$
where μ(s) is the mean function (spatial trend) and ε(∙) is a zero-mean random field that describes the short-range variation, with the same covariance function as Z(∙). The usual model for the spatial trend is similar to that used in linear regression models
$$\mu (s)=\sum_{j=1}^p{f}_j(s){\beta}_j=\boldsymbol{f}{(s)}^{\mathbf{T}}\boldsymbol{\beta},$$
where β = (β1,…,βp)T are unknown regression parameters and f(s) = (f1(s),…,fp(s))T are known location-dependent covariates. The latter may include related spatially varying processes. For instance, if Z(s) = rainfall amount that fell over a period of time at locations s, then f.s/ = altitude at location s may be a useful explanatory variable. More often a spatial trend is described in terms of a polynomial in the spatial coordinates. For the case when d = 2 and s = (x, y), this would be
$$\mu (s)=\sum_{0\le i+ j\le p}{\beta}_{i j}{x}^i{y}^j,\mathrm{for}\ \mathrm{some}\ p\ge 1\ \mathrm{known}.$$
Many examples of stationary covariance models have been proposed in the literature (see Cressie 1993; Chilès and Delfiner 1999). An example of a flexible family of isotropic covariance functions is the so-called Matérn family (Matérn 1986; Stein 1999)
$$\begin{array}{cc}\hfill \overline{C}(t)=\frac{2{\sigma}^2}{\Gamma (v)}{\left(\frac{t}{2\phi}\right)}^{\nu}{\mathrm{K}}_v\left(\frac{t}{\phi}\right),\hfill & \hfill t\ge 0,\hfill \end{array}$$

where Γ(∙) is the gamma function and $${\mathrm{K}}_{\nu}\left(\cdot \right)$$ is the modified Bessel function of the second kind and order v. For such model ϕ > 0 (mainly) controls how fast the correlation decreases with distance, and v > 0 controls the smoothness of the realizations of the random field. The commonly used exponential and Gaussian covariance functions are special cases obtained, respectively, by setting v = 1/2 and v → ∞.

The above description assumes the process of interest is measured exactly (or nearly so), but more often the data contain measurement error; see Le and Zidek (2006) for an extensive discussion. In this case the simplest model for the observed data is
$$\begin{array}{cc}\hfill {Z}_{i{,}_{\mathrm{o}\mathrm{bs}}}= Z\left({s}_i\right)+{\acute{\mathrm{o}}}_i,\hfill & \hfill i=1,\dots, n,\hfill \end{array}$$
where ó1,…,ón are assumed i.i.d with mean 0, variance τ2 > 0 and independent of Z(∙). Under the above model the data Zobs = (Z1 , obs,  … , Zn , obs)T follow the general linear model
$${\mathbf{Z}}_{\mathrm{obs}}= X\boldsymbol{\beta} +\boldsymbol{\varepsilon},$$

where X is the n by p matrix with entries (X)ij = fj(si) and ε is a random vector with $\mathrm{E}\left\{\epsilon \right\}=0$ and var{ε} = var{Z obs} = Σθ, with the n by n matrix Σθ having entries (Σθ)ij = σ2(2/τ(γ))(tij/2ϕ)γKγ(tij = ϕ) and t ij =  ∥ si − sj1 ; 1 (A) denotes the indicator function of A. This basic specification of a geostatistical model depends on unknown regression parameters β and covariance parameters θ = (σ2; ϕ, v; τ2).

#### Parameter Estimation

The classical geostatistical method of estimation uses a distribution-free approach (Journel and Huijbregts 1978; Cressie 1993; Chilès and Delfiner 1999). First, the regression parameters are estimated by least squares, resulting in
$$\widehat{\boldsymbol{\beta}}={\left({X}^{\prime } Q X\right)}^{-1}{X}^T Q{\boldsymbol{Z}}_{\mathrm{obs}},$$

where Q = In (ordinary least squares) or $$Q={\Sigma}_{\boldsymbol{\theta}}^{-1}$$ (generalized least squares); the latter requires an estimate of Σθ. In both cases X is assumed to have full rank. The second choice of Q results in a more efficient estimator, but often in practice there is little difference between them. The resulting trend surface estimate is $$\widehat{\mu}(s)=\boldsymbol{f}{(s)}^{\mathbf{T}}\widehat{\boldsymbol{\beta}}$$.

Second, when the mean function is constant, the covariance parameters are estimated by the following two-stage approach: For selected distances t1 <…<tk, the (model-free) semivariogram estimates are first computed
$$\widehat{\gamma}\left({t}_j\right)=\frac{1}{2\left| N\left({t}_j\right)\right|}\sum_{N\left({t}_j\right)}{\left({z}_{i,\mathrm{obs}}-{z}_{j,\mathrm{obs}}\right)}^2,$$
where N(t) = {(i, j) : t − Δt <  ∥ si − sj ∥  < t + Δt}, with Δt > 0 fixed and |N(t)| the number of elements in N(t). A proposed semi-variogram model, say γ(t; θ), is then fitted to the above semivariogram estimates $$\widehat{\gamma}\left({t}_1\right),\dots, \widehat{\gamma}\left({t}_k\right)$$ using (nonlinear) least squares, so the covariance parameter estimates are
$$\widehat{\boldsymbol{\theta}}= \arg \min \sum_{j=1}^k{\left(\widehat{\gamma}\left({t}_j\right)-\gamma \left({t}_j;\boldsymbol{\theta} \right)\right)}^2.$$

The resulting semivariogram function estimate is $$\gamma \left(;\widehat{\boldsymbol{\theta}}\right)$$. When μ(s) is not constant a similar procedure is done using the residuals $$\mathbf{e}={\mathbf{z}}_{\mathrm{obs}}- X\widehat{\boldsymbol{\beta}}$$, rather than the observed data. This estimation method is popular among practitioners, but the statistical properties of the resulting estimators are not well understood.

When the random field Z(∙) is Gaussian, all the parameters can be jointly estimated by maximum likelihood (Cressie 1993; Stein 1999), resulting in the estimators
$$\left(\widehat{\boldsymbol{\beta}},\widehat{\boldsymbol{\theta}}\right)= \arg \max L\left(\boldsymbol{\beta}, \boldsymbol{\theta}; {\mathbf{z}}_{\mathrm{obs}}\right),$$
(1)
where
$$\begin{array}{l} L\left(\boldsymbol{\beta}, \boldsymbol{\theta}; {\mathbf{z}}_{\mathrm{obs}}\right)={\left(\frac{1}{2{\pi \sigma}^2}\right)}^{\frac{n}{2}}{\left|{\Sigma}_{\boldsymbol{\theta}}\right|}^{-\frac{1}{2}}\\ {} \times \exp \left\{-\frac{1}{2{\sigma}^2}{\left({\mathbf{z}}_{\mathrm{obs}}- X\boldsymbol{\beta} \right)}^{\top }{\Sigma}_{\boldsymbol{\theta}}^{-1}\left({\mathbf{z}}_{\mathrm{obs}}- X\boldsymbol{\beta} \right)\right\}.\end{array}$$
(2)

This method is more statistically satisfactory than the two-stage approach described above but is also more computationally demanding, to the point of not being feasible for very large datasets (n very large) due the need of storing and numerically inverting the n by n matrix Σθ; see Cressie (1993), Schabenberger and Gotway (2005), and chapter 4 in Gelfand et al. (2010) for other methods of estimation.

#### Spatial Prediction (Kriging)

The primary task in the analysis of geostatistical data is often spatial prediction, also known as kriging, which consists of making inference about Z(s0) where s0 ∈ D is an unsampled location. The classical approach uses optimal linear unbiased prediction and only requires knowledge of the mean and covariance (or semivariogram) functions. Specifically, the method seeks to minimize the mean squared prediction error
$\mathit{\text{MSPE}}\left(\stackrel{̂}{Z}\left({s}_{0}\right)\right)=\mathrm{E}\left\{{\left(Z\left({s}_{0}\right)-\stackrel{̂}{Z}\left({s}_{0}\right)\right)}^{2}\right\},$
over the class of linear unbiased predictors, that is, predictors of the form $$\widehat{Z}\left({s}_0\right)={\sum}_{i=1}^n{\lambda}_i\left({s}_0\right){z}_{i,\mathrm{obs}}$$ that satisfy $\mathrm{E}\left\{\stackrel{̂}{Z}\left({s}_{0}\right)\right\}=\mathrm{E}\left\{Z\left({s}_{0}\right)\right\}$. Under the aforementioned linear model, the optimal coefficients are obtained as the solution of a linear system of equations, and the resulting optimal predictor is
$${\widehat{Z}}^K\left({s}_0\right)={\left({\boldsymbol{\sigma}}_0+ X{\left({X}^T{\Sigma}_{\boldsymbol{\theta}}^{-1} X\right)}^{-1}\left(\boldsymbol{f}\left({s}_0\right)-{X}^T{\Sigma}_{\boldsymbol{\theta}}^{-1}{\boldsymbol{\sigma}}_0\right)\right)}^T{\mathbf{Z}}_{\mathrm{obs}},$$
where σ0 = cov{Zobs, Z(s0)}; this is called the best linear unbiased predictor (BLUP) or kriging predictor of Z(s0). The usual uncertainty measure associated with the kriging predictor is $$\mathrm{MSPE}\left({\widehat{Z}}^K\left({s}_0\right)\right)$$, which is given by
$$\begin{array}{c}{\boldsymbol{\sigma}}^{2 K}\left({s}_0\right)= C(0)-{\boldsymbol{\sigma}}_0^{\top }{\sum}_{\boldsymbol{\theta}}^{-1}{\boldsymbol{\sigma}}_0+\left(\boldsymbol{f}\left({s}_0\right)\right.\\ {}-{\left.{X}^{\top }{\sum}_{\boldsymbol{\theta}}^{-1}{\boldsymbol{\sigma}}_0\right)}^{\top }{\left({X}^{\mathrm{T}}{\sum}_{\boldsymbol{\theta}}^{-1} X\right)}^{-1}\left(\boldsymbol{f}\left({s}_0\right)\right.\\ {}\left.-{X}^{\top }{\sum}_{\boldsymbol{\theta}}^{-1}{\boldsymbol{\sigma}}_0\right).\end{array}$$

When the random field Z(∙) is Gaussian, then $${\widehat{Z}}^K\left({s}_0\right)$$ is also the best unbiased predictor (it minimizes MSPE(∙) over the class of all unbiased predictors), and a 95 % prediction interval for Z(s0) is $${\widehat{Z}}^K\left({s}_0\right)\pm 1.96{\boldsymbol{\sigma}}^K\left({s}_0\right)$$; see Cressie (1993), Chilès and Delfiner (1999), and Schabenberger and Gotway (2005) for methodological details and Stein (1999) for theoretical underpinnings.

The computation of kriging predictors and the validity of their optimality properties require the covariance parameters θ to be known, which is certainly not the case in practice. The simplest and most commonly used practical solution is to use empirical or plug-in predictors and mean squared prediction errors obtained by replacing in the above formulas unknown co-variance parameters with their estimates. But the properties of the resulting plug-in predictors and mean squared prediction errors differ from those of their known covariance parameters counterparts since the former do not take into account the sampling variability of parameter estimators. As a result plug-in mean square prediction errors tend to underestimate the true mean square prediction errors of plug-in predictors, and the true coverage probability of plug-in prediction intervals tends to be smaller than nominal. Possible approaches to account for parameter uncertainty when performing predictive inference include using bootstrap (Sjöstedt-De Luna and Young 2003) and the Bayesian approach (Banerjee et al. 2004; Diggle and Ribeiro 2007), where the latter approach appears to be the most effective.

### Lattice Data Models

The starting point in the construction of models for lattice data is to empirically assess the existence of spatial association, which as mentioned in a previous section is usually specified in terms of neighborhood systems and weight matrices. The two most common statistics to diagnose spatial association among lattice data are Moran’s I (an analogue of the lagged autocorrelation used in time series) and Geary’s c (an analogue of the Durbin-Watson statistic used in time series). For random fields with constant mean, these statistics are defined as
$$\begin{array}{l} I=\frac{n{\sum}_{i=1}^n{\sum}_{j=1}^n{w}_{i j}\left( Z\left({s}_i\right)-\overline{Z}\right)\left( Z\left({s}_j\right)-\overline{Z}\right)}{S_0{\sum}_{i=1}^n{\left( Z\left({s}_i\right)-\overline{Z}\right)}^2}\\ {} c=\frac{\left( n-1\right){\sum}_{i=1}^n{\sum}_{j=1}^n{w}_{i j}{\left( Z\left({s}_i\right)- Z\left({s}_j\right)\right)}^2}{2{S}_0{\sum}_{i=1}^n{\left( Z\left({s}_i\right)-\overline{Z}\right)}^2},\end{array}$$

where $$\overline{Z}=\frac{1}{n}{\sum}_{i=1}^n Z\left({s}_i\right)$$ and $${S}_0={\sum}_{i=1}^n{\sum}_{j=1}^n{w}_{i j}$$. For Gaussian processes, $\mathrm{E}\left\{I\right\}=-{\left(n-1\right)}^{-1}$ and $\mathrm{E}\left\{c\right\}=1$ when observations are independent. Hence, observed values of I substantially below/above −(n − 1)−1 indicate negative/positive spatial association, while for Geary’s c the interpretation is reversed, with observed values of c substantially above/below 1 indicating negative/positive association. When the random field has a nonconstant mean, the above statistics are computed using residuals; see Cressie (1993) and Cliff and Ord (1981) for further details.

A large number of models for lattice data have been proposed in the literature (see Cressie 1993 and LeSage and Pace 2009), where most of them involve the specification of a neighborhood system {Ni} and weight matrix W. One of the most common models for lattice data is the Simultaneous Auto-regressive (SAR) model specified by a set of autoregressions
$$\begin{array}{cc}\hfill \begin{array}{l} Z\left({s}_i\right)=\boldsymbol{f}{\left({s}_i\right)}^{\top}\boldsymbol{\beta} +\rho \sum_{j=1}^n{w}_{i j}\left( Z\left({s}_j\right)-\boldsymbol{f}{\left({s}_j\right)}^{\top}\boldsymbol{\beta} \right)\\ {} +{\acute{\mathrm{o}}}_i, i=1,\dots, n,\end{array}\hfill & \hfill \hfill \end{array}$$

where f(sj) and β have the same interpretation as in models for geostatistical data and ó i  ~ N(0, ξi) are independent errors. This is a spatial analogue of autoregressive time series models, but unlike the latter the response and error vectors are correlated. Provided In − ρW is non-singular, it follows that Z ~ Nn(Xβ, (In − ρW)−1M(In − ρWT)−1), where M = diag(ξ 1,  … , ξ n ). It is common to assume ξ i  = ξfor all i, in which case the model parameters are β and θ = (ξ, ρ).

Another large class of models for lattice data is that of Markov random fields (Rue and Held 2005; Li 2009). These models construct the joint distribution for the data by specifying the set of all full conditional distributions, namely, the conditional distributions of Z(si) given Z(i), i = 1,…, n, where Z(i) = (Z(sj) : j ≠ i). In addition, these models assume a Markov property stating that the distribution of each datum depends on the rest only through its neighbors. An example of this is the class of Conditional Autoregressive (CAR) models with full conditional distributions
$$\begin{array}{ll}\hfill & \left( Z\left({s}_i\right)| Z\left({s}_j\right), j\ne i\right)\sim N\left(\boldsymbol{f}{\left({s}_i\right)}^{\top}\boldsymbol{\beta} +\rho \sum_{j=1}^n{w}_{i j}\left( Z\left({s}_j\right)-\boldsymbol{f}{\left({s}_j\right)}^{\top}\boldsymbol{\beta} \right),{\sigma}_i^2\right),\\ {}& i\hfill \\ {}& =1,\dots, n.\hfill \end{array}$$

To guarantee the above set of full conditional distributions determines a unique joint distribution, it is required that $${\sigma}_j^2{w}_{i j}={\sigma}_i^2{w}_{j i}$$ for all i,j, and M−1(In − ρW) be positive definite, with $$M=\mathrm{diag}\left({\sigma}_1^2,\dots, {\sigma}_n^2\right)$$, in which case Z ~ Nn(Xβ, (I − ρW)−1M). It is common to assume that $${\sigma}_i^2={\sigma}^2$$ for all i, in which case the model parameters are β and θ = (σ2, ρ). An extensive comparison between the SAR and CAR models is given in Cressie (1993, chapter 6).

The most commonly used method for parameter estimation in these models is maximum likelihood. As for geostatistical models, the resulting estimators are given by Eq. 1 where in the likelihood Eq. 2 $${\Sigma}_{\boldsymbol{\theta}}^{-1}=\left({I}_n-\rho {W}^T\right){M}^{-1}\left({I}_n-\rho W\right)$$ for SAR models and $${\Sigma}_{\boldsymbol{\theta}}^{-1}={M}^{-1}\left( I-\rho W\right)$$ for CAR models. For both models (as for geostatistical models) the computation of these estimators requires the use of numerical iterative methods.

A point worth noting is that, unlike in geostatistical models, in SAR and CAR models the spatial association structure is specified in terms of the inverse covariance matrix, rather than the covariance matrix, so the interpretation of parameters controlling spatial association is less straightforward than that in geostatistical models.

### Point Process Models

A point process on D ⊂ ℝ d is a random field whose realizations are sets of points in D, called point patterns (events). In the most general case attributes may also be observed along with the location of the events, resulting in a marked point process. For any A ⊂ D, let N(A) denote the number of events in A and v(A) the size of A(=∫Ads). The intensity function of a point process is the function λ : D → [0; ∞) with the property that $\mathrm{E}\left\{N\left(A\right)\right\}={\int }_{A}\lambda \left(s\right)\mathit{ds}$. Alternatively, using an “infinitesimal disc” ds centered at s the intensity function can be defined as the ratio of the expected number of points in ds to its size, that is,
$\lambda \left(s\right)=\underset{v\left(\mathit{ds}\right)\to 0}{\mathit{lim}}\frac{\mathrm{E}\left\{N\left(\mathrm{ds}\right)\right\}}{v\left(\mathrm{ds}\right)}.$

The most fundamental point process model is the Poisson process with intensity function λ(s), which satisfies the following: For any n ∈ ℕ and A1 ,  …  , An disjoint subsets of D, it holds that (i) N(Ai) has Poisson distribution with mean $${\int}_{A_i}\lambda (s) ds$$, and (ii) N(A1),…,N(An) are independent random variables. When the intensity function is constant, λ(s) = λ, the above is called a homogeneous Poisson process (HPP), and otherwise it is called an inhomogeneous Poisson process (IPP). Point patterns from HPP have the property of complete spatial randomness (CSR): given the number of events in a set A, these events are independently and identically distributed over A, so there is no “interaction” between events. Poisson processes are often used on their own for the analysis of point patterns, or as “building blocks” for more complex models; see Diggle (2003) and Illian et al. (2008) for introductory treatments and Cressie (1993) and Daley and Vere-Jones (2003, 2007) for more mathematical treatments.

A basic question in the analysis of point patterns is to assess whether the events have the CSR property. Departures from this comprise either clustering (events tend to aggregate) or regularity (events tend not to aggregate). The standard model by which to assess the CSR property is the HPP. Testing for CSR is based on either counts of events in regions (quadrants) or distance-based measures using the event locations. Focusing on the former, the distributions of some test statistics are known (usually only asymptotically), which allows for closed-form tests. The default is the chi-square test, whereby the region D is bounded by a rectangle and divided into r rows and c columns. If nij denotes the number of events in the quadrant corresponding to the i-th row and j-th column, and $$\overline{\mathrm{n}}$$ is the expected number of events in any quadrant, then under CSR the statistic
$${\chi}^2=\sum_{i=1}^r\sum_{j=1}^c\frac{{\left({n}_{i j}-\overline{n}\right)}^2}{\overline{n}},$$

follows a $${\chi}_{rc-1}^2$$ distribution, asymptotically. Tests based on more complex nonstandard statistics can be carried out by resorting to Monte Carlo simulation.

Rejection of CSR may lead one to consider modeling a possibly nonconstant intensity function. This can be done either parametrically, by proposing a specific function for the intensity whose parameters are then estimated via maximum likelihood, or nonparametrically by means of kernel smoothing. For example, under an IPP λ(s) can be estimated as a function of coordinates or covariates by fitting a log-linear model of the form
$$\log \left(\lambda (s)\right)={\beta}_0+\sum_{j=1}^p{\beta}_j{f}_j(s),$$

which provides a way to accommodate departures from CSR based on changes in the mean structure.

Alternatively, rejection of CSR may lead one to consider modeling interactions between events, when for non-overlapping regions A and B, N(A), and N(B) are correlated. The second-order intensity function, λ2(s, u), extends the definition of λ(s) to measure the covariance between points at s and u, defined as
${\lambda }_{2}\left(s,u\right)=\underset{v\left(\mathit{ds}\right),v\left(\mathit{du}\right)\to 0}{\mathit{lim}}\frac{\mathrm{E}\left\{N\left(\mathit{ds}\right)N\left(\mathit{du}\right)\right\}}{v\left(\mathit{ds}\right)v\left(\mathit{du}\right)}.$
For stationary and isotropic processes, where λ(s) ≡ λ and λ2(s, u) = λ2(‖s − u‖) ≡ λ2(t), the K-function is a more informative tool for assessing dependence defined, when d = 2, as
$$K(t)=\frac{2\pi}{\lambda^2}{\int}_0^t x{\lambda}_2(x)\mathrm{dx}.$$

Then, λ K.(t) represents the expected number of extra events within a distance t from the origin, given that there is an event at the origin. For a HPP one has K (t) = π t2; values larger (smaller) than this being indicative of clustering (regularity) on that distance scale. Plotting the estimated K(t) versus t, or the closely related L-function, $$L(t)=\sqrt{K(t)/\pi}$$, enables one to glean the degree of dependence with reference to the HPP for which L(t) = t; see Diggle (2003) and Illian et al. (2008) for further details.

## Key Applications

### Example 1

As an illustration of a geostatistical dataset Fig. 2a displays pH measurements of wet deposition (acid rain) at 39 rainfall stations taken in April 1987 over the Lower Saxony state in northwest Germany (Berke 1999). Each datum is associated with the sampling location where the pH measurement was taken. For instance, a pH value of 4.63 was observed at the sampling location s = (0.61,0.1) (the southernmost station). For this dataset the coordinates of the sampling locations were provided without units and are all between 0 and 1, which (presumably) mean they were scaled by the maximum distance between stations. A key characteristic of this phenomenon is that a pH value is associated with each location. A typical goal in the analysis of such datasets is the prediction of pH values over a dense grid of prediction locations, which together provide an estimated map of pH over the entire region. Fig. 2 (a) pH measurements and sampling locations and (b) empirical and fitted semivariogram function of pH measurements

By plotting the pH values against the spatial coordinates, it can be seen that the pH values tend to decrease in the eastward direction and increase in the northward direction. We use a model with μ(s) = β1 + β2 x + β3 y, with s = (x, y), for which the OLS estimates are $$\left({\widehat{\beta}}_1,{\widehat{\beta}}_2,{\widehat{\beta}}_3\right)=\left(5.627,-1.440,0.761\right)$$. The second-order specification is completed by assuming the covariance function of the true pH process is isotropic and exponential. Figure 2b shows empirical semivariogram estimates at a few selected distances (dots) based on the OLS residuals. It displays an apparent discontinuity at the origin, suggesting the data contain measurement error, so the covariance function of the pH data is C(h) = σ2 exp. (−h = ϕ) + r21{h = 0}. The estimated semivariogram function is also displayed in Fig. 2b (line), obtained using the parameters $$\left({\widehat{\sigma}}^2,\widehat{\phi},{\widehat{\tau}}^2\right)=\left(0.270,0.070,0.059\right)$$, estimated by least squares.

Figure 3a shows a map of estimated pH values obtained by computing the kriging predictor with estimated parameters at about 4,200 prediction locations located inside the convex hull of the sampling locations. Except for the northwest corner of the prediction region that correspond to a group of islands, the pH values are high in the northwest of the state and decrease toward the south and east. Figure 3b shows a map of the square root of the kriging variance at the prediction locations, displaying the typical behavior of having small values at prediction locations close to some sampling location and larger values away from sampling locations. Fig. 3 (a) Map of kriging predictor of pH and (b) map of square root of kriging variance of pH

### Example 2

As an illustration of a lattice dataset, we study the relation between poverty level (POV) and total population (POP) at the county level in 2009 in the US state of Texas, using data obtained from the US Census Bureau. Figure 4 displays the state of Texas, composed of 254 counties color-coded by the 2009 logarithm of poverty levels. By plotting the data it can be seen that the logarithm of poverty level is closely linearly related with the logarithm of total population, where the least squares fit is ê{log . POV)|POP} =− 1 : 741 + 0 : 992 ⋅  log O POP/. Based on the residuals from this fit, we have that Moran’s and Geary’s statistics are I = 0:391 and c = 0.568, respectively, which are both highly significant for the hypothesis of no spatial association (p-values <10−15). Hence, there is substantial spatial association among county log poverty levels, even after accounting for log total population. Fig. 4 Choropleth map of county log poverty level in the US state of Texas in 2009

We fitted both CAR and SAR models using log poverty level as the response and log total population as the explanatory variable, and the neighborhood system based on geographic adjacency: two counties are neighbors if and only if their boundaries intersect. As for the weights we assume that wij = 1 for any two neighbors si and sj. The SAR model is fit by maximum likelihood, resulting in the estimates ê{log(POV) |POP} =− 2.123 +1 :034⋅  log (POP), and $$\left({\widehat{\sigma}}^2,\widehat{\rho}\right)=\left(0.067,0.116\right)$$. The estimated mean for the CAR model is similar, but the fit is slightly inferior.

### Example 3

As an illustration of a point pattern data, we consider earthquakes (with magnitude 1.0 or more on the Richter scale) that occurred worldwide in 2011 over the 8 consecutive days beginning at 00:00 h UTC on May 20. Figure 5a displays the locations of the 981 events as a “bubble map” with respect to magnitude (size of bubble is proportional to square root of earthquake’s magnitude) and so provides a fair visual comparison of the relative sizes (magnitudes) among events. The color-coding scheme renders earlier events in lighter shades of orange and later events in darker shades of red. Since magnitude is an attribute recorded along with each event’s location, this is a marked point pattern. Fig. 5 (a) Worldwide earthquakes, May 20–27, 2011, and (b) Corresponding estimated intensity function

We focus merely on assessing tendency for clustering and disregard magnitude. It is obvious in the current context that there is clustering as geology informs us that this tends to occur at the junction of tectonic plates and fault zones. The Aleutian Islands/Bering Strait and southern Alaska are prominent “hot spots.” In fact, a chi-square test strongly rejects CSR (p-value ≈ 10−16). Assuming (for the sake of illustration) stationarity and isotropy, the estimated L-function reveals a pronounced upward bow that falls well outside the 95% confidence envelopes for a HPP, thus further confirming the strong tendency for clustering on this spatial scale.

Since a constant intensity function is an inadequate hypothesis, we continue the analysis by producing an estimate of the intensity function in the context of an IPP. The result is displayed in Fig. 5b which shows a kernel smoothing estimate (with bandwidth selected by cross-validation; see Diggle 2003). Since intensity is the expected number of (random) points per unit area, the units are “earthquakes per unit area.” The two Alaskan hot spots alluded to earlier are clearly visible. Interestingly, the central Caribbean emerges as a third hot spot.

## Historical Background and Final Remarks

Early pioneers of statistical inference (e.g., Fisher, Gossett, Pearson) alluded to issues arising from the correlation of observations due to spatial proximity in designed experiments and proposed methods to account for it. Some of the history and early developments in spatial statistics is reviewed in chapter 1 of Gelfand et al. (2010). Since some areas of spatial statistics have not been included in this brief overview, we end with some additional pointers to the literature. A review of non-stationary spatial processes is given in chapter 9 of Gelfand et al. (2010). The problems of spatial sampling and design (how and where to collect the data) are treated in Cressie (1993), Le and Zidek (2006), Müller (2007), and chapter 10 of Gelfand et al. (2010). Multivariate methods in spatial statistics are treated in Banerjee et al. (2004), Le and Zidek (2006), Wackernagel (2010), and chapter 21 of Gelfand et al. (2010). Hierarchical models for the modeling of non-Gaussian spatial data, specially models for discrete spatial data, are discussed in Banerjee et al. (2004) and Diggle and Ribeiro (2007), where the Bayesian approach is featured prominently. Models for more complex types of spatial random objects are treated in Matheron (1975), Cressie (1993), and Nguyen (2006). Finally, an extensive discussion of available software written in R that implements the methods described here for the statistical analysis of the three types of spatial data appears in Bivand et al. (2008).

## References

1. Anselin L (1988) Spatial econometrics: methods and models. Kluwer, Dordrecht
2. Banerjee S, Carlin BP, Gelfand AE (2004) Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC, Boca Raton
3. Berke O (1999) Estimation and prediction in the spatial linear model. Water Air Soil Pollut 110:215–237
4. Bivand RS, Pebesba EJ, Gómez-Rubio V (2008) Applied spatial data analysis with R. Springer, New York
5. Chilès J-P, Delfiner P (1999) Geostatistics: modeling spatial uncertainty. Wiley, New York
6. Cliff AD, Ord JK (1981) Spatial processes: models and applications. Pion, London
7. Cressie NAC (1993) Statistics for spatial data. Wiley, New York
8. Daley D, Vere-Jones DJ (2003) Introduction to the theory of point processes, volume I: elementary theory and methods, 2nd edn. Springer, New York
9. Daley D, Vere-Jones DJ (2007) Introduction to the theory of point processes, volume II: general theory and structure, 2nd edn. Springer, New York
10. Diggle PJ (2003) Statistical analysis of spatial point patterns, 2nd edn. Arnold, New York
11. Diggle PJ, Ribeiro PJ (2007) Model-based geostatistics. Springer, New York
12. Gelfand AE, Diggle PJ, Guttorp P, Fuentes M (eds) (2010) Handbook of spatial statistics. Chapman & Hall/CRC, Boca Raton
13. Illian J, Penttinen A, Stoyan H, Stoyan D (2008) Statistical analysis and modelling of spatial point patterns. Wiley, Chichester
14. Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic, LondonGoogle Scholar
15. Le ND, Zidek JV (2006) Statistical analysis of environmental space-time processes. Springer, New York
16. LeSage JP, Pace RK (2009) Introduction to spatial econometrics. Chapman & Hall/CRC, Boca Raton
17. Li SZ (2009) Markov random field modeling in image analysis, 3rd edn. Springer, London
18. Matérn B (1986) Spatial variation. Lecture notes in statistics, 2nd edn. Springer, Berlin
19. Matheron G (1975) Random sets and integral geometry. Wiley, New York
20. Müller WG (2007) Collecting spatial data: optimum design of experiments for random fields, 3rd edn. Springer, Heidelberg
21. Nguyen HT (2006) An introduction to random sets. Chapman & Hall/CRC, Boca Raton
22. Ripley BD (1981) Spatial statistics. Wiley, New York
23. Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. Chapman & Hall/CRC, Boca Raton
24. Schabenberger O, Gotway CA (2005) Statistical methods for spatial data analysis. Chapman & Hall/CRC, Boca Raton
25. Sjöstedt-De Luna S, Young A (2003) The bootstrap and kriging prediction intervals. Scand J Stat 30:175–192
26. Stein ML (1999) Interpolation of spatial data: some theory for kriging. Springer, New York
27. Wackernagel H (2010) Multivariate geostatistics: an introduction with applications, 3rd edn. Springer, Berlin
28. Yaglom AM (1987) Correlation theory of stationary and related random function I: basic results. Springer, New York

## Copyright information

© Springer Science+Business Media LLC 2016

## Authors and Affiliations

1. 1.Department of Management Science and StatisticsThe University of Texas at San AntonioSan AntonioUSA
2. 2.Department of Mathematics & StatisticsTexas Tech UniversityLubbockUSA

## Section editors and affiliations

• Suheil Khoury
• 1
1. 1.American University of SharjahSharjahUnited Arab Emirates