# Spatial Statistics

**DOI:**https://doi.org/10.1007/978-1-4614-7163-9_167-1

## Synonyms

## Glossary

Correlation/covariance

Measures of similarity between observations

Geostatistics

A branch of spatial statistics

Isotropy

Property of covariance and variogram functions that make them is invariant under rotation of locations

Kriging

Method for linear unbiased prediction

Random field

A collection of random variables indexed by location

Stationarity

Property of random fields in which their mean and covariance functions are invariant under translation of locations

Variogram/semivariogram

Measures of dissimilarity between observations

## Definition

Spatial statistics is a branch of statistics that studies methods to make inference based on data observed over spatial regions. In typical applications these regions are either 2- or 3-dimensional. The methodology is mostly aimed at accounting and modeling aspects of the so-called First Law of Geography: attributes from locations that are closer together are more closely related than attributes from locations that are farther apart. This is accomplished through appropriate measures of spatial association. An overview of models and methods is given for the three main types of spatial data: geostatistical, lattice, and point pattern.

## Introduction

Spatial data refer to measurements of phenomena that vary over a region of space *D* ⊂ ℝ^{ d } , *d* ≥ 1, which would be called the region of interest. Each datum is associated to a subset of D that indicates where it was collected, often called the datum’s support. This may be a single point or a larger subset, depending on the context.

_{i}, z

_{i}) : i = 1, … , n} where the interpretation and characteristics of the data components vary from type to type. For geostatistical data z

_{1},…,z

_{n}are measurements or observations of a phenomenon of interest taken at sampling locations s

_{1}; … , s

_{n}∈ D, which are single points. In the models to be described later, the z

_{i}s are random, while

*n*(the sample size) and the s

_{i}s are known and fixed. For lattice data s

_{1},…,s

_{n}are subregions that form a partition of D, such as counties or postal codes, and z

_{1},…,z

_{n}are averages or summaries of the phenomenon of interest over these subregions. For this type of data, it also holds that the z

_{i}s are random, while n and the s

_{i}s are known and fixed. For point pattern data s

_{1},…,s

_{n}are points where a certain event of interest occurs, such as the presence of a type of tree or the epicenter of an earthquake, and z

_{1},…,z

_{n}are a feature of the aforementioned events, such as the diameter of the tree at breast height or the magnitude of the earthquake. In the models for point pattern data to be described later, all components n, s

_{i}and, z

_{i}are random. Often the z

_{i}s are absent when interest centers only on the pattern of occurrences. For all three types of spatial data, additional variables could also be available, that serve as explanatory variables. Comprehensive treatments of statistical models and methods for all three types of data appear in Cressie (1993), Schabenberger and Gotway (2005), and the recent edited volume by Gelfand et al. (2010). Table 1 summarizes the key concepts and gives an overview of models and examples.

Summary and overview of concepts, models, and examples in the three types of spatial data

Geostatistical | Lattice | Point pattern | |
---|---|---|---|

Domain D | Fixed, continuous | Fixed, discrete | Random, continuous |

Observation sites {s | s | s | s |

Inference for | Z.(s) only | Z.(s) only | Both Z.(s) and D |

Main models for Z.(s) | Sum of regression trend and stationary random field | Simultaneous Autoregressive Model (SAR) | Poisson process (homogeneous and inhomogeneous) |

Key aims, concepts | Kriging (minimum MSE prediction) | Spatial proximity matrix (W) | Assess tendency for clustering |

Examples | Meteorological and geological variables | Geographic and demographic variables | Location and intensity of events |

## Key Points

### Random Fields

*D*⊂ ℝ

^{ d }is a collection of random variables indexed by the elements of D, where D can be finite or infinite. These random variables are often nonidentically distributed and dependent, so modeling these aspects is a key starting point. The simplest way to do this is through the mean and covariance functions of the random field, defined as

The former determines the spatial trend, a measure of variation over large distances, while the latter determines the spatial association, a measure of variation over small distances.

^{2}. (s) = var{Z . (s)} is the variance function. The functions C.(s, u) and γ.(s, u) provide similar information about the spatial association of the random field, with the former being a measure of similarity between Z.(s) and Z(u), while the latter is a measure of dissimilarity. When choosing the aforementioned functions, it is important to note that any function can be used as a mean function, but not any function can be used as a covariance function. The latter needs to be positive semi-definite, meaning that for any

*m*∈ ℕ ,

*s*

_{1}, … ,

*s*

_{ m }∈

*D and a*

_{1}, … ,

*a*

_{ m }∈ ℝ it holds that

This is a difficult condition to verify, but fortunately the literature provides many functions known to be positive semi-definite; see Cressie (1993) and Chilès and Delfiner (1999) for examples. These references also provide an intermediate treatment on the theory and methods of random fields and their application to spatial statistics, while Matérn (1986), Yaglom (1987), and Stein (1999) provide more mathematical treatments.

_{1},…,s

_{n}} is endowed with a neighborhood system {N

_{i}: i = 1, … , n}, where N

_{i}denotes the subregions that are, in a precisely defined way, neighbors of subregion s

_{i}. For rectangular regular lattices where the subregions may be thought of as pixels, it is common to use first-order neighborhood systems, where the neighbors of a pixel are the pixels adjacent to the north, south, east, and west; see Fig. 1a. Second-order neighborhood systems are also used, where the neighbors of a pixel are its first-order neighbors and their first-order neighbors; see Fig. 1b. In these cases all pixels have the same number of neighbors, except for pixels at (or near) the boundary of D. For regions divided in unequally shaped subregions (like counties in a state), a commonly used neighborhood system is defined in terms of geographic adjacency, N

_{i}= {s

_{j}: subregions s

_{i}and s

_{j}share a boundary}; other examples not based on geographic adjacency are also possible. In these cases the number of neighbors for each subregion usually differs.

In addition a weight (or neighborhood) matrix W = (w_{ij}) is specified, where w_{ij} measures the strength of direct association between sites s_{i} and s_{j}. It must satisfy that w_{ij} ≥ 0, w_{ii} = 0 and w_{ij} > 0 if and only if s_{i} and s_{j} are neighbors (i.e., s_{j} ∈ N_{i}). The most common example of weight matrix is w_{ij} = 1 if s_{i} and s_{j} are neighbors and w_{ij} = 0 otherwise, but other more refined specifications are also possible, e.g., based on distance between subregions’ centroids. Anselin (1988), Cressie (1993), Rue and Held (2005), and LeSage and Pace (2009) provide ample treatments of models and methods for the analysis of lattice data.

For geostatistical and lattice data, the sampling locations s_{1},…,s_{n} are fixed and known, so these types of data are usually written as z = (z_{1};…,z_{n})^{T} (^{T} denotes transpose of a vector or matrix). The stochastic approach for modeling and inference assumes the data are a part of a realization of a random field Z(∙), so datum z_{i} is the realized value of the random variable Z(s_{i}).

### Stationarity and Ergodicity

Spatial data typically contain no replicates as usually a single observation is available at each location, so some assumptions on the random field are needed to make statistical inference feasible. To illustrate this point consider the conceptual decomposition Z(s_{i}) = μ(s_{i}) + ε(s_{i}), with ε(∙)a random field with mean zero and co-variance function C(s, u). Without some extra assumptions it is not possible to identify both μ(s_{i}) and ε(s_{i}) with a single observation at s_{i}. This is so because a term can be added to μ(s_{i}) and subtracted from ε(s_{i}) in infinitely many ways, any of which will not change the datum Z(s_{i}) but will change the components that seek to be identified.

An important and commonly used special case of stationarity is called isotropy, meaning that \( C\left( s, u\right)=\overline{C}\left(|| s- u||\right), \) where \( \left|\right| h\left|\right|:={\left({h}_1^2+\cdots +{h}_d^2\right)}^{1/2} \) is the Euclidean norm of *h* ∈ ℝ^{ d } and \( \overline{\mathrm{C}}\ \left(\cdot \right) \) is a function of a single real variable. In this case the covariance function is also invariant under rotations of the spatial locations, so the nature of spatial association is the same in all directions; see Ripley (1981), Cressie (1993), and Schabenberger and Gotway (2005) for further discussion on stationarity.

A precise definition of ergodicity is somewhat technical (see Cressie 1993, pp. 53–58), but this assumption is key to make statistical inference based on spatial data feasible. This is so because the meaning and interpretation of many features of a random field, such as the mean function, are based on ensemble (i.e., population) averages, namely, averages over the possible realizations of the random field. Ergodicity requires that spatial averages computed from a single realization converge to their respective ensemble averages as the sample size increases to infinity.

∀*m* ∈ ℕ *and s* _{1} , … , *s* _{ m } ∈ *D*. The simplest and most commonly used of such specification is that of Gaussian random fields, meaning that all the aforementioned distributions are multi-variate normal. Gaussian random fields are completely specified by their mean and covariance functions, and when they are stationary, a sufficient condition for them to be ergodic is that \( { \lim}_{\left\Vert h\right\Vert \to \infty}\tilde{C}(h)=0 \). Gaussian random fields are the most commonly used models because of their convenient mathematical properties and wide applicability, as well as their use as “building blocks” for more complex random fields models. Examples of the latter are hierarchical models used to describe discrete spatial data; see Banerjee et al. (2004) and Diggle and Ribeiro (2007).

## Models and Inference

### Geostatistical Data Models

_{1},…,β

_{p})

^{T}are unknown regression parameters and f(s) = (f

_{1}(s),…,f

_{p}(s))

^{T}are known location-dependent covariates. The latter may include related spatially varying processes. For instance, if Z(s) = rainfall amount that fell over a period of time at locations s, then f.s/ = altitude at location s may be a useful explanatory variable. More often a spatial trend is described in terms of a polynomial in the spatial coordinates. For the case when d = 2 and s = (x, y), this would be

where Γ(∙) is the gamma function and \( {\mathrm{K}}_{\nu}\left(\cdot \right) \) is the modified Bessel function of the second kind and order v. For such model ϕ > 0 (mainly) controls how fast the correlation decreases with distance, and v > 0 controls the smoothness of the realizations of the random field. The commonly used exponential and Gaussian covariance functions are special cases obtained, respectively, by setting v = 1/2 and v → ∞.

_{1},…,ó

_{n}are assumed i.i.d with mean 0, variance τ

^{2}> 0 and independent of Z(∙). Under the above model the data Z

_{obs}= (Z

_{1 , obs}, … , Z

_{n , obs})

^{T}follow the general linear model

where X is the n by p matrix with entries (X)_{ij} = f_{j}(s_{i}) and ε is a random vector with $\mathrm{E}\left\{\epsilon \right\}=0$ and var{*ε*} = var{*Z* _{obs}} = Σ_{θ}, with the n by n matrix Σ_{θ} having entries (Σ_{θ})_{ij} = σ^{2}(2/τ(γ))(t_{ij}/2ϕ)^{γ}K_{γ}(t_{ij} = ϕ) and *t* _{ij} = ∥ s_{i} − s_{j}1 ; 1 (A) denotes the indicator function of A. This basic specification of a geostatistical model depends on unknown regression parameters β and covariance parameters θ = (σ^{2}; ϕ, v; τ^{2}).

#### Parameter Estimation

where Q = I_{n} (ordinary least squares) or \( Q={\Sigma}_{\boldsymbol{\theta}}^{-1} \) (generalized least squares); the latter requires an estimate of Σ_{θ}. In both cases X is assumed to have full rank. The second choice of Q results in a more efficient estimator, but often in practice there is little difference between them. The resulting trend surface estimate is \( \widehat{\mu}(s)=\boldsymbol{f}{(s)}^{\mathbf{T}}\widehat{\boldsymbol{\beta}} \).

_{1}<…<t

_{k}, the (model-free) semivariogram estimates are first computed

_{i}− s

_{j}∥ < t + Δt}, with Δt > 0 fixed and |N(t)| the number of elements in N(t). A proposed semi-variogram model, say γ(t; θ), is then fitted to the above semivariogram estimates \( \widehat{\gamma}\left({t}_1\right),\dots, \widehat{\gamma}\left({t}_k\right) \) using (nonlinear) least squares, so the covariance parameter estimates are

The resulting semivariogram function estimate is \( \gamma \left(;\widehat{\boldsymbol{\theta}}\right) \). When μ(s) is not constant a similar procedure is done using the residuals \( \mathbf{e}={\mathbf{z}}_{\mathrm{obs}}- X\widehat{\boldsymbol{\beta}} \), rather than the observed data. This estimation method is popular among practitioners, but the statistical properties of the resulting estimators are not well understood.

This method is more statistically satisfactory than the two-stage approach described above but is also more computationally demanding, to the point of not being feasible for very large datasets (n very large) due the need of storing and numerically inverting the n by n matrix Σ_{θ}; see Cressie (1993), Schabenberger and Gotway (2005), and chapter 4 in Gelfand et al. (2010) for other methods of estimation.

#### Spatial Prediction (Kriging)

_{0}) where s

_{0}∈ D is an unsampled location. The classical approach uses optimal linear unbiased prediction and only requires knowledge of the mean and covariance (or semivariogram) functions. Specifically, the method seeks to minimize the mean squared prediction error

_{0}= cov{Z

_{obs}, Z(s

_{0})}; this is called the best linear unbiased predictor (BLUP) or kriging predictor of Z(s

_{0}). The usual uncertainty measure associated with the kriging predictor is \( \mathrm{MSPE}\left({\widehat{Z}}^K\left({s}_0\right)\right) \), which is given by

When the random field Z(∙) is Gaussian, then \( {\widehat{Z}}^K\left({s}_0\right) \) is also the best unbiased predictor (it minimizes MSPE(∙) over the class of all unbiased predictors), and a 95 % prediction interval for Z(s_{0}) is \( {\widehat{Z}}^K\left({s}_0\right)\pm 1.96{\boldsymbol{\sigma}}^K\left({s}_0\right) \); see Cressie (1993), Chilès and Delfiner (1999), and Schabenberger and Gotway (2005) for methodological details and Stein (1999) for theoretical underpinnings.

The computation of kriging predictors and the validity of their optimality properties require the covariance parameters θ to be known, which is certainly not the case in practice. The simplest and most commonly used practical solution is to use empirical or plug-in predictors and mean squared prediction errors obtained by replacing in the above formulas unknown co-variance parameters with their estimates. But the properties of the resulting plug-in predictors and mean squared prediction errors differ from those of their known covariance parameters counterparts since the former do not take into account the sampling variability of parameter estimators. As a result plug-in mean square prediction errors tend to underestimate the true mean square prediction errors of plug-in predictors, and the true coverage probability of plug-in prediction intervals tends to be smaller than nominal. Possible approaches to account for parameter uncertainty when performing predictive inference include using bootstrap (Sjöstedt-De Luna and Young 2003) and the Bayesian approach (Banerjee et al. 2004; Diggle and Ribeiro 2007), where the latter approach appears to be the most effective.

### Lattice Data Models

where \( \overline{Z}=\frac{1}{n}{\sum}_{i=1}^n Z\left({s}_i\right) \) and \( {S}_0={\sum}_{i=1}^n{\sum}_{j=1}^n{w}_{i j} \). For Gaussian processes, $\mathrm{E}\left\{I\right\}=-{\left(n-1\right)}^{-1}$ and $\mathrm{E}\left\{c\right\}=1$ when observations are independent. Hence, observed values of I substantially below/above −(n − 1)^{−1} indicate negative/positive spatial association, while for Geary’s c the interpretation is reversed, with observed values of c substantially above/below 1 indicating negative/positive association. When the random field has a nonconstant mean, the above statistics are computed using residuals; see Cressie (1993) and Cliff and Ord (1981) for further details.

_{i}} and weight matrix W. One of the most common models for lattice data is the Simultaneous Auto-regressive (SAR) model specified by a set of autoregressions

where f(s_{j}) and β have the same interpretation as in models for geostatistical data and ó_{ i } ~ N(0, ξ_{i}) are independent errors. This is a spatial analogue of autoregressive time series models, but unlike the latter the response and error vectors are correlated. Provided I_{n} − ρW is non-singular, it follows that Z ~ N_{n}(Xβ, (I_{n} − ρW)^{−1}M(I_{n} − ρW^{T})^{−1}), where M = diag(*ξ* _{1}, … , *ξ* _{ n }). It is common to assume *ξ* _{ i } = *ξ*for all i, in which case the model parameters are β and θ = (ξ, ρ).

_{i}) given Z

_{(i)}, i = 1,…, n, where Z

_{(i)}= (Z(s

_{j}) : j ≠ i). In addition, these models assume a Markov property stating that the distribution of each datum depends on the rest only through its neighbors. An example of this is the class of Conditional Autoregressive (CAR) models with full conditional distributions

To guarantee the above set of full conditional distributions determines a unique joint distribution, it is required that \( {\sigma}_j^2{w}_{i j}={\sigma}_i^2{w}_{j i} \) for all i,j, and M^{−1}(I_{n} − ρW) be positive definite, with \( M=\mathrm{diag}\left({\sigma}_1^2,\dots, {\sigma}_n^2\right) \), in which case Z ~ N_{n}(Xβ, (I − ρW)^{−1}M). It is common to assume that \( {\sigma}_i^2={\sigma}^2 \) for all i, in which case the model parameters are β and θ = (σ^{2}, ρ). An extensive comparison between the SAR and CAR models is given in Cressie (1993, chapter 6).

The most commonly used method for parameter estimation in these models is maximum likelihood. As for geostatistical models, the resulting estimators are given by Eq. 1 where in the likelihood Eq. 2 \( {\Sigma}_{\boldsymbol{\theta}}^{-1}=\left({I}_n-\rho {W}^T\right){M}^{-1}\left({I}_n-\rho W\right) \) for SAR models and \( {\Sigma}_{\boldsymbol{\theta}}^{-1}={M}^{-1}\left( I-\rho W\right) \) for CAR models. For both models (as for geostatistical models) the computation of these estimators requires the use of numerical iterative methods.

A point worth noting is that, unlike in geostatistical models, in SAR and CAR models the spatial association structure is specified in terms of the inverse covariance matrix, rather than the covariance matrix, so the interpretation of parameters controlling spatial association is less straightforward than that in geostatistical models.

### Point Process Models

*D*⊂ ℝ

^{ d }is a random field whose realizations are sets of points in D, called point patterns (events). In the most general case attributes may also be observed along with the location of the events, resulting in a marked point process. For any A ⊂ D, let N(A) denote the number of events in A and v(A) the size of A(=∫

_{A}ds). The intensity function of a point process is the function λ : D → [0; ∞) with the property that $\mathrm{E}\left\{N\left(A\right)\right\}={\int}_{A}\lambda \left(s\right)\mathit{ds}$. Alternatively, using an “infinitesimal disc” ds centered at s the intensity function can be defined as the ratio of the expected number of points in ds to its size, that is,

The most fundamental point process model is the Poisson process with intensity function λ(s), which satisfies the following: For any *n* ∈ ℕ and A_{1} , … , A_{n} disjoint subsets of D, it holds that (i) N(A_{i}) has Poisson distribution with mean \( {\int}_{A_i}\lambda (s) ds \), and (ii) N(A_{1}),…,N(A_{n}) are independent random variables. When the intensity function is constant, λ(s) = λ, the above is called a homogeneous Poisson process (HPP), and otherwise it is called an inhomogeneous Poisson process (IPP). Point patterns from HPP have the property of complete spatial randomness (CSR): given the number of events in a set A, these events are independently and identically distributed over A, so there is no “interaction” between events. Poisson processes are often used on their own for the analysis of point patterns, or as “building blocks” for more complex models; see Diggle (2003) and Illian et al. (2008) for introductory treatments and Cressie (1993) and Daley and Vere-Jones (2003, 2007) for more mathematical treatments.

_{ij}denotes the number of events in the quadrant corresponding to the i-th row and j-th column, and \( \overline{\mathrm{n}} \) is the expected number of events in any quadrant, then under CSR the statistic

follows a \( {\chi}_{rc-1}^2 \) distribution, asymptotically. Tests based on more complex nonstandard statistics can be carried out by resorting to Monte Carlo simulation.

which provides a way to accommodate departures from CSR based on changes in the mean structure.

_{2}(s, u), extends the definition of λ(s) to measure the covariance between points at s and u, defined as

_{2}(s, u) = λ

_{2}(‖s − u‖) ≡ λ

_{2}(t), the K-function is a more informative tool for assessing dependence defined, when d = 2, as

Then, λ K.(t) represents the expected number of extra events within a distance t from the origin, given that there is an event at the origin. For a HPP one has K (t) = π t^{2}; values larger (smaller) than this being indicative of clustering (regularity) on that distance scale. Plotting the estimated K(t) versus t, or the closely related L-function, \( L(t)=\sqrt{K(t)/\pi} \), enables one to glean the degree of dependence with reference to the HPP for which L(t) = t; see Diggle (2003) and Illian et al. (2008) for further details.

## Key Applications

### Example 1

By plotting the pH values against the spatial coordinates, it can be seen that the pH values tend to decrease in the eastward direction and increase in the northward direction. We use a model with μ(s) = β_{1} + β_{2} x + β_{3} y, with s = (x, y), for which the OLS estimates are \( \left({\widehat{\beta}}_1,{\widehat{\beta}}_2,{\widehat{\beta}}_3\right)=\left(5.627,-1.440,0.761\right) \). The second-order specification is completed by assuming the covariance function of the true pH process is isotropic and exponential. Figure 2b shows empirical semivariogram estimates at a few selected distances (dots) based on the OLS residuals. It displays an apparent discontinuity at the origin, suggesting the data contain measurement error, so the covariance function of the pH data is C(h) = σ^{2} exp. (−h = *ϕ*) + r^{2}1{h = 0}. The estimated semivariogram function is also displayed in Fig. 2b (line), obtained using the parameters \( \left({\widehat{\sigma}}^2,\widehat{\phi},{\widehat{\tau}}^2\right)=\left(0.270,0.070,0.059\right) \), estimated by least squares.

### Example 2

*O*POP/. Based on the residuals from this fit, we have that Moran’s and Geary’s statistics are I = 0:391 and c = 0.568, respectively, which are both highly significant for the hypothesis of no spatial association (p-values <10

^{−15}). Hence, there is substantial spatial association among county log poverty levels, even after accounting for log total population.

We fitted both CAR and SAR models using log poverty level as the response and log total population as the explanatory variable, and the neighborhood system based on geographic adjacency: two counties are neighbors if and only if their boundaries intersect. As for the weights we assume that w_{ij} = 1 for any two neighbors s_{i} and s_{j}. The SAR model is fit by maximum likelihood, resulting in the estimates ê{log(POV) |POP} =− 2.123 +1 :034⋅ log (POP), and \( \left({\widehat{\sigma}}^2,\widehat{\rho}\right)=\left(0.067,0.116\right) \). The estimated mean for the CAR model is similar, but the fit is slightly inferior.

### Example 3

We focus merely on assessing tendency for clustering and disregard magnitude. It is obvious in the current context that there is clustering as geology informs us that this tends to occur at the junction of tectonic plates and fault zones. The Aleutian Islands/Bering Strait and southern Alaska are prominent “hot spots.” In fact, a chi-square test strongly rejects CSR (p-value ≈ 10^{−16}). Assuming (for the sake of illustration) stationarity and isotropy, the estimated L-function reveals a pronounced upward bow that falls well outside the 95% confidence envelopes for a HPP, thus further confirming the strong tendency for clustering on this spatial scale.

Since a constant intensity function is an inadequate hypothesis, we continue the analysis by producing an estimate of the intensity function in the context of an IPP. The result is displayed in Fig. 5b which shows a kernel smoothing estimate (with bandwidth selected by cross-validation; see Diggle 2003). Since intensity is the expected number of (random) points per unit area, the units are “earthquakes per unit area.” The two Alaskan hot spots alluded to earlier are clearly visible. Interestingly, the central Caribbean emerges as a third hot spot.

## Historical Background and Final Remarks

Early pioneers of statistical inference (e.g., Fisher, Gossett, Pearson) alluded to issues arising from the correlation of observations due to spatial proximity in designed experiments and proposed methods to account for it. Some of the history and early developments in spatial statistics is reviewed in chapter 1 of Gelfand et al. (2010). Since some areas of spatial statistics have not been included in this brief overview, we end with some additional pointers to the literature. A review of non-stationary spatial processes is given in chapter 9 of Gelfand et al. (2010). The problems of spatial sampling and design (how and where to collect the data) are treated in Cressie (1993), Le and Zidek (2006), Müller (2007), and chapter 10 of Gelfand et al. (2010). Multivariate methods in spatial statistics are treated in Banerjee et al. (2004), Le and Zidek (2006), Wackernagel (2010), and chapter 21 of Gelfand et al. (2010). Hierarchical models for the modeling of non-Gaussian spatial data, specially models for discrete spatial data, are discussed in Banerjee et al. (2004) and Diggle and Ribeiro (2007), where the Bayesian approach is featured prominently. Models for more complex types of spatial random objects are treated in Matheron (1975), Cressie (1993), and Nguyen (2006). Finally, an extensive discussion of available software written in R that implements the methods described here for the statistical analysis of the three types of spatial data appears in Bivand et al. (2008).

## Cross-References

## Notes

### Acknowledgments

The authors thank Edgar Muñoz for producing Fig. 4. The first author was partially supported by National Science Foundation Grant HRD-0932339.

## References

- Anselin L (1988) Spatial econometrics: methods and models. Kluwer, DordrechtCrossRefzbMATHGoogle Scholar
- Banerjee S, Carlin BP, Gelfand AE (2004) Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC, Boca RatonzbMATHGoogle Scholar
- Berke O (1999) Estimation and prediction in the spatial linear model. Water Air Soil Pollut 110:215–237CrossRefGoogle Scholar
- Bivand RS, Pebesba EJ, Gómez-Rubio V (2008) Applied spatial data analysis with R. Springer, New YorkzbMATHGoogle Scholar
- Chilès J-P, Delfiner P (1999) Geostatistics: modeling spatial uncertainty. Wiley, New YorkCrossRefzbMATHGoogle Scholar
- Cliff AD, Ord JK (1981) Spatial processes: models and applications. Pion, LondonzbMATHGoogle Scholar
- Cressie NAC (1993) Statistics for spatial data. Wiley, New YorkzbMATHGoogle Scholar
- Daley D, Vere-Jones DJ (2003) Introduction to the theory of point processes, volume I: elementary theory and methods, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
- Daley D, Vere-Jones DJ (2007) Introduction to the theory of point processes, volume II: general theory and structure, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
- Diggle PJ (2003) Statistical analysis of spatial point patterns, 2nd edn. Arnold, New YorkzbMATHGoogle Scholar
- Diggle PJ, Ribeiro PJ (2007) Model-based geostatistics. Springer, New YorkzbMATHGoogle Scholar
- Gelfand AE, Diggle PJ, Guttorp P, Fuentes M (eds) (2010) Handbook of spatial statistics. Chapman & Hall/CRC, Boca RatonzbMATHGoogle Scholar
- Illian J, Penttinen A, Stoyan H, Stoyan D (2008) Statistical analysis and modelling of spatial point patterns. Wiley, ChichesterzbMATHGoogle Scholar
- Journel AG, Huijbregts CJ (1978) Mining geostatistics. Academic, LondonGoogle Scholar
- Le ND, Zidek JV (2006) Statistical analysis of environmental space-time processes. Springer, New YorkzbMATHGoogle Scholar
- LeSage JP, Pace RK (2009) Introduction to spatial econometrics. Chapman & Hall/CRC, Boca RatonCrossRefzbMATHGoogle Scholar
- Li SZ (2009) Markov random field modeling in image analysis, 3rd edn. Springer, LondonzbMATHGoogle Scholar
- Matérn B (1986) Spatial variation. Lecture notes in statistics, 2nd edn. Springer, BerlinCrossRefzbMATHGoogle Scholar
- Matheron G (1975) Random sets and integral geometry. Wiley, New YorkzbMATHGoogle Scholar
- Müller WG (2007) Collecting spatial data: optimum design of experiments for random fields, 3rd edn. Springer, HeidelbergzbMATHGoogle Scholar
- Nguyen HT (2006) An introduction to random sets. Chapman & Hall/CRC, Boca RatonCrossRefzbMATHGoogle Scholar
- Ripley BD (1981) Spatial statistics. Wiley, New YorkCrossRefzbMATHGoogle Scholar
- Rue H, Held L (2005) Gaussian Markov random fields: theory and applications. Chapman & Hall/CRC, Boca RatonCrossRefzbMATHGoogle Scholar
- Schabenberger O, Gotway CA (2005) Statistical methods for spatial data analysis. Chapman & Hall/CRC, Boca RatonzbMATHGoogle Scholar
- Sjöstedt-De Luna S, Young A (2003) The bootstrap and kriging prediction intervals. Scand J Stat 30:175–192MathSciNetCrossRefzbMATHGoogle Scholar
- Stein ML (1999) Interpolation of spatial data: some theory for kriging. Springer, New YorkCrossRefzbMATHGoogle Scholar
- Wackernagel H (2010) Multivariate geostatistics: an introduction with applications, 3rd edn. Springer, BerlinzbMATHGoogle Scholar
- Yaglom AM (1987) Correlation theory of stationary and related random function I: basic results. Springer, New YorkzbMATHGoogle Scholar