1 Introduction

Fifty years ago, when geostatistics was pioneered by Matheron (1971), its main applications were Kriging and the change of support for mining applications. At the time, geostatistics was presented as a new discipline, without much reference to its relationships with other mathematical interpolation and modeling techniques. This has now changed as the relationships between geostatistics and such techniques as splines, regularization, Bayesian inversion, or ensemble Kalman filtering have become clearer. This convergence is fascinating and has led to many significant developments allowing the integration of multi-disciplinary data into 3-D geostatistical earth models.

This chapter discusses approaches for generating 2-D or 3-D subsurface models constrained by geological (wells), seismic or dynamic data. In spite of the wealth of data available, the uncertainty on the 3-D earth model remains high in most cases. Approaches that are designed to generate one unique “deterministic” model often pick the smoothest one. This is not realistic in situations where the Earth Model is used for flow simulation, as the results are biased if the model heterogeneities are not representative of that of the actual reservoir. More generally, non-linear operations, such as the application of cut-offs, may give biased results when applied to deterministic smooth models such as those produced by Kriging.

The multi-realization approach is now routinely applied to subsurface parameters inversion. Looking at the mean provides much less information than looking at a movie of realizations. …By construction, each of the realizations captures the essential random fluctuations of the actual field from which the data were extracted (Tarantola 2005). This is a fundamental change. The traditional inversion approach could be formulated as “How to find an estimate of the spatial parameters which is as close as possible to the first guess values of these parameters and which provides, through forward modeling, an output which is as close as possible to the available data” (modified from Evensen 2007). These first guess values are usually a smooth (Kriging-like) spatial model of these parameters. Now the question has changed to “Find the probability density function (pdf) of 3-D models constrained by all the existing data, and provide techniques for sampling realizations from this pdf”.

This chapter, written from a geostatistical perspective, discusses the convergence between the existing techniques.

Deterministic approaches such as Kriging, splines, regularization- or energy-based methods generate a single model of the subsurface, which usually minimizes or maximizes an optimisation criterion. These approaches are closely related and their formal relationships are discussed.

Geostatistical simulation is then revisited, and two key simulation algorithms are discussed; The first one is sequential Gaussian simulation and the second one is the “rough plus smooth” combination of an unconditional simulation plus a smooth correction term. These two algorithms have helped bridge the gap between geostatistics and inversion.

Two successful approaches are then discussed for integrating seismic and dynamic data into the earth model. Rather than using an approach merely based on statistical correlations between data and model parameters, it is assumed that there exists a deterministic relationship (or forward model) between model parameters and data, possibly including a random error.

The first approach, geostatistical inversion, produces reservoir-scale models of acoustic or elastic parameters constrained by single- or multi-offset seismic amplitude data. The value of using sequential Gaussian simulation to calculate seismically-constrained realizations is discussed. In situations where the forward model is linear, geostatistical inversion can be formulated as a particular case of Bayesian seismic inversion.

The second approach, ensemble Kalman filtering, consists of sequentially updating an “ensemble” of geostatistical realizations using dynamic data as they are acquired in time. The key idea here is to statistically derive the covariance terms of the equation used in Bayesian inversion from an ensemble of realizations rather than from a theoretical covariance model. The formal relationship between ensemble Kalman filtering and co-Kriging is discussed.

Most of the above techniques can be shown to use the same kind of formalism, where the mismatch between newly arrived data and the current model is interpolated and used to update this model.

One of the conclusions of this chapter is that the equations of Bayes, geostatistics or ensemble Kalman filtering are closely related. However, this relationship is mostly formal as the three techniques differ in their approach to the covariances used in the equations. Geostatisticians first fit a model to the data, whilst Bayesians start from a model based on general “prior” information. Only later in the process do they introduce the well data. And ensemble Kalman filtering directly uses the experimental covariances calculated from the realizations of the ensemble.

The topic of joint inversion of seismic and dynamic data is not discussed here, in spite of the interesting on-going developments in 4-D seismic data inversion. This is because the objective of this chapter is to address formal relationships between the different formalisms rather than discuss specific applications.

2 Deterministic Aspects of Geostatistics

2.1 Simple Stationary Kriging

The basic model used by geostatistics is that of stationary random functions of order 2: a spatial property \( z\left( {\mathbf{x}} \right) \) at location \( {\mathbf{x}} \) is represented by a random function \( Z\left( {\mathbf{x}} \right) \), which is assumed to follow a trend \( m\left( {\mathbf{x}} \right) \) and a stationary covariance \( C\left( {\mathbf{h}} \right) \)

$$ m({\mathbf{x}}) = E\left( {Z\left( {\mathbf{x}} \right)} \right) $$
(1.1a)
$$ C\left( {\mathbf{h}} \right) = E\left( {Z\left( {\mathbf{x}} \right)Z\left( {{\mathbf{x}} + {\mathbf{h}}} \right)} \right) - E\left( {Z\left( {\mathbf{x}} \right)} \right)E\left( {Z\left( {{\mathbf{x}} + {\mathbf{h}}} \right)} \right) $$
(1.1b)

At each unsampled location \( {\mathbf{x}} \), the value of \( Z\left( {\mathbf{x}} \right) \) is estimated by a linear combination \( Z_{k} \left( {\mathbf{x}} \right) \) of the values \( Z_{i} = Z\left( {{\mathbf{x}}_{{_{{\mathbf{i}}} }} } \right) \) at the \( n \) data points \( \left( {{\mathbf{x}}_{i} } \right)_{i = 1, \ldots ,n} \). Kriging is the best linear unbiased estimator, in the sense that it is unbiased and that it minimizes the estimation variance. If the trend \( m\left( {\mathbf{x}} \right) \) is known at each location \( {\mathbf{x}} \), the simple Kriging (Chilès and Delfiner 2012, p. 151) system of equations is obtained

$$ Z_{k} \left( {\mathbf{x}} \right) - m\left( {\mathbf{x}} \right) = \sum\limits_{i = 1}^{n} {\lambda_{i} } \left( {Z_{i} - m\left( {{\mathbf{x}}_{i} } \right)} \right) $$
(1.2a)
$$ {\text{with }}\sum\limits_{i = 1}^{n} {\lambda_{i} } C\left( {{\mathbf{x}}_{{\mathbf{i}}} - {\mathbf{x}}_{{\mathbf{j}}} } \right) = C\left( {{\mathbf{x}} - {\mathbf{x}}_{{\mathbf{j}}} } \right){\text{ for }}j \in \left( {1, \ldots ,n} \right) $$
(1.2b)

2.2 Kriging with Intrinsic Random Functions of Order k

Matheron (1973) generalized the above model to that of Intrinsic Random Functions of Order k (IRF-k), where the definition of the variogram as a generalized covariance of order zero and of generalized covariances of order k leads to a model based on the stationarity of generalized increments of order k.

With k-IRFs, the model only considers linear combinations of \( Z\left( {\mathbf{x}} \right) \) that filter polynomials of order k (such polynomials being likely to represent a trend). Simple Kriging is not applicable any more. For instance, if k = 1 in two dimensions, and if \( K\left( {\mathbf{h}} \right) \) designates the generalized covariance of order k (GC-k), the kriging system becomes

$$ Z_{k} ({\mathbf{x}}) = \sum\limits_{i = 1}^{n} {\lambda_{i} } Z_{i} $$
(1.3a)
$$ \begin{aligned} & {\text{with }}\sum\limits_{i = 1}^{n} {\lambda_{i} } K\left( {{\mathbf{x}}_{{\mathbf{i}}} - {\mathbf{x}}_{{\mathbf{j}}} } \right) + \mu_{0} + \mu_{1} x_{j1} + \mu_{2} x_{j2} = K\left( {{\mathbf{x}} - {\mathbf{x}}_{{\mathbf{j}}} } \right){\text{for}}\,j \in \left( {1, \ldots ,n} \right) \\ & {\text{and}}\sum\limits_{i = 1}^{n} {\lambda_{i} } = 1\quad \sum\limits_{i = 1}^{n} {\lambda_{i} } x_{i1} = x_{1} \quad \sum\limits_{i = 1}^{n} {\lambda_{i} } x_{i2} = x_{2} \\ \end{aligned} $$
(1.3b)

where the coordinates of each point \( {\mathbf{x}} \) of the plane are written as \( {\mathbf{x}} = \left( {x_{1} ,x_{2} } \right) \).

2.3 Kriging Extensions

The goal here is not to discuss the details of Kriging, as there are plenty of excellent textbooks for this (Chilès and Delfiner 2012, p. 150). However, two features of Kriging deserve to be discussed, as they facilitate the understanding of the relationship between Kriging, splines and Bayesian approaches.

2.3.1 Generalization of Kriging to the Interpolation of Average Values

Kriging is a linear interpolator. The data used by Kriging do not have to be point values, but they can be any linear function of the parameters of interest; Hansen et al. (2006) call these “volume support data”. In particular, Kriging can be used to estimate the average value of a parameter \( Z(v_{{\mathbf{x}}} ) \) at a location x by a linear combination volume of support data \( Z(v_{{{\mathbf{x}}_{i} }} ) \) (Chilès and Delfiner 2012, p. 198)

$$ Z_{k} \left( {v_{{\mathbf{x}}} } \right) = \sum\limits_{i = 1}^{n} {\lambda_{i} } Z(v_{{{\mathbf{x}}_{i} }} ) $$
(1.4)

This property of Kriging, extensively used in mining applications, is of significant interest in the context of linear inversion of volume support data (Hansen et al. 2006). The Kriging equations associated with Eq. 1.4 are not given here, as they are a bit heavy, but conceptually simple thanks to the linear property of Kriging.

2.3.2 Error CoKriging

Error coKriging (Dubrule 2003) is a generalization of Kriging to the situation where measurements \( Y_{i} \) of the parameter \( Z_{i} \) at data points \( {\mathbf{x}}_{{\mathbf{i}}} \) are affected by an unbiased random error

$$ Y_{i} = Z_{i} + \varepsilon_{i} \,{\text{with}}\,E\left( {\varepsilon_{i} } \right) = 0 \,{\text{and}}\,Var\left( {\varepsilon_{i} } \right) = C_{{\varepsilon_{i} }}$$
(1.5)

In this situation, error coKriging allows the estimation of \( Z\left( {\mathbf{x}} \right) \) at any unsampled location \( {\mathbf{x}} \) from a linear combination of values \( Y_{i} \) (the random measurement error attached to each data can be zero or not) (Dubrule 2003; Hansen et al. 2006; Chilès and Delfiner 2012, p. 216)

$$ Z_{k} \left( {\mathbf{x}} \right) = \sum\limits_{i = 1}^{n} {\lambda_{i} Y_{i} } $$
(1.6)

2.3.3 Dual Kriging

If a global neighborhood is used, that is if all the available data are used to estimate \( Z\left( {\mathbf{x}} \right) \) at every single location \( {\mathbf{x}} \), the Kriging equations (Eq. 1.3) can be inverted to obtain the dual Kriging system (for interpolation in the case of Kriging and smoothing in the case of error coKriging). For example, in two dimensions for a k-IRF of order 1

$$ z_{k} \left( {\mathbf{x}} \right) = z_{k} \left( {x_{1} ,x_{2} } \right) = a_{0} + a_{1} x_{1} + a_{2} x_{2} + \sum\limits_{i = 1}^{n} {b_{i} } K\left( {{\mathbf{x}} - {\mathbf{x}}_{{\mathbf{i}}} } \right) $$
(1.7)

where the conditions on the coefficients \( (a_{0} ,a_{1} ,a_{2} ,b_{1} , \ldots ,b_{n} ) \) are different for Kriging and error coKriging (Dubrule 1983)

$$ {\text{Kriging:}}\quad \sum\limits_{i = 1}^{n} {b_{i} } = \sum\limits_{i = 1}^{n} {b_{i} } x_{i1} = \sum\limits_{i = 1}^{n} {b_{i} } x_{i2} = 0\quad {\text{and}}\quad z_{k} \left( {x_{i1} ,x_{i2} } \right) = z_{i} $$
(1.8)
$$ {\text{Error coKriging: }}\sum\limits_{i = 1}^{n} {b_{i} } = \sum\limits_{i = 1}^{n} {b_{i} } x_{i1} = \sum\limits_{i = 1}^{n} {b_{i} } x_{i2} = 0\quad {\text{and}}\quad z_{k} \left( {x_{i1} ,x_{i2} } \right) + b_{i} C_{{\varepsilon_{i} }} = y_{i} $$
(1.9)

2.4 Kriging and Splines

2.4.1 Interpolating Splines

Splines are a popular method for deterministic interpolation and approximation (Micula and Micula 1999). In 2-D, interpolating splines calculate a function honouring the data and minimizing an energy functional. Harmonic splines minimize the stretching energy of a membrane while biharmonic splines minimize the bending energy of an elastic plate. The biharmonic spline function can be written using a similar expression as Eq. 1.7 (Duchon 1975), but with a specific model for the generalized covariance function

$$ K\left( {{\mathbf{x}} - {\mathbf{x}}_{{\mathbf{i}}} } \right) = \left( {\left( {x_{1} - x_{i1} } \right)^{2} + \left( {x_{2} - x_{i2} } \right)^{2} } \right)Log\left( {\sqrt {\left( {x_{1} - x_{i1} } \right)^{2} + \left( {x_{2} - x_{i2} } \right)^{2} } } \right) $$
(1.10)

Splines and Kriging are a particular case of a more general class of interpolators, called radial basis functions (Billings et al. 2002a, b). With splines, the polynomial in Eq. 1.7 belongs to the kernel of the operator \( T \) that is minimized by the spline function (\( T \) is the gradient for harmonic splines and the laplacian for biharmonic splines), whilst the function \( K\left( {\mathbf{h}} \right) \) is the Green function associated with the operator \( T'T \), where \( T' \) is the transposed operator of \( T \) (Matheron 1981a)

$$ T'TK\left( {\mathbf{h}} \right) = \delta \, $$
(1.11)

where \( \delta \) is the Dirac Function. Choosing the energy functional minimized by splines is equivalent to fixing the degree of the trend function and the generalized covariance model for Kriging. For harmonic splines, these are respectively a constant and the De Wijs variogram in \( Logh \) (Chilès and Delfiner 2012, p. 94).

The consequence of Eq. 1.11 on the spectral density of the generalized covariance \( K\left( {\mathbf{h}} \right) \) is straightforward. For example, the spectral densities associated with the harmonic and biharmonic splines are power laws, representing fractal models. Szeliski and Terzopoulos (1989) and Micula and Micula (1999) discuss this relationship between Splines and fractals.

2.4.2 Smoothing Splines

Smoothing splines are used in situations where measurements at data points are affected by a random error (Eq. 1.5). In two dimensions, they compute a function \( f\left( {x_{1} ,x_{2} } \right) \) minimizing the sum of a spline energy functional plus a weighted distance to the \( n \) data

$$ \left\| {Tf} \right\|^{2} + \theta \sum\limits_{i = 1}^{n} {\frac{{\left( {f\left( {x_{i1} ,x_{i2} } \right) - y_{i} } \right)^{2} }}{{C_{{\varepsilon_{i} }} }}} $$
(1.12)

The smoothing biharmonic spline function has the same expression as that of Kriging and error coKriging (Eq. 1.7) but with the following relationships

$$ \sum\limits_{i = 1}^{n} {b_{i} } = \sum\limits_{i = 1}^{n} {b_{i} } x_{i1} = \sum\limits_{i = 1}^{n} {b_{i} } x_{i2} = 0\quad {\text{and}}\quad f\left( {x_{1i} ,x_{i2} } \right) + b_{i} \frac{{C_{{\varepsilon_{i} }} }}{\theta } = y_{i} $$
(1.13)

Smoothing biharmonic splines are identical to error Cokriging as long as the generalized covariance used by error Cokriging is the function \( \theta K\left( {{\mathbf{x}} - {\mathbf{x}}_{{\mathbf{i}}} } \right) \), where \( K\left( {{\mathbf{x}} - {\mathbf{x}}_{{\mathbf{i}}} } \right) \) is given by Eq. 1.10 (Matheron 1981a; Dubrule 2003). This is a general relationship between smoothing splines and coKriging, which are formally equivalent if the generalized covariance \( K\left( {\mathbf{h}} \right) \) is that satisfying Eq. 1.11, with the coefficient of \( K\left( {\mathbf{h}} \right) \) equal to the smoothing parameter \( \theta \) of Eq. 1.12.

2.4.3 Kriging and Regularization—The Discrete Case

The discrete case is the situation where interpolation is performed at the nodes of a regular grid and each data point is located at one of the nodes of this grid. If \( p \) is the total number of grid nodes, the number \( n \) of data points is such that \( n < p \).

In the discrete case, Matheron (1981b) also demonstrated the equivalence between splines and Kriging, and between smoothing splines and error coKriging. Both the Kriged and spline values \( z_{u} \) minimize

$$ \sum\limits_{u,v = 1}^{p} {z_{u} B_{uv} z_{v} } + \sum\limits_{i = 1}^{n} {\frac{{\left( {z_{i} - y_{i} } \right)^{2} }}{{C_{{\varepsilon_{i} }} }}} $$
(1.14)

where the u and v indices designate all the \( p \) grid points where the interpolation takes place, whilst i indices designate the \( n \) data points. The minimization of Eq. 1.14 is performed according to the unknown values \( z_{u} \) at all grid nodes (including those unknown values \( z_{i} \) where a data point with measured value \( y_{i} \) is present). The first term of Eq. 1.14 can be interpreted as a quadratic energy function traditionally used in inverse problems. In the regularization context, the choice of this quadratic form is driven by smoothing considerations, often using Briggs’ finite difference Laplacian (or spline) “roughening” operator (Briggs 1974; Bolondi et al. 1976). Seen from the geostatistical perspective, \( B_{uv} \) is the inverse of the covariance matrix in the stationary case and a pseudo-inverse of the generalized covariance matrix in the k-IRF case (Matheron 1981b). Equation 1.14 confirms the clear relationship between the inverse of the (generalized) covariance and the spline differential operator.

Kriging can thus be formalized in the frame of energy-based estimation techniques such as splines. This comes from the relationship between the inverse of the covariance function and the roughening filter implicit in the quadratic regularization term. It will be shown below that the regularization term can also be regarded, in the Bayesian inversion context, as an expression of the prior knowledge about the variable under study.

2.5 Kriging and Bayesian Inversion

2.5.1 Bayesian Linear Inversion

Here it may be useful to recall the general expression of the posterior mean and covariance in the case of Bayesian linear inversion of a multigaussian function. A very good reference for this is Tarantola (2005).

In the discrete case, consider a stationary multigaussian random vector \( z\; \) of dimension \( p \) containing the grid values \( z_{u} \) over a two or three-dimensional regular grid of size \( p \). Assume also that a vector \( y_{{}} \) contains the \( n \) data \( y_{i} \). It is assumed again that the data are affected by an error vector \( \varepsilon \) of dimension \( n \), and also that these data are a linear function of the \( p \) values of \( z \) over the grid

$$ y = Fz + \varepsilon $$
(1.15)

where the vector \( \varepsilon \) has mean zero and covariance matrix \( C_{\varepsilon } \) and \( F \) is a matrix of dimension \( n \times p \). In the multigaussian case, thanks to the Bayes formula relating the posterior pdf \( f_{post} \left( z \right) \) to the prior pdf \( f_{prio} \left( z \right) \) and the likelihood function \( g\left( {y\left| z \right.} \right) \), the prior mean vector m (dimension \( p \)) and covariance matrix C (dimension \( p \times p \)) of \( z \) are updated using the information brought by the data vector \( y\; \)

$$ f_{post} \left( z \right) \propto f_{prio} \left( z \right)g\left( {y /z} \right) \propto \exp \left[ {\left( {z - m} \right)^{{\prime }} C^{ - 1} \left( {z - m} \right)} \right] \times \exp \left[ {\left( {y - Fz} \right)^{{\prime }} C_{\varepsilon }^{ - 1} \left( {y - Fz} \right)} \right] $$
(1.16)

\( f_{post} \left( z \right) \) is a multigaussian function with the mean vector

$$ m_{post} = m + C\,F^{{\prime }} \left( {FCF^{{\prime }} + C_{\varepsilon } } \right)^{ - 1} \left( {y - F\,m} \right) $$
(1.17)

and the covariance matrix

$$ C_{post} = C - CF^{{\prime }} \left( {FCF^{{\prime }} + C_{\varepsilon } } \right)^{ - 1} FC $$
(1.18)

2.5.2 Kriging and Bayesian Inversion

Equation 1.17 can also be written

$$ m_{post} = m +\Lambda \left( {y - Fm} \right) = (I -\Lambda F)m +\Lambda y $$
(1.19)

with

$$ \Lambda = C\,F^{{\prime }} \left( {FCF^{{\prime }} + C_{\varepsilon } } \right)^{ - 1} $$
(1.20)

In can be checked that \( \Lambda \) is also the \( p \times n \) matrix giving at each line \( u \) the \( n \) simple Kriging (or error coKriging) weights associated with the Kriging of the value \( z_{u} \) at node \( u \). Comparing the first part of Eq. 1.19 with Eq. 1.2 shows that, in the multigaussian case, \( m_{post} \) is equal to simple Kriging and that the matrix \( C_{post} \) contains the variances and covariances of simple Kriging at each node u of the regular grid.

2.6 Energy-Based Versus Probabilistic Estimates

The minimization of Eq. 1.14 leads to either Kriging or splines if the (inverse of) the covariance (Kriging) and the differential operator (splines) are properly chosen. Minimizing the expression in Eq. 1.14 is equivalent to maximizing

$$ \exp \left( { - \left( {\sum\limits_{u,v = 1}^{p} {z_{u} B_{uv} z_{v} } + \sum\limits_{i = 1}^{n} {\frac{{\left( {z_{i} - y_{i} } \right)^{2} }}{{C_{{\varepsilon_{i} }} }}} } \right)} \right) $$
(1.21)

This is also the expression (up to a multiplicative constant) of the conditional multivariate distribution in the multigaussian case, as given by Bayes theorem (Eq. 1.16), in the case where \( m = 0 \), where the matrix \( C_{\varepsilon } \) is diagonal and where the data are point values. The first term represents the prior pdf and the second the likelihood function. Kriging which is equal to the mean of the posterior pdf, also maximizes this pdf in the multigaussian case.

Expression (1.21) relates the world of energy functionals (such as splines) with that of probability functions (such as Kriging). More generally regularization and maximum a posteriori Bayesian estimates are identical if the prior covariance used in Bayesian inversion is properly chosen. The equivalence between an energy function and a probability distribution is also used in statistical mechanics, as the probability of a particular configuration is inversely related to its energy. Suppose that the vector \( z \) minimizes an energy functional \( E\left( z \right) \). Using the results of Geman and Geman (1984), Szeliski and Terzopoulos (1989) associate a probability to this energy through the Boltzmann (or Gibbs) distribution \( p\left( z \right) \, \) defined as

$$ p\left( z \right) = \frac{1}{Z}\exp \left( { - \frac{E\left( z \right)}{T}} \right) \, $$
(1.22)

where \( Z \) and \( T \) are positive constants. If Bayes’ theorem is applied to the above prior pdf \( p\left( z \right) \) and the posterior pdf is maximized, the formalism of splines is obtained.

2.7 Conclusion on Kriging

Three different ways of calculating a Kriging interpolator have been discussed

  • using the basic approach where Kriging is calculated at each location as a linear combination of the data (Eq. 1.2)

  • using Eq. 1.7, where the expression of dual Kriging, or more generally of radial basis functions, is used

  • minimizing Eq. 1.14 in the discrete case, where Kriging values are calculated on a discrete grid by a minimization incorporating a regularization and a distance to the data term.

Kriging, although derived using a probabilistic formalism, is still a deterministic technique, in the sense that one unique or “best” model is produced, In most cases, Kriging provides a representation that is very smooth. As a result the application of non- linear operators to Kriged models will provide biased results (Dubrule 2003). This is one of the reasons for the success of conditional simulation.

3 Stochastic Aspects of Geostatistics: Conditional Simulation

With conditional simulation, the approach is stochastic. A large number of realizations are generated, which match the data (if the simulation is conditional) and share the first (mean) and second order (stationary covariance or generalized covariance) moments of the modeled random function. The main benefit of conditional simulation is that it produces realizations that behave away from the well data the same way as the well data themselves (Dubrule 2003). This is not true with Kriging, which produces a model that is smoother away from the wells than it is at the wells.

Conditional simulation can also be regarded as a technique for generating realizations of the conditional multigaussian pdf fully characterized by Eqs. 1.17 and 1.18. In other words, the realizations “vibrate” around their Kriging mean with a variance at each location equal to the Kriging variance.

A number of conditional simulation algorithms have been developed (Chilès and Delfiner 2012, p. 478). Among them, two are routinely used in the petroleum industry and are particularly interesting in relation with the inversion of seismic and production data.

3.1 Method 1: “Smooth Plus Rough” or “Rough Plus Smooth” Algorithm

\( Z\left( {\mathbf{x}} \right) \) can be simply written as the sum of Kriging plus the Kriging error

$$ Z\left( {\mathbf{x}} \right) = Z_{k} \left( {\mathbf{x}} \right) + \left( {Z\left( {\mathbf{x}} \right) - Z_{k} \left( {\mathbf{x}} \right)} \right) $$
(1.23)

The “smooth plus rough” (Oliver 1996) simulation method writes a conditional simulation \( Z_{cs} \left( {\mathbf{x}} \right) \) as the sum of Kriging plus a simulation of the Kriging error. A non-conditional simulation \( Z_{ncs} \left( {\mathbf{x}} \right) \) of \( Z\left( {\mathbf{x}} \right) \) is generated first, which honors the mean and the covariance of \( Z\left( {\mathbf{x}} \right) \), then the conditional simulation \( Z_{cs} \left( {\mathbf{x}} \right) \) is calculated as

$$ Z_{cs} \left( {\mathbf{x}} \right) = Z_{k} \left( {\mathbf{x}} \right) + \left( {Z_{ncs} \left( {\mathbf{x}} \right) - Z_{ncsk} \left( {\mathbf{x}} \right)} \right) $$
(1.24)

where \( Z_{ncsk} \left( {\mathbf{x}} \right) \) designates Kriging of \( Z_{ncs} \left( {\mathbf{x}} \right) \) using as data the values \( Z_{ncs} \left( {{\mathbf{x}}_{i} } \right) \) of the non-conditional simulation at the conditioning data locations. Thus to the smooth term \( Z_{k} \left( {\mathbf{x}} \right) \) is added the rough term \( \left( {Z_{ncs} \left( {\mathbf{x}} \right) - Z_{ncsk} \left( {\mathbf{x}} \right)} \right) \). Chilès and Delfiner (2012, p. 495) show that \( Z_{cs} \left( {\mathbf{x}} \right) \) honors the data and has the same (generalized) covariance as \( Z_{ncs} \left( {\mathbf{x}} \right) \) (and hence as \( Z\left( {\mathbf{x}} \right) \)).

Equation 1.24 can also be expressed in the form of a “rough plus smooth” equation

$$ Z_{cs} \left( {\mathbf{x}} \right) = Z_{ncs} \left( {\mathbf{x}} \right) + \left( {Z_{k} \left( {\mathbf{x}} \right) - Z_{ncsk} \left( {\mathbf{x}} \right)} \right) $$
(1.25)

Using Eq. 1.17, Eq. 1.25 can be written in the discrete case, assuming that the data are average values of the gridded values and are affected by a measurement error. At location \( u \) of the discrete grid

$$ z_{ucs} = z_{uncs} + CF^{{\prime }} \left( {FCF^{{\prime }} + C_{\varepsilon } } \right)^{ - 1} \left( {y - Fz_{uncs} } \right) $$
(1.26)

Equation 1.26 shows that conditional simulation is obtained by adding to a non-conditional simulation a Kriging of the mismatch \( \left( {y\; - \;Fz_{uncs} } \right) \) between the data and the unconditional simulation at the data location. This formalism will appear to be quite general and will facilitate the understanding of the relationship between conditional simulation and Kalman Filtering.

3.2 Method 2: Sequential Gaussian Simulation (SGS)

SGS (Deutsch and Journel 1998) is probably the most popular and flexible conditional simulation technique used in applications. SGS works under the multigaussian assumption and sequentially draws random locations within the simulated grid. At each new random location, the value is first Kriged from the previously simulated values and the well data. Then, a random value is sampled from the Gaussian pdf with mean equal to the Kriged value and variance equal to the Kriging variance (SGS uses the property that, in the multivariate normal case, univariate conditional distributions are also Gaussian). Then the sampled value is merged with the rest of the dataset, and a new random location is chosen within the simulated grid. The grid points where a data point is present are treated the same way as grid points with no data if the error \( \varepsilon \) affecting the data is different from zero. If all the data are exact, then the grid nodes with data points are left unchanged. The result is a Gaussian realization constrained by the data values and satisfying the input statistics (mean and covariance function).

The main difference between “rough plus smooth” and SGS is that SGS works sequentially, grid point by grid point. The sequential nature of SGS is well suited to the geostatistical inversion of seismic data. Indeed, at each grid node, the sequential approach can make sure that the sampled value is compatible with both the previously generated points and the seismic data at the same location, thus combining the advantage of single trace inversion with that of spatial coupling. This will be discussed in Sect. 1.4.

3.3 Spectrum and Conditional Simulation

Since the frequency spectrum is the Fourier transform of the covariance (Chilès and Delfiner 2012, p. 66), the spectrum of a conditional simulation is the same as that of the data. Conditional simulation addresses the following statement from Claerbout (2002) about seismic data interpolation: Of all the assumptions we could make to fill empty bins, one that people usually find easiest to agree with is that the spectrum should be the same in the empty-bin regions as where bins are filled.

Claerbout (2002) also defines the Prediction Error Filter (PEF) as the linear operator \( T \) that transforms the data into a white noise. In other words, \( T'T \) is the inverse of the covariance. Based on Eq. 1.11, this also means that \( T \) is the spline operator associated with the covariance of the data. Claerbout (2002) shows that unconditional simulations can be generated by applying \( T^{ - 1} \) to a white noise. This is the same technique as that used by Oliver (1988) and Oliver (1995) who applies what he calls the square root of the covariance function to a white noise.

4 Geostatistical Inversion of Seismic Data

4.1 Deterministic Seismic Inversion

Until the mid-nineties or so, most seismic inversion studies were deterministic, in the sense that they generated a single “best” model, usually at the same resolution as the seismic data. Often, regularization-based or Bayesian methods were used, which led to the generation of one “maximum posterior” or “optimal for a given norm (often L2)” 3-D acoustic impedance model (Tarantola 2005).

If the seismic inversion problem is linearized as with Fatti et al.’s (1994) model, the reflection coefficient \( r\left( \theta \right) \) at seismic time \( t \) for a seismic block of offset \( \theta \) can be written

$$ r\left( \theta \right) = a_{1} \left( \theta \right)\frac{{\partial LogI_{p} \left( t \right)}}{\partial t} + a_{2} \left( \theta \right)\frac{{\partial LogI_{s} \left( t \right)}}{\partial t} $$
(1.27a)
$$ {\text{and}}\quad y\left( \theta \right) = w\left( \theta \right)*r\left( \theta \right) + \varepsilon \left( \theta \right) $$
(1.27b)

where \( I_{p} \left( t \right) \) and \( I_{s} \left( t \right) \) are the compressive and shear impedances at time \( t \), \( a_{1} \left( \theta \right) \) and \( a_{2} \left( \theta \right) \) are offset-related parameters, \( w\left( \theta \right) \) is the seismic wavelet for offset \( \theta \) and \( \varepsilon \left( \theta \right) \) is noise. This model is linear in the logarithm of \( I_{p} \left( t \right) \) and \( I_{s} \left( t \right) \). Thus, as long as the logarithms of impedances are inverted, the seismic amplitudes can we written as in Eq. 1.15 as a linear function of the logarithms of impedances, and the posterior mean obtained by multigaussian Bayesian seismic inversion (Eqs. 1.17 and 1.18) is identical to Kriging. The solution can also be regarded as a regularization-based solution, where the norm controlling the smoothness is derived from the inverse of the covariance.

At the time when only deterministic inversion was used, geostatisticians often treated seismic data as “soft” information, making use only of statistical correlations between seismic and reservoir parameters in order to constrain the earth models. This “soft” approach to seismic data allowed the development of some interesting interpolation techniques such as external drift or collocated coKriging (Dubrule 2003). However it also led to reservoir models not fully compatible with the seismic data as, if a seismic forward model such as that of Eq. 1.27 was applied to them, the actual seismic data was not recovered.

The above approaches proved sufficient until the late eighties or so, as seismic data were used at rather large scale. Thanks to the development of 3-D earth modeling at the reservoir scale in the early nineties, it became necessary to work with models at higher resolution than seismic data, and hence to quantify the uncertainty attached to these models. Then the availability of 4D seismic data also called for new technology to better constrain the earth models. Geostatistical inversion, described below, was developed with these issues in mind.

4.2 Geostatistical Inversion (GI)

The original GI algorithm (Bortoli et al. 1992; Haas and Dubrule 1994) used SGS to simulate high-resolution acoustic impedance traces constrained by seismic data. SGS starts by picking a random cell within a regular two-dimensional grid. At this cell, a large number of possible acoustic impedance vertical traces are generated by SGS, then the trace that best matches the actual seismic trace at this location is selected. Then SGS moves to another random location of the two-dimensional grid, etc. until the whole model is filled with high-resolution impedance traces. Initially SGS appeared to be well suited to this application, as it allowed the use of any kind of forward model—linear or not—relating the acoustic impedance trace generated by SGS to the seismic amplitude trace at the same location. The acoustic impedance vertical traces simulated by SGS typically have higher frequency content than the seismic amplitudes, which makes them non-unique. This uncertainty can be quantified by generating multiple conditional simulations. Unfortunately the use of SGS proved to take too much computer time for large seismic datasets.

By revisiting the above GI algorithm in a Bayesian framework and in the linear context of Fatti’s model (Eq. 1.27), authors such as Buland and Omre (2003) or Escobar et al. (2006) not only clarified the GI formalism but also provided a straightforward conditional simulation algorithm based on Eqs. 1.17 and 1.18 which was more efficient then SGS for sampling acoustic impedance traces compatible with seismic amplitudes. Whilst Bayesian inversion provided an expression of the posterior mean and covariance of the impedances multiGaussian pdf, GI allowed the sampling of reservoir-scale impedance realizations from this pdf.

As convincingly shown by Francis (2006a, b) or Escobar et al. (2006), cut-off operations such as those used to translate acoustic impedance into facies can be applied to GI realizations, thus avoiding statistical bias if these cut-offs were applied to Kriging.

5 Kalman Filtering and Ensemble Kalman Filtering

5.1 Kalman Filtering (KF)

Suppose that a Gaussian random vector \( Z_{t - 1} \) has evolved until time \( \left( {t - 1} \right) \) and that \( Z_{t - 1} \) is an unbiased estimate of the unknown true state vector \( z_{t - 1} \) at time \( \left( {t - 1} \right) \)

$$ Z_{t - 1} = z_{t - 1} + R_{t - 1} \,{\text{with}} \,E\left( {R_{t - 1} } \right) = 0\,{\text{and}}\,Var\left( {Z_{t - 1} } \right) = Var\left( {R_{t - 1} } \right) = C_{t - 1} $$
(1.28)

If the model error is neglected, the forward model relating the true state vector at time \( \left( {t - 1} \right) \) with the state vector at time \( t \) is assumed to be a linear function \( L_{t} \)

$$ z_{1} = L_{t} z_{t - 1} $$
(1.29)

At time step \( t \), the unknown true state of the system has evolved according to Eq. 1.29 and a vector \( d_{t} \) of \( n \) new data may also be available. Assume that these data are linear functions of the state vector \( z_{t} \), and can be expressed as in Eq. 1.15

$$ d_{t} = F_{t} z_{t} + \varepsilon_{t} $$
(1.30)

where the error vector \( \varepsilon_{t} \) has mean zero and covariance matrix \( C_{{\varepsilon_{t}^{{}} }} \).

KF (Kalman 1960) aims to combine the information provided about \( z_{t} \) by the forward model \( L_{t} \) applied to the estimate \( Z_{t - 1} \) (Eq. 1.29) with the information provided by the data \( d_{t} \; \) (Eq. 1.30). Bayes can be used for this, \( L_{t} Z_{t - 1} \) playing the role of the prior distribution. It is easy to verify that the covariance of the random vector \( L_{t} Z_{t - 1} \) is \( C_{t}^{{}} = L_{t} C_{t - 1} L_{t}^{'} \). Hence, under Gaussian assumptions the best estimate is (from Eq. 1.19)

$$ Z_{t} = L_{t} Z_{t - 1} +\Lambda _{t} \left( {d_{t} - F_{t} L_{t} Z_{t - 1} } \right) $$
(1.31)

where the kriging weights matrix \( \Lambda _{t} \) (as in Eq. 1.20) is now called the Kalman gain

$$ \Lambda _{t} = C_{t} F_{t}^{{\prime }} \left( {F_{t} C_{t} F_{t}^{{\prime }} + C_{{\varepsilon_{t} }} } \right)^{ - 1} $$
(1.32)

\( Z_{t - 1} \) as defined in Eq. 1.28 can represent any kind of unbiased estimate based on all the information available at time \( \left( {t - 1} \right) \). Kriging and conditional simulation are both unbiased estimates of \( z_{t} \), only their variance is different and is of course minimum if \( Z_{t - 1} \) is Kriging and larger if \( Z_{t - 1} \) is simulation. Chilès and Delfiner (2012) (p. 497) show that the variance of the difference between a random function and its conditional simulation is twice the Kriging variance. In case \( Z_{t - 1} \) is simulation, Eq. 1.31 looks like the “rough plus smooth” method (Eq. 1.26) with \( L_{t} Z_{t - 1} \) playing the role of the non conditional simulation. Equation 1.31 makes the estimate \( L_{t} Z_{t - 1} \) conditional to the new data \( d_{t} \) by adding an interpolation of the mismatch between \( F_{t} L_{t} Z_{t - 1} \) and the data.

In standard geostatistical applications, the observations are often spatial and hence assimilated simultaneously, while KF processes information sequentially, time step after time step. Tarantola (2005) (in Appendix 6.18) shows that, if in a linear least-squares problem the dataset can be divided into subsets with zero covariance between them, then solving one global inverse problem is equivalent to solving a series of smaller problems using the posterior state and covariance matrix of each partial problem as prior information for the next. Oliver et al. (2008) also show (in Chap. 11) that, under the same assumptions as Tarantola (2005), the step by step computation of KF provides (in the multigaussian case) the same result as would be obtained by integrating all the data in one single step. In the case where \( L_{t} \) is the identity function, these two results also imply that simple Kriging would provide the same result if data were incorporated sequentially into the Kriging system, or in one single batch (under the assumptions that each batch of data has zero covariance with the others).

5.2 Constraining Reservoir Models by Production Data

Fluid flow models are strongly non-linear, and linear approximations such as those already discussed for seismic modeling or KF cannot be used.

A distinction must be made between “history-matching”, where a single reservoir model is modified until the flow simulation matches the production data, and “constraining reservoir models by production data”, where reservoir model realizations compatible with production data are generated. Here the discussion focuses on the second objective rather than the first one. Some techniques to address this objective are based on rigorous approaches such as Markov-Chain Monte Carlo (MCMC) or Genetic Algorithms (GA) (Oliver et al. 2008). But these are very time-consuming and often unpractical. Ensemble Kalman filtering appears to be a more practical approach for incorporating production data into the reservoir model.

5.3 Ensemble Kalman Filtering (EnKF) Versus Conditional Simulation

EnKF (Evensen 2007; Oliver et al. 2008) starts with an ensemble of initial realizations that are not constrained by production data. Typically the state vector \( z_{t} \) at time \( t \) contains permeabilities, porosities, saturations, pressures, and thermodynamic variables at the simulator grid nodes followed by a vector of predicted production data at each well \( i \) at time \( t \).

The following notation is used for a given state vector \( z_{t} \)

$$ z_{t} = \left( {z_{ut}^{1} ,z_{ut}^{2} , \ldots z_{ut}^{k} ,q_{it}^{1*} , \ldots .q_{it}^{l*} } \right) $$
(1.33)

It is assumed that there are \( k \) gridded variables in \( z_{t} \), that the simulator grid is composed of \( p \) cells \( u \), and that there are \( n \) wells \( i \) each with \( l \) new production data at time \( t \). The total size of the state vector \( z_{t} \) is \( kp + nl \). The predicted data vector \( d_{t}^{*} \) is the vector of size \( nl \)

$$ d_{t}^{*} = \left( {q_{1t}^{1*} , \ldots .q_{nt}^{1*} ,q_{1t}^{2*} , \ldots .q_{nt}^{2*} , \ldots ,q_{1t}^{l*} , \ldots .q_{nt}^{l*} } \right) $$
(1.34)

The relation between state vector and predicted data is

$$ d_{t}^{*} = Pz_{t} \,{\text{with}}\,P = (O_{nlxkp} ,I_{nlxnl} ) $$
(1.35)

\( P \) is a \( nl \times \left( {kp + nl} \right) \) matrix. The function \( f_{t} \), which represents the flow simulator, is non-linear. If the model errors are neglected

$$ z_{t} = f_{t} \left( {z_{t - 1} } \right) $$
(1.36)

does not modify the rock properties (unless they are affected by changes in pressure and saturation), but replaces the pressure, saturation, and simulated data with new values at time t.

The problem is now to calculate the best estimate of the state vector \( z_{t} \) combining the information provided by the flow simulation forward model \( f_{t} \left( {z_{t - 1} } \right) \) and that provided by the new data \( d_{t} \).

If \( f_{t} \) is a linear function, this is the standard KF domain of application and Eq. 1.31 applies, \( L_{t} \) playing the role of \( f_{t} \). But now \( f_{t} \) is non-linear. It would still be convenient to update the state vector through a generalization of Eq. 1.31

$$ z_{t} = f_{t} \left( {z_{t - 1} } \right) +\Lambda _{t} \left( {d_{t} - Pf_{t} \left( {z_{t - 1} } \right)} \right) $$
(1.37)

where the Kalman gain \( \Lambda _{t} \) is obtained using Eq. 1.32. Assuming that there is no error associated with the data, Eq. 1.32 can be simplified into

$$ \Lambda _{t} = C_{t} P^{{\prime }} \left( {PC_{t} P^{{\prime }} } \right)^{ - 1} $$
(1.38)

Equation 1.38 requires the knowledge of the covariance \( C_{t}^{{}} \) of \( f_{t} \left( {z_{t - 1} } \right)_{{}} \), in other words the covariance of the image of the state vector after application of the flow simulation model \( f_{t} \). \( f_{t} \) is non-linear and this covariance cannot be simply calculated—as in the linear case—from the covariance at the previous step. EnKF addresses this issue by statistically deriving this covariance using the information from the multiple realizations, typically about a hundred of them. This is the key idea behind EnKF.

There are of course a number of issues resulting from the fact that the covariances are calculated from a finite number of realizations of the ensemble. The first one is spurious correlation, because the ensemble members are not independent except in the starting ensemble. The second one is that if the number of realizations in the ensemble is not large enough, then the covariances are poorly estimated. Standard geostatistics addresses this by fitting mathematical models to the experimental covariances, in order to smooth the spurious correlations.

5.4 Ensemble Kalman Filtering and Its Relationship with CoKriging

In Eq. 1.37, focus now on the rock properties in the state vector. \( f_{t} \left( {z_{t - 1} } \right) \) leaves the rock properties unchanged, as only the time-dependent state vectors in the simulator grid are calculated by one time-step of the flow simulator, whilst \( \Lambda _{t} \left( {d_{t} - Pf_{t} \left( {z_{t - 1} } \right)} \right) \) is a linear combination of the differences between observed and predicted production data at each well. Thus EnKF interpolates between the wells by calculating a linear combination of these differences across the field, then adds these interpolated difference to the rock properties model. Is it possible to reformulate EnKF as a well by well geostatistical approach?

The term \( \Lambda _{t} \) in Eq. 1.37 is the Kalman gain as given by Eq. 1.38. In the case where there is no error affecting the data, Eqs. 1.37 and 1.38 can be written

$$ z_{t} - f_{t} \left( {z_{t - 1} } \right) = C_{t} P^{{\prime }} \left( {PC_{t} P^{{\prime }} } \right)^{ - 1} \left( {d_{t} - Pf_{t} \left( {z_{t - 1} } \right)} \right) $$
(1.39)

The left-hand side is the update calculated by EnKF for the property of interest as the time step evolves from \( \left( {t - 1} \right) \) to \( t \). The Kalman gain coefficients of the right-hand side are nothing else than the simple coKriging weights (see for instance Chilès and Delfiner 2012, p. 303).

Thus, each estimate of a 3-D spatial parameter such as porosity or permeability at time \( \left( {t - 1} \right) \) is updated at time \( t \) by a linear combination of all the inconsistencies generated by this parameter at the data points. Since, in the case of flow simulations, many parameters are involved in the production profiles prediction, all the individual parameters’ 3-D models must be corrected in a consistent way, which is why multivariate coKriging—and not univariate Kriging applies here.

6 Beyond the Formal Relationship Between Geostatistics and Bayes

6.1 Two Identical Formalisms but Different Assumptions

The above developments show that techniques such as conditional simulation, Bayesian inversion, geostatistical inversion and ensemble Kalman filtering follow a similar mathematical formalism.

However, their philosophy of application differs in the way the covariance is approached. This can be understood by looking again as Bayes rule as presented in Eq. 1.16

$$ f_{post} \left( z \right) \propto f_{prio} \left( z \right)g\left( {y /z} \right) $$
(1.40)

With geostatistics, the experimental (generalized) covariance calculated on the data \( y \) is fitted by a model which becomes the covariance of the unconditional distribution \( f_{prio} \left( z \right) \). Then the data \( y \) are used a second time through the simulation conditioning process of Eq. 1.26.

With Bayes, the covariance model associated with \( f_{prio} \left( z \right) \) is a prior based on local or analog knowledge, but not on the data themselves (Tarantola 2005). This prior is transformed into a posterior covariance through the conditioning process of Eq. 1.40.

With geostatistics, the aim of conditional simulation is to generate realizations that match the data and satisfy the input covariance; the SGS and rough plus smooth algorithms work only if the data themselves satisfy this input covariance. But the random function \( Z_{cs} \left( {\mathbf{x}} \right) \) of Eqs. 1.24 and 1.25 is not an ergodic or even a stationary random function; its variance at each location \( {\mathbf{x}} \) is equal to the Kriging variance and changes with \( {\mathbf{x}} \), as it is zero at the data points. In other words, the covariance of the random function \( Z_{{}} \left( {\mathbf{x}} \right) \) is different from that of \( Z_{cs} \left( {\mathbf{x}} \right) \) conditionally to the data (Chilès and Delfiner 2012, p. 497). But the covariance calculated on a single conditional realization does not “see” any difference between the grid cells associated with data points and those not associated with data points. It is only as the realizations change, leaving the data unchanged, that the covariance across realizations appears non-stationary and hence non-ergodic.

On the other hand, Bayes combines a prior covariance—usually different from that of the data—with a data-based likelihood, resulting into a posterior pdf that sits somewhere between the prior and the likelihood. Bayes updates prior covariances based on new data whilst conditional simulation anchors the realizations against the hard data (Escobar, personal communication).

6.2 Model Falsifiability

Tarantola (2006) challenges the geostatistical and Bayes formalisms if models are to be falsifiable or have a scientific meaning: I suggest that the setting, in principle, for an inverse problem should be as follows: use all available prior information to sequentially create models of the system, potentially an infinite number of them. For each model, solve the forward modeling problem, compare the predictions to the actual observations and use some criterion to decide if the fit is acceptable or unacceptable, given the uncertainties in the observations and, perhaps, in the physical theory being used. The unacceptable models have been falsified, and must be dropped. The collection of all the models that have not been falsified represent the solution of the inverse problem. Thus, Tarantola (2006) offers to keep all the prior realizations that are compatible with the data. Thus the data are used to validate or reject the prior realizations, rather than update the prior pdf into the posterior.

6.3 Looking Ahead: Machine Learning and Falsifiability

The fast growth in machine learning algorithms (Goodfellow et al. 2016) is challenging the geostatistical and Bayesian formalisms in situations where data are plenty. Thanks to this large number of data, the approach used to falsify a convolutional neural network model (for instance) relating input parameters to data is often to test whether the convolutional model works as well on a training (or calibration) dataset as on a test dataset not used for training. The prior model itself is completely data-driven, which contradicts Tarantola (2006) but the validation step is along the lines of his above recommendations! This topic is likely to generate interesting discussions in the future.

7 Conclusion

The objective of this chapter was to discuss the convergence observed over the last fifty years between geostatistics and other modelling and inversion techniques.

A formal convergence exists between the main techniques used to constrain reservoir models by multi-disciplinary data. Kriging, splines, conditional simulation, geostatistical inversion and ensemble Kalman filtering can be interpreted using either the geostatistical formalism or Bayes.

Most of these techniques amount to the same approach where an initial model is updated by using a linear combination of the mismatches between the new data and their prediction from the initial model (Eqs. 1.19, 1.26, 1.31 and 1.39).

However the methods above have a different philosophy towards the inference of the covariances used in these calculations. Bayes uses the data to update a prior pdf which is independent of the data. Geostatistics generate realizations of conditional simulations that reproduce the modeled covariance—or the spectrum—of the data. EnKF does not model a covariance but directly uses the empirical covariances derived from the ensemble realizations and their flow simulations.