A Random Line-Search Optimization Method via Modified Cholesky Decomposition for Non-linear Data Assimilation

Nino-Ruiz, Elias D.

doi:10.1007/978-3-030-50426-7_15

A Random Line-Search Optimization Method via Modified Cholesky Decomposition for Non-linear Data Assimilation

Elias D. Nino-Ruiz ORCID: orcid.org/0000-0001-7784-8163¹⁵

Conference paper
First Online: 15 June 2020

1992 Accesses
2 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12141))

Abstract

This paper proposes a line-search optimization method for non-linear data assimilation via random descent directions. The iterative method works as follows: at each iteration, quadratic approximations of the Three-Dimensional-Variational (3D-Var) cost function are built about current solutions. These approximations are employed to build sub-spaces onto which analysis increments can be estimated. We sample search-directions from those sub-spaces, and for each direction, a line-search optimization method is employed to estimate its optimal step length. Current solutions are updated based on directions along which the 3D-Var cost function decreases faster. We theoretically prove the global convergence of our proposed iterative method. Experimental tests are performed by using the Lorenz-96 model, and for reference, we employ a Maximum-Likelihood-Ensemble-Filter (MLEF) whose ensemble size doubles that of our implementation. The results reveal that, as the degree of observational operators increases, the use of additional directions can improve the accuracy of results in terms of $\ell _2$-norm of errors, and even more, our numerical results outperform those of the employed MLEF implementation.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Data Assimilation is the process by which imperfect numerical forecasts are adjusted according to real observations [1]. In sequential methods, a numerical forecast $\mathbf{x}^b \in \mathbb {R}^{n\times 1}$ is adjusted according to an array of observations $\mathbf{y}\in \mathbb {R}^{m\times 1}$ where $n$ and $m$ are the number of model components and the number of observations, respectively. When Gaussian assumptions are made in prior and observational errors, the posterior mode $\mathbf{x}^a \in \mathbb {R}^{n\times 1}$ can be estimated via the minimization of the Three Dimensional Variational (3D-Var) cost function:

$$\begin{aligned} \mathcal {J}(\mathbf{x}) = \frac{1}{2} \cdot \left\| \mathbf{x}-\mathbf{x}^b \right\| _{\mathbf{B}^{-1}}^2 + \frac{1}{2} \cdot \left\| \mathbf{y}-\mathcal {H}\left( \mathbf{x}\right) \right\| _{\mathbf{R}^{-1}}^2, \end{aligned}$$

(1)

where $\mathbf{B}\in \mathbb {R}^{n\times n}$ and $\mathbf{R}\in \mathbb {R}^{m\times m}$ are the background error and the data error covariance matrices, respectively. Likewise, $\mathcal {H}(\mathbf{x}):\mathbb {R}^{n\times 1} \rightarrow \mathbb {R}^{m\times 1}$ is a (non-) linear observation operator which maps vector states to observation spaces. The solution to the optimization problem

$$\begin{aligned} \mathbf{x}^a = \arg \,\underset{\mathbf{x}}{\min } \,\mathcal {J}(\mathbf{x}), \end{aligned}$$

(2)

is immediate when $\mathcal {H}(\mathbf{x})$ is linear (i.e., closed-form expressions can be obtained to compute $\mathbf{x}^a$) but, for non-linear observation operators, numerical optimization methods such as Newton’s one must be employed [2]. However, since Newton’s step is derived from a second-order Taylor polynomial, it can be too large with regard to the actual step size. Thus, line search methods can be employed to estimate optimal step lengths among Newton’s method iterations. A DA method based on this idea is the Maximum-Likelihood-Ensemble-Filter (MLEF), which performs the assimilation step onto the ensemble space. However, the convergence of this method is not guaranteed (i.e., as the mismatch of gradients cannot be bounded), and even more, analysis increments can be impacted by sampling noise. We think that there is an opportunity to enhance line-search methods in the non-linear DA context by employing random descent directions onto which analysis increments can be estimated. Moreover, the analysis increments can be computed onto the model space to ensure global convergence.

This paper is organized as follows: in Sect. 2, we discuss topics related to linear and non-linear data assimilation as well as line-search optimization methods. Section 3 proposes an ensemble Kalman filter implementation via random descent directions. In Sect. 4, experimental tests are performed to assess the accuracy of our proposed filter implementation by using the Lorenz 96 model. Conclusions of this research are stated in Sect. 5.

2 Preliminaries

2.1 The Ensemble Kalman Filter

The Ensemble Kalman Filter (EnKF) is a sequential Monte-Carlo method for parameter and state estimation in highly non-linear models [3]. The popularity of the EnKF obeys to his simple formulation and relatively ease implementation. In the EnKF, an ensemble of model realizations is employed to estimate moments of the background error distribution [4]:

$$\begin{aligned} \mathbf{X}_k^b = \left[ \mathbf{x}^{b[1]},\, \mathbf{x}^{b[2]},\, \ldots ,\, \mathbf{x}^{b[N]} \right] \in \mathbb {R}^{n\times N} \end{aligned}$$

(3)

where $\mathbf{x}^{b[e]} \in \mathbb {R}^{n\times 1}$ stands for the e-th ensemble member, for $1 \le e \le N$, at time k, for $0 \le k \le M$. Then, the ensemble mean:

$$\begin{aligned} \overline{\mathbf{x}}^b = \frac{1}{N} \cdot \sum _{e=1}^{N} \mathbf{x}^{b[e]} \in \mathbb {R}^{n\times 1}, \end{aligned}$$

(4)

and the ensemble covariance matrix:

$$\begin{aligned} \mathbf{P}^b = \frac{1}{N-1} \cdot \varvec{\varDelta }{} \mathbf{X}^b \cdot \left[ \varvec{\varDelta }{} \mathbf{X}^b \right] ^T \in \mathbb {R}^{n\times n}, \end{aligned}$$

(5)

act as estimates of the background state $\mathbf{x}^b$ and the background error covariance matrix $\mathbf{B}$, respectively, where the matrix of member deviations reads:

$$\begin{aligned} \varvec{\varDelta }{} \mathbf{X}^b = \mathbf{X}^b-\overline{\mathbf{x}}^b \cdot \mathbf{1}^T \in \mathbb {R}^{n\times N}. \end{aligned}$$

(6)

Posterior members can be computed via the use synthetic observations:

$$\begin{aligned} \mathbf{X}^a = \mathbf{X}^b + \varvec{\varDelta }{} \mathbf{X}^a, \end{aligned}$$

(7)

where the analysis increments can be obtained via the solution of the next linear system:

$$\begin{aligned} \left[ \left[ \mathbf{P}^b \right] ^{-1} + \mathbf{H}^T \cdot \mathbf{R}^{-1} \cdot \mathbf{H}\right] \cdot \varvec{\varDelta }{} \mathbf{X}^a = \mathbf{H}^T \cdot \mathbf{R}^{-1} \cdot \mathbf{D}^s \in \mathbb {R}^{n\times N}, \end{aligned}$$

(8)

and $\mathbf{D}^s \in \mathbb {R}^{m\times N}$ is the innovation matrix on the synthetic observations whose e-th column reads $\mathbf{y}-\mathbf{H}\cdot \mathbf{x}^{b[e]} + \varvec{\varepsilon }^{[e]} \in \mathbb {R}^{m\times 1}$ with $\varvec{\varepsilon }^{[e]} \sim \mathcal {N}\left( \mathbf{0}_{m},\, \mathbf{R}\right) $. In practice, model dimensions range in the order of millions while ensemble sizes are constrained by the hundreds and as a direct consequence, sampling errors impact the quality of analysis increments. To counteract the effects of sampling noise, localizations methods are commonly employed [5], in practice. In the EnKF based on a modified Cholesky decomposition (EnKF-MC) [6] the following estimator is employed to approximate the precision covariance matrix of the background error distribution [7]:

$$\begin{aligned} \widehat{\mathbf{B}}^{-1} = \widehat{\mathbf{L}}^T \cdot \widehat{\mathbf{D}}^{-1} \cdot \widehat{\mathbf{L}}\in \mathbb {R}^{n\times n}, \end{aligned}$$

(9)

where the Cholesky factor $\mathbf{L}\in \mathbb {R}^{n\times n}$ is a lower triangular matrix,

$$\begin{aligned} \left\{ \widehat{\mathbf{L}}\right\} _{i,v} = {\left\{ \begin{array}{ll} -\beta _{i,v} &{} \,,\, v \in P(i,r) \\ 1 &{} \,,\, i=v \\ 0 &{}\,,\, otherwise \end{array}\right. } \,, \end{aligned}$$

(10)

whose non-zero sub-diagonal elements $\beta _{i,v}$ are obtained by fitting models of the form,

$$\begin{aligned} {\mathbf{x}_{[i]}^T} = \sum _{v \in P(i,\,r)} \beta _{i,v} \cdot {\mathbf{x}_{[v]}^T} + {\varvec{\gamma }_i} \in \mathbb {R}^{N\times 1}, 1 \le i \le n, \end{aligned}$$

(11)

where ${\mathbf{x}_{[i]}^T} \in \mathbb {R}^{N\times 1}$ denotes the i-th row (model component) of the ensemble (3), components of vector ${\varvec{\gamma }_i} \in \mathbb {R}^{N\times 1}$ are samples from a zero-mean Normal distribution with unknown variance $\sigma ^2$, and $\mathbf{D}\in \mathbb {R}^{n\times n}$ is a diagonal matrix whose diagonal elements read,

$$\begin{aligned} \left\{ \mathbf{D}\right\} _{i,i}= & {} \widehat{\mathbf{var}}\left( {\mathbf{x}_{[i]}^T} -\sum _{v \in P(i,\,r)} \beta _{i,v} \cdot {\mathbf{x}_{[j]}^T} \right) ^{-1} \end{aligned}$$

(12)

$$\begin{aligned}\approx & {} \mathbf{var}\left( {\varvec{\gamma }_i} \right) ^{-1} = \frac{1}{\sigma ^2} >0, \text { with } \left\{ \mathbf{D}\right\} _{1,1} = \widehat{\mathbf{var}}\left( {\mathbf{x}_{[1]}^T} \right) ^{-1}, \end{aligned}$$

(13)

where $\mathbf{var}(\bullet )$ and $\widehat{\mathbf{var}}(\bullet )$ denote the actual and the empirical variances, respectively. The analysis equations can then be written as follows:

$$\begin{aligned} \mathbf{X}^a = \mathbf{X}^b + \left[ \widetilde{\mathbf{L}}^T \cdot \widetilde{\mathbf{D}}^{-1/2} \right] ^{-1} \cdot \mathbf{E}\in \mathbb {R}^{n\times N}, \end{aligned}$$

(14)

where

$$\begin{aligned} \widehat{\mathbf{A}}^{-1}= & {} \widetilde{\mathbf{L}}^T \cdot \widetilde{\mathbf{D}}^{-1} \cdot \widetilde{\mathbf{L}}= \widehat{\mathbf{B}}^{-1} + \mathbf{H}^T \cdot \mathbf{R}^{-1} \cdot \mathbf{H}\\= & {} \widehat{\mathbf{L}}^T \cdot \widehat{\mathbf{D}}^{-1} \cdot \widehat{\mathbf{L}}+ \mathbf{H}^T \cdot \mathbf{R}^{-1} \cdot \mathbf{H}\in \mathbb {R}^{n\times n},\nonumber \end{aligned}$$

(15)

is an estimate of the posterior precision covariance matrix while the columns of matrix $\mathbf{E}\in \mathbb {R}^{n\times N}$ are formed by samples from a standard Normal distribution, $\widetilde{\mathbf{L}}^T \in \mathbb {R}^{n\times n}$ is a lower triangular matrix (with the same structure as $\widehat{\mathbf{L}}$), and $\widetilde{\mathbf{D}}^{-1} \in \mathbb {R}^{n\times n}$ is a diagonal matrix. Given the special structure of the left-hand side in (14), the direct inversion of the matrix $\widetilde{\mathbf{L}}\cdot \widetilde{\mathbf{D}}^{-1/2} \in \mathbb {R}^{n\times n}$ can be avoided [8, Algorithm 1].

2.2 Maximum Likelihood Ensemble Filter (MLEF)

To handle non-linear observation operators during assimilation steps, optimization based methods can be employed to estimate analysis increments. A well-known method in this context is the Maximum-Likelihood-Ensemble-Filter (MLEF) [9, 10]. This square-root filter employs the ensemble space to compute analysis increments, this is:

$$\begin{aligned} \overline{\mathbf{x}}^a-\overline{\mathbf{x}}^b \in \mathbf{range} \left\{ \varvec{\varDelta }{} \mathbf{X}\right\} , \end{aligned}$$

which is nothing but a pseudo square-root approximation of $\mathbf{B}^{1/2}$. Thus, vector states can be written as follows:

$$\begin{aligned} \mathbf{x}= \overline{\mathbf{x}}^b+\varvec{\varDelta }{} \mathbf{X}\cdot \mathbf{w}, \end{aligned}$$

(16)

where $\mathbf{w}\in \mathbb {R}^{N\times 1}$ is a vector in redundant coordinates to be computed later. By replacing (16) in (1) one obtains:

$$\begin{aligned} \mathcal {J}(\mathbf{x}) = \mathcal {J}\left( \overline{\mathbf{x}}^b+\varvec{\varDelta }{} \mathbf{X}\cdot \mathbf{w}\right) = \frac{N-1}{2} \cdot \left\| \mathbf{w} \right\| ^2 + \frac{1}{2} \cdot \left\| \mathbf{y}-\mathcal {H}\left( \overline{\mathbf{x}}^b+\varvec{\varDelta }{} \mathbf{X}\cdot \mathbf{w}\right) \right\| ^2_{\mathbf{R}^{-1}}. \end{aligned}$$

(17)

The optimization problem to solve reads:

$$\begin{aligned} \mathbf{w}^{*} = \arg \, \underset{\mathbf{w}}{\min } \, \mathcal {J}\left( \overline{\mathbf{x}}^b+\varvec{\varDelta }{} \mathbf{X}\cdot \mathbf{w}\right) . \end{aligned}$$

(18)

This problem can be numerically solved via Line-Search (LS) and/or Trust-Region methods. However, convergence is not ensured since gradient approximations are performed onto a reduce space whose dimension is much smaller than that of the model one.

2.3 Line Search Optimization Methods

The solution of optimization problems of the form (2) can be approximated via Numerical Optimization. In this context, solutions are obtained via iterations:

$$\begin{aligned} \mathbf{x}_{k+1} = \mathbf{x}_k + \varvec{\varDelta }{} \mathbf{s}_k , \end{aligned}$$

(19)

wherein k denotes iteration index, and $\varvec{\varDelta }{} \mathbf{s}_k \in \mathbb {R}^{n\times 1}$ is a descent direction, for instance, the gradient descent direction [11]

$$\begin{aligned} \varvec{\varDelta }{} \mathbf{s}_k = -\nabla \mathcal {J}\left( \mathbf{x}_k \right) , \end{aligned}$$

(20a)

the Newton’s step [12],

$$\begin{aligned} \nabla ^2 \mathcal {J}\left( \mathbf{x}_k \right) \cdot \varvec{\varDelta }{} \mathbf{s}_k = -\nabla \mathcal {J}\left( \mathbf{x}_k \right) , \end{aligned}$$

(20b)

or a quasi-Newton based method [13],

$$\begin{aligned} \mathbf{P}_k \cdot \varvec{\varDelta }{} \mathbf{s}_k = -\nabla \mathcal {J}\left( \mathbf{x}_k \right) , \end{aligned}$$

(20c)

where $\mathbf{P}_k \in \mathbb {R}^{n\times n}$ is a positive definite matrix. A concise survey of Newton based methods can be consulted in [14]. Since step lengths in (20) are based on first or second order Taylor polynomials, the step size can be chosen via line search [15] and/or trust region [16] methods. Thus, we can ensure global convergence of optimization methods to stationary points of the cost function (1). This holds as long as some assumptions over functions, gradients, and (potentially) Hessians are preserved [17]. In the context of line search, the following assumptions are commonly done:

C1
A lower bound of $\mathcal {J}(\mathbf{x})$ exists on $\varOmega _0 = \{\mathbf{x}\in \mathbb {R}^{n \times 1},\, \mathcal {J}(\mathbf{x}) \le \mathcal {J}\left( \mathbf{x}^{\dag } \right) \}$, where $\mathbf{x}^{\dag } \in \mathbb {R}^{n \times 1}$ is available.
C2
There is a constant $\mathbf{L}$ such as:
$$\begin{aligned} \left\| \nabla \mathcal {J}(\mathbf{x})-\nabla \mathcal {J}(\mathbf{z}) \right\| \le L \cdot \left\| \mathbf{x}-\mathbf{z} \right\| ,\, \text { for } \mathbf{x},\,\mathbf{z}\in B, \text { and } L > 0, \end{aligned}$$
where B is an open convex set which contains $\varOmega _0$. These conditions together with iterates of the form,
$$\begin{aligned} \mathbf{x}_{k+1} = \mathbf{x}_k + \alpha \cdot \varvec{\varDelta }{} \mathbf{s}_k , \end{aligned}$$
(21)
ensure global convergence [18] as long as $\alpha $ is chosen as an (approximated) minimizer of
$$\begin{aligned} \alpha ^{*} = \arg \,\underset{\alpha \ge 0}{\min } \,\mathcal {J}\left( \mathbf{x}_k + \alpha \cdot \varvec{\varDelta }{} \mathbf{s}_k\right) . \end{aligned}$$
(22)

In practice, rules for choosing step-size such as the Goldstein rule [19], the Strong Wolfe rule [20], and the Halving method [21] are employed to partially solve (22). Moreover, soft computing methods can be employed for solving (22) [22].

3 Proposed Method: An Ensemble Kalman Filter Implementation via Line-Search Optimization and Random Descent Directions

In this section, we propose an iterative method to estimate the solution of the optimization problem (2). We detail our filter derivation, and subsequently, we theoretically prove the convergence of our method.

3.1 Filter Derivation

Starting with the forecast ensemble (3), we compute an estimate $\widehat{\mathbf{B}}^{-1}$ of the precision covariance $\mathbf{B}^{-1}$ via modified Cholesky decomposition. Then, we perform an iterative process as follows: let $\mathbf{x}_0 = \overline{\mathbf{x}}^b$, at iteration k, for $0 \le k \le K$, where K is the maximum number of iterations, we build a quadratic approximation of $\mathcal {J}(\mathbf{x})$ about $\mathbf{x}_k$

$$\begin{aligned} \mathcal {J}_k(\mathbf{x}) = \frac{1}{2} \cdot \left\| \mathbf{x}-\mathbf{x}_k \right\| ^2_{\widehat{\mathbf{B}}^{-1}} + \frac{1}{2} \cdot \left\| \mathbf{y}-\widehat{\mathcal {H}}_k(\mathbf{x}) \right\| ^2_{\mathbf{R}^{-1}} , \end{aligned}$$

(23a)

where

$$\begin{aligned} \widehat{\mathcal {H}}_k(\mathbf{x}) = \mathcal {H}\left( \mathbf{x}_k \right) + \mathbf{H}_k \cdot \left[ \mathbf{x}-\mathbf{x}_k \right] , \end{aligned}$$

and $\mathbf{H}_k$ is the Jacobian of $\mathcal {H}(\mathbf{x})$ at $\mathbf{x}_k$. The gradient of (23a) reads:

$$\begin{aligned} \nabla \mathcal {J}_k(\mathbf{x})= & {} \widehat{\mathbf{B}}^{-1} \cdot \left[ \mathbf{x}-\mathbf{x}_k \right] - \mathbf{H}_k^T \cdot \mathbf{R}^{-1} \cdot \left[ \mathbf{d}_k - \mathbf{H}_k \cdot \mathbf{x}\right] \, \\= & {} \left[ \widehat{\mathbf{B}}^{-1} + \mathbf{H}_k^T \cdot \mathbf{R}^{-1} \cdot \mathbf{H}_k \right] \cdot \mathbf{x}- \mathbf{H}_k^T \cdot \mathbf{R}\cdot \mathbf{d}_k \in \mathbb {R}^{n\times 1} , \end{aligned}$$

where $\mathbf{d}_k = \mathbf{y}-\mathcal {H}(\mathbf{x}_k) +\mathbf{H}_k \cdot \mathbf{x}_k \in \mathbb {R}^{m\times 1}$. Readily, the Hessian of (23a) is

$$\begin{aligned} \nabla ^2 \mathcal {J}_k(\mathbf{x}) = \widehat{\mathbf{B}}^{-1} + \mathbf{H}_k^T \cdot \mathbf{R}^{-1} \cdot \mathbf{H}_k \in \mathbb {R}^{n\times n} , \end{aligned}$$

(23b)

and therefore, the Newton’s step can be written as follows:

(23c)

As we mentioned before, the step size (23c) is based on a quadratic approximation of $\mathcal {J}(\mathbf{x})$ and depending how highly non-linear is $\mathcal {H}(\mathbf{x})$, the direction (23c) can poorly estimate the analysis increments. Thus, we compute U random directions based on the Newton’s one as follows:

$$\begin{aligned} \mathbf{q}_{u,k} = \varPi _u \cdot \mathbf{p}_k \left( \mathbf{x}_k \right) \in \mathbb {R}^{n\times 1} , \text { for } 1 \le u \le U , \end{aligned}$$

(23d)

where the matrices $\varPi _u \in \mathbb {R}^{n\times n}$ are symmetric positive definite and these are randomly formed with $\left\| \varPi _u \right\| = 1$. We constraint the increments to the space spanned by the vectors (23d), this is

$$\begin{aligned} \mathbf{x}_{k+1} - \mathbf{x}_{k} = \mathbf{range}\left\{ \mathbf{Q}_k \right\} , \end{aligned}$$

where the u-th column of $\mathbf{Q}_k \in \mathbb {R}^{n\times U}$ reads $\mathbf{q}_{u,k}$. Thus,

$$\begin{aligned} \mathbf{x}_{k+1} = \mathbf{x}_k + \mathbf{Q}_k \cdot \varvec{\gamma }^{*} , \end{aligned}$$

(23e)

where $\varvec{\gamma }^{*} \in \mathbb {R}^{U \times 1}$ is estimated by solving the following optimization problem

$$\begin{aligned} \varvec{\gamma }^{*} = \arg \, \underset{\varvec{\gamma }}{\min } \, \mathcal {J}\left( \mathbf{x}_k+\mathbf{Q}_k \cdot \varvec{\gamma }\right) . \end{aligned}$$

(23f)

To solve (23f), we proceed as follows: generate Z random vectors $\varvec{\gamma }_z \in \mathbb {R}^{U \times 1}$, for $1 \le z \le Z$, with $\left\| \varvec{\gamma }_z \right\| = 1$. We then, for each direction $\mathbf{Q}_k \cdot \varvec{\gamma }_z \in \mathbb {R}^{n\times 1}$, we solve the following one-dimensional optimization problem

$$\begin{aligned} \alpha ^{*}_z = \arg \, \underset{\alpha _z}{\min } \, \mathcal {J}\left( \mathbf{x}_k+\alpha _z \cdot \left[ \mathbf{Q}_k \cdot \varvec{\gamma }_z \right] \right) , \end{aligned}$$

(23g)

and therefore, an estimate of the next iterate (23e) reads:

$$\begin{aligned} \mathbf{x}_{k+1} = \mathbf{x}_k + \mathbf{Q}_k \cdot \left[ \alpha _k^{*} \cdot \varvec{\gamma }_k \right] , \end{aligned}$$

(23h)

where the pair $(\alpha ^{*}_k,\, \varvec{\gamma }_k )$ is chosen as the duple $(\alpha ^{*}_z,\, \varvec{\gamma }_z )$ which provide the best profit (minimum value) in (23g), for $1 \le z \le Z$. The overall process detailed in equations (23) is repeated until some stopping criterion is satisfied (i.e., we let a maximum number of iterations K).

Based on the iterations (23h), we estimate the analysis state as follows:

$$\begin{aligned} \overline{\mathbf{x}}^a = \overline{\mathbf{x}}^b + \sum _{k=1}^K \mathbf{Q}_k \cdot \left[ \alpha ^{*}_k \cdot \varvec{\gamma }_k\right] = \mathbf{x}_{K}. \end{aligned}$$

(24)

The inverse of the Hessian (23b) provides an estimate of the posterior error covariance matrix. Thus, posterior members (analysis ensemble) can be sampled as follows:

$$\begin{aligned} \mathbf{x}^{a[e]} \sim \mathcal {N}\left( \overline{\mathbf{x}}^a,\, \left[ \nabla ^2 \mathcal {J}_K \left( \overline{\mathbf{x}}^a \right) \right] ^{-1}\right) . \end{aligned}$$

(25)

To efficiently perform the sampling process (25) the reader can consult [23]. Afterwards, the analysis members are propagated in time until a new observation is available. We name this formulation the Random Ensemble Kalman Filter (RAN-EnKF).

3.2 Convergence of the Analysis Step in the RAN-EnKF

For proving the convergence of our method, we consider the assumptions C1, C2, and

$$\begin{aligned} \nabla \mathcal {J}\left( \mathbf{x}_k\right) ^T \cdot \mathbf{q}_{u,k} < 0,\, \text { for } 1 \le u \le U . \end{aligned}$$

(26)

The next Theorem states the necessary conditions in order to ensure global convergence of the analysis step in the RAN-EnKF.

Theorem 1

If (2.3), (2.3), and (26) hold, then the RSLS-RD with exact line search generates an infinite sequence $\left\{ \mathbf{x}_k\right\} _{u=0}^{\infty }$, then

$$\begin{aligned} \underset{k \rightarrow \infty }{\lim } \left[ \frac{ - \nabla \mathcal {J}\left( \mathbf{x}_k \right) ^T \cdot \mathbf{Q}_k \cdot \varvec{\gamma }^{*} }{ \left\| \mathbf{Q}_k \cdot \varvec{\gamma }^{*} \right\| } \right] ^2 = 0 \end{aligned}$$

(27)

holds.

Proof

By Taylor series and the Mean Value Theorem we know that,

and therefore,

for any $\mathbf{x}_{k+1}$ on the ray $\mathbf{x}_k+ \alpha \cdot \mathbf{Q}_k \cdot \varvec{\gamma }^{*}$, with $\alpha \in [0,\,1]$, we have

hence:

by the Cauchy Schwarz inequality we have

choose

$$\begin{aligned} \alpha ^{*} = -\frac{\nabla \mathcal {J}\left( \mathbf{x}_k \right) ^T \cdot \mathbf{Q}_k \cdot \varvec{\gamma }^{*}}{L \cdot \left\| \mathbf{Q}_k \cdot \varvec{\gamma }^{*} \right\| ^2} , \end{aligned}$$

therefore,

$$\begin{aligned} \mathcal {J}\left( \mathbf{x}_k \right)- & {} \mathcal {J}\left( \mathbf{x}_{k+1} \right) \ge \frac{\left[ \nabla \mathcal {J}\left( \mathbf{x}_k \right) ^T \cdot \mathbf{Q}_k \cdot \varvec{\gamma }^{*} \right] ^2}{L \cdot \left\| \mathbf{Q}_k \cdot \varvec{\gamma }^{*} \right\| ^2} \\- & {} \frac{1}{2} \cdot \frac{\left[ - \nabla \mathcal {J}\left( \mathbf{x}_k \right) ^T \cdot \mathbf{Q}_k \cdot \varvec{\gamma }^{*} \right] ^2}{L \cdot \left\| \mathbf{Q}_k \cdot \varvec{\gamma }^{*} \right\| ^2} \\= & {} \frac{1}{2 \cdot L} \cdot \left[ - \frac{ \nabla \mathcal {J}\left( \mathbf{x}_k \right) ^T \cdot \mathbf{Q}_k \cdot \varvec{\gamma }^{*} }{ \left\| \mathbf{Q}_k \cdot \varvec{\gamma }^{*} \right\| } \right] ^2 . \end{aligned}$$

By (2.3), and (26), it follows that $\left\{ \mathcal {J}\left( \mathbf{x}_k\right) \right\} _{k = 0}^{\infty }$ is a monotone decreasing number sequence and it has a bound below, therefore $\left\{ \mathcal {J}\left( \mathbf{x}_k\right) \right\} _{k = 0}^{\infty }$ has a limit, and consequently (27) holds.

We are now ready to test our proposed method numerically.

4 Experimental Results

For the experiments, we consider non-linear observation operators, a current challenge in the context of DA [6, 24]. We make use of the Lorenz-96 model [25] as our surrogate model during the experiments. The Lorenz-96 model is described by the following set of ordinary differential equations [26]:

$$\begin{aligned} \frac{dx_j}{dt} = {\left\{ \begin{array}{ll} \left( x_2-x_{n-1}\right) \cdot x_{n} -x_1 +F &{} \text { for j=1}, \\ \left( x_{j+1}-x_{j-2}\right) \cdot x_{j-1} -x_j +F &{} \text { for } 2 \le j \le n-1, \\ \left( x_1-x_{n-2}\right) \cdot x_{n-1} -x_{n} +F &{} \text { for } j=n, \end{array}\right. } \end{aligned}$$

(28)

where F is external force and $n=40$ is the number of model components. Periodic boundary conditions are assumed. When $F=8$ units the model exhibits chaotic behavior, which makes it a relevant surrogate problem for atmospheric dynamics [27, 28]. A time unit in the Lorenz-96 represents 7 days in the atmosphere. We create the initial pool $\widehat{\mathbf{X}^b}_0$ of $\widehat{N}=10^4$ members. The error statistics of observations are as follows:

$$\begin{aligned} \mathbf{y}_k \sim \mathcal {N}\left( \mathcal {H}_k \left( \mathbf{x}^{*}_k \right) , \left[ {\epsilon ^{o}} \right] ^2 \cdot \mathbf{I}\right) , \text { for } 0 \le k \le M, \end{aligned}$$

where the standard deviations of observational errors $\epsilon ^{o} = 10^{-2}$. The components are randomly chosen at the different assimilation cycles. We use the non-smooth and non-linear observation operator [29]:

$$\begin{aligned} \left\{ \mathcal {H}\left( \mathbf{x}\right) \right\} _j = \frac{\left\{ \mathbf{x}\right\} _j }{2} \cdot \left[ \left( \frac{\left| \left\{ \mathbf{x}\right\} _j \right| }{2} \right) ^{\beta -1} +1 \right] , \end{aligned}$$

(29)

where j denotes the j-th observed component from the model state. Likewise, $\beta \in \{1,\,3,\,5,\,7,\,9 \}$. Since the observation operator (29) is non-smooth, gradients of (1) are approximated by using the $\ell _2$-norm. A full observational network is available at assimilation steps. The ensemble size for the benchmarks is $N= 20$. These members are randomly chosen from the pool $\widehat{\mathbf{X}^b}_0$ for the different experiments in order to form the initial ensemble $\mathbf{X}^b_0$ for the assimilation window. Evidently, $\mathbf{X}^b_0 \subset \widehat{\mathbf{X}^b}_0$. The $\ell _2$-norm of errors are utilized as a measure of accuracy at the assimilation step k,

$$\begin{aligned} \mathcal {E} \left( \mathbf{x}_k,\,\mathbf{x}^{*} \right) = \sqrt{\left[ \mathbf{x}^{*} - \mathbf{x}_k \right] ^T \cdot \left[ \mathbf{x}^{*} - \mathbf{x}_k \right] } , \end{aligned}$$

(30)

where $\mathbf{x}^{*}$ and $\mathbf{x}_k$ are the reference and current solution at iteration k, respectively. The initial background error, in average, reads $\epsilon ^b \approx 31.73$. By convenience, this value is expressed in the log scale: $\log (\epsilon ^b) = 3.45$. We consider a single assimilation cycle for the experiments. We try sub-spaces of dimensions $U \in \{10,\,20,\,30\}$ and number of samples from those spaces of $Z \in \{10,\,30,\,50\}$. We set a maximum number of iterations of 40. We compare our results with those obtained by the MLEF with $N= 40$, note that, the ensemble size in the MLEF doubles the ones employed by our method.

We group the results in Figs. 1 and 2 by sub-space size and sample size (sub-space dimension), respectively. As can be seen, the RAN-EnKF outperforms the MLEF in terms of $\ell _2$-norm of errors, for all cases. Note that the error differences between the compared filter implementations are given by order of magnitudes. This can be explained as follows: the MLEF method performs the assimilation step onto a space given by the ensemble size; this is equivalent to perform an assimilation process by using the sample covariance matrix (5) whose quality is impacted by sampling errors. Contrarily, in our formulation, we employ sub-spaces whose basis vectors rely on the precision covariance (9) and, therefore, the impact of sampling errors is mitigated during optimization steps. As the degree $\beta $ of the observation operator increases, the accuracy of the MLEF degrades, and consequently, this method diverges for the largest $\beta $ value. On the other hand, convergence is always achieved in the RAN-EnKF method; this should be expected based on the theoretical results of Theorem 1. It should be noted that, as the $\beta $ value increases, the 3D-Var cost function becomes highly non-linear, and as a consequence, more iterations are needed to decrease errors (as in any iterative optimization method). In general, it can be seen that as the number of samples Z increases, the results can be improved regardless of the sub-space dimension U (i.e., for $Z = 10$). However, it is clear that, for highly non-linear observation operators, it is better to have small sub-spaces and a large number of samples.

5 Conclusions

In this paper, we propose an ensemble Kalman filter implementation via line-search optimization; we name it a Random Ensemble Kalman Filter (RAN-EnKF). The proposed method proceeds as follows: an ensemble of model realization is employed to estimate background moments, and then, quadratic approximations of the 3D-Var cost function are obtained among iterations via the linearization of the observation operator about current solutions. These approximations serve to estimate descent directions of the 3D-Var cost function, which are perturbed to obtain additional directions onto which analysis increments can be computed. We theoretically prove the global convergence of our optimization method. Experimental tests are performed by using the Lorenz 96 model and the Maximum-Likelihood-Ensemble-Filter formulation. The results reveal that the RAN-EnKF outperforms the MLEF in terms of $\ell _2$-norm of errors, and even more, it is able to achieve convergence in cases wherein the MLEF diverges.

References

Nino-Ruiz, E.D., Guzman-Reyes, L.G., Beltran-Arrieta, R.: An adjoint-free four-dimensional variational data assimilation method via a modified Cholesky decomposition and an iterative Woodbury matrix formula. Nonlinear Dyn. 99(3), 2441–2457 (2020)
Article Google Scholar
Nino-Ruiz, E.D.: Non-linear data assimilation via trust region optimization. Comput. Appl. Math. 38(3), 1–26 (2019). https://doi.org/10.1007/s40314-019-0901-x
Article MathSciNet MATH Google Scholar
Evensen, G.: The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dyn. 53(4), 343–367 (2003)
Article Google Scholar
Stroud, J.R., Katzfuss, M., Wikle, C.K.: A Bayesian adaptive ensemble Kalman filter for sequential state and parameter estimation. Mon. Weather Rev. 146(1), 373–386 (2018)
Article Google Scholar
Greybush, S.J., Kalnay, E., Miyoshi, T., Ide, K., Hunt, B.R.: Balance and ensemble Kalman filter localization techniques. Mon. Weather Rev. 139(2), 511–522 (2011)
Article Google Scholar
Nino-Ruiz, E.D., Sandu, A., Deng, X.: An ensemble Kalman filter implementation based on modified Cholesky decomposition for inverse covariance matrix estimation. SIAM J. Sci. Comput. 40(2), A867–A886 (2018)
Article MathSciNet Google Scholar
Bickel, P.J., Levina, E., et al.: Regularized estimation of large covariance matrices. Ann. Statist. 36(1), 199–227 (2008)
Article MathSciNet Google Scholar
Nino-Ruiz, E.: A matrix-free posterior ensemble Kalman filter implementation based on a modified Cholesky decomposition. Atmosphere 8(7), 125 (2017)
Article Google Scholar
Zupanski, M.: Maximum likelihood ensemble filter: theoretical aspects. Mon. Weather Rev. 133(6), 1710–1726 (2005)
Article Google Scholar
Zupanski, D., Zupanski, M.: Model error estimation employing an ensemble data assimilation approach. Mon. Weather Rev. 134(5), 1337–1354 (2006)
Article Google Scholar
Savard, G., Gauvin, J.: The steepest descent direction for the nonlinear bilevel programming problem. Oper. Res. Lett. 15(5), 265–272 (1994)
Article MathSciNet Google Scholar
Pan, V.Y., Branham, S., Rosholt, R.E., Zheng, A.-L.: Newton’s iteration for structured matrices. In: Fast Reliable Algorithms for Matrices with Structure, pp. 189–210. SIAM (1999)
Google Scholar
Nocedal, J.: Updating Quasi-Newton matrices with limited storage. Math. Comput. 35(151), 773–782 (1980)
Article MathSciNet Google Scholar
Knoll, D.A., Keyes, D.E.: Jacobian-free Newton-Krylov methods: a survey of approaches and applications. J. Comput. Phys. 193(2), 357–397 (2004)
Article MathSciNet Google Scholar
Hosseini, S., Huang, W., Yousefpour, R.: Line search algorithms for locally Lipschitz functions on Riemannian manifolds. SIAM J. Optim. 28(1), 596–619 (2018)
Article MathSciNet Google Scholar
Conn, A.R., Gould, N.I.M., Toint, Ph.L.: Trust region methods, vol. 1. SIAM (2000)
Google Scholar
Shi, Z.-J.: Convergence of line search methods for unconstrained optimization. Appl. Math. Comput. 157(2), 393–405 (2004)
MathSciNet MATH Google Scholar
Zhou, W., Akrotirianakis, I.G., Yektamaram, S., Griffin, J.D.: A matrix-free line-search algorithm for nonconvex optimization. Optim. Methods Softw. 34, 1–24 (2017)
Article MathSciNet Google Scholar
Dunn, J.C.: Newton’s method and the Goldstein step-length rule for constrained minimization problems. SIAM J. Control Optim. 18(6), 659–674 (1980)
Article MathSciNet Google Scholar
Dai, Y.-H., Yuan, Y.: A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim. 10(1), 177–182 (1999)
Article MathSciNet Google Scholar
Ravindran, A., Reklaitis, G.V., Ragsdell, K.M.: Engineering Optimization: Methods and Applications. Wiley, Hoboken (2006)
Book Google Scholar
Nino-Ruiz, E.D., Yang, X.-S.: Improved Tabu Search and Simulated Annealing methods for nonlinear data assimilation. Appl. Soft Comput. 83, 105624 (2019)
Article Google Scholar
Nino-Ruiz, E.D., Beltran-Arrieta, R., Mancilla Herrera, A.M.: Efficient matrix-free ensemble Kalman filter implementations: accounting for localization. In: Kalman Filters - Theory for Advanced Applications. InTech, February 2018
Google Scholar
Nino-Ruiz, E.D., Cheng, H., Beltran, R.: A robust non-Gaussian data assimilation method for highly non-linear models. Atmosphere 9(4), 126 (2018)
Article Google Scholar
Gottwald, G.A., Melbourne, I.: Testing for chaos in deterministic systems with noise. Phys. D Nonlinear Phenom. 212(1–2), 100–110 (2005)
Article MathSciNet Google Scholar
Karimi, A., Paul, M.R.: Extensive chaos in the Lorenz-96 model. Chaos Interdiscip. J. Nonlinear Sci. 20(4), 043105 (2010)
Article Google Scholar
Wilks, D.S.: Comparison of ensemble-MOS methods in the Lorenz’96 setting. Meteorol. Appl. 13(3), 243–256 (2006)
Article Google Scholar
Fertig, E.J., Harlim, J., Hunt, B.R.: A comparative study of 4D-VAR and a 4D ensemble Kalman filter: perfect model simulations with Lorenz-96. Tellus A 59(1), 96–100 (2007)
Article Google Scholar
van Leeuwen, P.J., Cheng, Y., Reich, S.: Nonlinear Data Assimilation. FADSRT, vol. 2. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18347-3
Book MATH Google Scholar

Download references

Acknowledgment

This work was supported by the Applied Math and Computer Science Lab at Universidad del Norte in Barranquilla, Colombia.

Author information

Authors and Affiliations

Applied Math and Computer Science Lab, Department of Computer Science, Universidad del Norte, Barranquilla, 0800001, Colombia
Elias D. Nino-Ruiz

Authors

Elias D. Nino-Ruiz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elias D. Nino-Ruiz .

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Amsterdam, Amsterdam, The Netherlands
Gábor Závodszky
University of Amsterdam, Amsterdam, The Netherlands
Michael H. Lees
University of Tennesee, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot
Intellegibilis, Setúbal, Portugal
Sérgio Brissos
Intellegibilis, Setúbal, Portugal
João Teixeira

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nino-Ruiz, E.D. (2020). A Random Line-Search Optimization Method via Modified Cholesky Decomposition for Non-linear Data Assimilation. In: Krzhizhanovskaya, V.V., et al. Computational Science – ICCS 2020. ICCS 2020. Lecture Notes in Computer Science(), vol 12141. Springer, Cham. https://doi.org/10.1007/978-3-030-50426-7_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-50426-7_15
Published: 15 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50425-0
Online ISBN: 978-3-030-50426-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics