Algorithmic Properties

Söderström, Torsten

doi:10.1007/978-3-319-75001-9_13

Torsten Söderström⁶

Part of the book series: Communications and Control Engineering ((CCE))

880 Accesses

Abstract

Various algorithmic properties are treated in this chapter. Variable projection algorithms, which are used for several identification methods, are analyzed. It is also described how overdetermined systems of equations can be handled in alternative ways. The chapter includes also descriptions of how many of the identification methods in the book can be implemented using time-recursive estimation schemes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Division of Systems and Control, Department of Information Technology, Uppsala University, Uppsala, Sweden
Torsten Söderström

Authors

Torsten Söderström
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Torsten Söderström .

Appendices

Appendix

13.A Algorithmic Aspects of the GIVE Estimate

This appendix describes some properties of GIVE algorithms. Some general aspects are treated in Sect. 13.A.1, while specifics for variable projection algorithms follow in Sect. 13.A.2.

Part of the analysis was carried out in detail for the BELS case in Söderström et al. (2005).

13.1.1 13.A.1 General Aspects

As described in Sect. 7.4, the general set of equations for determining $\varvec{\theta }$ and $\varvec{\rho }$ is given by (7.96). Ideally one should solve the possibly overdetermined system of equations

$$\begin{aligned} \mathbf f(\varvec{\vartheta }) = \hat{\mathbf r}_{z \varepsilon } (\varvec{\theta }) - {\mathbf r}_{z \varepsilon } (\varvec{\theta }, \varvec{\rho }) \approx \mathbf 0 \end{aligned}$$

(13.87)

with respect to $\varvec{\theta }$ and $\varvec{\rho }$.

Consider the variant with a fixed weighting matrix $\mathbf W$, that does not depend on $\varvec{\theta }$. Thus the Frisch variant which compares the covariance function for the residuals is not included in the present discussion. It turns out that the function $\mathbf f(\varvec{\theta }, \varvec{\rho })$ is bilinear: It is linear in $\varvec{\theta }$ and linear in $\varvec{\rho }$, but not linear simultaneously in both vectors. In early BELS algorithms, Zheng and Feng (1989), this property was exploited in the following way.

Algorithm 13.3

Assume that the system of equations is not overdetermined. Write the Eq. (13.87) symbolically as

$$\begin{aligned} \mathbf A_1(\varvec{\rho }) \varvec{\theta }= & {} \mathbf b_1 (\varvec{\rho }) \; , \end{aligned}$$

(13.88)

$$\begin{aligned} \mathbf A_2(\varvec{\theta }) \varvec{\rho }= & {} \mathbf b_2 (\varvec{\theta }) \; , \end{aligned}$$

(13.89)

where $ \mathbf A_1(\varvec{\theta })$ is a square matrix of dimension dim$(\varvec{\theta })$, and $ \mathbf A_2(\varvec{\theta })$ is another square matrix of dimension dim$(\varvec{\rho })$. Then iterating successively between solving the two linear systems of equations above can be written as follows

1.
Start with an initial guess, $\varvec{\theta }^{(0)}$.
2.
For $i = 1, \dots , $ repeat until convergence

2a. Solve $ \mathbf A_2(\varvec{\theta }^ { (i-1) })\varvec{\rho }= \mathbf b_2(\varvec{\theta }^ { (i-1) })$ to get $\varvec{\rho }^ {(i)}$.

2b. Solve $ \mathbf A_1(\varvec{\rho }^ { (i) })\varvec{\theta }= \mathbf b_1(\varvec{\rho }^ { (i) })$ to get $\varvec{\theta }^ {(i)}$.

$\blacksquare $

After linearizing these iterations around the true parameter vector, one may write

$$\begin{aligned} \varvec{\theta }^ {(i)} - \varvec{\theta }_0 = \mathbf G\left[ \varvec{\theta }^ {(i-1)} - \varvec{\theta }_0 \right] \; . \end{aligned}$$

(13.90)

The (local) convergence properties of the iterations are determined by the eigenvalues of the matrix $\mathbf G$. Such a convergence analysis is undertaken in Söderström et al. (2005). It turns out that:

1.
Local convergence takes always place ($\mathbf G$ has all eigenvalues inside the unit circle) if the signal-to-noise ratio, SNR, on the input and output sides is large. There, it is also proved that the matrix $\mathbf G$ has always one eigenvalue equal to zero. In case $ \mathsf{E} \left\{ u_0^{2}(t) \right\} $ becomes large, the eigenvalues of $\mathbf G$ all satisfy
$$\begin{aligned} \lambda _{j}(\mathbf G) = O(1/ \mathsf{E} \left\{ u_0^{2}(t)\right\} ) \; . \end{aligned}$$
(13.91)
Unfortunately, this is of somewhat limited practical value, as for large SNR the bias introduced by the least squares method may be insignificant anyway.
2.
There are indeed cases, with low SNR, where the matrix $\mathbf G$ has eigenvalues outside the unit circle. Then Algorithm 13.3 will not produce the desired solution (although the solution does exist) to the Eq. (13.87).

In Söderström et al. (2005) it was proposed to preferably solve the Eq. (13.87) using a variable projection algorithm, cf. Sect. A.1.4. This principle is outlined below.

Algorithm 13.4

Write the Eq. (13.87) as

$$\begin{aligned} \mathbf 0= \mathbf f(\varvec{\theta },\varvec{\rho })&= \hat{\mathbf r}_{\mathbf zy} - \hat{\mathbf R}_{\mathbf z\varvec{\varphi }} \varvec{\theta }- {\mathbf r}_{\tilde{\mathbf z} \tilde{y} } (\varvec{\rho }) - \mathbf R_{ \tilde{\mathbf z} \tilde{\varvec{\varphi }} } \varvec{\theta }\nonumber \\&{\mathop {=}\limits ^{\varDelta }}\,\mathbf g(\varvec{\rho }) - \mathbf F(\varvec{\rho }) \varvec{\theta }\; . \end{aligned}$$

(13.92)

Then the loss function to be minimized can be written as

$$\begin{aligned} \parallel \mathbf f(\varvec{\theta }, \varvec{\rho }) \parallel _{\mathbf W}^2 = \left[ \mathbf g(\varvec{\rho }) - \mathbf F(\varvec{\rho }) \varvec{\theta }\right] ^ {T} \mathbf W\left[ \mathbf g(\varvec{\rho }) - \mathbf F(\varvec{\rho }) \varvec{\theta }\right] \; . \end{aligned}$$

(13.93)

To minimize the criterion with respect to $\varvec{\theta }$ for any fixed $\varvec{\rho }$ is easy:

$$\begin{aligned} \varvec{\theta }= \varvec{\theta }(\varvec{\rho }) = \left( \mathbf F^{T}(\varvec{\rho }) \mathbf W\mathbf F(\varvec{\rho }) \right) ^{-1} \mathbf F^{T}(\varvec{\rho }) \mathbf W\mathbf g(\varvec{\rho }) \; . \end{aligned}$$

(13.94)

Inserting (13.94) into (13.93) gives the concentrated loss function

$$\begin{aligned} V(\varvec{\rho })= & {} \min _{\varvec{\theta }} \parallel \mathbf f(\varvec{\theta }, \varvec{\rho }) \parallel _{\mathbf W} ^2 \nonumber \\= & {} \mathbf g^{T}(\varvec{\rho }) \mathbf W\mathbf g(\varvec{\rho }) - \mathbf g^{T}(\varvec{\rho }) \mathbf W\mathbf F(\varvec{\rho }) \left( \mathbf F^{T}(\varvec{\rho }) \mathbf W\mathbf F(\varvec{\rho }) \right) ^{-1} \mathbf F^{T}(\varvec{\rho }) \mathbf W\mathbf g(\varvec{\rho }) \; . \nonumber \\ \end{aligned}$$

(13.95)

Minimization of $V(\varvec{\rho })$ in (13.95) has to be carried out using some numerical search algorithm.

One advantage of minimizing $V(\varvec{\rho })$ numerically instead of $\parallel f(\varvec{\theta }, \varvec{\rho }) \parallel _\mathbf W^2 $ is, as reported in Söderström et al. (2005), that this algorithm is much more robust than Algorithm 13.3 and no particular numerical problems have been reported. $\blacksquare $

When using Algorithm 13.4 to solve the equations at least two theoretical and practical questions arise.

1.
Does the loss function $V(\varvec{\rho })$ have a unique global minimum point? Every solution to the Eq. (13.87) would correspond to a global minimum point giving V its theoretical minimal value equal to zero.
2.
Does the loss function $V(\varvec{\rho })$ have any ‘false’ local minimum points? (with ‘false’ is here understood points such that $\varvec{\rho }\ne \varvec{\rho }_0$). If this happens to be the case, there is a potential risk that the numerical search procedure is stuck in a local minimum point not corresponding to the global minimum.

The two above questions are largely open and still unanswered. As mentioned before the reported experience indicates that in practice one should not expect convergence problems. Still, the optimization problem is certainly nonlinear and has a good deal of structure, so it would not be a surprise if false minima can sometimes exist. A further aspect is that in practice one would prefer not to consider any arbitrary vector $\varvec{\rho }$ but only such ones that correspond to positive noise variances (and to a positive definite covariance sequence for $r_{\tilde{y}}(\tau )$ when $\tilde{y}(t)$ is treated as a correlated noise).

To exemplify the above reasoning of the two questions, consider the following example.

Example 13.2

Consider the very simple case (the detailed calculations will easily become messy with more advanced examples!)

$$\begin{aligned} y_0(t) = b u_0(t) \; , \end{aligned}$$

(13.96)

and set

$$\begin{aligned}&\mathbf z(t) = \left( \begin{array}{c} y(t) \\ u(t) \\ u(t-1) \end{array}\right) , \ \ \varphi (t) = u(t) \; , \ \ \end{aligned}$$

(13.97)

$$\begin{aligned}&r_0 = \mathsf{E} \left\{ u_0^2(t) \right\} , \ \ r_1 = \mathsf{E} \left\{ u_0(t) u_0(t-1) \right\} \; . \end{aligned}$$

(13.98)

Let the true noise variances be denoted as

$$\begin{aligned} \lambda _u^0 = \mathsf{E} \left\{ \tilde{u}^2(t) \right\} , \ \ \lambda _y^0 = \mathsf{E} \left\{ \tilde{y}^2(t) \right\} \; , \end{aligned}$$

(13.99)

and set for future calculations

$$\begin{aligned} \xi = \lambda _y^0 - \lambda _y , \ \ \eta = \lambda _u^0 - \lambda _u \; . \end{aligned}$$

(13.100)

In this case it holds (when $N \rightarrow \infty $)

$$\begin{aligned} \mathbf g= & {} \mathbf g(\xi ) = \mathbf r_{\mathbf zy} - \mathbf r_{\tilde{\mathbf z} \tilde{y}} (\varvec{\rho }) = \left( \begin{array}{c} b^2 r_0 + \xi \\ b r_0 \\ b r_1 \end{array}\right) \; , \end{aligned}$$

(13.101)

$$\begin{aligned} \mathbf F= & {} \mathbf F(\eta ) = \mathbf r_{\mathbf z\varvec{\varphi }} - \mathbf r_{\tilde{\mathbf z} \tilde{\varvec{\varphi }}} (\varvec{\rho }) = \left( \begin{array}{c} b r_0 \\ r_0 + \eta \\ r_1 \end{array}\right) \; . \end{aligned}$$

(13.102)

First consider global minima, or equivalently, all solutions to

$$\begin{aligned} \mathbf g(\xi ) - \mathbf F(\eta ) \hat{b} = 0 \; , \end{aligned}$$

(13.103)

where the unknowns are $\hat{b}, \xi $, and $\eta $. In order to have identifiability one must require that $r_1 \ne 0,\ b \ne 0$. The last component of (13.103) then gives $\hat{b} = b$. Then the first and the second components give directly $ \xi = 0, \ \eta = 0$. Hence in this specific example, there is only one solution to the GIVE equation.

Next consider the local minima of V, (13.95). For this examination choose for convenience the weighting (scaling)

$$\begin{aligned} \mathbf W= \left( \begin{array}{ccc} 1 &{} 0 &{} 0 \\ 0 &{} b^2 &{} 0 \\ 0 &{} 0 &{} b^2 \end{array}\right) \; . \end{aligned}$$

(13.104)

Use (13.101), (13.102) and express the loss function as

$$\begin{aligned} V(\xi ,\eta ) = \mathbf g^ {T}(\xi ) \mathbf W\mathbf g(\xi ) - \frac{ \left[ \mathbf g^{T}(\xi ) \mathbf W\mathbf F(\eta ) \right] ^2 }{ \mathbf F^{T}(\eta )\mathbf W\mathbf F(\eta )} \; . \end{aligned}$$

(13.105)

The stationary points of $V(\xi , \eta )$ are the solutions to

$$\begin{aligned} V'_{\xi } = 0\Rightarrow & {} \mathbf g^{T}(\xi )\mathbf W\mathbf g_{\xi }(\xi ) - \frac{ \mathbf g^{T}(\xi )\mathbf W\mathbf F(\eta )}{\mathbf F^{T}(\eta )\mathbf W\mathbf F(\eta )} \mathbf F^{T}(\eta )\mathbf W\mathbf g_{\xi }(\xi ) = 0 \; , \end{aligned}$$

(13.106)

$$\begin{aligned} V'_{\eta } = 0\Rightarrow & {} - \frac{ \mathbf g^{T}(\xi )\mathbf W\mathbf F(\eta )}{\mathbf F^{T}(\eta )\mathbf W\mathbf F(\eta )} \mathbf g^{T}(\xi )\mathbf W\mathbf F_{\eta }(\eta )\nonumber \\&+ \frac{ \left[ \mathbf g^{T}(\xi ) \mathbf W\mathbf F(\eta ) \right] ^2 }{\left[ \mathbf F^{T}(\eta )\mathbf W\mathbf F(\eta ) \right] ^2 } \mathbf F^{T}(\eta )\mathbf W\mathbf F_{\eta }(\eta ) = 0 \; . \end{aligned}$$

(13.107)

Furthermore,

$$\begin{aligned} \mathbf g_{\xi }(\xi ) = \frac{ \partial \mathbf g(\xi )}{\partial \xi } = \left( \begin{array}{c} 1 \\ 0 \\ 0 \end{array}\right) , \ \ \mathbf F_{\eta }(\eta ) = \frac{ \partial \mathbf F(\eta )}{\partial \eta } = \left( \begin{array}{c} 0 \\ 1 \\ 0 \end{array}\right) \; . \end{aligned}$$

(13.108)

Next, (13.106) is simplified:

$$\begin{aligned} V'_{\xi } = 0\Rightarrow & {} \left[ b^2 r_0^2 + b^2 (r_0 + \eta )^2 + b^2 r_1^2 \right] (b^2 r_0 + \xi ) \nonumber \\&- \left[ b r_0 (b^2 r_0 + \xi ) + b^3 r_0 (r_0 + \eta ) + b^3 r_1^2 \right] b r_0 = 0 \nonumber \\\Rightarrow & {} \left[ (r_0 + \eta )^2 + r_1^2 \right] \xi - b^2 r_0 \eta (r_0 + \eta ) = 0 \nonumber \\\Rightarrow & {} \xi = \frac{ b^2 r_0 (r_0 + \eta ) }{ (r_0 + \eta )^2 + r_1^2 } \eta \; . \end{aligned}$$

(13.109)

Simplification of (13.107) leads to

$$\begin{aligned} V'_{\eta } = 0\Rightarrow & {} - \left[ b^2 r_0^2 + b^2 (r_0 + \eta )^2 + b^2 r_1^2 \right] b^3 r_0 \nonumber \\&+ \left[ b r_0 (b^2 r_0 + \xi ) + b^3 r_0 (r_0 + \eta ) + b^3 r_1^2 \right] b^2 (r_0 + \eta ) = 0 \nonumber \\\Rightarrow & {} -b^2 r_0 (r_0^2 + r_1^2) + (r_0 + \eta ) \left[ b^2(r_0 ^2 + r_1^2) + r_0 \xi \right] = 0 \nonumber \\\Rightarrow & {} \eta b^2 (r_0 ^2 + r_1^2) + r_0 \xi (r_0 + \eta ) = 0 \; . \end{aligned}$$

(13.110)

Inserting (13.109) into (13.110) leads now to an equation in only $\eta $:

$$\begin{aligned} \eta b^2 (r_0^2 + r_1^2) \left[ (r_0 + \eta )^2 + r_1^2 \right] + r_0 (r_0 + \eta ) b^2 r_0 \eta (r_0 + \eta ) = 0 \; . \end{aligned}$$

(13.111)

Obviously $\eta = 0$ is a solution (which corresponds to the true parameter values, $\hat{\varvec{\vartheta }} = \varvec{\vartheta }_0$). Cancelling a factor $b^2 \eta $ in (13.111) leads to

$$\begin{aligned} (r_0^2 + r_1^2) \left[ (r_0 + \eta )^2 + r_1^2 \right] + r_0^2 (r_0 + \eta )^2 = 0 \; . \end{aligned}$$

(13.112)

However, all terms in this equation must be positive, and hence $\eta = 0$ is the only solution to (13.111). This fact means that the loss function $V(\xi , \eta ) $ has a unique stationary point, namely $\xi = 0,\ \eta = 0$. $\blacksquare $

13.1.2 13.A.2 Use of a Variable Projection Algorithm for MIMO Systems

In this section some details are presented for a variable projection algorithm applied for the GIVE in the multivariable case. For general aspects on such algorithms; see Sect. A.1.4.

Noting that the matrix $\mathbf f$, (7.125), is an affine transformation of the system parameter matrix $\varvec{\varTheta }$, one can, by simplifying notations, write it as

$$\begin{aligned} \mathbf f(\varvec{\vartheta })&= \hat{\mathbf r}_{\mathbf z\mathbf y} - \hat{\mathbf R}_{\mathbf z\varvec{\varphi }} \varvec{\varTheta }- \mathbf r_{ \tilde{\mathbf z} \tilde{\mathbf y} } (\varvec{\rho }) + \mathbf R_{ \tilde{\mathbf z} \tilde{\varvec{\varphi }} } (\varvec{\rho }) \varvec{\varTheta }\nonumber \\&{\mathop { =}\limits ^{\varDelta }}\,\mathbf G- \mathbf H\varvec{\varTheta }\; , \end{aligned}$$

(13.113)

where

$$\begin{aligned} \mathbf G= \mathbf G(\varvec{\rho })= & {} \hat{\mathbf r}_{\mathbf z\mathbf y} - \mathbf r_{ \tilde{\mathbf z} \tilde{\mathbf y} } (\varvec{\rho }) \; , \end{aligned}$$

(13.114)

$$\begin{aligned} \mathbf H= \mathbf H(\varvec{\rho })= & {} \hat{\mathbf R}_{\mathbf z\varvec{\varphi }} - \mathbf R_{ \tilde{\mathbf z} \tilde{\varvec{\varphi }} } (\varvec{\rho }) \; . \end{aligned}$$

(13.115)

Here $\varvec{\rho }$ is the noise parameter vector; see (5.66) or (5.67). Note that $\mathbf f(\varvec{\vartheta })$ is linear in the parameter matrix $\varvec{\varTheta }$. This can be exploited to simplify the minimization problem. The parameter estimate is defined as

$$\begin{aligned} ( \hat{\varvec{\varTheta }}, \hat{\varvec{\rho }} )\,{\mathop {=}\limits ^{\varDelta }}\,\mathrm{arg} \min _{\varvec{\vartheta }} V(\varvec{\varTheta }, \varvec{\rho }) \; , \end{aligned}$$

(13.116)

and can be arranged by first minimizing V with respect to $\varvec{\varTheta }$. Set

$$\begin{aligned} \hat{\varvec{\varTheta }} (\varvec{\rho })= & {} \mathrm{arg} \min _{\varvec{\varTheta }} \ V( \varvec{\varTheta }, \varvec{\rho }) \; , \end{aligned}$$

(13.117)

$$\begin{aligned} \hat{\varvec{\rho }}= & {} \mathrm{arg} \min _{\varvec{\rho }} V_2 (\varvec{\rho })\,{\mathop { =}\limits ^{\varDelta }}\,\mathrm{arg} \min _{\varvec{\rho }} V( \hat{\varvec{\varTheta }} (\varvec{\rho }), \varvec{\rho }) \; . \end{aligned}$$

(13.118)

This means that $V_2(\varvec{\rho })$ is a concentrated loss function. The minimization in (13.117) is simple, as V depends quadratically on $\varvec{\varTheta }$. The minimization problem in (13.118) is simpler than that in the original formulation (13.116), as the number of unknown variables is significantly reduced.

In what follows some details for the optimization problem (13.116) are given. Write the criterion $V(\varvec{\varTheta }, \varvec{\rho })$ as, see (5.71)

$$\begin{aligned} V(\varvec{\varTheta }, \varvec{\rho }) = \mathrm{tr} \left[ \mathbf W( \mathbf G- \mathbf H\varvec{\varTheta }) \mathbf Z( \mathbf G- \mathbf H\varvec{\varTheta })^{T} \right] \; . \end{aligned}$$

(13.119)

Here let, by using general notations, $\mathbf H$ be an $k \times m$ matrix, $\varvec{\varTheta }$ an $m \times n$ matrix, $\mathbf G$ an $k \times n$ matrix, $\mathbf Z$ an $n \times n$ matrix, and $\mathbf W$ an $k \times k$ matrix. Further, introduce $\mathbf e_i$ as an m-dimensional unit vector, and $\mathbf f_j$ as an n-dimensional unit vector. Then one can write

$$\begin{aligned} \frac{\partial }{\partial \varvec{\varTheta }_{i, j} } \varvec{\varTheta }= \mathbf e_i \mathbf f_j^{T} \; . \end{aligned}$$

(13.120)

Now find the minimal value of $V(\varvec{\varTheta }, \varvec{\rho })$ and the minimizing argument when the matrix $\varvec{\varTheta }$ is varied. Direct differentiation gives

$$\begin{aligned} 0 = \frac{\partial V}{\partial \varvec{\varTheta }_{i, j} }= & {} - 2 \mathrm{tr} \left[ \mathbf W\mathbf H\mathbf e_i \mathbf f_j^{T} \mathbf Z( \mathbf G- \mathbf H\varvec{\varTheta })^{T} \right] \nonumber \\= & {} - 2 \mathbf f_j^{T} \mathbf Z( - \varvec{\varTheta }^{T} \mathbf H^{T} + \mathbf G^{T} ) \mathbf W\mathbf H\mathbf e_i \ \ \ \forall i, j \; . \end{aligned}$$

(13.121)

Hence one can conclude

$$\begin{aligned}\mathbf Z(\varvec{\varTheta }^{T} \mathbf H^{T} - \mathbf G^{T} ) \mathbf W\mathbf H= \mathbf 0\; , \end{aligned}$$

and therefore

$$\begin{aligned} \hat{\varvec{\varTheta }} = \left( \mathbf H^{T} \mathbf W\mathbf H\right) ^{-1} \mathbf H^{T} \mathbf W\mathbf G\; . \end{aligned}$$

(13.122)

Note that $\hat{\varvec{\varTheta }}$ does not depend on the weighting matrix $\mathbf Z$. The minimal value $V_2(\varvec{\rho })$ of the criterion is easily found to be

(13.123)

It is illustrative to derive the results (13.122), (13.123) using completion of squares as an alternative technique. To this aim, first set

$$\begin{aligned} \varvec{\varTheta }^* = \left( \mathbf H^{T} \mathbf W\mathbf H\right) ^{-1} \mathbf H^{T} \mathbf W\mathbf G\; . \end{aligned}$$

(13.124)

Next the criterion (13.119) can be rewritten as

$$\begin{aligned} V(\varvec{\varTheta }, \varvec{\rho })= & {} \mathrm{tr} \left[ \mathbf W\left( \mathbf G- \mathbf H\varvec{\varTheta }+ \mathbf H\varvec{\varTheta }^* - \mathbf H\varvec{\varTheta }^* \right) \mathbf Z\left( \mathbf G- \mathbf H\varvec{\varTheta }+ \mathbf H\varvec{\varTheta }^* - \mathbf H\varvec{\varTheta }^* \right) ^{T} \right] \nonumber \\= & {} \mathrm{tr} \left[ \mathbf W\mathbf H( \varvec{\varTheta }- \varvec{\varTheta }^*) \mathbf Z(\varvec{\varTheta }- \varvec{\varTheta }^*) ^{T} \mathbf H^{T} \right] \nonumber \\&+ \mathrm{tr} \left[ \mathbf W( \mathbf H\varvec{\varTheta }^* - \mathbf G) \mathbf Z(\mathbf H\varvec{\varTheta }^* - \mathbf G)^{T} \right] \nonumber \\&+ 2 \mathrm{tr} \left[ \mathbf W\mathbf H( \varvec{\varTheta }- \varvec{\varTheta }^* ) \mathbf Z(\mathbf H\varvec{\varTheta }^* - \mathbf G)^{T} \right] \; . \end{aligned}$$

(13.125)

Here, the last term can be evaluated as

$$\begin{aligned}&2 \mathrm{tr} \left[ \mathbf W\mathbf H( \varvec{\varTheta }- \varvec{\varTheta }^* ) \mathbf Z(\mathbf H\varvec{\varTheta }^* - \mathbf G)^{T} \right] \nonumber \\= & {} 2 \mathrm{tr} \left[ \left\{ \mathbf G^{T} \mathbf W\mathbf H\left( \mathbf H^{T} \mathbf W\mathbf H\right) ^{-1} \mathbf H^{T} - \mathbf G^{T} \right\} \mathbf W\mathbf H( \varvec{\varTheta }- \varvec{\varTheta }^* ) \mathbf Z\right] \nonumber \\= & {} 2 \mathrm{tr} \left[ \mathbf G^{T} \left\{ \mathbf W\mathbf H\left( \mathbf H^{T} \mathbf W\mathbf H\right) ^{-1} \mathbf H^{T} \mathbf W\mathbf H- \mathbf W\mathbf H\right\} ( \varvec{\varTheta }- \varvec{\varTheta }^* ) \mathbf Z\right] \nonumber \\= & {} 0 \; . \end{aligned}$$

From this and (13.125) it follows directly that $V(\varvec{\varTheta }, \varvec{\rho })$ is minimized with respect to $\varvec{\varTheta }$ for $\varvec{\varTheta }= \varvec{\varTheta }^*$.

13.B Handling Overdetermined Systems of Equations

This section treats the situation where some of the generalized IV equations are required to hold exactly and others approximately for the parameter estimates. The Algorithms 13.1 and 13.2 will be compared.

For this aim set

$$\begin{aligned} \mathbf f(\varvec{\vartheta })= & {} \left( \begin{array}{c} \mathbf f_1(\varvec{\vartheta }) \\ \mathbf f_2(\varvec{\vartheta }) \end{array}\right) \; , \end{aligned}$$

(13.126)

$$\begin{aligned} \mathbf S= \frac{\partial \mathbf f}{\partial \varvec{\vartheta }}= & {} \left( \begin{array}{c} \frac{\partial \mathbf f_1}{\partial \varvec{\vartheta }} \\ \\ \frac{\partial \mathbf f_2}{\partial \varvec{\vartheta }} \end{array}\right) \,{\mathop {=}\limits ^{\varDelta }}\,\left( \begin{array}{c} \mathbf F_1 \\ \mathbf F_2 \end{array}\right) \; , \end{aligned}$$

(13.127)

where $\mathbf F_1$ is $n_1 \times n$ and of rank $n_1$, and $\mathbf S$ has rank n. The matrix $\mathbf S$ is a form of sensitivity matrix.

The equations governing the parameter estimates are for Algorithm 13.1

$$\begin{aligned} \begin{array}{l} \mathbf f_1(\varvec{\vartheta }) = \mathbf 0\; , \\ \mathbf f_2^ {T} (\varvec{\vartheta }) \mathbf W_2 \mathbf F_2 (\varvec{\vartheta }) + \varvec{\lambda }^{T} \mathbf F_1 (\varvec{\vartheta }) = \mathbf 0\; , \end{array} \end{aligned}$$

(13.128)

while Algorithm 13.2 leads to

$$\begin{aligned} \mathbf 0= \alpha \mathbf f_1 ^{T}(\varvec{\vartheta }) \mathbf W_1 \mathbf F_1 (\varvec{\vartheta }) + \mathbf f_2 ^{T}(\varvec{\vartheta }) \mathbf W_2 \mathbf F_2 (\varvec{\vartheta }) \; , \end{aligned}$$

(13.129)

where $\alpha $ should be chosen large.

The sensitivity matrix $ \mathbf S= \frac{\partial \mathbf f}{\partial \varvec{\vartheta }} $ appears also in (14.90). Next it will be shown that the factor that matters for the covariance matrix of the parameter estimates is

$$\begin{aligned} \mathbf G\,{\mathop {=}\limits ^{\varDelta }}\,\left( \mathbf S^{T} \mathbf W\mathbf S\right) ^{-1} \mathbf S^{T} \mathbf W\; . \end{aligned}$$

(13.130)

Equations (13.128) and (13.129) are compatible if one sets

$$\begin{aligned} \varvec{\lambda }^{T} = \alpha \mathbf f_1 ^{T}(\varvec{\vartheta }) \mathbf W_1 \; . \end{aligned}$$

(13.131)

It then makes sense that

$$\begin{aligned} \lim _{\alpha \rightarrow \infty } \mathbf f_1(\varvec{\vartheta }) = \mathbf 0\; , \end{aligned}$$

(13.132)

as $\varvec{\lambda }$ does not depend on $\alpha $.

Using (13.129) and the associated weighting

$$\begin{aligned} \mathbf W= \left( \begin{array}{cc} \alpha \mathbf W_1 &{} \mathbf 0\\ \mathbf 0&{} \mathbf W_2 \end{array}\right) \end{aligned}$$

(13.133)

in the expression (13.130) gives

$$\begin{aligned} \mathbf G= \mathbf G(\alpha ) = \left( \alpha \mathbf F_1^{T} \mathbf W_1 \mathbf F_1 + \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 \right) ^ {-1} \left( \begin{array}{cc} \alpha \mathbf F_1^{T} \mathbf W_1&\mathbf F_2^{T} \mathbf W_2 \end{array}\right) \; . \end{aligned}$$

(13.134)

It is of interest to examine the limit of $\mathbf G(\alpha )$ when $\alpha $ tends to infinity. Due to the nature of the problem, one expects that the limit exists and that it is independent of $\mathbf W_1$. Note that both terms in the inverse appearing in (13.134) can be singular, and thus the matrix inversion lemma cannot be applied in a standard fashion to examine the convergence.

Before examining the limit using an algebraic approach, consider the stationary points obtained in (13.128) for Algorithm 13.1. After linearization, these equations may be written as

$$\begin{aligned} \begin{array}{l} \mathbf F_1 \tilde{\varvec{\vartheta }} + \mathbf y_1 = \mathbf 0\; , \\ \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 \tilde{\varvec{\vartheta }} + \mathbf F_1 ^{T} \tilde{\varvec{\lambda }} + \mathbf F_2^{T} \mathbf W_2 \mathbf y_2 = \mathbf 0\; . \end{array} \end{aligned}$$

(13.135)

From this equation one can find the parameter error $\tilde{\varvec{\vartheta }}$ as

$$\begin{aligned} \tilde{\varvec{\vartheta }} = - \left( \begin{array}{cc} \mathbf I&\mathbf 0\end{array}\right) \left( \begin{array}{cc} \mathbf F_1 &{} \mathbf 0\\ \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 &{} {\quad } \mathbf F_1 ^{T} \end{array}\right) ^{-1} \left( \begin{array}{cc} \mathbf I&{} \mathbf 0\\ \mathbf 0&{} {\quad } \mathbf F_2^{T} \mathbf W_2 \end{array}\right) \left( \begin{array}{c} \mathbf y_1 \\ \mathbf y_2 \end{array}\right) \; , \end{aligned}$$

(13.136)

and one would therefore expect that the matrix in front of the $\mathbf y$’s relates to $\lim _{\alpha \rightarrow \infty } \mathbf G(\alpha )$. It will now be shown that this is indeed the case.

Lemma 13.1

Under the dimension and rank assumptions of $\mathbf F_1$ and $\mathbf F_2$ it holds for any fixed $\alpha $ that

$$\begin{aligned} \mathbf G(\alpha )= & {} \left( \alpha \mathbf F_1^{T} \mathbf W_1 \mathbf F_1 + \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 \right) ^ {-1} \left( \begin{array}{cc} \alpha \mathbf F_1^{T} \mathbf W_1&\mathbf F_2^{T} \mathbf W_2 \end{array}\right) \nonumber \\&= \left( \begin{array}{cc} \mathbf I&\mathbf 0\end{array}\right) \left( \begin{array}{cc} \mathbf F_1 &{} - \mathbf W_1^{-1}/\alpha \\ \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 &{} \mathbf F_1^{T} \end{array}\right) ^{-1} \left( \begin{array}{cc} \mathbf I&{} \mathbf 0\\ \mathbf 0&{} {\quad } \mathbf F_2^{T} \mathbf W_2 \end{array}\right) \; . \end{aligned}$$

(13.137)

Proof

The right-hand side of (13.137) can be evaluated as

$\blacksquare $

Remark 13.3

When $\alpha \rightarrow \infty $ it holds

$$\begin{aligned} \lim _{\alpha \rightarrow \infty } \mathbf G(\alpha ) = \left( \begin{array}{cc} \mathbf I&\mathbf 0\end{array}\right) \left( \begin{array}{cc} \mathbf F_1 &{} \mathbf 0\\ \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 &{} \quad \mathbf F_1^{T} \end{array}\right) ^{-1} \left( \begin{array}{cc} \mathbf I&{} \mathbf 0\\ \mathbf 0&{} \quad \mathbf F_2^{T} \mathbf W_2 \end{array}\right) \; , \end{aligned}$$

(13.138)

which obviously does not depend on $\mathbf W_1$. Further, the right-hand side of (13.138) is precisely the matrix appearing in (13.136). $\blacksquare $

To express the inverse in (13.138) is relatively complicated in the general case. First rewrite $\mathbf G$ by making some block permutations,

$$\begin{aligned} \mathbf G= & {} \left( \begin{array}{cc} \mathbf I&\mathbf 0\end{array}\right) \left[ \left( \begin{array}{cc} \mathbf 0&{} \mathbf I\\ \mathbf I&{} \mathbf 0\end{array}\right) \left( \begin{array}{cc} \mathbf 0&{} \mathbf I\\ \mathbf I&{} \mathbf 0\end{array}\right) \left( \begin{array}{cc} \mathbf F_1 &{} \mathbf 0\\ \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 &{} {\quad } \mathbf F_1^{T} \end{array}\right) \right. \nonumber \\&\times \left. \left( \begin{array}{cc} \mathbf 0&{} \mathbf I\\ \mathbf I&{} \mathbf 0\end{array}\right) \left( \begin{array}{cc} \mathbf 0&{} \mathbf I\\ \mathbf I&{} \mathbf 0\end{array}\right) \right] ^{-1} \left( \begin{array}{cc} \mathbf I&{} \mathbf 0\\ \mathbf 0&{} {\quad } \mathbf F_2^{T} \mathbf W_2 \end{array}\right) \nonumber \\= & {} \left( \begin{array}{cc} \mathbf 0&\mathbf I\end{array}\right) \left( \begin{array}{cc} \mathbf F_1^{T} &{} {\quad }\mathbf F_2^{T} \mathbf W_2 \mathbf F_2 \\ \mathbf 0&{} \mathbf F_1 \end{array}\right) ^{-1} \left( \begin{array}{cc} \mathbf 0&{} {\quad }\mathbf F_2^{T} \mathbf W_2 \\ \mathbf I&{} \mathbf 0\end{array}\right) \; . \end{aligned}$$

(13.139)

To proceed, the inverse in (13.139) needs to be rewritten. For that aim apply Lemma A.6 to the matrix in (13.139). It is needed to verify that the matrix

$$\begin{aligned} \mathbf D_0\,{\mathop {=}\limits ^{\varDelta }}\,\mathbf P+ \mathbf P_{\perp } \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 \mathbf P_{\perp } \; , \end{aligned}$$

(13.140)

is non-singular, where

$$\begin{aligned} \mathbf P= \mathbf F_1 \mathbf F_1^{\dagger }, \ \ \ \mathbf P_{\perp } = \mathbf I- \mathbf P\; . \end{aligned}$$

(13.141)

Clearly, by construction $\mathbf D_0$ is symmetric and nonnegative definite. Further,

$$\begin{aligned} \mathbf x^{T} \mathbf D_0 \mathbf x= \mathbf 0,\Rightarrow & {} \mathbf x^{T} \left[ \mathbf P+ \mathbf P_{\perp } \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 \mathbf P_{\perp } \right] \mathbf x= \mathbf 0\; , \nonumber \\\Rightarrow & {} \mathbf x^{T} \mathbf P\mathbf x= 0, \ \ \mathbf F_2 \mathbf P_{\perp } \mathbf x= \mathbf 0\; , \nonumber \\\Rightarrow & {} \mathbf F_1 \mathbf x= \mathbf 0, \ \ \mathbf F_2 \left[ \mathbf I- \mathbf F_1 ^{T} \left( \mathbf F_1 \mathbf F_1^{T} \right) ^{-1} \mathbf F_1 \right] \mathbf x= \mathbf 0\; , \nonumber \\\Rightarrow & {} \mathbf F_1 \mathbf x= \mathbf 0, \ \ \mathbf F_2 \mathbf x= \mathbf 0, \ \ \Rightarrow \mathbf F\mathbf x= \mathbf 0\; . \end{aligned}$$

Thus $\mathbf D_0$ is non-singular. Then according to Lemma A.6, (13.139) can be expressed as

$$\begin{aligned} \mathbf G= \left( \begin{array}{cc} \mathbf 0&\mathbf I\end{array}\right) \left( \begin{array}{cc} \mathbf H_{11} &{} \mathbf H_{12} \\ \mathbf H_{21} &{} \mathbf H_{22} \end{array}\right) \left( \begin{array}{cc} \mathbf 0&{} {\quad }\mathbf F_2^{T} \mathbf W_2 \\ \mathbf I&{} \mathbf 0\end{array}\right) \; , \end{aligned}$$

(13.142)

where

$$\begin{aligned} \mathbf H_{11}= & {} \left( \mathbf F_1 \mathbf F_1 ^{T} \right) ^{-1} \mathbf F_1 ( \mathbf I- \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 \mathbf D) \; , \nonumber \\ \mathbf H_{12}= & {} - \left( \mathbf F_1 \mathbf F_1 ^{T} \right) ^{-1} \mathbf F_1 \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 ( \mathbf I- \mathbf D\mathbf F_2^{T} \mathbf W_2 \mathbf F_2 ) \mathbf F_1^{T} \left( \mathbf F_1 \mathbf F_1 ^{T} \right) ^{-1} \; , \nonumber \\ \mathbf H_{21}= & {} \mathbf D\; , \nonumber \\ \mathbf H_{22}= & {} ( \mathbf I- \mathbf D\mathbf F_2^{T} \mathbf W_2 \mathbf F_2 ) \mathbf F_1 ^{T} \left( \mathbf F_1 \mathbf F_1 ^{T} \right) ^{-1} \; , \nonumber \\ \mathbf D= & {} \mathbf D_0^{-1} \mathbf P_{\perp } \; . \nonumber \end{aligned}$$

Straightforward multiplications in (13.142) then lead to

$$\begin{aligned} \mathbf G= \left( \begin{array}{ccc} ( \mathbf I- \mathbf D\mathbf F_2^{T} \mathbf W_2 \mathbf F_2 )\mathbf F_1 ^{T}\left( \mathbf F_1 \mathbf F_1 ^{T}\right) ^{-1}&\,&\mathbf D\mathbf F_2^{T} \mathbf W_2 \end{array}\right) \; . \end{aligned}$$

(13.143)

For the particular case when rank $\mathbf F_2 = n$ (which requires dim $\mathbf f_2 \ge n$) a simpler expression is possible:

Lemma 13.2

Consider the expression

$$\begin{aligned} \mathbf G(\alpha ) = \left[ \alpha \mathbf F_1^{T} \mathbf W_1 \mathbf F_1 + \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 \right] ^ {-1} \left( \begin{array}{cc} \alpha \mathbf F_1^{T} \mathbf W_1&\mathbf F_2^{T} \mathbf W_2 \end{array}\right) \; , \end{aligned}$$

(13.144)

where $\mathbf F_1$ is an $n_1 \times n$ matrix of rank $n_1$, $\mathbf F_2$ an $n_2 \times n$ matrix of rank n, $\mathbf W_1$ an $n_1 \times n_1$ matrix, $\mathbf W_2$ an $n_2 \times n_2$ matrix, and $\mathbf W_1$ and $\mathbf W_2$ are positive definite. Set

$$\begin{aligned} \mathbf H= \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 \; . \end{aligned}$$

(13.145)

Then it holds that the following limit exists and

$$\begin{aligned} \lim _{\alpha \rightarrow \infty } \mathbf G(\alpha )&{\mathop {=}\limits ^{\varDelta }}\,\left( \begin{array}{cc} \mathbf G_1&\mathbf G_2 \end{array}\right) \; , \end{aligned}$$

(13.146)

$$\begin{aligned} \mathbf G_1&= \mathbf H^{-1}\mathbf F_1^{T} \left( \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) ^{-1} \; , \end{aligned}$$

(13.147)

$$\begin{aligned} \mathbf G_2&= \left[ \mathbf I- \mathbf H^{-1}\mathbf F_1^{T} \left( \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) ^{-1} \mathbf F_1 \right] \mathbf H^ {-1} \mathbf F_2^{T} \mathbf W_2 \; . \end{aligned}$$

(13.148)

Proof

Note that matrix $\mathbf H$ by construction is invertible. Using the matrix inversion lemma

$$\begin{aligned} \left[ \alpha \mathbf F_1^{T} \mathbf W_1 \mathbf F_1 + \mathbf F_2^{T} \mathbf W_2 \mathbf F_2 \right] ^ {-1} = \mathbf H^{-1} - \mathbf H^{-1}\mathbf F_1^{T} \left( \frac{1}{\alpha } \mathbf W_1^ {-1} + \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) ^{-1} \mathbf F_1 \mathbf H^{-1} \; . \end{aligned}$$

(13.149)

Thus

$$\begin{aligned} \mathbf G_2= & {} \lim _{\alpha \rightarrow \infty } \left[ \mathbf H^{-1} - \mathbf H^{-1}\mathbf F_1^{T} \left( \frac{1}{\alpha } \mathbf W_1^ {-1} + \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) ^{-1} \mathbf F_1 \mathbf H^{-1} \right] \mathbf F_2^{T} \mathbf W_2 \nonumber \\= & {} \mathbf H^{-1} \mathbf F_2^{T} \mathbf W_2 - \mathbf H^{-1}\mathbf F_1^{T} \left( \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) ^{-1} \mathbf F_1 \mathbf H^{-1} \mathbf F_2^{T} \mathbf W_2 \nonumber \\= & {} \left[ \mathbf I- \mathbf H^{-1}\mathbf F_1^{T} \left( \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) ^{-1} \mathbf F_1 \right] \mathbf H^{-1} \mathbf F_2^{T} \mathbf W_2 \; . \end{aligned}$$

(13.150)

Similarly,

$$\begin{aligned} \mathbf G_1= & {} \lim _{\alpha \rightarrow \infty } \left[ \alpha \mathbf H^{-1} \mathbf F_1^{T} \mathbf W_1 - \alpha \mathbf H^{-1} \mathbf F_1^{T} \left( \frac{1}{\alpha } \mathbf W_1^ {-1} + \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) ^{-1} \mathbf F_1 \mathbf H^{-1} \mathbf F_1^{T} \mathbf W_1 \right] \nonumber \\= & {} \lim _{\alpha \rightarrow \infty } \left[ \alpha \mathbf H^{-1} \mathbf F_1^{T} \left( \frac{1}{\alpha } \mathbf W_1^ {-1} + \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) ^{-1} \right. \nonumber \\&\times \left. \left( \frac{1}{\alpha } \mathbf W_1^ {-1} + \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} - \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) \mathbf W_1 \right] \nonumber \\= & {} \lim _{\alpha \rightarrow \infty } \mathbf H^{-1} \mathbf F_1^{T} \left( \frac{1}{\alpha } \mathbf W_1^ {-1} + \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) ^{-1} \nonumber \\= & {} \mathbf H^{-1} \mathbf F_1^{T} \left( \mathbf F_1 \mathbf H^ {-1} \mathbf F_1^{T} \right) ^{-1} \; . \end{aligned}$$

(13.151)

$\blacksquare $

13.C Algorithmic Aspects of CFA-Based Estimators

The loss function $V_2(\varvec{\vartheta })$, (2.49), turns out to depend quadratically on some of the parameters. When applied to dynamic models as in Sect. 8.5, this can be exploited to treat the minimization problem as a separable nonlinear least squares problem.

It is convenient to split the parameter vector $\varvec{\vartheta }$ into two parts:

$$\begin{aligned} \varvec{\vartheta }= \left( \begin{array}{c} \varvec{\theta }\\ \varvec{\rho }\end{array}\right) , \ \ \varvec{\theta }= \left( \begin{array}{c} a_1 \\ \vdots \\ a_{n_a} \\ b_1 \\ \vdots \\ b_{n_b} \end{array}\right) , \ \ \varvec{\rho }= \left( \begin{array}{c} \lambda _y \\ \lambda _u \\ r_z(0) \\ \vdots \\ r_z(k) \end{array}\right) \; . \end{aligned}$$

(13.152)

Next exploit that $V_2(\varvec{\vartheta })$ depends quadratically on $\varvec{\rho }$. For this purpose, write the covariance matrix $\mathbf R(\varvec{\vartheta })$ in the form

$$\begin{aligned} \mathbf R= \sum _{j=1}^{k+3} \varvec{\rho }_{j} \mathbf J_j \; . \end{aligned}$$

(13.153)

Specifically, the coefficient matrices $\mathbf J_j$ become:

$$\begin{aligned} \mathbf J_1= & {} \left( \begin{array}{cc} \mathbf I_{n_a+p_y+1} &{} \mathbf 0\\ \mathbf 0&{} \mathbf 0\end{array}\right) \; , \end{aligned}$$

(13.154)

$$\begin{aligned} \mathbf J_2= & {} \left( \begin{array}{cc} \mathbf 0&{} \mathbf 0\\ \mathbf 0&{} \mathbf I_{n_b+p_u} \end{array}\right) \; , \end{aligned}$$

(13.155)

$$\begin{aligned} \mathbf J_3= & {} \varvec{\varGamma }\mathbf I_{k+1} \varvec{\varGamma }^T \; , \end{aligned}$$

(13.156)

$$\begin{aligned} \mathbf J_j= & {} \varvec{\varGamma }\left( \begin{array}{cccc} \mathbf 0&{} 1 \\ 1 &{} &{} \ddots \\ &{} \ddots &{} &{} 1 \\ &{} &{} 1 \end{array}\right) \varvec{\varGamma }^T, \ \ \ j = 4, \dots , k+3 \; . \end{aligned}$$

(13.157)

Note that $\varvec{\varGamma }$ depends on $\varvec{\theta }$, and hence the coefficient matrices $\mathbf J_j, j = 3, \dots k+3$ are also functions of $\varvec{\theta }$.

The loss function can now be written as, where the dependence on $\varvec{\rho }$ is emphasized,

$$\begin{aligned} V_2(\varvec{\vartheta })= & {} \mathrm{tr} \left[ \mathbf Q_1 \left( \hat{\mathbf R} - \sum _j \varvec{\rho }_{j} \mathbf J_j \right) \mathbf Q_2 \left( \hat{\mathbf R} - \sum _{\ell } \varvec{\rho }_{ \ell } \mathbf J_{\ell } \right) \right] \nonumber \\= & {} \sum _j \sum _{\ell } \varvec{\rho }_j \varvec{\rho }_{ \ell } \mathrm{tr} \left[ \mathbf Q_1 \mathbf J_j \mathbf Q_2 \mathbf J_{\ell } \right] - 2 \sum _j \varvec{\rho }_{j} \mathrm{tr} \left[ \mathbf Q_1 \mathbf J_j \mathbf Q_2 \hat{\mathbf R} \right] + \mathrm{tr} \left[ \mathbf Q_1 \hat{\mathbf R} \mathbf Q_2 \hat{\mathbf R} \right] \; . \nonumber \\ \end{aligned}$$

(13.158)

It is straightforward to minimize (13.158) with respect to $\varvec{\rho }_{ j}$:

$$\begin{aligned} 0 = \sum _{\ell } \varvec{\rho }_{ \ell } 2 \mathrm{tr} \left[ \mathbf Q_1 \mathbf J_j \mathbf Q_2 \mathbf J_{\ell } \right] - 2 \mathrm{tr} \left[ \mathbf Q_1 \mathbf J_j \mathbf Q_2 \hat{\mathbf R} \right] , \ \ \ j = 1, \dots , m = k+3 \; . \end{aligned}$$

(13.159)

This is easily expressed as a system of linear equations:

$$\begin{aligned} \left( \begin{array}{ccc} \mathrm{tr} \left( \mathbf Q_1 \mathbf J_1 \mathbf Q_2 \mathbf J_1 \right) &{} \dots \mathrm{tr} \left( \mathbf Q_1 \mathbf J_1 \mathbf Q_2 \mathbf J_m \right) \\ \vdots \\ \mathrm{tr} \left( \mathbf Q_1 \mathbf J_m \mathbf Q_2 \mathbf J_1 \right) &{} \dots \mathrm{tr} \left( \mathbf Q_1 \mathbf J_m \mathbf Q_2 \mathbf J_m \right) \end{array}\right) \left( \begin{array}{c} \varvec{\rho }_{1} \\ \vdots \\ \varvec{\rho }_{m} \end{array}\right) = \left( \begin{array}{c} \mathrm{tr} \left( \mathbf Q_1 \mathbf J_1 \mathbf Q_2 \hat{\mathbf R} \right) \\ \vdots \\ \mathrm{tr} \left( \mathbf Q_1 \mathbf J_m \mathbf Q_2 \hat{\mathbf R} \right) \end{array}\right) \; , \end{aligned}$$

(13.160)

which can be compactly written as

$$\begin{aligned} \mathbf A(\varvec{\theta }) \varvec{\rho }= \mathbf b(\varvec{\theta }) \; . \end{aligned}$$

(13.161)

As $\left\{ \mathbf J_k \right\} $ depends on $\varvec{\theta }$, so will $\mathbf A$ and $\mathbf b$.

The loss function (13.158) can now be written as

$$\begin{aligned} V_2(\varvec{\vartheta }) = \varvec{\rho }^T \mathbf A\varvec{\rho }- 2 \varvec{\rho }^T \mathbf b+ \mathrm{tr} \left[ \mathbf Q_1 \hat{\mathbf R} \mathbf Q_2 \hat{\mathbf R} \right] \; . \end{aligned}$$

(13.162)

By minimizing over $\varvec{\rho }$ one gets the concentrated loss function, which depends on $\varvec{\theta }$ only, as

$$\begin{aligned} \bar{V}_2 (\varvec{\theta })= & {} \min _{\varvec{\rho }} V_2(\varvec{\rho }) \nonumber \\= & {} \mathrm{tr} \left[ \mathbf Q_1 \hat{\mathbf R} \mathbf Q_2 \hat{\mathbf R} \right] - \mathbf b^T(\varvec{\theta }) \mathbf A^{-1}(\varvec{\theta }) \mathbf b(\varvec{\theta }) \; . \end{aligned}$$

(13.163)

To minimize the criterion $\bar{V}_2(\varvec{\theta })$ a numerical search method has to be applied.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Söderström, T. (2018). Algorithmic Properties. In: Errors-in-Variables Methods in System Identification. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-75001-9_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-75001-9_13
Published: 08 April 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75000-2
Online ISBN: 978-3-319-75001-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Algorithmic Properties

Abstract

Access this chapter

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

13.A Algorithmic Aspects of the GIVE Estimate

13.1.1 13.A.1 General Aspects

Algorithm 13.3

Algorithm 13.4

Example 13.2

13.1.2 13.A.2 Use of a Variable Projection Algorithm for MIMO Systems

13.B Handling Overdetermined Systems of Equations

Lemma 13.1

Proof

Remark 13.3

Lemma 13.2

Proof

13.C Algorithmic Aspects of CFA-Based Estimators

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation