Skip to main content

Part of the book series: Communications and Control Engineering ((CCE))

  • 892 Accesses

Abstract

This chapter presents the EIV problem for static systems. Quite some details are given for a simple line-fitting problem, which yet captures the key properties of many more advanced EIV problems. It is shown that without specific assumptions there is no unique solution to the EIV problem, and thus the system is then not identifiable. Further, it is discussed under what additional conditions identifiability prevails. The methodology provided by the factor analysis modeling and by the Frisch scheme are also described.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Torsten Söderström .

Appendices

Appendix

2.A Further Details

2.1.1 2.A.1 Further Results for Line Fitting

This appendix presents a closer analysis of the line-fitting example introduced in this chapter. Some various situations are considered as examples.

Example 2.3

Consider the case of a scalar x and a scalar y, and assume that

$$\begin{aligned} y_0 = a_0 x_0 \end{aligned}$$
(2.76)

holds.

Now examine the identifiability properties from first- and second-order moments of the observed data. Let \(\tilde{x}_i\) and \(\tilde{y}_i\) have zero means. Let \(\left\{ x_0 \right\} _i\) all have mean m and variance \(\sigma \).

The first-order moments of the observations will be

$$\begin{aligned} m_y = \mathsf{E} \left\{ y_i \right\}= & {} a_0 m \;, \end{aligned}$$
(2.77)
$$\begin{aligned} m_x = \mathsf{E} \left\{ x_i \right\}= & {} m \;, \end{aligned}$$
(2.78)

while the second-order moments are

$$\begin{aligned} r_y = \mathrm{var} (y_i)= & {} a_0^2 \sigma + \lambda _y \;, \end{aligned}$$
(2.79)
$$\begin{aligned} r_x = \mathrm{var} (x_i)= & {} \sigma + \lambda _x \;, \end{aligned}$$
(2.80)
$$\begin{aligned} r_{yx} = \mathrm{cov} (y_i, x_i)= & {} a_0 \sigma \;. \end{aligned}$$
(2.81)

The left-hand sides of equations (2.77)–(2.81) can be estimated from the data in a straightforward manner. Therefore, to analyze identifiability, regard the left-hand sides of (2.77)–(2.81) as known, and these equations as the information available to determine the sought parameter \(a_0\). Apparently there are 5 equations for determining the 5 unknowns \(a_0,\ m,\ \sigma ,\ \lambda _y\), and \(\lambda _u\). A somewhat closer look reveals that if \(m \ne 0,\ a_0 \ne 0\), these equations have a unique solution, and hence, the system is then identifiable. On the other hand, if \(m=0\), then (2.77) no longer carries any information about \(a_0\), and the system becomes unidentifiable.

How would the estimates be determined in the case of \(m \ne 0\)? The slope a is estimated as

$$\begin{aligned} \hat{a} = \frac{ \frac{1}{N} \sum _i y_i }{\frac{1}{N} \sum _i x_i } \;. \end{aligned}$$
(2.82)

Based on the data model one then finds that

$$\begin{aligned} \hat{a} - a_0 = \frac{\frac{1}{N} \sum _i \left( y_i - a_0 x_i \right) }{\frac{1}{N} \sum _i x_i } = \frac{ \frac{1}{N} \sum _i \left( \tilde{y}_i - a_0 \tilde{x}_i \right) }{ \frac{1}{N} \sum _i \left( x_{0i} + \tilde{x}_i \right) } \;. \end{aligned}$$
(2.83)

If the number of points, N, becomes large, then noise terms such as \( (1/N) \sum _i \tilde{y}_i\) and \( (1/N) \sum _i \tilde{x}_i\) will converge to zero. Assuming that \( (1/N) \sum _i x_{0i} \) has a nonzero limit will therefore lead to a consistent estimate of the parameter a. \(\blacksquare \)

If the data are not Gaussian, use of higher-order moments can be used to improve the identifiability properties. This is illustrated with an example.

Example 2.4

Reconsider the setup of Example 2.3 and assume that \(m=0\). Then the system is not identifiable from (first- and) second-order moments. Assume that \(\tilde{x}_i, \tilde{y}_i\) are Gaussian, with zero mean, so

$$\begin{aligned} \mathsf{E} \left\{ \tilde{x}_i^2 \right\} = \lambda _x, {\quad } \mathsf{E} \left\{ \tilde{x}_i^4\right\} = 3 \lambda _x^2, {\quad } \mathsf{E} \left\{ \tilde{y}_i^2\right\} = \lambda _y, {\quad } \mathsf{E} \left\{ \tilde{y}_i^4\right\} = 3 \lambda _y^2 \;.\quad \end{aligned}$$
(2.84)

Assume further that \(x_{oi}\) has a symmetric distribution around \(x=0\) and that it is not Gaussian. Then

$$\begin{aligned} \mathsf{E} \left\{ x_{0i}\right\} = 0, {\quad } \mathsf{E} \left\{ x_{0i}^2\right\} = \sigma , {\quad } \mathsf{E} \left\{ x_{0i}^4\right\} = 3 \sigma ^2 + \gamma \;, \end{aligned}$$
(2.85)

where \(\gamma \ne 0\). Now express the second- and fourth-order moments of the data:

$$\begin{aligned} r_y = \mathsf{E} \left\{ y_i^2 \right\}= & {} a_0^2 \sigma + \lambda _y \;, \end{aligned}$$
(2.86)
$$\begin{aligned} r_x = \mathsf{E} \left\{ x_i^2 \right\}= & {} \sigma + \lambda _x \; , \end{aligned}$$
(2.87)
$$\begin{aligned} r_{yx} = \mathsf{E} \left\{ y_i x_i\right\}= & {} a_0 \sigma \;, \end{aligned}$$
(2.88)
$$\begin{aligned} v_y = \mathsf{E} \left\{ y_i^4 \right\}= & {} \mathsf{E} \left\{ \left[ a_0 x_{0i} + \tilde{y}_i \right] ^4 \right\} \nonumber \\= & {} a_0^4 (3 \sigma ^2 + \gamma ) + 6 a_0^2 \sigma \lambda _y + 3 \lambda _y^2 \nonumber \\= & {} a_0^4 \gamma + 3 (a_0^2 \sigma + \lambda _y)^2 \;, \end{aligned}$$
(2.89)
$$\begin{aligned} v_x = \mathsf{E} \left\{ x_i^4 \right\}= & {} \mathsf{E} \left\{ \left[ x_{0i} + \tilde{x}_i \right] ^4 \right\} \nonumber \\= & {} 3 \sigma ^2 + \gamma + 6 \sigma \lambda _x + 3 \lambda _x^2 \nonumber \\= & {} \gamma + 3 (\sigma + \lambda _x)^2 \;, \end{aligned}$$
(2.90)
$$\begin{aligned} v_{yx} = \mathsf{E} \left\{ y_i^2 x_i^2 \right\}= & {} \mathsf{E} \left\{ \left[ a_0 x_{0i} + \tilde{y}_i \right] ^2 \left[ x_{0i} + \tilde{x}_i \right] ^2 \right\} \nonumber \\= & {} a_0^2 (3 \sigma ^2 + \gamma ) + \lambda _y \sigma + \lambda _x a_0^2 \sigma + \lambda _x \lambda _y \;. \end{aligned}$$
(2.91)

The left-hand sides of (2.86)–(2.91) can be determined from data with arbitrarily good accuracy when \(N \rightarrow \infty \).

Keep \(a_0\) as an unknown for the moment, and use (2.88) to solve for \(\sigma \), (2.86) to solve for \(\lambda _y\), (2.87) to solve for \(\lambda _x\), and (2.90) to solve for \(\gamma \). This gives

$$\begin{aligned} \begin{array}{rcl} \sigma &{} = &{} r_{yx}/a_0 \;, \\ \lambda _y &{} = &{} r_y - a_0 r_{yx} \;, \\ \lambda _x &{} = &{} r_x - r_{yx}/a_0 \;, \\ \gamma &{} = &{} v_x - 3 r_x^2 \;. \end{array} \end{aligned}$$
(2.92)

Now use (2.92) into (2.89) and (2.91) to determine the remaining unknown, namely \(a_0\), which after some simplification gives

$$\begin{aligned} v_y= & {} a_0^4 (v_x - 3 r_x^2) + 3 r_y^2 \;, \end{aligned}$$
(2.93)
$$\begin{aligned} v_{yx}= & {} a_0^2 (v_x - 3 r_x^2) + (2 r_{yx}^2 + r_x r_y) \;. \end{aligned}$$
(2.94)

As (2.93) and (2.94) are functions of even powers of \(a_0\), it is found that \( a_0^2\) can be determined uniquely, but not the sign of \(a_0\). \(\blacksquare \)

Example 2.5

For the problem treated in Example 2.3, it holds

$$\begin{aligned} | \hat{a}_\mathrm{LS} | \le | \hat{a}_\mathrm{TLS} | \le | \hat{a}_\mathrm{DLS} | \;. \end{aligned}$$
(2.95)

See also the numerical example of Sect. 2.1. Introduce the notations

$$\begin{aligned} \hat{r}_x = \frac{1}{N} \sum _{i=1}^N x_i^2, \ \ \hat{r}_y = \frac{1}{N} \sum _{i=1}^N y_i^2, \ \ \hat{r}_{yx} = \frac{1}{N} \sum _{i=1}^N y_i x_i \;. \end{aligned}$$
(2.96)

Now prove (2.95) for the case \(\hat{r}_{yx} > 0\). Then all estimates in (2.96) take positive values. It holds

$$\begin{aligned} \hat{a}_\mathrm{LS} = \frac{ \hat{r}_{yx} }{ \hat{r}_x }, \ \ \hat{a}_\mathrm{DLS} = \frac{ \hat{r}_{y} }{ \hat{r}_{yx} } \;. \end{aligned}$$
(2.97)

From the Cauchy–Schwarz inequality it holds

$$\begin{aligned} \hat{r}_{yx}^2 \le \hat{r}_x \hat{r}_{y} \;, \end{aligned}$$
(2.98)

and the relation

$$\begin{aligned} \hat{a}_\mathrm{LS} \le \hat{a}_\mathrm{DLS} \end{aligned}$$
(2.99)

follows directly.

The estimate \(\hat{a}_\mathrm{TLS}\) follows from (2.15). To avoid dealing with messy expressions including square roots, it is more convenient to utilize (2.14). One needs to prove that the criterion

$$\begin{aligned} V(a) = \frac{ \hat{r}_y + a^2 \hat{r}_x - 2 a \hat{r}_{yx} }{ 1 + a^2} \end{aligned}$$
(2.100)

which has a minimum for \(a = \hat{a}_\mathrm{TLS}\), behaves as in Fig. 2.5.

Fig. 2.5
figure 5

The criterion V(a) versus a

To verify (2.95), it then is enough to show that

$$\begin{aligned} V'(\hat{a}_\mathrm{LS}) \le 0, \ \ V'(\hat{a}_\mathrm{DLS}) \ge 0 \;. \end{aligned}$$
(2.101)

However, straightforward differentiation gives

$$\begin{aligned} V'(a)= & {} \frac{ (2 a \hat{r}_x - 2 \hat{r}_{yx}) (1 + a^2) - 2 a (\hat{r}_y + a^2 \hat{r}_x - 2 a \hat{r}_{yx}) }{ (1 + a^2)^2 } \nonumber \\= & {} \frac{ 2 }{ (1+ a^2)^2 } \left( a^2 \hat{r}_{yx} + a (\hat{r}_x - \hat{r}_y) - \hat{r}_{yx} \right) \;, \end{aligned}$$
(2.102)
$$\begin{aligned} V'(\hat{a}_\mathrm{LS})= & {} \frac{ 2 }{ (1+ \hat{a}^2 _\mathrm{LS})^2 } \left[ \frac{ \hat{r}_{yx}^3 }{ \hat{r}_x^2 } + \frac{ \hat{r}_{yx} }{ \hat{r}_x } (\hat{r}_x - \hat{r}_y) - \hat{r}_{yx} \right] \nonumber \\= & {} \frac{ 2 }{ (1+ \hat{a}^2 _\mathrm{LS})^2 } \frac{ \hat{r}_{yx} }{ \hat{r}_x^2 } \left( \hat{r}_{yx}^2 - \hat{r}_x \hat{r}_y \right) \le 0 \;, \end{aligned}$$
(2.103)
$$\begin{aligned} V'(\hat{a}_\mathrm{DLS})= & {} \frac{ 2 }{ (1+ \hat{a}^2 _\mathrm{DLS})^2 } \left[ \frac{ \hat{r}_y^2 }{ \hat{r}_{yx} } + \frac{ \hat{r}_y }{ \hat{r}_{yx} } (\hat{r}_x - \hat{r}_y) - \hat{r}_{yx} \right] \nonumber \\= & {} \frac{ 2 }{ (1+ \hat{a}^2 _\mathrm{DLS})^2 } \frac{ \hat{r}_x \hat{r}_y - \hat{r}_{yx}^2 }{ \hat{r}_{yx} } \ge 0 \;, \end{aligned}$$
(2.104)

which completes the proof of (2.101). See also Sect. 2.1.1. \(\blacksquare \)

Next evaluate the asymptotic expressions for the estimates \( \hat{a} _\mathrm{LS}\), \( \hat{a} _\mathrm{DLS}\), and \( \hat{a} _\mathrm{TLS}\) when N, the number of data points, tends to infinity.

Example 2.6

Assume

$$\begin{aligned} \mathsf{E} \left\{ x^2_{0i} \right\} = r \;. \end{aligned}$$
(2.105)

Then in the limit as \(N \rightarrow \infty \)

$$\begin{aligned} \begin{array}{rcl} r_x &{} = &{} r + \lambda _x \;, \\ r_y &{} = &{} a_0^2 r + \lambda _y \;, \\ r_{yx} &{} = &{} a_0 r \;. \end{array} \end{aligned}$$
(2.106)

Hence,

$$\begin{aligned} \hat{a}_\mathrm{LS}= & {} \frac{ a_0 r}{ r + \lambda _x} = a_0 + \frac{ - a_0 \lambda _x}{ r + \lambda _x} \;, \end{aligned}$$
(2.107)
$$\begin{aligned} \hat{a}_\mathrm{DLS}= & {} \frac{ a_0^2 r + \lambda _y }{ a_0 r } = a_0 + \frac{ \lambda _y}{ a_0 r } \;. \end{aligned}$$
(2.108)

Apparently, both \(\hat{a}_\mathrm{LS}\) and \(\hat{a}_\mathrm{DLS}\) differ from \(a_0\).

The estimate \(\hat{a}_\mathrm{TLS}\) is the solution to, cf. (2.15),

$$\begin{aligned} a^2 a_0 r + a \left( r + \lambda _x - a_0 ^2 r - \lambda _y \right) - a_0 r = 0 \;, \end{aligned}$$

which can be rearranged as

$$\begin{aligned} \left( a - a_0 \right) \left( 1 + a a_0 \right) r + a \left( \lambda _x - \lambda _y \right) = 0 \;. \end{aligned}$$
(2.109)

In particular one finds that \(\hat{a}_\mathrm{TLS} = a_0\) only if \( \lambda _x = \lambda _y\).

In summary, all the three estimates under consideration are biased and not consistent. \(\blacksquare \)

It is important to note that the line-fitting examples above, even if very simple, still are a bit special. A specific property is that the line is constrained to go through the origin. Another way to express this constraint is to add the origin as a further given data point with no measurement error. Now generalize to an arbitrary straight line.

Example 2.7

Consider the same situation as in Example 2.3, but allow the line to pass outside the origin; that is, the model is changed to

$$\begin{aligned} y = a x + b \;, \end{aligned}$$
(2.110)

and the data is assumed to fulfill

$$\begin{aligned} \begin{array}{rcl} y_i &{} = &{} y_{0i} + \tilde{y}_i \;, \\ x_i &{} = &{} x_{0i} + \tilde{x}_i \;, \\ y_{0i} &{} = &{} a_0 x_{0i} + b_0 \;, \end{array} i = 1, \ldots , N \;. \end{aligned}$$
(2.111)

To examine the identifiability properties from first- and second-order moments, let \(x_{0i}\) have mean m and variance \(\sigma \). Let the noise terms \(\tilde{y}_i\) and \(\tilde{x}_i\) be independent of \(x_{0j}\) (for all i and j) and have zero means and variances \(\lambda _y\) and \(\lambda _x\), respectively. Consider the equations

$$\begin{aligned} m_y = \mathsf{E} \left\{ y_i \right\}= & {} b_0 + a_0 m \;, \end{aligned}$$
(2.112)
$$\begin{aligned} m_x = \mathsf{E} \left\{ x_i \right\}= & {} m \;, \end{aligned}$$
(2.113)

while the second-order moments are

$$\begin{aligned} r_y = \mathrm{var} (y_i)= & {} a_0^2 \sigma + \lambda _y \;, \end{aligned}$$
(2.114)
$$\begin{aligned} r_x = \mathrm{var} (x_i)= & {} \sigma + \lambda _x \;, \end{aligned}$$
(2.115)
$$\begin{aligned} r_{yx} = \mathrm{cov} (y_i, x_i)= & {} a_0 \sigma \;. \end{aligned}$$
(2.116)

Now there are 6 unknowns (\(a_0,\ b_0,\ m,\ \sigma ,\lambda _x\), and \(\lambda _y\)) and still only 5 equations. Hence identifiability is lost.

To express an attempt for estimation, consider the least squares estimate of \( \varvec{\theta }= \left( \begin{array}{cc} a_0&b_0 \end{array} \right) ^ T \):

$$\begin{aligned} \hat{\varvec{\theta }}_\mathrm{LS} = \mathrm{arg \ } \min _{\theta } \frac{1}{N} \sum _{i=1}^ N \left( y_i - a x_i - b \right) ^ 2, \end{aligned}$$
(2.117)

which leads to the normal equations

$$\begin{aligned} \frac{1}{N} \left( \begin{array}{cc} \sum _i x_i^2 &{} \sum _i x_i \\ \sum _i x_i &{} \sum _i 1 \end{array} \right) \left( \begin{array}{c} a \\ b \end{array} \right) = \frac{1}{N} \left( \begin{array}{c} \sum _i x_i y_i \\ \sum _i y_i \end{array} \right) . \end{aligned}$$
(2.118)

In the limit when \(N \rightarrow \infty \), the solution to (2.118) becomes

$$\begin{aligned} \left( \begin{array}{c} \hat{a} \\ \hat{b} \end{array} \right)= & {} \left( \begin{array}{cc} m^2 + \sigma + \lambda _x &{} m \\ m &{} 1 \end{array} \right) ^{-1} \left( \begin{array}{c} a_0 (m^2 + \sigma ) + b_0 m \\ a_0 m + b_0 \end{array} \right) \nonumber \\= & {} \left( \begin{array}{c} a_0 \\ b_0 \end{array} \right) + \frac{ a_0 \lambda _x}{ \sigma + \lambda _x } \left( \begin{array}{c} -1 \\ m \end{array} \right) . \end{aligned}$$
(2.119)

It is apparent that the estimate is biased, also in the asymptotic case. This is an illustration of what is already known: the system is not identifiable. \(\blacksquare \)

Can the situation improve in the multivariable case? Unfortunately, the answer is negative, as shown in the following example.

Example 2.8

Consider the general affine case, where \(\mathbf x\) is a vector of dimension \(n_x\) and \(\mathbf y\) is a vector of dimension \(n_y\). Postulate the model

$$\begin{aligned} \begin{array}{rcl} \mathbf y_i &{} = &{} \mathbf y_{0i} + \tilde{\mathbf y}_i \;, \\ \mathbf x_i &{} = &{} \mathbf x_{0i} + \tilde{\mathbf x}_i \;, \\ \mathbf y_{0i} &{} = &{} \mathbf A_0 \mathbf x_{0i} + \mathbf b_0 \;, \end{array} i = 1, \ldots , N \,, \end{aligned}$$
(2.120)

where \(\mathbf b_0\) is an \(n_y\)-dimensional vector, and \(\mathbf A_0\) is an \(n_y \times n_x\)-dimensional matrix. Let \(\mathbf x_{0i}\) have mean \(\mathbf m\) and a positive definite covariance matrix \(\mathbf {R}_x\), and let \(\tilde{\mathbf {x}}_i\) have zero mean and covariance matrix \(\varvec{\varLambda }_{\mathbf x} > 0\). Similarly, let \(\tilde{\mathbf y}_i\) have zero mean and covariance matrix \(\varvec{\varLambda }_{\mathbf y} > 0\). For this case the first-order moments become, comparing (2.112) and (2.113),

$$\begin{aligned} \mathsf{E} \left\{ \mathbf y_i \right\}= & {} \mathbf b_0 + \mathbf A_0 \mathbf m\;, \end{aligned}$$
(2.121)
$$\begin{aligned} \mathsf{E} \left\{ \mathbf x_i \right\}= & {} \mathbf m\;, \end{aligned}$$
(2.122)

while the second-order moments become

$$\begin{aligned} \mathrm{cov} (\mathbf y_i)= & {} \mathbf A_0 \mathbf {R}_{\mathbf x} \mathbf A_0^ T + \varvec{\varLambda }_{\mathbf y} \;, \end{aligned}$$
(2.123)
$$\begin{aligned} \mathrm{cov} (\mathbf x_i)= & {} \mathbf {R}_{\mathbf x} + \varvec{\varLambda }_{\mathbf x} \;, \end{aligned}$$
(2.124)
$$\begin{aligned} \mathrm{cov} (\mathbf y_i, \mathbf x_i)= & {} \mathbf A_0 \mathbf {R}_{\mathbf x} \;. \end{aligned}$$
(2.125)

The number of equations in (2.121)–(2.125) is altogether (taking the symmetry of (2.123) and (2.124) into account)

$$\begin{aligned} \# \mathrm{equations \ }= & {} n_y + n_x + \frac{ n_y (n_y + 1) }{2} + \frac{ n_x (n_x + 1) }{2} + n_y n_x \nonumber \\= & {} \frac{n_x ^2 }{2} + \frac{ 3 n_x}{2} + \frac{n_y ^2 }{2} + \frac{ 3 n_y}{2} + n_y n_x \; . \end{aligned}$$
(2.126)

As both \(\mathbf {R}_x\) and \(\varvec{\varLambda }_x\) have \(n_x(n_x+1)/2\) unknowns each, the total number of unknowns (for \(\mathbf A_0,\ \mathbf b_0,\ \mathbf m,\ \varvec{\varLambda }_y,\ \mathbf {R}_x,\ \varvec{\varLambda }_x\)) becomes

$$\begin{aligned} \# \mathrm{unknowns \ }= & {} n_y n_x + n_y + n_x + \frac{ n_y (n_y + 1) }{2} + 2 \times \frac{ n_x (n_x + 1) }{2} \nonumber \\= & {} n_x^2 + 2 n_x + \frac{n_y^2}{2} + \frac{ 3 n_y}{2} + n_y n_x \;, \end{aligned}$$
(2.127)

which apparently always exceeds the number of equations given in (2.126). More precisely, the number of degrees of freedom in the solution will be

$$\begin{aligned} \# \mathrm{unknowns \ } - \# \mathrm{equations \ } = \frac{ n_x (n_x + 1) }{2} \;. \end{aligned}$$
(2.128)

Therefore, the system is not identifiable. The degrees of freedom in the solution will grow with the dimension of \(\mathbf x\). \(\blacksquare \)

In the above examples, the parameters of the model (the parameters a and b) were considered to be estimated from second-order moments of the data (the time series \(\{x_i\}\) and \(\{y_i\}\). The unknown x coordinates were characterized by their mean (m) and variance (\(\sigma \)). It is also possible, however, to change the setting and formulate another identification problem. This is done next.

Example 2.9

Assume that the measurement noise \(\tilde{x}_i, \ \tilde{y}_i\) are independent and Gaussian, and consider the maximum likelihood (ML) estimation of the unknowns. The unknowns are a and b in the model (2.110). In addition, choose to treat the noise-free x-values \(\{ x_{0i} \}_{i= 1}^N\) as auxiliary unknowns.

The likelihood function L turns out to satisfy

$$\begin{aligned} \log ( L )= & {} - \frac{1}{2 \lambda _x} \sum _{i=1}^N \left( x_i - x_{0i} \right) ^2 - \frac{1}{2 \lambda _y} \sum _{i=1}^N \left( y_i - a x_{0i} -b \right) ^2 \nonumber \\&- \frac{N}{2} \log \lambda _x - \frac{N}{2} \log \lambda _y - \frac{N}{2} \log (2 \pi ) \;. \end{aligned}$$
(2.129)

The ML estimate is obtained as the parameters that maximize the likelihood function.

However, in this case, if L is reconsidered for \(x_{0i} = x_i,\ i = 1, \ldots N\), then apparently \(L \rightarrow \infty \), when \(\lambda _x \rightarrow 0\). Hence the optimization problem is not well posed. In fact, the ML estimate does not exist here, and the parameters are not identifiable. \(\blacksquare \)

Return to the situation treated in Example 2.9. One might believe that the root of the identifiability problem is that all the N unknowns \(\{x_{oi}\}\) need to be determined as auxiliary unknowns. This is not so, as shown next.

Example 2.10

Reconsider the situation treated in Example 2.9, but assume that the noise ratio

$$\begin{aligned} \lambda _y / \lambda _x = r \end{aligned}$$
(2.130)

is known.

Then the parameters a and b are identifiable. This can be realized in three different ways.

Geometrically, the assumption (2.130) means that one can scale the measurements so that the uncertainties in the x and y directions are the same. Then it will be feasible to apply orthogonal regression, given by the total least squares estimate.

Treating the problem instead algebraically, one has now 6 equations (namely (2.112)–(2.116), (2.130)) and still 6 unknowns. As the equations are nonlinear, one needs a more detailed examination to conclude whether or not there is a unique solution with respect to \(a_0\) and \(b_0\). Eliminating first b, \(\lambda _y\) and m, one arrives at three equations for determining \(a, \ \sigma \), and \(\lambda _x\):

$$\begin{aligned} \left\{ \begin{array}{rcl} r_y &{} = &{} a^2 \sigma + r \lambda _x \;, \\ r_x &{} = &{} \sigma + \lambda _x \;, \\ r_{yx} &{} = &{} a \sigma \;. \end{array} \right. \end{aligned}$$
(2.131)

One can next solve for \(\sigma \) and \(\lambda _x\), and finally arrive at a second-order equation for a:

$$\begin{aligned} 0 = a^2 r_{yx} + a (r r_x - r_ y) - r r_{yx} \;. \end{aligned}$$
(2.132)

This equation has two roots: one equal to the true value \(a = a_0\), and one false, \( a = - r/ a_0\). The false one can though be neglected as it leads to a negative estimate of the variable \(\sigma \), which must be constrained to be positive as it stands for a variance.

For a statistical treatment of the problem, make use of the (2.130) in the logarithm of the likelihood function, (2.129) which now becomes (removing a trivial constant term)

$$\begin{aligned} \log (L) =&- \frac{1}{2 \lambda _x} \sum _{i=1}^N \left( x_i - x_{0i} \right) ^2 - \frac{1}{2 r \lambda _x} \sum _{i=1}^N \left( y_i - a x_{0i} -b \right) ^2 \nonumber \\&- \frac{N}{2} \log \lambda _x - \frac{N}{2} \log (r \lambda _x) - \frac{N}{2} \log (2 \pi ) \nonumber \\ {\mathop { = }\limits ^{\varDelta }}&- \frac{N}{2 \lambda _x} V(a,b, x_{0i}) - \frac{N}{2} \left[ 2 \log (\lambda _x) + \log (r) \right] - \frac{N}{2} \log (2 \pi ) \;.\nonumber \\ \end{aligned}$$
(2.133)

where

$$\begin{aligned} V(a, b, x_{0i}) = \frac{1}{N} \sum _{i=1}^N \left[ (x_i - x_{0i})^2 + \frac{1}{r} (y_i - a x_{0i} - b)^2 \right] \;. \end{aligned}$$
(2.134)

One can then find the ML estimates of \(a, \ b\) and \(x_{0i}, \ i = 1, \ldots N \) by minimizing \(V(a, b, x_{0i})\). First minimize with respect to \(x_{0i}\):

$$\begin{aligned} 0 = \frac{ \partial V}{\partial x_{0i} } = - 2 (x_i - x_{0i}) - \frac{2 a}{r} ( y_i - a x_{0i} - b) \; , \end{aligned}$$
(2.135)

leading directly to

$$\begin{aligned} x_{0i}= & {} \frac{r}{a^2 + r} x_i + \frac{a}{a^2 + r} (y_i - b) \;, \end{aligned}$$
(2.136)
$$\begin{aligned} x_i - x_{0i}= & {} \frac{a}{a^2 + r} \left( a x_i - y_i + b \right) \;, \end{aligned}$$
(2.137)
$$\begin{aligned} y_i - a x_{0i} - b= & {} \frac{r}{a^2 + r} \left( y_i - a x_i - b \right) \;. \end{aligned}$$
(2.138)

Inserting this into (2.134) leads to a criterion depending on just the primary parameters a and b,

$$\begin{aligned} \bar{V}_N(a, b) {\mathop {=}\limits ^{\varDelta }}&\min _{x_{0i}} V(a, b, x_{0i}) \nonumber \\ =&\frac{1}{N} \sum _{i=1}^N \frac{1}{ (a^2 + r)^2 } \left[ a^2 \left( a x_i - y_i + b \right) ^2 + \frac{1}{r} r^2 \left( y_i - a x_i - b \right) ^2 \right] \nonumber \\ =&\frac{1}{ (a^2 + r) } \frac{1}{N} \sum _{i=1}^N \left( y_i - a x_i - b \right) ^2 \;. \end{aligned}$$
(2.139)

The minimizing arguments of \(\bar{V}_N (a, b)\) is precisely the ML estimate of a and b. Compared to (2.14) one finds that using \(\bar{V}_N (a, b)\) is nothing but the orthogonal regression (or total least squares) criterion.

Letting the true values of the parameters be denoted \(a_0\) and \(b_0\), one gets in the asymptotic case when the number of observation data tends to infinity

$$\begin{aligned} \bar{V}_{\infty }(a,b)= & {} \lim _{N \rightarrow \infty } \bar{V}_N (a, b) \nonumber \\= & {} \frac{1}{a^2 + r} \lim _{N \rightarrow \infty } \frac{1}{N} \sum _{i= 1} ^N \left[ a_0 x_{0i} + b_0 + \tilde{y}_i - a (x_{i0} + \tilde{x}_i) - b \right] ^2 \nonumber \\= & {} \frac{1}{a^2 + r} \left[ \left( (a_0 - a) m \sigma + (b_0 - b) \right) ^2 + (a_0 - a)^2 \sigma + (r + a^2) \lambda _x \right] \;, \nonumber \\ \end{aligned}$$
(2.140)

which apparently has minimum for the true values,

$$\begin{aligned} a = a_0, \ \ b = b_0 \;. \end{aligned}$$

This analysis illustrates the consistency of the parameter estimates. \(\blacksquare \)

The estimate of the noise variance \(\lambda _x\) is treated next.

Example 2.11

It is of interest to continue Example 2.10 to find also the estimate of the noise variance \(\lambda _x\). The estimate is the minimization of (2.133)

$$\begin{aligned} \hat{\lambda }_x = \mathrm{arg} \min _{\lambda _x} \log (L) \;. \end{aligned}$$
(2.141)

Direct differentiation of (2.133) leads to

$$\begin{aligned} \frac{N}{2 \hat{\lambda }_x^2 } V(\hat{a}, \hat{b}, \hat{x}_{0i}) - \frac{N}{\hat{\lambda }_x} = 0 \;, \end{aligned}$$

and hence

$$\begin{aligned} \hat{\lambda }_x = \frac{1}{2} V(\hat{a}, \hat{b}, \hat{x}_{0i}) = \frac{1}{2} \bar{V}_N (\hat{a}, \hat{b}) \;. \end{aligned}$$

In the asymptotic case therefore

$$\begin{aligned} \lim _{N \rightarrow \infty } \hat{\lambda }_x = \frac{1}{2} \bar{V}_{\infty } (a_0, b_0) = \frac{1}{2} \lambda _x \;. \end{aligned}$$
(2.142)

It is striking that the factor 1 / 2 appears in (2.142) and that the noise variance estimate is not consistent. This is a classical result; see, for example, Lindley (1947). An intuitive explanation may be that both the number of observed data and the number of free variables increases linearly with N. This makes the ML problem differ from the ‘standard’ case, where the number of independent parameters does not depend on N. \(\blacksquare \)

So far, linear models have been considered. When even a simple nonlinearity is introduced, the complexity is increased. This is illustrated in the next example.

Example 2.12

Consider the following generalization of Example 2.7.

$$\begin{aligned} \begin{array}{rcl} y_i &{} = &{} y_{0i} + \tilde{y}_i \;, \\ x_i &{} = &{} x_{0i} + \tilde{x}_i \;, \\ y_{0i} &{} = &{} a x_{0i}^2 + b x_{0i} + c \;, \end{array} \ \ \ i = 1, \dots , N \;. \end{aligned}$$
(2.143)

Assume that \(\tilde{y}_i\) and \(\tilde{x}_i\) are zero mean Gaussian noise with variance \(\lambda _y\) and \(\lambda _x\), respectively. Further, let \(x_{0i}\) have zero mean and a symmetric distribution. Set

$$\begin{aligned} r_k = \mathsf{E} \left\{ x_{0i}^k \right\} \;. \end{aligned}$$
(2.144)

Obviously, \(r_k = 0\) for k odd.

The first-order moments of the data (they can of course be estimated from time records) are given by

$$\begin{aligned} \mathsf{E} \left\{ y_i \right\}= & {} \mathsf{E} \left\{ y_{0i} \right\} = a r_2 + c \;, \end{aligned}$$
(2.145)
$$\begin{aligned} \mathsf{E} \left\{ u_i \right\}= & {} 0 \;. \end{aligned}$$
(2.146)

Of course, equation (2.146) does not bring any information at all. From first-order moments, there is hence 1 equation (namely (2.145)) and 4 unknowns (namely a, b, c, \(r_2\)).

Considering second-order moments of the data leads to the additional equations

$$\begin{aligned} \mathsf{E} \left\{ y_i^2 \right\}= & {} \mathsf{E} \left\{ y_{0i}^2 \right\} + \lambda _y = a^2 r_4 + b^2 r_2 + c^2 + 2 a c r_2 + \lambda _y \;, \end{aligned}$$
(2.147)
$$\begin{aligned} \mathsf{E} \left\{ y_i x_i \right\}= & {} \mathsf{E} \left\{ y_{0i} x_{0i} \right\} = b r_2 \;, \end{aligned}$$
(2.148)
$$\begin{aligned} \mathsf{E} \left\{ x_{i}^2 \right\}= & {} r_2 + \lambda _x \;. \end{aligned}$$
(2.149)

Considering now equations (2.145)–(2.149), there are 4 useful equations but 7 unknowns (a, b, c, \(r_2\), \(r_4\), \(\lambda _y\), \(\lambda _u\)). Hence there is still three degrees of freedom in the solution (leaving aside whether or not there is any further ambiguity when solving for the unknowns.)

By adding some further moments one gets also

$$\begin{aligned} \mathsf{E} \left\{ y_i x_i^2 \right\}= & {} \mathsf{E} \left\{ y_{0i} \left( x_{0i}^2 + \tilde{x}_i^2 \right) \right\} = \lambda _x (a r_2 + c) + a r_4 + c r_2 \;, \end{aligned}$$
(2.150)
$$\begin{aligned} \mathsf{E} \left\{ x_i^4 \right\}= & {} \mathsf{E} \left\{ x_{0i}^4 + 6 x_{0i}^2 \tilde{x}_i^2 + \tilde{x}_i^4 \right\} = r_4 + 6 r_2 \lambda _x + 3 \lambda _x^2 \;, \end{aligned}$$
(2.151)
$$\begin{aligned} \mathsf{E} \left\{ y_i^2 x_i \right\}= & {} \mathsf{E} \left\{ y_{0i}^2 x_{0i} \right\} = 2 a b r_4 + 2 b c r_2 \;. \end{aligned}$$
(2.152)

Then the number of equations becomes equal to the number of unknowns (and equal to 7).

As the purpose of the example is to illustrate that the complexity of the equations grows rapidly when nonlinear models are treated, the issue whether or not the above 7 equations in this example has a unique solution is not considered here. \(\blacksquare \)

Needless to say, a parabola as treated in Example 2.12 is an extremely simple case of a nonlinear static model. To base an estimate for a nonlinear EIV model in general on higher-order moments of the data will soon become very complex when more advanced parameterizations are considered, if at all possible.

2.1.2 2.A.2 Consistency of the CFA Estimate

Consider here the general CFA-based estimator introduced in Sect. 2.2. In order to emphasize the influence of the sample length N, introduce the notations \(V_N\) for the loss function used and \(\hat{\mathbf {R}}_N\) for the sample covariance matrix of the data.

Given the model, assume the model to be identifiable. This means that

$$\begin{aligned} \mathbf {R}(\varvec{\theta }) = \mathbf {R}(\varvec{\theta }_0) \ \ \Rightarrow \ \ \varvec{\theta }= \varvec{\theta }_0 \;. \end{aligned}$$
(2.153)

According to the general assumptions, the data is assumed ergodic, and then

$$\begin{aligned} \lim _{N \rightarrow \infty } \hat{\mathbf {R}}_N \rightarrow \mathbf {R}_{\infty } = \mathbf {R}(\varvec{\vartheta }_0) \;. \end{aligned}$$
(2.154)

Consistency is hence essentially proved if the asymptotic loss function \(V_{\infty }(\varvec{\vartheta })\) has a global minimum for \(\varvec{\vartheta }= \varvec{\vartheta }_0\). This means that the following inequality holds true

$$\begin{aligned} V_{\infty } (\varvec{\vartheta }) \ge V_{\infty } (\varvec{\vartheta }_0) \;. \end{aligned}$$
(2.155)
  • For the criteria \(V_2(\varvec{\vartheta })\), (2.49), and \(V_3(\varvec{\vartheta })\), (2.51), it is trivial to see that (2.155) applies.

  • For the criterion \(V_1(\varvec{\vartheta })\), (2.48), the inequality (2.155) turns out to be equivalent to

    $$\begin{aligned} \mathrm{tr}\left( \mathbf {R}(\varvec{\vartheta }_0) \mathbf {R}^{-1}(\varvec{\vartheta }) \right) + \log (\det \mathbf {R}(\varvec{\vartheta }))\ge & {} n + \log (\det \mathbf {R}(\varvec{\vartheta }_0)) \Leftrightarrow \nonumber \\ \mathrm{tr}\left( \mathbf {R}(\varvec{\vartheta }_0) \mathbf {R}^{-1}(\varvec{\vartheta }) \right) + \ \log (\det \mathbf {R}(\varvec{\vartheta }) \mathbf {R}^{-1}(\varvec{\vartheta }_0))\ge & {} n \;. \end{aligned}$$
    (2.156)

    Now set

    $$\begin{aligned} \mathbf S= \mathbf {R}^{1/2}(\varvec{\vartheta }_0) \mathbf {R}^{-1}(\varvec{\vartheta }) \mathbf {R}^{1/2}(\varvec{\vartheta }_0) \;, \end{aligned}$$
    (2.157)

    which is positive definite by construction. Let its eigenvalues be denoted \(\lambda _1, \ldots , \lambda _n\). The relation (2.156) is equivalent to

    $$\begin{aligned}&\mathrm{tr} (\mathbf S) - \log (\det \mathbf S) \ge n \Leftrightarrow \nonumber \\&\sum _{i=1}^n \lambda _i - \log \left( \prod _{i=1}^n \lambda _i \right) \ge n \Leftrightarrow \nonumber \\&\sum _{i=1}^n \left[ \lambda _i - \log (\lambda _i) -1 \right] \ge 0 \;, \end{aligned}$$
    (2.158)

    which holds true, as

    $$\begin{aligned} \lambda > 0 \Rightarrow \log (\lambda ) \le \lambda - 1 \;. \end{aligned}$$
    (2.159)

    Note that in this analysis it is crucial that \(\lambda \) is positive. This illustrates also that in the numerical search for minimizing \(V(\varvec{\theta })\) only values of \(\varvec{\vartheta }\) that keeps \(\mathbf {R}(\varvec{\vartheta })\) positive definite must be considered. In case values of \(\varvec{\vartheta }\) making the matrix \(\mathbf {R}(\varvec{\vartheta })\) indefinite were allowed, lower values than the right-hand side of (2.155) could be obtained.

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Söderström, T. (2018). The Static Case. In: Errors-in-Variables Methods in System Identification. Communications and Control Engineering. Springer, Cham. https://doi.org/10.1007/978-3-319-75001-9_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-75001-9_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-75000-2

  • Online ISBN: 978-3-319-75001-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics