Skip to main content

Asymptotic Normality of Nonlinear Least Squares under Singular Experimental Designs

  • Chapter
Optimal Design and Related Areas in Optimization and Statistics

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 28))

Summary

We study the consistency and asymptotic normality of the LS estimator of a function h(θ) of the parameters θ in a nonlinear regression model with observations \(y_i=\eta(x_i,\theta) +\varepsilon_i\), \(i=1,2\ldots\) and independent errors ε i . Optimum experimental design for the estimation of h(θ) frequently yields singular information matrices, which corresponds to the situation considered here. The difficulties caused by such singular designs are illustrated by a simple example: depending on the true value of the model parameters and on the type of convergence of the sequence of design points \(x_1,x_2\ldots\) to the limiting singular design measure ξ, the convergence of the estimator of h(θ) may be slower than \(1/\sqrt{n}\), and, when convergence is at a rate of \(1/\sqrt{n}\) and the estimator is asymptotically normal, its asymptotic variance may differ from that obtained for the limiting design ξ (which we call irregular asymptotic normality of the estimator). For that reason we focuss our attention on two types of design sequences: those that converge strongly to a discrete measure and those that correspond to sampling randomly from ξ. We then give assumptions on the limiting expectation surface of the model and on the estimated function h which, for the designs considered, are sufficient to ensure the regular asymptotic normality of the LS estimator of h(θ).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Atkinson, A. Donev, A. (1992). Optimum Experimental Design. Oxford University Press, NY, USA.

    Google Scholar 

  • Bierens, H. (1994). Topics in Advanced Econometrics. Cambridge University Press, Cambridge.

    Book  MATH  Google Scholar 

  • Billingsley, P. (1971). Weak Convergence of Measures: Applications in Probability. SIAM, Philadelphia.

    MATH  Google Scholar 

  • Elfving, G. (1952). Optimum allocation in linear regression. The Annals of Mathematical Statistics, 23, 255–262.

    Article  MATH  MathSciNet  Google Scholar 

  • Fedorov, V. (1972). Theory of Optimal Experiments. Academic Press, New York.

    Google Scholar 

  • Gallant, A. (1987). Nonlinear Statistical Models. Wiley, New York.

    Book  MATH  Google Scholar 

  • Hero, A., Fessler, J., Usman, M. (1996). Exploring estimator bias-variance tradeoffs using the uniform CR bound. IEEE Transactions on Signal Processing, 44, 2026–2041.

    Article  Google Scholar 

  • Ivanov, A. (1997). Asymptotic Theory of Nonlinear Regression. Kluwer, Dordrecht.

    MATH  Google Scholar 

  • Jennrich, R. (1969). Asymptotic properties of nonlinear least squares estimation. The Annals of Mathematical Statistics, 40, 633–643.

    Article  MATH  MathSciNet  Google Scholar 

  • Kiefer, J. Wolfowitz, J. (1959). Optimum designs in regression problems. The Annals of Mathematical Statistics, 30, 271–294.

    Article  MATH  MathSciNet  Google Scholar 

  • Lehmann, E. Casella, G. (1998). Theory of Point Estimation. Springer, Heidelberg.

    MATH  Google Scholar 

  • Pázman, A. (1980). Singular experimental designs. Math. Operationsforsch. Statist., Ser. Statistics, 16, 137–149.

    Google Scholar 

  • Pázman, A. (1986). Foundations of Optimum Experimental Design. Reidel (Kluwer group), Dordrecht (co-pub. VEDA, Bratislava).

    Google Scholar 

  • Pázman, A. Pronzato, L. (1992). Nonlinear experimental design based on the distribution of estimators. Journal of Statistical Planning and Inference, 33, 385–402.

    Article  MATH  MathSciNet  Google Scholar 

  • Pázman, A. Pronzato, L. (2006). On the irregular behavior of LS estimators for asymptotically singular designs. Statistics & Probability Letters, 76, 1089–1096.

    Article  MATH  MathSciNet  Google Scholar 

  • Pronzato, L. Pázman, A. (1994). Second-order approximation of the entropy in nonlinear least-squares estimation. Kybernetika, 30, (2)187–198. Erratum. 32(1):104, 1996.

    MATH  MathSciNet  Google Scholar 

  • Shiryaev, A. (1996). Probability. Springer, Berlin.

    Google Scholar 

  • Silvey, S. (1980). Optimal Design. Chapman & Hall, London.

    MATH  Google Scholar 

  • Sjöberg, J., Zhang, Q., Ljung, L., Benveniste, A., Delyon, B., Glorennec, P.-Y., Hjalmarsson, H., Juditsky, A. (1995). Nonlinear black-box modeling in system identification: a unified overview. Automatica, 31, (12)1691–1724.

    Article  MATH  MathSciNet  Google Scholar 

  • Spivak, M. (1965). Calculus on Manifolds. A Modern Approach to Classical Theorems of Advanced Calculus. W. A. Benjamin, Inc., New York.

    MATH  Google Scholar 

  • Stoica, P. (2001). Parameter estimation problems with singular information matrices. IEEE Transactions on Signal Processing, 49, 87–90.

    Article  MathSciNet  Google Scholar 

  • Wu, C.-F. (1980). Characterizing the consistent directions of least squares estimates. The Annals of Statistics, 8, (4)789–801.

    Article  MATH  MathSciNet  Google Scholar 

  • Wu, C.-F. (1981). Asymptotic theory of nonlinear least squares estimation. The Annals of Statistics, 9, (3)501–513.

    Article  MATH  MathSciNet  Google Scholar 

  • Wu, C.-F. (1983). Further results on the consistent directions of least squares estimators. The Annals of Statistics, 11, (4)1257–1262.

    MATH  MathSciNet  Google Scholar 

  • Wynn, H. (1972). Results in the theory and construction of D. -optimum experimental designs Journal of the Royal Statistical Society B, 34, 133–147.

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Pázman .

Editor information

Editors and Affiliations

Appendix. Proofs of Lemmas 1–3

Appendix. Proofs of Lemmas 1–3

Proof of Lemma 1. We can write

$$\begin{array}{l}\left| \frac{1}{N} \ \sum\limits_{k=1}^N a(x_k,\theta) \alpha_k - {\rm I}\!{\rm E}\{\alpha_1\} \ \sum\limits_{x\in S_\xi} a(x,\theta) \ \xi(\{x\}) \right| \\\leq \left| \frac{1}{N} \ \sum\limits_{k=1, x_k \not\in S_\xi}^N a(x_k,\theta) \alpha_k\right| \\\quad + \sum\limits_{x\in S_\xi} \sup_{\theta \in \Theta } \left| a(x,\theta) \right| \ \left| \frac{N(x)}{N} \left( \frac{1}{N(x)} \ \sum\limits_{k=1, \ x_k=x}^N \alpha_k \right) - {\rm I}\!{\rm E}\{\alpha_1\} \xi(\{x\}) \right|,\end{array}$$

where N(x)/N is the relative frequency of the point x in the sequence \(x_1,x_2,\ldots,x_N\). The last sum for \(x \in S_\xi\) tends to zero a.s. and uniformly on Θ, since N(x)/N tends to \(\xi(\{x\})\), and \([1/N(x)] \sum_{k=1, \ x_k=x}^N\alpha_k\) converges a.s. to \({\rm I}\!{\rm E}\{\alpha_1\}\). The first sum on the right-hand side is bounded by

$$\sup_{x\in {\mathcal X}, \ \theta \in \Theta}\left| a(x,\theta) \right| \ \frac{N({\mathcal X}\setminus S_\xi)}{N} \ \frac{1}{N({\mathcal X}\setminus S_\xi)} \ \sum\limits_{k=1,\,x_k\in{\mathcal X}\setminus S_\xi}\alpha_k.$$

This expression tends a.s. to zero, since \(N({\mathcal X} \setminus S_\xi)/N\) tends to zero, and the law of large numbers applies for the remaining part in case \(N({\mathcal X}\setminus S_\xi)\rightarrow \infty\).

Proof of Lemma 2. We use a construction similar to that in Bierens (1994, p. 43). Take some fixed \(\theta^1 \in \Theta\) and consider the set

$${\mathcal B}(\theta^1,\delta) = \left\{ \theta \in \Theta: \left\| \theta -\theta^1\right\| \leq \delta \right\}.$$

Define \(\bar {a}_\delta(z)\) and \(\underline{a}_\delta(z)\) as the maximum and the minimum of \(a(z,\theta)\) over the set \({\mathcal B}(\theta^1,\delta)\).

The expectations \({\rm I}\!{\rm E}\{|\underline{a}_\delta(z)|\}\) and \({\rm I}\!{\rm E}\{ |\bar{a}_\delta(z)|\}\) are bounded by

$${\rm I}\!{\rm E} \{ \max\limits_{\theta \in\Theta} |a(z,\theta)|\}<\infty.$$

Also, \(\bar{a}_\delta(z) - \underline{a}_\delta(z)\) is an increasing function of δ. Hence, we can interchange the order of the limit and expectation in the following expression

$$\lim_{\delta \searrow 0}\left[ {\rm I}\!{\rm E}\{ \bar{a}_\delta(z) \} - {\rm I}\!{\rm E}\{ \underline{a}_\delta(z)\} \right] = {\rm I}\!{\rm E}\left\{ \lim_{\delta \searrow 0} \left[ \bar{a}_\delta(z) - \underline{a}_\delta(z) \right] \right\} = 0,$$

which proves the continuity of \({\rm I}\!{\rm E}\{ a(z,\theta)\}\) at θ1 and implies

$$\forall \beta >0, \ \exists \delta(\beta) >0 \ \hbox{such that} \ \left| {\rm I}\!{\rm E}\{ \bar{a}_{\delta(\beta)}(z) \} - {\rm I}\!{\rm E}\{\underline{a}_{\delta(\beta)}(z) \} \right| < \frac{\beta}{2}.$$

Hence we can write for every \(\theta \in {\mathcal B}(\theta^1,\delta(\beta))\)

$$\begin{array}{rcl}\frac{1}{N} \sum\limits_k \underline{a}_{\delta(\beta) }(z_k) - {\rm I}\!{\rm E}\{\underline{a}_{\delta(\beta)}(z)\} -\frac{\beta}{2} & \leq & \frac{1}{N} \sum\limits_k \underline{a}_{\delta(\beta)}(z_k) - {\rm I}\!{\rm E}\{\bar{a}_{\delta(\beta)}(z)\} \\& \leq & \frac{1}{N} \sum\limits_k a(z_k,\theta) - {\rm I}\!{\rm E}\{a(z,\theta)\} \\& \leq & \frac{1}{N} \sum\limits_k \bar{a}_{\delta(\beta)}(z_k) - {\rm I}\!{\rm E}\{\underline{a}_{\delta(\beta)}(z) \} \\& \leq & \frac{1}{N} \sum\limits_k \bar{a}_{\delta(\beta)}(z_k) - {\rm I}\!{\rm E}\{\bar{a}_{\delta(\beta)}(z)\} + \frac{\beta}{2}.\end{array}$$

From the strong law of large numbers, we have that \(\forall \gamma>0\), \(\exists N_1(\beta,\gamma)\) such that

$$\begin{array}{rcl}{\rm Prob}\left\{ \forall N>N_1(\beta,\gamma), \ \left| \frac{1}{N} \sum\limits_k \bar{a}_{\delta(\beta)}(z_k) - {\rm I}\!{\rm E}\{\bar{a}_{\delta(\beta)}(z)\} \right| < \frac{\beta}{2} \right\} & > & 1-\frac{\gamma}{2}, \\{\rm Prob}\left\{ \forall N>N_1(\beta,\gamma), \ \left| \frac{1}{N} \sum\limits_k \underline{a}_{\delta(\beta)}(z_k) - {\rm I}\!{\rm E}\{\underline{a}_{\delta(\beta)}(z)\} \right| < \frac{\beta}{2} \right\} & > & 1-\frac{\gamma}{2}.\end{array}$$

Combining with previous inequalities, we obtain

$$\begin{array}{l}{\rm Prob}\left\{ \forall N>N_1(\beta,\gamma), \ \max\limits_{\theta\in{\mathcal B}(\theta^1,\delta(\beta))} \left| \frac{1}{N} \sum\limits_k a(z_k,\theta) - {\rm I}\!{\rm E}\{a(z,\theta)\} \right| < \beta \right\} \\\qquad \qquad \qquad > 1-\gamma.\end{array}$$

It only remains to cover Θ with a finite numbers of sets \({\mathcal B}(\theta^i,\delta(\beta))\), \(i=1,\ldots,n(\beta)\), which is always possible from the compactness assumption. For any \(\alpha > 0, \beta > 0\), take \(\gamma = \alpha /n(\beta)\), \(N(\beta) =\max_iN_i(\beta,\gamma)\). We obtain

$${\rm Prob}\left\{ \forall N>N(\beta), \ \max\limits_{\theta\in\Theta} \left| \frac{1}{N} \sum\limits_k a(z_k,\theta) - {\rm I}\!{\rm E}\{ a(z,\theta) \} \right| <\beta \right\} >1-\alpha,$$

which completes the proof.

Proof of Lemma 3. Since P θ is the orthogonal projector onto \({\mathcal L}_\theta\) it is sufficient to prove that \(\bar{\alpha}{\mathop {\sim}\limits^{\xi}} \bar{\theta}\) implies that any element of \({\mathcal L}_{\bar{\alpha}}\) is in \({\mathcal L}_{\bar{\theta}}\).

From \(\{{\bf f}_{{\bar{\theta}}}\}_1,\ldots,\{{\bf f}_{{\bar{\theta}}}\}_p\) we choose r functions that form a linear basis of \({\mathcal L}_{\bar{\theta}}\). Without any loss of generality we can suppose that they are the first r ones. Decompose θ into \(\theta = (\beta, \gamma)\), where β corresponds to the first r components of θ and γ to the pr remaining ones. Define similarly \({\bar{\theta}}=(\bar{\beta},\bar{\gamma})\). From A4, the components of \(\partial \eta[x,(\beta,\gamma)] / \partial\gamma\) are linear combinations of components of \(\partial\eta[x,(\beta,\gamma)]/\partial \beta\) not only for \(\theta = \bar{\theta}\) but also for θ in some neighborhood of \(\bar{\theta}\).

Define the following mapping G from \({\mathbb R}^{r+p}\) to \({\mathbb R}^r\) by

$$G(\beta, \alpha) = \int_{{\mathcal X}} \frac{\partial \eta[x,(\beta,\bar{\gamma})] }{\partial \beta} \{ \eta[x,(\beta,\bar{\gamma})] -\eta(x,\alpha) \} \xi({\rm d}x).$$

From \(\bar{\alpha}{\mathop {\sim}\limits^{\xi}} \bar{\theta}\) we obtain \(G(\bar{\beta},\bar{\alpha})=0\). The matrix

$$\frac{\partial G(\beta,\alpha)} {\partial\beta^\top}_{|\bar{\beta},\bar{\alpha}} = \int_{{\mathcal X}} \frac{\partial \eta[x,(\beta,\bar{\gamma})] } {\partial \beta }_{|\bar{\beta}} \frac{\partial \eta[x,(\beta,\bar{\gamma})] } {\partial \beta^\top}_{|{\bar{\beta}}} \xi({\rm d}x)$$

is a nonsingular r × r submatrix of \({\bf M}(\xi,\bar{\theta})\), with rank \([{\bf M}(\xi,\theta)] =r\) for θ in a neighborhood of \(\bar{\theta}\). From the Implicit Function Theorem, see Spivak (1965, Th. 2–12, p. 41), there exist neighborhoods \({\mathcal V}({\bar{\alpha}})\), \({\mathcal W}({\bar{\beta}})\) and a differentiable mapping \(\psi: {\mathcal V}({\bar{\alpha}}) \rightarrow {\mathcal W}({\bar{\beta}})\) such that \(\psi(\bar{\alpha})=\bar{\beta}\) and that \(\alpha \in {\mathcal V}({\bar{\alpha}})\) implies \(G[\psi(\alpha),\alpha]=0\). It follows that

$$\begin{array}{l}\frac{\partial}{\partial \beta} \int_{{\mathcal X}} \{ \eta[x,(\beta,\bar{\gamma})] -\eta(x,\alpha) \}_{\beta =\psi(\alpha)}^2 \ \xi({\rm d}x) \\\qquad = 2 \int_{{\mathcal X}} \left[ \frac{\partial \eta[x,(\beta,\bar{\gamma})]}{\partial \beta}\right]_{\beta =\psi(\alpha) } \{ \eta[x,(\psi(\alpha),\bar{\gamma})] -\eta(x,\alpha)\} \ \xi({\rm d}x) = 0.\end{array}$$
((8.24))

Since the components of \(\partial \eta[x,(\beta,\gamma)]/\partial \gamma\) are linear combinations of the components of \(\partial \eta[x,(\beta,\gamma)] /\partial \beta\) for any \(\theta = (\beta,\gamma)\) in some neighborhood of \(\bar{\theta}\), we obtain from (8.24)

$$\begin{array}{l}\frac{\partial}{\partial \gamma} \int_{{\mathcal X}} \{\eta[x,(\beta,\gamma)] -\eta(x,\alpha) \}_{\beta =\psi(\alpha),\gamma =\bar{\gamma}}^2 \ \xi({\rm d}x) = \\2 \ \int_{{\mathcal X}} \left[ \frac{\partial \eta[x,(\beta,\gamma)] }{\partial \gamma }\right]_{\beta =\psi(\alpha),\gamma =\bar{\gamma}} \ \{ \eta[x,(\psi(\alpha),\bar{\gamma})] -\eta(x,\alpha) \} \ \xi({\rm d}x) = 0.\end{array}$$

Combining with (8.24) we obtain that

$$\left\{ \frac{\partial}{\partial \theta}\int_{{\mathcal X}} [\eta(x,\theta) - \eta(x,\alpha) ]^2 \ \xi({\rm d}x) \right\}_{\theta =[\psi(\alpha),\bar{\gamma}]}=0$$

for all α belonging to some neighborhood \({\mathcal U}({\bar{\alpha}})\). We can make \({\mathcal U}({\bar{\alpha}})\) small enough to satisfy the inequality \(\left\| \eta[x,(\psi(\alpha),\bar{\gamma})] -\eta(x,\bar{\theta}) \right\|_\xi^2<\epsilon\) required in A5. It follows that \((\psi(\alpha),\bar{\gamma}) {\mathop {\sim}\limits^{\xi}} \alpha\), that is, \(\eta(\cdot,\alpha) {\mathop {=}\limits^{\xi}} \eta[\cdot,(\psi(\alpha),\bar{\gamma})]\) for all α in a neighborhood of \(\bar{\alpha}\). By taking derivatives we then obtain

$$\begin{array}{rcl}\left[ \frac{\partial \eta(\cdot,\alpha) }{\partial \alpha^\top} \right]_{\bar{\alpha}} & {\mathop {=}\limits^{\xi}} & \left[ \frac{\partial \eta[\cdot,(\psi(\alpha),\bar{\gamma})] } {\partial \alpha^\top} \right]_{\bar{\alpha}} \\& {\mathop {=}\limits^{\xi}} & \left[ \frac{\partial \eta[\cdot,(\beta,\bar{\gamma})]} {\partial \beta^\top}\right]_{(\psi(\bar{\alpha}),\bar{\gamma})} \, \left[\frac{\partial \psi(\alpha)} {\partial \alpha^\top}\right]_{\bar{\alpha}},\end{array}$$

that is, \({\mathcal L}_{\bar{\alpha}}{\mathop {\subset}\limits^{\xi}} {\mathcal L}_{(\psi( \bar{ \alpha}), \bar{\gamma}) } = {\mathcal L}_{\bar{\theta}}\).

By interchanging \(\bar{\alpha}\) with \(\bar{\theta}\) we obtain \({\mathcal L}_{\bar{\theta}}{\mathop {\subset}\limits^{\xi}} {\mathcal L}_{\bar{\alpha}}\).

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media LLC

About this chapter

Cite this chapter

Pázman, A., Pronzato, L. (2009). Asymptotic Normality of Nonlinear Least Squares under Singular Experimental Designs. In: Pronzato, L., Zhigljavsky, A. (eds) Optimal Design and Related Areas in Optimization and Statistics. Springer Optimization and Its Applications, vol 28. Springer, New York, NY. https://doi.org/10.1007/978-0-387-79936-0_8

Download citation

Publish with us

Policies and ethics