Asymptotic Normality of Nonlinear Least Squares under Singular Experimental Designs

Pázman, A.; Pronzato, L.

doi:10.1007/978-0-387-79936-0_8

A. Pázman³ &
L. Pronzato⁴

Part of the book series: Springer Optimization and Its Applications ((SOIA,volume 28))

1050 Accesses
3 Citations

Summary

We study the consistency and asymptotic normality of the LS estimator of a function h(θ) of the parameters θ in a nonlinear regression model with observations $y_i=\eta(x_i,\theta) +\varepsilon_i$, $i=1,2\ldots$ and independent errors ε_i. Optimum experimental design for the estimation of h(θ) frequently yields singular information matrices, which corresponds to the situation considered here. The difficulties caused by such singular designs are illustrated by a simple example: depending on the true value of the model parameters and on the type of convergence of the sequence of design points $x_1,x_2\ldots$ to the limiting singular design measure ξ, the convergence of the estimator of h(θ) may be slower than $1/\sqrt{n}$, and, when convergence is at a rate of $1/\sqrt{n}$ and the estimator is asymptotically normal, its asymptotic variance may differ from that obtained for the limiting design ξ (which we call irregular asymptotic normality of the estimator). For that reason we focuss our attention on two types of design sequences: those that converge strongly to a discrete measure and those that correspond to sampling randomly from ξ. We then give assumptions on the limiting expectation surface of the model and on the estimated function h which, for the designs considered, are sufficient to ensure the regular asymptotic normality of the LS estimator of h(θ).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atkinson, A. Donev, A. (1992). Optimum Experimental Design. Oxford University Press, NY, USA.
Google Scholar
Bierens, H. (1994). Topics in Advanced Econometrics. Cambridge University Press, Cambridge.
Book MATH Google Scholar
Billingsley, P. (1971). Weak Convergence of Measures: Applications in Probability. SIAM, Philadelphia.
MATH Google Scholar
Elfving, G. (1952). Optimum allocation in linear regression. The Annals of Mathematical Statistics, 23, 255–262.
Article MATH MathSciNet Google Scholar
Fedorov, V. (1972). Theory of Optimal Experiments. Academic Press, New York.
Google Scholar
Gallant, A. (1987). Nonlinear Statistical Models. Wiley, New York.
Book MATH Google Scholar
Hero, A., Fessler, J., Usman, M. (1996). Exploring estimator bias-variance tradeoffs using the uniform CR bound. IEEE Transactions on Signal Processing, 44, 2026–2041.
Article Google Scholar
Ivanov, A. (1997). Asymptotic Theory of Nonlinear Regression. Kluwer, Dordrecht.
MATH Google Scholar
Jennrich, R. (1969). Asymptotic properties of nonlinear least squares estimation. The Annals of Mathematical Statistics, 40, 633–643.
Article MATH MathSciNet Google Scholar
Kiefer, J. Wolfowitz, J. (1959). Optimum designs in regression problems. The Annals of Mathematical Statistics, 30, 271–294.
Article MATH MathSciNet Google Scholar
Lehmann, E. Casella, G. (1998). Theory of Point Estimation. Springer, Heidelberg.
MATH Google Scholar
Pázman, A. (1980). Singular experimental designs. Math. Operationsforsch. Statist., Ser. Statistics, 16, 137–149.
Google Scholar
Pázman, A. (1986). Foundations of Optimum Experimental Design. Reidel (Kluwer group), Dordrecht (co-pub. VEDA, Bratislava).
Google Scholar
Pázman, A. Pronzato, L. (1992). Nonlinear experimental design based on the distribution of estimators. Journal of Statistical Planning and Inference, 33, 385–402.
Article MATH MathSciNet Google Scholar
Pázman, A. Pronzato, L. (2006). On the irregular behavior of LS estimators for asymptotically singular designs. Statistics & Probability Letters, 76, 1089–1096.
Article MATH MathSciNet Google Scholar
Pronzato, L. Pázman, A. (1994). Second-order approximation of the entropy in nonlinear least-squares estimation. Kybernetika, 30, (2)187–198. Erratum. 32(1):104, 1996.
MATH MathSciNet Google Scholar
Shiryaev, A. (1996). Probability. Springer, Berlin.
Google Scholar
Silvey, S. (1980). Optimal Design. Chapman & Hall, London.
MATH Google Scholar
Sjöberg, J., Zhang, Q., Ljung, L., Benveniste, A., Delyon, B., Glorennec, P.-Y., Hjalmarsson, H., Juditsky, A. (1995). Nonlinear black-box modeling in system identification: a unified overview. Automatica, 31, (12)1691–1724.
Article MATH MathSciNet Google Scholar
Spivak, M. (1965). Calculus on Manifolds. A Modern Approach to Classical Theorems of Advanced Calculus. W. A. Benjamin, Inc., New York.
MATH Google Scholar
Stoica, P. (2001). Parameter estimation problems with singular information matrices. IEEE Transactions on Signal Processing, 49, 87–90.
Article MathSciNet Google Scholar
Wu, C.-F. (1980). Characterizing the consistent directions of least squares estimates. The Annals of Statistics, 8, (4)789–801.
Article MATH MathSciNet Google Scholar
Wu, C.-F. (1981). Asymptotic theory of nonlinear least squares estimation. The Annals of Statistics, 9, (3)501–513.
Article MATH MathSciNet Google Scholar
Wu, C.-F. (1983). Further results on the consistent directions of least squares estimators. The Annals of Statistics, 11, (4)1257–1262.
MATH MathSciNet Google Scholar
Wynn, H. (1972). Results in the theory and construction of D. -optimum experimental designs Journal of the Royal Statistical Society B, 34, 133–147.
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics and Statistics, Faculty of Mathematics, Physics and Informatics, Comenius University, SK 84248, Bratislava, Slovakia
A. Pázman
Laboratoire I3S, CNRS - UNSA, Les Algorithmes – Bât. Euclide B, 2000 route des Lucioles, B.P. 121, 06903, Sophia Antipolis, France
L. Pronzato

Authors

A. Pázman
View author publications
You can also search for this author in PubMed Google Scholar
L. Pronzato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Pázman .

Editor information

Editors and Affiliations

CNRS/Université de Nice Sophia Antipolis Laboratoire 13S, Bât Euclide, Les Algorithmes, 2000 route des Lucioles, BP 121, 06903, Sophia-Antipolis cedex, France
Luc Pronzato
Cardiff University, School of Mathematics, Senghennydd Road, CF24 4AG, Cardiff, United Kingdom
Anatoly Zhigljavsky

Appendix. Proofs of Lemmas 1–3

Proof of Lemma 1. We can write

$$\begin{array}{l}\left| \frac{1}{N} \ \sum\limits_{k=1}^N a(x_k,\theta) \alpha_k - {\rm I}\!{\rm E}\{\alpha_1\} \ \sum\limits_{x\in S_\xi} a(x,\theta) \ \xi(\{x\}) \right| \\\leq \left| \frac{1}{N} \ \sum\limits_{k=1, x_k \not\in S_\xi}^N a(x_k,\theta) \alpha_k\right| \\\quad + \sum\limits_{x\in S_\xi} \sup_{\theta \in \Theta } \left| a(x,\theta) \right| \ \left| \frac{N(x)}{N} \left( \frac{1}{N(x)} \ \sum\limits_{k=1, \ x_k=x}^N \alpha_k \right) - {\rm I}\!{\rm E}\{\alpha_1\} \xi(\{x\}) \right|,\end{array}$$

where N(x)/N is the relative frequency of the point x in the sequence $x_1,x_2,\ldots,x_N$. The last sum for $x \in S_\xi$ tends to zero a.s. and uniformly on Θ, since N(x)/N tends to $\xi(\{x\})$, and $[1/N(x)] \sum_{k=1, \ x_k=x}^N\alpha_k$ converges a.s. to ${\rm I}\!{\rm E}\{\alpha_1\}$. The first sum on the right-hand side is bounded by

$$\sup_{x\in {\mathcal X}, \ \theta \in \Theta}\left| a(x,\theta) \right| \ \frac{N({\mathcal X}\setminus S_\xi)}{N} \ \frac{1}{N({\mathcal X}\setminus S_\xi)} \ \sum\limits_{k=1,\,x_k\in{\mathcal X}\setminus S_\xi}\alpha_k.$$

This expression tends a.s. to zero, since $N({\mathcal X} \setminus S_\xi)/N$ tends to zero, and the law of large numbers applies for the remaining part in case $N({\mathcal X}\setminus S_\xi)\rightarrow \infty$.

Proof of Lemma 2. We use a construction similar to that in Bierens (1994, p. 43). Take some fixed $\theta^1 \in \Theta$ and consider the set

$${\mathcal B}(\theta^1,\delta) = \left\{ \theta \in \Theta: \left\| \theta -\theta^1\right\| \leq \delta \right\}.$$

Define $\bar {a}_\delta(z)$ and $\underline{a}_\delta(z)$ as the maximum and the minimum of $a(z,\theta)$ over the set ${\mathcal B}(\theta^1,\delta)$.

The expectations ${\rm I}\!{\rm E}\{|\underline{a}_\delta(z)|\}$ and ${\rm I}\!{\rm E}\{ |\bar{a}_\delta(z)|\}$ are bounded by

$${\rm I}\!{\rm E} \{ \max\limits_{\theta \in\Theta} |a(z,\theta)|\}<\infty.$$

Also, $\bar{a}_\delta(z) - \underline{a}_\delta(z)$ is an increasing function of δ. Hence, we can interchange the order of the limit and expectation in the following expression

$$\lim_{\delta \searrow 0}\left[ {\rm I}\!{\rm E}\{ \bar{a}_\delta(z) \} - {\rm I}\!{\rm E}\{ \underline{a}_\delta(z)\} \right] = {\rm I}\!{\rm E}\left\{ \lim_{\delta \searrow 0} \left[ \bar{a}_\delta(z) - \underline{a}_\delta(z) \right] \right\} = 0,$$

which proves the continuity of ${\rm I}\!{\rm E}\{ a(z,\theta)\}$ at θ¹ and implies

$$\forall \beta >0, \ \exists \delta(\beta) >0 \ \hbox{such that} \ \left| {\rm I}\!{\rm E}\{ \bar{a}_{\delta(\beta)}(z) \} - {\rm I}\!{\rm E}\{\underline{a}_{\delta(\beta)}(z) \} \right| < \frac{\beta}{2}.$$

Hence we can write for every $\theta \in {\mathcal B}(\theta^1,\delta(\beta))$

$$\begin{array}{rcl}\frac{1}{N} \sum\limits_k \underline{a}_{\delta(\beta) }(z_k) - {\rm I}\!{\rm E}\{\underline{a}_{\delta(\beta)}(z)\} -\frac{\beta}{2} & \leq & \frac{1}{N} \sum\limits_k \underline{a}_{\delta(\beta)}(z_k) - {\rm I}\!{\rm E}\{\bar{a}_{\delta(\beta)}(z)\} \\& \leq & \frac{1}{N} \sum\limits_k a(z_k,\theta) - {\rm I}\!{\rm E}\{a(z,\theta)\} \\& \leq & \frac{1}{N} \sum\limits_k \bar{a}_{\delta(\beta)}(z_k) - {\rm I}\!{\rm E}\{\underline{a}_{\delta(\beta)}(z) \} \\& \leq & \frac{1}{N} \sum\limits_k \bar{a}_{\delta(\beta)}(z_k) - {\rm I}\!{\rm E}\{\bar{a}_{\delta(\beta)}(z)\} + \frac{\beta}{2}.\end{array}$$

From the strong law of large numbers, we have that $\forall \gamma>0$, $\exists N_1(\beta,\gamma)$ such that

$$\begin{array}{rcl}{\rm Prob}\left\{ \forall N>N_1(\beta,\gamma), \ \left| \frac{1}{N} \sum\limits_k \bar{a}_{\delta(\beta)}(z_k) - {\rm I}\!{\rm E}\{\bar{a}_{\delta(\beta)}(z)\} \right| < \frac{\beta}{2} \right\} & > & 1-\frac{\gamma}{2}, \\{\rm Prob}\left\{ \forall N>N_1(\beta,\gamma), \ \left| \frac{1}{N} \sum\limits_k \underline{a}_{\delta(\beta)}(z_k) - {\rm I}\!{\rm E}\{\underline{a}_{\delta(\beta)}(z)\} \right| < \frac{\beta}{2} \right\} & > & 1-\frac{\gamma}{2}.\end{array}$$

Combining with previous inequalities, we obtain

$$\begin{array}{l}{\rm Prob}\left\{ \forall N>N_1(\beta,\gamma), \ \max\limits_{\theta\in{\mathcal B}(\theta^1,\delta(\beta))} \left| \frac{1}{N} \sum\limits_k a(z_k,\theta) - {\rm I}\!{\rm E}\{a(z,\theta)\} \right| < \beta \right\} \\\qquad \qquad \qquad > 1-\gamma.\end{array}$$

It only remains to cover Θ with a finite numbers of sets ${\mathcal B}(\theta^i,\delta(\beta))$, $i=1,\ldots,n(\beta)$, which is always possible from the compactness assumption. For any $\alpha > 0, \beta > 0$, take $\gamma = \alpha /n(\beta)$, $N(\beta) =\max_iN_i(\beta,\gamma)$. We obtain

$${\rm Prob}\left\{ \forall N>N(\beta), \ \max\limits_{\theta\in\Theta} \left| \frac{1}{N} \sum\limits_k a(z_k,\theta) - {\rm I}\!{\rm E}\{ a(z,\theta) \} \right| <\beta \right\} >1-\alpha,$$

which completes the proof.

Proof of Lemma 3. Since P _θ is the orthogonal projector onto ${\mathcal L}_\theta$ it is sufficient to prove that $\bar{\alpha}{\mathop {\sim}\limits^{\xi}} \bar{\theta}$ implies that any element of ${\mathcal L}_{\bar{\alpha}}$ is in ${\mathcal L}_{\bar{\theta}}$.

From $\{{\bf f}_{{\bar{\theta}}}\}_1,\ldots,\{{\bf f}_{{\bar{\theta}}}\}_p$ we choose r functions that form a linear basis of ${\mathcal L}_{\bar{\theta}}$. Without any loss of generality we can suppose that they are the first r ones. Decompose θ into $\theta = (\beta, \gamma)$, where β corresponds to the first r components of θ and γ to the p – r remaining ones. Define similarly ${\bar{\theta}}=(\bar{\beta},\bar{\gamma})$. From A4, the components of $\partial \eta[x,(\beta,\gamma)] / \partial\gamma$ are linear combinations of components of $\partial\eta[x,(\beta,\gamma)]/\partial \beta$ not only for $\theta = \bar{\theta}$ but also for θ in some neighborhood of $\bar{\theta}$.

Define the following mapping G from ${\mathbb R}^{r+p}$ to ${\mathbb R}^r$ by

$$G(\beta, \alpha) = \int_{{\mathcal X}} \frac{\partial \eta[x,(\beta,\bar{\gamma})] }{\partial \beta} \{ \eta[x,(\beta,\bar{\gamma})] -\eta(x,\alpha) \} \xi({\rm d}x).$$

From $\bar{\alpha}{\mathop {\sim}\limits^{\xi}} \bar{\theta}$ we obtain $G(\bar{\beta},\bar{\alpha})=0$. The matrix

$$\frac{\partial G(\beta,\alpha)} {\partial\beta^\top}_{|\bar{\beta},\bar{\alpha}} = \int_{{\mathcal X}} \frac{\partial \eta[x,(\beta,\bar{\gamma})] } {\partial \beta }_{|\bar{\beta}} \frac{\partial \eta[x,(\beta,\bar{\gamma})] } {\partial \beta^\top}_{|{\bar{\beta}}} \xi({\rm d}x)$$

is a nonsingular r × r submatrix of ${\bf M}(\xi,\bar{\theta})$, with rank $[{\bf M}(\xi,\theta)] =r$ for θ in a neighborhood of $\bar{\theta}$. From the Implicit Function Theorem, see Spivak (1965, Th. 2–12, p. 41), there exist neighborhoods ${\mathcal V}({\bar{\alpha}})$, ${\mathcal W}({\bar{\beta}})$ and a differentiable mapping $\psi: {\mathcal V}({\bar{\alpha}}) \rightarrow {\mathcal W}({\bar{\beta}})$ such that $\psi(\bar{\alpha})=\bar{\beta}$ and that $\alpha \in {\mathcal V}({\bar{\alpha}})$ implies $G[\psi(\alpha),\alpha]=0$. It follows that

$$\begin{array}{l}\frac{\partial}{\partial \beta} \int_{{\mathcal X}} \{ \eta[x,(\beta,\bar{\gamma})] -\eta(x,\alpha) \}_{\beta =\psi(\alpha)}^2 \ \xi({\rm d}x) \\\qquad = 2 \int_{{\mathcal X}} \left[ \frac{\partial \eta[x,(\beta,\bar{\gamma})]}{\partial \beta}\right]_{\beta =\psi(\alpha) } \{ \eta[x,(\psi(\alpha),\bar{\gamma})] -\eta(x,\alpha)\} \ \xi({\rm d}x) = 0.\end{array}$$

((8.24))

Since the components of $\partial \eta[x,(\beta,\gamma)]/\partial \gamma$ are linear combinations of the components of $\partial \eta[x,(\beta,\gamma)] /\partial \beta$ for any $\theta = (\beta,\gamma)$ in some neighborhood of $\bar{\theta}$, we obtain from (8.24)

$$\begin{array}{l}\frac{\partial}{\partial \gamma} \int_{{\mathcal X}} \{\eta[x,(\beta,\gamma)] -\eta(x,\alpha) \}_{\beta =\psi(\alpha),\gamma =\bar{\gamma}}^2 \ \xi({\rm d}x) = \\2 \ \int_{{\mathcal X}} \left[ \frac{\partial \eta[x,(\beta,\gamma)] }{\partial \gamma }\right]_{\beta =\psi(\alpha),\gamma =\bar{\gamma}} \ \{ \eta[x,(\psi(\alpha),\bar{\gamma})] -\eta(x,\alpha) \} \ \xi({\rm d}x) = 0.\end{array}$$

Combining with (8.24) we obtain that

$$\left\{ \frac{\partial}{\partial \theta}\int_{{\mathcal X}} [\eta(x,\theta) - \eta(x,\alpha) ]^2 \ \xi({\rm d}x) \right\}_{\theta =[\psi(\alpha),\bar{\gamma}]}=0$$

for all α belonging to some neighborhood ${\mathcal U}({\bar{\alpha}})$. We can make ${\mathcal U}({\bar{\alpha}})$ small enough to satisfy the inequality $\left\| \eta[x,(\psi(\alpha),\bar{\gamma})] -\eta(x,\bar{\theta}) \right\|_\xi^2<\epsilon$ required in A5. It follows that $(\psi(\alpha),\bar{\gamma}) {\mathop {\sim}\limits^{\xi}} \alpha$, that is, $\eta(\cdot,\alpha) {\mathop {=}\limits^{\xi}} \eta[\cdot,(\psi(\alpha),\bar{\gamma})]$ for all α in a neighborhood of $\bar{\alpha}$. By taking derivatives we then obtain

$$\begin{array}{rcl}\left[ \frac{\partial \eta(\cdot,\alpha) }{\partial \alpha^\top} \right]_{\bar{\alpha}} & {\mathop {=}\limits^{\xi}} & \left[ \frac{\partial \eta[\cdot,(\psi(\alpha),\bar{\gamma})] } {\partial \alpha^\top} \right]_{\bar{\alpha}} \\& {\mathop {=}\limits^{\xi}} & \left[ \frac{\partial \eta[\cdot,(\beta,\bar{\gamma})]} {\partial \beta^\top}\right]_{(\psi(\bar{\alpha}),\bar{\gamma})} \, \left[\frac{\partial \psi(\alpha)} {\partial \alpha^\top}\right]_{\bar{\alpha}},\end{array}$$

that is, ${\mathcal L}_{\bar{\alpha}}{\mathop {\subset}\limits^{\xi}} {\mathcal L}_{(\psi( \bar{ \alpha}), \bar{\gamma}) } = {\mathcal L}_{\bar{\theta}}$.

By interchanging $\bar{\alpha}$ with $\bar{\theta}$ we obtain ${\mathcal L}_{\bar{\theta}}{\mathop {\subset}\limits^{\xi}} {\mathcal L}_{\bar{\alpha}}$.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Pázman, A., Pronzato, L. (2009). Asymptotic Normality of Nonlinear Least Squares under Singular Experimental Designs. In: Pronzato, L., Zhigljavsky, A. (eds) Optimal Design and Related Areas in Optimization and Statistics. Springer Optimization and Its Applications, vol 28. Springer, New York, NY. https://doi.org/10.1007/978-0-387-79936-0_8

Download citation

DOI: https://doi.org/10.1007/978-0-387-79936-0_8
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-79935-3
Online ISBN: 978-0-387-79936-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Asymptotic Normality of Nonlinear Least Squares under Singular Experimental Designs

Summary

Access this chapter

Preview

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix. Proofs of Lemmas 1–3

Appendix. Proofs of Lemmas 1–3

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation