In our proofs, C denotes a generic positive constant which can take different values even on the same line.
Proof of Theorem 1
Consider the class of functions
$$\begin{aligned} \mathcal {F}= & {} \left\{ \left( \mathbf{x}\mathbf{z}^{\mathrm{T}}\mathbf{g}'(\mathbf{x}^{\mathrm{T}}\varvec{\beta })-\mathbf{H}(\mathbf{x}^{\mathrm{T}}\varvec{\beta })\mathbf{z}\right) \left( \tau -I\left\{ y-\mathbf{g}^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta })\mathbf{z}\le 0\right\} \right) , \varvec{\beta }\in \mathcal {B}, \right. \\&\hbox { and for some } \alpha>3/2, \alpha '>1/2, \hbox { and } M>0, \\&\left. \hbox {entries of } \mathbf{g}\hbox { are in } \mathcal {C}^{\alpha }(M), \hbox { and entries of } \mathbf{H}\hbox { are in } \mathcal {C}^{\alpha '}(M)\right\} . \end{aligned}$$
For
\(\alpha >1/2\), define
$$\begin{aligned} \mathcal {F}_1=\left\{ h(\mathbf{x}^{\mathrm{T}}\varvec{\beta }):\varvec{\beta }\in \mathcal {B},h\in \mathcal {C}^{\alpha }(M)\right\} . \end{aligned}$$
For any fixed
\(\varvec{\beta }\), the class
\(\mathcal {F}_1(\varvec{\beta }):=\{h(\mathbf{x}^{\mathrm{T}}\varvec{\beta }): h\in \mathcal {C}^\alpha (R) \}\) has entropy
\(\log N(\delta ,\mathcal {C}^{\alpha }(M),\Vert .\Vert _\infty )\le C\delta ^{-1/\alpha }\) by Theorem 2.7.1 of van der Vaart and Wellner (
1996). Since
\(\Vert h(\mathbf{x}^{\mathrm{T}}\varvec{\beta })-h(\mathbf{x}^{\mathrm{T}}\varvec{\beta }')\Vert _\infty \le C\Vert \varvec{\beta }-\varvec{\beta }\Vert ^{s}\) for some
\(s>1/2\), it is easy to see that the
\(\delta \)-entropy of
\(\mathcal {F}_1\) in
\(L_\infty \) norm is bounded by the sum of
\(C\delta ^{-1/\alpha }\) and the
\(\delta ^{1/s}\)-entropy of
\(\mathcal {B}\) in Euclidean norm, with the latter being
\(C\log (1/\delta )\). Thus by Theorem 19.14 of van der Vaart (
1998) and that
\(\mathbf{x}\) lies in a compact set,
\(\mathcal {F}_1\) is a Donsker class.
Furthermore, consider
$$\begin{aligned} \mathcal {F}_2=\left\{ I\{y-\mathbf{g}^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta })\mathbf{z}\le 0\}: \varvec{\beta }\in \mathcal {B},\mathbf{g}\in \mathcal {C}^{\alpha }(M)\right\} . \end{aligned}$$
We have
$$\begin{aligned}&E\left( I\left\{ y-\mathbf{g}_1^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_1)\mathbf{z}\le 0\right\} -I\left\{ y-\mathbf{g}_2^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_2)\mathbf{z}\le 0\right\} \right) ^2\\&\qquad \le C E|\mathbf{g}_1^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_1)\mathbf{z}-\mathbf{g}_2^{\mathrm{T}}(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_2)\mathbf{z}|\\&\qquad \le C \left( E\left[ \mathbf{g}_1(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_1)-\mathbf{g}_2(\mathbf{x}^{\mathrm{T}}\varvec{\beta }_2)\right] ^2\right) ^{1/2}. \end{aligned}$$
Thus
\(\log N(\delta ,\mathcal {F}_2,L_2)\le C\log N(C\delta ^2,\mathcal {F}_1,L_2)\le C\delta ^{-2/\alpha }\) and
\(\mathcal {F}_2\) is Donsker if
\(\alpha >1\). Combining that
\(\mathcal {F}_1\) and
\(\mathcal {F}_2\) are Donsker classes, it is easy to see that
\(\mathcal {F}\) is also a Donsker class.
First we prove consistency. Let
\(F(.|\mathbf{X})\) be the conditional cumulative distribution function of
Y. Uniformly for all
\(\varvec{\beta }\in \mathcal {B}\), we have
$$\begin{aligned}&\Phi (\varvec{\beta };\widehat{\mathbf{m}}(;\varvec{\beta }))-\Phi (\varvec{\beta };\mathbf{m}(;\varvec{\beta }))\nonumber \\&\quad =\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) (\tau -F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \nonumber \\&\qquad -\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \left( \tau -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\quad =\mathbf{J}^{\mathrm{T}}E \left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\left( \widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta }) -\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) -\left( \widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta })\right. \right. \nonumber \\&\qquad \left. \left. -\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta })\right) \mathbf{Z}\right) (\tau -F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \nonumber \\&\qquad -\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta }) -\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta })\mathbf{Z}\right) \left( F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right. \nonumber \\&\qquad \left. -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\quad =o_p(1), \end{aligned}$$
(13)
using Proposition
1. Furthermore, by the Glivenko–Cantelli Theorem (
\(\mathcal {F}\) is Donsker implies it is Glivenko–Cantelli),
\(\sup _{f\in \mathcal {F}}(P_n-P)f=o_p(1)\). Thus uniformly for
\(\varvec{\beta }\in \mathcal {B}\),
$$\begin{aligned} \Vert \Phi _n(\varvec{\beta };\widehat{\mathbf{m}}(.;\varvec{\beta }))-\Phi (\varvec{\beta };\widehat{\mathbf{m}}(.;\varvec{\beta })\Vert =o_p(1). \end{aligned}$$
(14)
Thus
$$\begin{aligned}&\Vert \Phi (\widehat{\varvec{\beta }};\mathbf{m}(.;\widehat{\varvec{\beta }})\Vert \\&\quad =\Vert \Phi (\widehat{\varvec{\beta }};\widehat{\mathbf{m}}(.;\widehat{\varvec{\beta }})\Vert +o_p(1)\\&\quad = \Vert \Phi _n(\widehat{\varvec{\beta }};\widehat{\mathbf{m}}(.;\widehat{\varvec{\beta }})\Vert +o_p(1)\\&\quad \le \Vert \Phi _n(\varvec{\beta }_0;\widehat{\mathbf{m}}(.;\varvec{\beta }_0)\Vert +o_p(1)\\&\quad =\Vert \Phi (\varvec{\beta }_0;\widehat{\mathbf{m}}(.;\varvec{\beta }_0)\Vert +o_p(1)\\&\quad =\Vert \Phi (\varvec{\beta }_0;\mathbf{m}(.;\varvec{\beta }_0)\Vert +o_p(1), \end{aligned}$$
where the first and the last equality used (
13), the second and third equality used (
14) and the inequality follows from the definition of
\(\widehat{\varvec{\beta }}\) as an approximate minimizer of
\(\Vert \Phi (\varvec{\beta },\widehat{\mathbf{m}}(.;\varvec{\beta }))\Vert \). Thus by assumption (A4),
\(\Vert \widehat{\varvec{\beta }}-\varvec{\beta }_0\Vert <\epsilon \) with probability approaching one, for any
\(\epsilon >0\). This shows
\(\Vert \widehat{\varvec{\beta }}-\varvec{\beta }_0\Vert =o_p(1)\).
Now we consider asymptotic normality. For readability, we split the proof into several steps.
Step 1 By consistency, Lemma 19.24 in van der Vaart (
1998) then implies that
$$\begin{aligned} {\mathbb {G}}_n(\phi _{\widehat{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}})=o_p(1), \end{aligned}$$
(15)
where
\({\mathbb {G}}_n=\sqrt{n}(P_n-P)\) is the empirical process.
Step 2 We show
$$\begin{aligned} \sqrt{n}P(\phi _{\widehat{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}})=\varvec{\Psi }_1\sqrt{n}(\widehat{\varvec{\beta }}^{(-1)}-\varvec{\beta }_0^{(-1)})+o_p(\sqrt{n}(\widehat{\varvec{\beta }}^{(-1)}-\varvec{\beta }_0^{(-1)}))+o_p(1). \end{aligned}$$
(16)
In the proof of (
16), we write
\(\widehat{\varvec{\beta }}\) as
\(\varvec{\beta }\) for simplicity of notation. Writing
\(P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}})=P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta },\mathbf{m}})+P(\phi _{\varvec{\beta },\mathbf{m}}-\phi _{\varvec{\beta }_0,\mathbf{m}})\), we first compute
\(P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta },\mathbf{m}})\) as follows.
$$\begin{aligned}&P(\phi _{\varvec{\beta },\widehat{\mathbf{m}}}-\phi _{\varvec{\beta },\mathbf{m}})\\&\quad =\mathbf{J}^{\mathrm{T}}E \left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \left( \tau -F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\widehat{\varvec{\beta }};\widehat{\varvec{\beta }})\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\qquad -\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \left( \tau -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\quad = \mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\left( \widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) -\left( \widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) \mathbf{Z}\right) \\&\qquad \cdot \left( \tau -F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\qquad -\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta }) -\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \left( F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right. \\&\qquad \left. -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\quad =\mathbf{J}^{\mathrm{T}}E \left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\left( \widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) -\left( \widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) \mathbf{Z}\right) \\&\qquad \cdot \left( F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) -F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\qquad +\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\left( \widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) -\left( \widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) \mathbf{Z}\right) \\&\qquad \cdot \left( F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}|\mathbf{X}\right) -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\qquad +\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\left( \widehat{\mathbf{g}}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) -\left( \widehat{\mathbf{H}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\right) \mathbf{Z}\right) \\&\qquad \cdot \left( \tau -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}|\mathbf{X}\right) \right) \nonumber \\&\qquad -\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \left( F\left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right. \nonumber \\&\qquad \left. -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \right) .\nonumber \\ \end{aligned}$$
By Proposition
1, the first term above is
\(o(n^{-1/2})\) and the second term is
\(o_p(\Vert \varvec{\beta }-\varvec{\beta }_0\Vert )\). The third term is actually zero since
\(F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}|\mathbf{X}\right) =\tau \). Finally, the last term is, by Taylor’s expansion
$$\begin{aligned}&\mathbf{J}^{\mathrm{T}}E\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) f\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \\&\qquad \times \left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}-\mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) \\&\qquad +O_p\left( \left( \widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}-\mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\right) ^2\right) . \end{aligned}$$
The first term above is zero since
\(\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\) is the projection of
\(\mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\) onto
\(\mathcal {M}_{\varvec{\beta }}\) while
\(\widehat{\mathbf{g}}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}-\mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\in \mathcal {M}_{\varvec{\beta }}\). The second term above is
\(o_p(n^{-1/2})\) by Proposition
1.
Now we compute
\(P(\phi _{\varvec{\beta },\mathbf{m}}-\phi _{\varvec{\beta }_0,\mathbf{m}})\). We have
$$\begin{aligned}&E[\phi _{ {\varvec{\beta }},\mathbf{m}}-\phi _{\varvec{\beta }_0,\mathbf{m}}|\mathbf{X}]\\&\quad =\mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \left( \tau -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \right) \nonumber \\&\quad =\mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \left( F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}\right) -F\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \right) \nonumber \\&\quad =\mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \\&\qquad f\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)-\mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\right) \mathbf{Z}+o_p\left( n^{-1/2}\right) \nonumber \\&\quad =-\mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) f\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \nonumber \\&\qquad \left. \frac{d\mathbf{g}^{\mathrm{T}}}{d\varvec{\beta }}\right. \mathbf{Z}( \varvec{\beta }-\varvec{\beta }_0)+o_p\left( n^{-1/2}\right) \nonumber \\&\quad =-\left( \mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }; \varvec{\beta })\mathbf{Z}\right) \right) ^{\otimes 2}f\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}|\mathbf{X}\right) \left( \varvec{\beta }^{(-1)}-\varvec{\beta }_0^{(-1)}\right) \\&\qquad +o_p\left( \Vert \varvec{\beta }-\varvec{\beta }_0\Vert \right) +o_p\left( n^{-1/2}\right) \nonumber \\&\quad =-\left( \mathbf{J}^{\mathrm{T}}\left( \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)-\mathbf{H}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0;\varvec{\beta }_0)\mathbf{Z}\right) \right) ^{\otimes 2}f\left( \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta }_0)\mathbf{Z}|\mathbf{X}\right) \nonumber \\&\qquad \times \left( \varvec{\beta }^{(-1)}-\varvec{\beta }_0^{(-1)}\right) +o_p\left( \Vert \varvec{\beta }-\varvec{\beta }_0\Vert \right) +o_p\left( n^{-1/2}\right) , \end{aligned}$$
where in the second to last line, we used the identity
$$\begin{aligned} \frac{d \mathbf{g}^{\mathrm{T}}(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })}{d \varvec{\beta }}\mathbf{Z}=\mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })-E\left[ \mathbf{X}\mathbf{Z}^{\mathrm{T}}\mathbf{g}'(\mathbf{X}^{\mathrm{T}}\varvec{\beta };\varvec{\beta })|\mathcal {M}_{\varvec{\beta }}\right] , \end{aligned}$$
as derived in the previous subsection.
Step 3 Let
\(\widetilde{\varvec{\beta }}^{(-1)}=\varvec{\beta }_0^{(-1)}-\varvec{\Psi }_1^{-1}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}\), we have
$$\begin{aligned} {\mathbb {G}}_n\left( \phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}}\right) =o_p(1). \end{aligned}$$
In fact, it is easy to see by central limit theorem that
\(\Vert \widetilde{\varvec{\beta }}-\varvec{\beta }_0\Vert =O_p\left( n^{-1/2}\right) \) and the proof is similar to Step 1 (actually only
\(\Vert \widetilde{\varvec{\beta }}-\varvec{\beta }_0\Vert =o_p(1)\) is needed here).
Step 4 \(\sqrt{n}P_n\phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}=o_p(1)\).
Rewriting the result in Step 3 as
$$\begin{aligned} \sqrt{n}P_n\phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}=\sqrt{n}P\left( \phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}}\right) +\sqrt{n}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}+o_p(1), \end{aligned}$$
using the same arguments as in Step 2, we have
$$\begin{aligned} \sqrt{n}P\left( \phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}}\right) =\varvec{\Psi }_1\sqrt{n}(\widetilde{\varvec{\beta }}^{(-1)}-\varvec{\beta }_0^{(-1)})+o_p(1), \end{aligned}$$
and thus
$$\begin{aligned} \sqrt{n}P_n\phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}= \varvec{\Psi }_1\sqrt{n}\left( \widetilde{\varvec{\beta }}^{(-1)}-\varvec{\beta }_0^{(-1)}\right) +\sqrt{n}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}+o_p(1)=o_p(1), \end{aligned}$$
by the definition of
\(\widetilde{\varvec{\beta }}\).
Step 5 Finish the proof. Since \(\widehat{\varvec{\beta }}\) minimizes \(\Vert \sqrt{n}P_n\phi _{\varvec{\beta },\widehat{\mathbf{m}}}\Vert \) (up to an \(o_p(1)\) term), we have \(|\sqrt{n}P_n\phi _{\widehat{\varvec{\beta }},\widehat{\mathbf{m}}}|\le |\sqrt{n}P_n\phi _{\widetilde{\varvec{\beta }},\widehat{\mathbf{m}}}|+o_p(1)=o_p(1)\).
Thus (
15) can be rewritten as
$$\begin{aligned} \sqrt{n}P\left( \phi _{\widehat{\varvec{\beta }},\widehat{\mathbf{m}}}-\phi _{\varvec{\beta }_0,\mathbf{m}}\right) =\sqrt{n}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}+o_p(1). \end{aligned}$$
(17)
Using the result in Step 2 on the left-hand side of (
17), we have
$$\begin{aligned} \sqrt{n}\left( \widehat{\varvec{\beta }}^{(-1)}-\varvec{\beta }_0^{(-1)}\right) =\varvec{\Psi }_1^{-1}\sqrt{n}P_n\phi _{\varvec{\beta }_0,\mathbf{m}}+o_p\left( \sqrt{n}\left( \widehat{\varvec{\beta }}-\varvec{\beta }_0\right) \right) +o_p(1). \end{aligned}$$
This implies root-
n consistency of
\(\widehat{\varvec{\beta }}^{(-1)}\) as well as the asymptotic normality.
\(\square \) Define \(m_i(\varvec{\beta })=\mathbf{g}^{\mathrm{T}}(\mathbf{X}_i^{\mathrm{T}}\varvec{\beta };\varvec{\beta })\mathbf{Z}\) and \(e_i(\varvec{\beta })=Y_i-m_i(\varvec{\beta })\). Note that \(\tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} \) does not have mean zero in general (unless \(\varvec{\beta }=\varvec{\beta }_0\)) but \(\mathbf{Z}\left( \tau -I\left\{ e_i(\varvec{\beta })\le 0\right\} \right) \) still has mean zero, as in (4), which is sufficient for our purpose.