Skip to main content
Log in

Penalized empirical likelihood for partially linear errors-in-variables panel data models with fixed effects

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

For the partially linear errors-in-variables panel data models with fixed effects, we, in this paper, study asymptotic distributions of a corrected empirical log-likelihood ratio and maximum empirical likelihood estimator of the regression parameter. In addition, we propose penalized empirical likelihood (PEL) and variable selection procedure for the parameter with diverging numbers of parameters. By using an appropriate penalty function, we show that PEL estimators have the oracle property. Also, the PEL ratio for the vector of regression coefficients is defined and its limiting distribution is asymptotically chi-square under the null hypothesis. Moreover, empirical log-likelihood ratio for the nonparametric part is also investigated. Monte Carlo simulations are conducted to illustrate the finite sample performance of the proposed estimators.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Arellano M (2003) Panel data econometrics. Oxford University Press, New York

    MATH  Google Scholar 

  • Baltagi BH (2005) Econometric analysis of panel data, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  • Baltagi BH, Li D (2002) Series estimation of partially linear panel data models with fixed effects. Ann Econ Fin 3(1):103–116

    Google Scholar 

  • Cai Z, Li Q (2008) Nonparametric estimation and varying coefficient dynamic panel data models. Econom Theory 24:1321–1342

    MathSciNet  MATH  Google Scholar 

  • Carroll RJ, Ruppert D, Stefanski LA (1995) Measurement error in nonlinear models. Chapman & Hall, London

    MATH  Google Scholar 

  • Chen J, Gao J, Li D (2013) Estimation in partially linear single-index panel data models with fixed effects. J Bus Econ Stat 31(3):315–330

    MathSciNet  Google Scholar 

  • Fan J, Huang T (2005) Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 11(6):1031–1057

    MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    MathSciNet  MATH  Google Scholar 

  • Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B 70(5):849–911

    MathSciNet  MATH  Google Scholar 

  • Fan J, Peng H, Huang T (2005) Semilinear high-dimensional model for normalization of microarrays data: a theoretical analysis and partial consistency. J Am Stat Assoc 100:781–796

    MathSciNet  MATH  Google Scholar 

  • Fan GL, Liang HY, Wang JF (2013) Statistical inference for partially time-varying coefficient errors-in-variables models. J Stat Plan Inference 143:505–519

    MathSciNet  MATH  Google Scholar 

  • Fan GL, Liang HY, Shen Y (2016) Penalized empirical likelihood for high-dimensional partially linear varying coefficient model with measurement errors. J Multivar Anal 147:183–201

    MathSciNet  MATH  Google Scholar 

  • Hall P, Heyde CC (1980) Martingale limit theory and its application. Academic Press, New York

    MATH  Google Scholar 

  • Hanfelt JJ, Liang KY (1997) Approximate likelihood for generalized linear errors-in-variables models. J R Stat Soc Ser B 59(3):627–637

    MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • He BQ, Hong XJ, Fan GL (2017) Block empirical likelihood for partially linear panel data models with fixed effects. Stat Probab Lett 123(4):128–138

    MathSciNet  MATH  Google Scholar 

  • Henderson DJ, Carroll RJ, Li Q (2008) Nonparametric estimation and testing of fixed effects panel data models. J Econom 144:257–275

    MathSciNet  MATH  Google Scholar 

  • Horowitz JL, Lee S (2004) Semiparametric estimation of a panel data proportional hazards model with fixed effects. J Econom 119:155–198

    MathSciNet  MATH  Google Scholar 

  • Hsiao C (2003) Analysis of panel data, 2nd edn. Cambridge University Press, New York

    MATH  Google Scholar 

  • Hu XM (2014) Estimation in a semi-varying coefficient model for panel data with fixed effects. J Syst Sci Complex 27:594–604

    MathSciNet  MATH  Google Scholar 

  • Hu XM (2017) Semi-parametric inference for semi-varying coefficient panel data model with individual effects. J Multivar Anal 154:262–281

    MathSciNet  MATH  Google Scholar 

  • Hu XM, Wang Z, Zhao Z (2009) Empirical likelihood for semiparametric varying-coefficient partially linear errors-in-variables models. Stat Probab Lett 79(8):1044–1052

    MathSciNet  MATH  Google Scholar 

  • Leng C, Tang CY (2012) Penalized empirical likelihood and growing dimensional general estimating equations. Biometrika 99(3):703–716

    MathSciNet  MATH  Google Scholar 

  • Li R, Liang H (2008) Variable selection in semiparametric regression modeling. Ann Stat 36(1):261–286

    MathSciNet  MATH  Google Scholar 

  • Li DG, Chen J, Gao JT (2011) Non-parametric time-varying coefficient panel data models with fixed effects. Econometrics 14:387–408

    MathSciNet  MATH  Google Scholar 

  • Li G, Lin L, Zhu L (2012) Empirical likelihood for a varying coefficient partially linear model with diverging number of parameters. J Multiva Anal 105:85–111

    MathSciNet  MATH  Google Scholar 

  • Liang H, Härdle W, Carroll RJ (1999) Estimation in a semiparametric partially linear errors-in-variables model. Ann Stat 27:1519–1535

    MathSciNet  MATH  Google Scholar 

  • Nakamura T (1990) Corrected score function for errors-in-variables model: methodoology and application to generalized linear models. Biometrika 77(1):127–137

    MathSciNet  MATH  Google Scholar 

  • Owen AB (1988) Empirical likelihood ratio confidence intervals for a single functional. Biometrika 75:237–249

    MathSciNet  MATH  Google Scholar 

  • Owen AB (1990) Empirical likelihood ratio confidence regions. Ann Stat 18:90–120

    MathSciNet  MATH  Google Scholar 

  • Ren Y, Zhang X (2011) Variable selection using penalized empirical likelihood. Sci China Math 54(9):1829–1845

    MathSciNet  MATH  Google Scholar 

  • Rodriguez-Poo JM, Soberon A (2014) Direct semi-parametric estimation of fixed effects panel data varying coefficient models. Econom J 17(1):107–138

    MathSciNet  Google Scholar 

  • Schennach SM (2007) Instrumental variable estimation of nonlinear errors-in-variables models. Econometrics 75:201–239

    MathSciNet  MATH  Google Scholar 

  • Su LJ, Ullah A (2006) Profile likelihood estimation of partially linear panel data models with models with fixed effects. Econom Lett 92:75–81

    MathSciNet  MATH  Google Scholar 

  • Su L, Ullah A (2007) More efficient estimation of nonparametric panel data models with random effects. Econom Lett 96(3):375–380

    MathSciNet  MATH  Google Scholar 

  • Tang CY, Leng C (2010) Penalized high-dimensional empirical likelihood. Biometrika 97(4):905–920

    MathSciNet  MATH  Google Scholar 

  • Wang S, Xiang L (2017) Penalized empirical likelihood inference for sparse additive hazards regression with a diverging number of covariates. Stat Comput 27(5):1347–1364

    MathSciNet  MATH  Google Scholar 

  • Wang N, Carroll RJ, Liang KY (1996) Quasi-likelihood and variance functions in measurement error models with replicates. Biometrics 52:423–432

    Google Scholar 

  • Xue LG, Zhu LX (2008) Empirical likelihood-based inference in a partially linear model for longitudinal data. Sci China Ser A 51(1):115–130

    MathSciNet  MATH  Google Scholar 

  • You JH, Chen GM (2006) Estimation in a statistical inference in a semiparametric varying-coefficient partially linear errors-in-variables model. J Multivar Anal 97:324–341

    MATH  Google Scholar 

  • Zhou HB, You JH, Zhou B (2010) Statistical inference for fixed-effects partially linear regression models with errors in variables. Stat Papers 51:629–650

    MathSciNet  MATH  Google Scholar 

  • Zhu LX, Cui HJ (2003) A semiparametric regression model with errors in variables. Scand J Stat 30:429–442

    MATH  Google Scholar 

Download references

Acknowledgements

The authors are grateful to two anonymous referees for providing detailed lists of comments and suggestions which greatly improved the presentation of the paper. This research is supported by the National Social Science Fund of China (18BTJ034).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xing-Jian Hong.

Appendix: Proofs of the main results

Appendix: Proofs of the main results

We use Frobenius norm of a matrix A, defined as \(||A||=\{tr(A^{\tau }A)\}^{1/2}\). Before we give the details of the proofs, we present some regularity conditions.

  1. (B1)

    The random vector \(Z_{it}\) has a continuous density function \(f(\cdot )\) with a bounded support \(\mathcal {Z}\). \( 0<inf_{z\in \mathcal {Z}}f(\cdot )\le sup_{z\in \mathcal {Z}}f(\cdot )<\infty \).

  2. (B2)

    The functions \(E(X_{it}|Z_{it}=z)\) and \(g(\cdot )\) have two bounded and continuous derivatives on \(\mathcal {Z}\).

  3. (B3)

    The kernel K(v) is a symmetric probability density function with a continuous derivative on its compact support \(\mathcal {Z}\).

  4. (B4)

    \((\mu _i,W_{it}, Z_{it}, \varepsilon _{it}),~ i=1,\ldots ,n,~t=1,\ldots ,T\) are i.i.d. \(E(\varepsilon |W,Z,\mu )=0\) almost surely. Furthermore, for some integer \(k\ge 4,\) \(E(||W\varepsilon ||^k)\le \infty ,\) \(E(||W||^k)\le \infty ,\) \(E(|\varepsilon |^k)\le \infty .\)

  5. (B5)

    \(E|\breve{X}_{it}|^{2+\delta }<\infty \),  \(\Sigma =E[\breve{X}_{it}\breve{X}_{it}^{\tau }]\) is non-singular, where \(\breve{X}_{it}=X_{it}-E(X_{it}|Z_{it}).\)

  6. (B6)

    The bandwidth h satisfies \(h\rightarrow 0\), \(Nh^8\rightarrow 0\) and \(Nh^2/(logN)^2\rightarrow \infty \) as \(n\rightarrow \infty \).

  7. (B7)

    \(\Sigma _1\) and \(\Sigma _2\) are positive definite matrices with all eigenvalues being uniformly bounded away from zero and infinity.

  8. (B8)

    Let \(\varpi _1=\sum _{t=1}^T\frac{T-1}{T}(X_{it}-E(X_{it}|U_{it}))(\varepsilon _{it}-\nu _{it}\beta _0)\), \(\varpi _2=\sum _{t=1}^T\frac{T-1}{T}\nu _{it}\varepsilon _{it}\), \(\varpi _3=\sum _{t=1}^T\frac{T-1}{T}(\nu _{it}\nu _{it}^{\tau }-\Sigma _{\nu })\beta _0\) and \(\varpi _{sj}, j=1,\ldots ,p\) be the j-th component of \(\varpi _{s}.\) For k of condition (B4), there is a positive constant c such that as \(n\rightarrow \infty \), \(E(||\varpi _{s}/\sqrt{p}||^k)\le c, s=1,2,3\).

  9. (B9)

    The \(p_{\lambda }(\cdot )\) satisfy \(\mathop {max}\limits _{j\in \mathcal {B}}p_{\lambda }'(|\beta _{j0}|)=o((np)^{-1/2})\) and \( \mathop {max}\limits _{j\in \mathcal {B}}p_{\lambda }''(|\beta _{j0}|)=o(p^{-1/2}) \).

Note that the obove conditions are assumed to hold uniformly in \(z\in \mathcal {Z}\). Conditions (B1)–(B9) while look a bit lengthy, are actually quite mild and can be easily satisfied. (B1)–(B2) are standard in the literature on local linear/polynomial estimation. B5 implies \(E(\varepsilon _{it}|X_i,Z_i,\mu _i)=E(\varepsilon _{it}|X_{it},Z_{it},\mu _{it})=0.\) (B1)–(B5) can be founded in Su and Ullah 2006. (B6) and (B7) have been used in Zhou et al. (2010).

For the convenience and simplicity, let \(\vartheta _k=\int z^kK(z)dz\), \(c_N=\{log(1/h)/(Nh)\}^{1/2}+h^2\) and \(\widetilde{M}_{\widetilde{D}}=\widetilde{D}(\widetilde{D}^{\tau }\widetilde{D})^{-1}\widetilde{D}^{\tau }\)

$$\begin{aligned} a_n= & {} \max _{1\le j\le p}\{p_{\lambda }'(\beta _{j0})|,\beta _{j0}\ne 0\},~~~~b_n=\max _{1\le j\le p}\{p_{\lambda }''(\beta _{j0})|,\beta _{j0}\ne 0\},\\ B_n= & {} \{\beta : ||\beta -\beta _0||\ge d_n\},~~~d_n=n^{-1/3-\delta }+a_n,~~~0<\delta <1/6 \end{aligned}$$

Lemma A.1

Suppose that Assumptions (B1)–(B6) hold. Then

$$\begin{aligned}&\displaystyle G^{\tau }(z,h)W_h(z)G(z,h)=Nf(z)\times \begin{pmatrix}1&{}\quad 0\\ 0&{}\quad v_2\end{pmatrix}\{1+O_p(c_N)\},\\&\displaystyle G^{\tau }(z,h)W_h(z)X=Nf(z)E(X|Z)\times (1,0)^{\tau }\{1+O_p(c_N)\}, \end{aligned}$$

Proof

Note that

$$\begin{aligned}&G^{\tau }(z,h)W_h(z)G(z,h)\\&\quad =\left( \begin{array}{cc} \sum _{i=1}^n\sum _{t=1}^TK_h(Z_{it}-z)&{}\quad \sum _{i=1}^n\sum _{t=1}^T(\frac{Z_{it}-z}{h})K_h(Z_{it}-z)\\ \sum _{i=1}^n\sum _{t=1}^T(\frac{Z_{it}-z}{h})K_h(Z_{it}-z) &{}\quad \sum _{i=1}^n\sum _{t=1}^T(\frac{Z_{it}-z}{h})^2K_h(Z_{it}-z) \end{array}\right) . \end{aligned}$$

Each element of the above matrix is in the form of a kernel regression. Similar to the proof of Lemma A.2 in Fan and Huang (2005), we can derive the desired result. \(\square \)

Lemma A.2

Suppose that Assumptions (B1)–(B6) hold, we have

$$\begin{aligned} E|g(Z_{it})-\sum _{k=1}^n\sum _{l=1}^TS_{kl}g(Z_{kl})|^2=O(h^4). \end{aligned}$$
(A.1)

Proof

Similar to the proof of Lemma 5.1 in He et al. (2017). \(\square \)

Lemma A.3

Suppose that Assumptions (B1)–(B6) hold, we have

$$\begin{aligned} \frac{1}{N}\widetilde{{W}}^{\tau }H\widetilde{{W}}{\mathop {\rightarrow }\limits ^{d}}\frac{T-1}{T}(\Sigma _2+\Sigma _{\nu }), \end{aligned}$$

where \(\Sigma _2=E\{[{X_{11}}-E({X_{11}}|Z_{11})]^{\tau }[{X_{11}}-E({X_{11}}|Z_{11})]\}.\)

Proof

By Lemma A.1, we can obtain

$$\begin{aligned}{}[I_q~~0^\tau _q]^{-1}(G^{\tau }(z,h)W_h(z)G(z,h))^{-1}G^{\tau }(z,h)W_h(z)X=E(X|Z)+O_p(c_N). \end{aligned}$$

Then we have

$$\begin{aligned} \widetilde{{X}}= & {} [{X}_{11}-E(X_{11}|Z_{11}),\ldots ,{X}_{1T}-E(X_{1T}|Z_{1T}),\ldots ,{X}_{nT}-E(X_{nT}|Z_{nT})]^{\tau }\\&+\,O_p(c_N), \end{aligned}$$

and

$$\begin{aligned} \widetilde{{W}}=\widetilde{{X}}+\nu +O_p(c_N)\triangleq A+O_p(c_N). \end{aligned}$$

By the law of large numbers, we have

$$\begin{aligned} \frac{1}{N}\widetilde{{W}}^{\tau }\widetilde{{W}}= & {} \frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T\big \{[{X}_{it}-E(X_{it}|Z_{it})]^{\tau }[{X}_{it}-E(X_{it}|Z_{it})]+\nu _{it}^{\tau }\nu _{it}\big \}\nonumber \\&+\,O_p(c_N) {\mathop {\rightarrow }\limits ^{p}}\Sigma _2+\Sigma _{\nu }. \end{aligned}$$
(A.2)

Hence, to prove the lemma, we consider the limit of \(N^{-1}\widetilde{{W}}^{\tau }\widetilde{M}_{\widetilde{D}}\widetilde{{W}}\). It is easy to show that \(N^{-1}\widetilde{{W}}^{\tau }\widetilde{M}_{\widetilde{D}}\widetilde{{W}}=N^{-1}A^{\tau }\widetilde{M}_{\widetilde{D}}A+O_p(c_N)\). Let \((\widetilde{M}_{\widetilde{D}})_{e_{kl}e_{it}}\triangleq m_{e_{kl}e_{it}}\) and \((A)_{it}\triangleq a_{it}=\widetilde{{W}}_{it}\), where \(e_{kl}=(k-1)T+l\). Then

$$\begin{aligned} \frac{1}{N}A^{\tau }\widetilde{M}_{\widetilde{D}}A&=\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T\sum _{k=1}^n\sum _{l=1}^T a_{kl}m_{e_{kl}e_{it}}a_{it}=\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^Ta_{it}m_{e_{it}e_{it}}a_{it}\\&\quad +\,\frac{1}{N}\sum _{e_{kl}\ne e_{it}}\sum _{k=1}^n\sum _{l=1}^Ta_{kl}m_{e_{kl}e_{it}}a_{it} \triangleq I_1+I_2. \end{aligned}$$

For the term \(I_2\), we have

$$\begin{aligned} EI_2^2=\frac{1}{N^2}E\left[ \sum _{e_{kl}\ne e_{it}}\sum _{k=1}^n\sum _{l=1}^T\sum _{e_{rs}\ne e_{uv}}\sum _{r=1}^n\sum _{s=1}^Ta_{kl}m_{e_{kl}e_{it}}a_{it}a_{rs}m_{e_{rs}e_{uv}}a_{uv}\right] . \end{aligned}$$

Note that \((X_{11},Z_{11}),\ldots ,(X_{nT},Z_{nT})\) are i.i.d. and \(E(a_{it}|Z_{it})=0\), when \(e_{kl}\ne e_{rs}\) and \(e_{it}\ne e_{uv}\), we have

$$\begin{aligned}&E(a_{kl}m_{e_{kl}e_{it}}a_{it}a_{rs}m_{e_{rs}e_{uv}}a_{uv})\\&\quad =E[m_{e_{kl}e_{it}}m_{e_{rs}e_{uv}}E(a_{kl}a_{it}a_{rs}a_{uv}|Z_{kl},Z_{it},Z_{rs},Z_{uv})]\\&\quad =E[m_{e_{kl}e_{it}}m_{e_{rs}e_{uv}}E(a_{it}a_{rs}a_{uv}|Z_{it},Z_{rs},Z_{uv})E(a_{kl}|Z_{kl})]=0. \end{aligned}$$

Using the same argument and \(m_{e_{kl}e_{it}}=m_{e_{it}e_{kl}}\), we have

$$\begin{aligned} EI_2^2=\frac{1}{N^2}\sum _{e_{kl}\ne e_{it}}\sum _{k=1}^n\sum _{l=1}^TE(a_{kl}m_{e_{kl}e_{it}}a_{it})^2+\frac{1}{N^2}\sum _{e_{kl}\ne e_{it}}\sum _{k=1}^n\sum _{l=1}^TE(m_{e_{kl}e_{it}}^2a_{kl}a_{it}a_{it}a_{kl}). \end{aligned}$$

By Conditions (B3), we obtain

$$\begin{aligned} EI_2^2\le \frac{2c}{N^2}\sum _{e_{kl}\ne e_{it}}E(m_{e_{kl}e_{it}})^2 \le \frac{1}{N^2}tr(\widetilde{M}^2)\le \frac{2c}{N}, \end{aligned}$$

where c is a constant. Hence

$$\begin{aligned} EI_2=o_p(1). \end{aligned}$$
(A.3)

Note that \(I_1\) can be decomposed as

$$\begin{aligned} I_1=\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^Tm_{e_{it}e_{it}} [a_{it}a_{it}-E(a_{it}a_{it})]+\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^Tm_{e_{it}e_{it}}E(a_{it}a_{it})\triangleq \Pi _1+\Pi _2. \end{aligned}$$

By the definition of S, it is easy to show that

$$\begin{aligned} S=(S_{11},\ldots ,S_{1T},S_{21},\ldots ,S_{nT})^{\tau }[I+diag\{O_p(c_n)\}], \end{aligned}$$

where

$$\begin{aligned} S_{it}=\left( \frac{K_h(Z_{11}-Z_{it})}{Nf(Z_{it})},\ldots ,\frac{K_h(Z_{1T}-Z_{it})}{Nf(Z_{it})},\ldots ,\frac{K_h(Z_{nT}-Z_{it})}{Nf(Z_{it})}\right) ^{\tau }. \end{aligned}$$

Let \(D_1\) is the first column vector of D, thus we have

$$\begin{aligned} D_1^{\tau }(I-S^{\tau })(I-S)D_1= & {} \left\{ T-2\sum _{t=1}^T \left[ \sum _{l=1}^T \frac{K_h(Z_{1l}-Z_{1t})}{Nf(Z_{1t})}\right] \right. \\&\left. +\sum _{i=1}^n\sum _{t=1}^T\left[ \sum _{l=1}^T \frac{K_h(Z_{1l}-Z_{it})}{Nf(Z_{it})}\right] ^2\right\} \{1+O_p(c_N)\}. \end{aligned}$$

Because

$$\begin{aligned} \sum _{t=1}^T \left[ \sum _{l=1}^T \frac{K_h(Z_{1l}-Z_{1t})}{Nf(Z_{1t})}\right] \{1+O_p(c_N)\}=O_p\left( \frac{1}{Nh}\right) , \end{aligned}$$

and

$$\begin{aligned} \sum _{i=1}^n\sum _{t=1}^T\left[ \sum _{l=1}^T \frac{K_h(Z_{1l}-Z_{it})}{Nf(Z_{it})}\right] ^2\{1+O_p(c_N)\}=O_p\left( \frac{1}{Nh}\right) , \end{aligned}$$

we have

$$\begin{aligned} D_1^{\tau }(I-S^{\tau })(I-S)D_1 =T\left[ 1+O_p\left( \frac{1}{Nh}\right) \right] . \end{aligned}$$

Consider the projection matrix, for \(i=1,\ldots ,T\), we obtain

$$\begin{aligned} (\widetilde{M}_{{\widetilde{D}}_1})_{ii}= & {} (I-S)D_1[D_1^{\tau }(I-S^{\tau })(I-S)D_1]^{-1}D_1^{\tau }(I-S^{\tau })\\= & {} \frac{1}{T}\left[ 1+O_p\left( \frac{1}{Nh}\right) \right] \left[ 1-\sum _{l=1}^T \frac{K_h(Z_{1l}-Z_{it})}{Nf(Z_{it})}\{1+O_p(c_N)\}\right] ^2\\= & {} \frac{1}{T}+O_p\left( \frac{1}{Nh}\right) . \end{aligned}$$

Because \({\widetilde{D}}_1\) is the first column vector of \({\widetilde{D}}\). It is easy to show that \(\widetilde{M}_{\widetilde{D}}\widetilde{M}_{{\widetilde{D}}_1}=\widetilde{M}_{{\widetilde{D}}_1}\widetilde{M}_{\widetilde{D}}=\widetilde{M}_{{\widetilde{D}}_1}\). Hence, \(\widetilde{M}_{\widetilde{D}}-\widetilde{M}_{{\widetilde{D}}_1}\) is also a projection matrix. Thus \(\widetilde{M}_{\widetilde{D}}-\widetilde{M}_{\widetilde{D}_1}=(\widetilde{M}_{\widetilde{D}}-\widetilde{M}_{{\widetilde{D}}_1})^2\ge 0 \). We obtain \((\widetilde{M}_{\widetilde{D}})_{ii}\ge (\widetilde{M}_{{\widetilde{D}}_1})_{ii}=\frac{1}{T}+O_p(\frac{1}{Nh}),~i=1,\ldots ,T\). By a similar argument, we can show that \((\widetilde{M}_{\widetilde{D}})_{ii}\ge \frac{1}{T}+O_p(\frac{1}{Nh}),i=T+1,\ldots ,N\). Thus, we have

$$\begin{aligned} 1\ge m_{e_{it}e_{it}}\ge \frac{1}{T}+O_p\left( \frac{1}{Nh}\right) , \end{aligned}$$

then, it is easy to show that

$$\begin{aligned} tr(\widetilde{M}_{\widetilde{D}})=\sum _{i=1}^{n}\sum _{t=1}^{T} m_{e_{it}e_{it}}\ge \frac{N}{T}+O_p\left( \frac{1}{Nh}\right) . \end{aligned}$$

Hence, we have

$$\begin{aligned} \Pi _2=\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^Tm_{e_{it}e_{it}}E(a_{it}a_{it})=\frac{1}{T}(\Sigma +\Sigma _{\eta })+O_p\left( \frac{1}{Nh}\right) . \end{aligned}$$
(A.4)

By (A.2), it is easy to show that

$$\begin{aligned} \frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T(m_{e_{it}e_{it}}-T^{-1})=O_p\left( \frac{1}{Nh^2}\right) , \end{aligned}$$

By the law of large numbers, \(\Pi _1\) is bounded as

$$\begin{aligned} \Pi _1&=\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T(m_{e_{it}e_{it}}-T^{-1})(a_{it}a_{it}-E(a_{it}a_{it}))\nonumber \\&\quad +\frac{1}{NT}\sum _{i=1}^n\sum _{t=1}^T(a_{it}a_{it}-E(a_{it}a_{it}))\nonumber \\&\le \frac{1}{N}\left[ \sum _{i=1}^n\sum _{t=1}^T(m_{e_{it}e_{it}}-T^{-1})^2\right] ^{1/2}+o_p(1)=o_p(1). \end{aligned}$$
(A.5)

By (A.4) and (A.5), we have

$$\begin{aligned} I_1=\frac{1}{T}(\Sigma _2+\Sigma _{\eta })+o_p(1). \end{aligned}$$
(A.6)

By (A.2), (A.3) and (A.6), the lemma holds. \(\square \)

Lemma A.4

Under the conditions of Theorem 2.1, if \(\beta \) is the true value of the parameter, we have

$$\begin{aligned} \sum _{i=1}^n\sum _{t=1}^T \nu _{it}H\widetilde{g}_{it}= & {} o_p(N^{1/2}), \end{aligned}$$
(A.7)
$$\begin{aligned} \sum _{i=1}^n\sum _{t=1}^T \varepsilon _{it}H\widetilde{g}_{it}= & {} o_p(N^{1/2}), \end{aligned}$$
(A.8)

Proof

Since the proof of (A.8) is similar of (A.7), we prove only (A.7)here. Let \(\zeta _N=N^{1/2}/log(N)\),

$$\begin{aligned} P\left( |\sum _{i=1}^n\sum _{t=1}^T \nu _{it}H\widetilde{g}_{it}|>\zeta _N\right)\le & {} P\left( |\sum _{i=1}^n\sum _{t=1}^T \nu _{it}H\widetilde{g}_{it}|>\zeta _N, max_{i,t}|\widetilde{g}_{it}|\le ch^4\right) \nonumber \\&+P(max_{i,t}|\widetilde{g}_{it}|\ge ch^4) \end{aligned}$$
(A.9)

The second term is \(o_p(1)\) by Lemma A.2. For the first term, let \(R_{it}\) be the event that \(|\widetilde{g}_{it}|\le ch^4\). Then

$$\begin{aligned}&P\left( |\sum _{i=1}^n\sum _{t=1}^T \nu _{it}H\widetilde{g}_{it}|>\zeta _N, \{I(R_{it})=1,\forall i,t\}\right) \\&\quad \le \zeta _N^{-2}\sum _{i=1}^n\sum _{t=1}^T E[\nu _{it}H\widetilde{g}_{it}\{I(R_{it})=1\}]^2\\&\qquad +\,\,\zeta _N^{-2}\sum _{i\ne k}^n\sum _{t\ne s}^TE[\nu _{it}H\widetilde{g}_{it}\nu _{ks}\Omega \widetilde{g}_{ks}\{I(R_{ks})=1\}]. \end{aligned}$$

Since \(\widetilde{g}_{it}\{I(R_{it})=1\}\le ch^4\) is independent of \(\nu _{it}\), the first term is \(O\{N\zeta _N^{-2}c^2h^8\}=o(1)\). The second term is easily seen to equal zero. \(\square \)

Lemma A.5

Under the conditions of Theorem 2.1, if \(\beta _0\) is the true value of the parameter, we have

$$\begin{aligned}&\displaystyle \mathop {max}\limits _{1\le i\le n}||\Gamma _i(\beta _0)||=o_p(\sqrt{n/p}), \end{aligned}$$
(A.10)
$$\begin{aligned}&\displaystyle \frac{\{N^{-1/2}\sum _{i=1}^n\Gamma _i^{\tau }(\beta _0)\}\Sigma _1^{-1}\{N^{-1/2}\sum _{i=1}^n\Gamma _i(\beta _0)\}-p}{\sqrt{2p}}{\mathop {\rightarrow }\limits ^{d}}N(0,1). \end{aligned}$$
(A.11)

Proof

From the definition of \(\Gamma _{i}(\beta )\) by (2.7), and a simple calculation, yields

$$\begin{aligned} \frac{1}{\sqrt{N}}\sum _{i=1}^n\Gamma _i(\beta _0)= & {} \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\big \{ \widetilde{X}_{it}H\widetilde{g}_{it}-\widetilde{X}_{it}H \widetilde{\nu }_{it}\beta _0+\widetilde{X}_{it}H\widetilde{\varepsilon }_{it}\\&+\,\widetilde{\nu }_{it}H\widetilde{g}_{it}-\widetilde{\nu }_{it}H \widetilde{\nu }_{it}^{\tau }\beta _0+\widetilde{\nu }_{it}H\widetilde{\varepsilon }_{it}+\Sigma _{\nu }\beta _0\big \}. \end{aligned}$$

By Lemma A.1, we have \(S\varepsilon =O_p(c_N)\). Similar to the proof of Lemma A.3 and under Assumption (B7), we have \(\frac{1}{\sqrt{N}}\widetilde{{X}}^{\tau }HS\varepsilon =O(\sqrt{N}c_N^2)=o_p(1)\). Therefore

$$\begin{aligned} \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\widetilde{X}_{it}H\widetilde{\varepsilon }_{it}=\frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T \frac{T-1}{T}(X_{it}-E(X_{it}|Z_{it}))\varepsilon _{it} \end{aligned}$$
(A.12)

Similar to the proofs of (A.12), we can derive that

$$\begin{aligned} \displaystyle \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\widetilde{X}_{it}H \widetilde{\nu }_{it}= & {} \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\frac{T-1}{T}(X_{it}-E(X_{it}|U_{it}))\nu _{it},\\ \displaystyle \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\widetilde{\nu }_{it}H \widetilde{\nu }_{it}^{\tau }= & {} \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\frac{T-1}{T}\nu _{it}\nu _{it},\\ \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\widetilde{\nu }_{it}H\widetilde{\varepsilon }_{it}= & {} \frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\frac{T-1}{T}\nu _{it}\varepsilon _{it}, \end{aligned}$$

which combining with Lemma A.4, it is easy to obtain

$$\begin{aligned}&\frac{1}{\sqrt{N}}\sum _{i=1}^n\Gamma _i(\beta _0)\nonumber \\&\quad =\frac{1}{\sqrt{N}}\sum _{i=1}^n\sum _{t=1}^T\frac{T-1}{T}\big \{[(X_{it}-E(X_{it}|U_{it}))(\varepsilon _{it}-\nu _{it}\beta _0)]+\nu _{it}\varepsilon _{it}\nonumber \\&\qquad +(\Sigma _{\nu }-\nu _{it} \nu _{it}^{\tau })\beta _0]\big \}+o_p(1)\nonumber \\&\quad =\frac{1}{\sqrt{N}}\sum _{i=1}^nG_{i}(\beta _0)+o_p(1). \end{aligned}$$
(A.13)

Therefore, we have

$$\begin{aligned} \Gamma _i(\beta _0)&=\frac{T-1}{T}\sum _{t=1}^T\big \{[(X_{it}-E(X_{it}|U_{it}))(\varepsilon _{it}-\nu _{it}\beta _0)]+\nu _{it}\varepsilon _{it}\nonumber \\&\quad +(\Sigma _{\nu }-\nu _{it} \nu _{it}^{\tau })\beta _0]\big \}+o_p(1)\nonumber \\&=\sum _{t=1}^T\left\{ \frac{T-1}{T}(X_{it}-E(X_{it}|U_{it}))(\varepsilon _{it}-\nu _{it}\beta _0)+\frac{T-1}{T}\nu _{it}\varepsilon _{it}\right. \nonumber \\&\quad \left. +\,\frac{T-1}{T}(\nu _{it}\nu _{it}^{\tau }-\Sigma _{\nu })\beta _0\right\} +o_p(1)=\varpi _1+\varpi _2+\varpi _3+o_p(1). \end{aligned}$$
(A.14)

Let \(\varpi _s^{*}=\mathop {max}\limits _{1\le i\le n}||\varpi _{si}||,~s=1,2,3\), and \(\{\varpi _{si}, i=1,\ldots , n \}\) is a sequence of independent random variables with common distribution. for any \(\varepsilon \ge 0\), then

$$\begin{aligned} P\{\varpi _1^{*}\ge (p)^{1/2}n^{1/k}\epsilon \}\le & {} \sum _{i=1}^nP\{||\varpi _{1i}||\ge (p)^{1/2}n^{1/k}\epsilon \}\\\le & {} \frac{1}{np^{k/2}\epsilon ^k}\sum _{i=1}^nE||\varpi _{1i}||^k\\\le & {} \frac{1}{\epsilon ^k}E||\varpi _{11}/p^{1/2}||^k. \end{aligned}$$

From conditions (B4) and (B7) and Cauchy-Schwarz inequality yields that \(\varpi _1^{*}=o_p(\sqrt{p}n^{1/k}).\) By the condition \(p=o(n^{(k-2)/(2k)}\) in Theorem 2.1, it is easy to check that \(\varpi _1^{*}=o_p(\sqrt{n/p}n^{2-k/(2k)}p)=o_p(\sqrt{n/p})\). Similar to the proof, we obtain \(\varpi _2^{*}=o_p(\sqrt{n/p})\) and \(\varpi _3^{*}=o_p(\sqrt{n/p})\). Then \(\mathop {max}\nolimits _{1\le i\le n}||\Gamma _i(\beta _0)||=o_p(\sqrt{n/p}).\)

By applying the martingale central limit theorem as give in Hall and Heyde (1980) and (A.13), it is easy to obtain (A.11). The proof of Lemma A.5 is thus completed. \(\square \)

Lemma A.6

Under the conditions of Theorem 2.1. Denote\(D_n=\{\beta :||\beta -\beta _0||\le ca_n\}\) Then \(||\gamma (\beta )||=O_p(a_n)\), for \(\beta \in D_n\).

Proof

For \(\beta \in D_n\), let\(\gamma (\beta )=\rho \theta \), where\(\rho \ge 0, \theta \in R^p\), and \(||\theta ||=1\). Set

$$\begin{aligned} J(\beta )=\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta )\Gamma _i^{\tau }(\beta ).~\bar{\Gamma }(\beta )=\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta ), ~\Gamma ^{*}(\beta )=\mathop {max}\limits _{1\le i\le n}||\Gamma _i(\beta )||. \end{aligned}$$

From (2.7), we can obtain

$$\begin{aligned} 0= & {} \frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )}= \frac{1}{n}\sum _{i=1}^n\theta ^{\tau }\Gamma _i(\beta )-\rho \frac{1}{n}\sum _{i=1}^n\frac{(\theta ^{\tau }\Gamma _i(\beta ))^2}{1+\rho \theta ^{\tau }\Gamma _i(\beta )}\\\le & {} \theta ^{\tau }\bar{\Gamma }(\beta )-\frac{\rho }{1+\rho \Gamma ^{*}}\theta ^{\tau }J(\beta )\theta . \end{aligned}$$

Then

$$\begin{aligned} \rho \left[ \theta ^{\tau }J(\beta )\theta -\mathop {max}\limits _{1\le i\le n}||\Gamma _i(\beta )||n^{-1}\left| \sum _{i=1}^n\theta ^{\tau }\Gamma _i(\beta )\right| \right] \le |\theta ^{\tau }\bar{\Gamma }(\beta )|. \end{aligned}$$

Observe that

$$\begin{aligned} \Gamma ^{*}(\beta )\le \Gamma ^{*}(\beta _0)+|\mathop {max}\limits _{1\le i\le n}\left\| \frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T\widetilde{W}_{it}H\widetilde{W}_{it}(\beta -\beta _0)\right\| \end{aligned}$$

Let \(\mathcal {X}_{it}=\widetilde{W}_{it}H\widetilde{W}_{it}\) According to Condition (B.7) and Minkowski inequality, we have

$$\begin{aligned} Var(||\mathcal {X}_{it}||^{r/2})\le & {} E(||\mathcal {X}_{it}||^r)=E\left[ \left( \sum _{j=1}^p\mathcal {X}_{itj}^2\right) ^{r/2}\right] \\\le & {} \left\{ \sum _{j=1}^pE[\mathcal {X}_{itj}^r]^{2/r}\right\} ^{r/2}=O(p^{r/2}). \end{aligned}$$

Then we obtain that

$$\begin{aligned}&\mathop {max}\limits _{1\le i\le n}||\mathcal {X}_{it}||\\&\quad \le \Big [\{Var(||\mathcal {X}_{it}||^{r/2})\}^{1/2}\mathop {max}\limits _{1\le i\le n}\left\{ \frac{||\mathcal {X}_{it}||^{r/2})-E||\mathcal {X}_{it}||^{r/2})}{\{Var(||\mathcal {X}_{it}||^{r/2})\}^{1/2}}\right. +E||\mathcal {X}_{it}||^{r/2}\Big ]^{r/2}\\&\quad \le o_p(\sqrt{p}n^{1/r})+O_p(\sqrt{p})=o_p(\sqrt{p}n^{1/r}). \end{aligned}$$

which combining with (A.11)

$$\begin{aligned} \Gamma ^{*}(\beta )=o_p(n/p) \end{aligned}$$
(A.15)

For \(\bar{\Gamma }(\beta )\), it is easy to see that

$$\begin{aligned} \bar{\Gamma }(\beta )=\bar{\Gamma }(\beta _0)+\frac{1}{N}\sum _{i=1}^n\sum _{t=1}^T\widetilde{W}_{it}H\widetilde{W}_{it}(\beta -\beta _0) \end{aligned}$$

Similar to the proofs of (A.10) in Fan et al. (2016), we obtain

$$\begin{aligned} \theta ^{\tau }\bar{\Gamma }(\beta )=O_p(a_n) \end{aligned}$$
(A.16)

Therefore, it follows from (A.15) and (A.16), we have \(\mathop {max}\nolimits _{1\le i\le n}||\Gamma _i(\beta )||=n^{-1}|\sum _{i=1}^n\).

From (2.7), similar to the proof (A.11) in Fan et al. (2016) and Lemma B.4 in Li et al. (2012), we can derive \(tr[(J(\beta _0)-\Sigma _1)^2]=O_p(p^2(c_n^4+1/n))\) which means that all the eigenvalues of \(J(\beta _0)\) converge to those of \(\Sigma _1\) at the rate of \(O_p(p^2(c_n^4+1/n))\). Therefore, by Lemma A.2, (2.7), (A.16), together with Condition (B7), we have

$$\begin{aligned} J(\beta )&=\frac{1}{n}\sum _{i=1}^n\sum _{t=1}^T\big \{\widetilde{W}_{it}H(\widetilde{Y}_{it}-\widetilde{W}_{it}\beta _0)-(T-1)\Sigma _{\nu }\beta _0 -\widetilde{W}_{it}H\widetilde{W}_{it}^{\tau }(\beta -\beta _0)\nonumber \\&\quad +(T-1)\Sigma _{\nu }(\beta -\beta _0)\big \}^{\oplus 2}\nonumber \\&=J(\beta _0)+\frac{1}{n}\sum _{i=1}^n\sum _{t=1}^T\big \{\widetilde{W}_{it}H\widetilde{W}_{it}^{\tau }(\beta -\beta _0)+(T-1)\Sigma _{\nu }(\beta -\beta _0)\big \}^{\oplus 2} \nonumber \\&\quad -\frac{2}{n}\sum _{i=1}^n\sum _{t=1}^T\big \{\widetilde{W}_{it}H(\widetilde{Y}_{it}-\widetilde{W}_{it}\beta _0)-(T-1)\Sigma _{\nu }\beta _0\big \}\big \{\widetilde{W}_{it}H\widetilde{W}_{it}^{\tau }(\beta -\beta _0)\nonumber \\&\quad +(T-1)\Sigma _{\nu }(\beta -\beta _0)\big \} \nonumber \\&= \Sigma _1+O_p(p^2(c_n^4+1/n)). \end{aligned}$$
(A.17)

we can obtain \(\theta ^{\tau }J(\beta )\theta =\theta ^{\tau }\Sigma _1\theta {\mathop {\rightarrow }\limits ^{P}}c.\) Therefore, we obtain \(\rho \le c|\theta ^{\tau }\bar{\Gamma }(\beta )|=O_p(a_n),\) then \(||\gamma (\beta )||=O_p(a_n)\). \(\square \)

Lemma A.7

Under the conditions of Theorem 2.1. as \(n\rightarrow \infty \), with probability tending to 1, \(R_{n}(\beta )\) has a minimum in \(D_n\).

Proof

For \(\beta \in D_n\),

$$\begin{aligned} H_{1n}(\beta ,\gamma )=\frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )}=0 \end{aligned}$$

According to Lemma A.6, we have \(\gamma ^\tau \Gamma _i(\beta )=o_p(1)\). Apply Taylor expansion to \(H_{1n}(\beta ,\gamma )\), we obtain \( \bar{\Gamma }(\beta )-J(\beta )\gamma +\delta _n=0,\) where \(\bar{\Gamma }(\beta )=\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta )\), \(\delta _n=\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta )(\gamma ^\tau \Gamma _i(\beta ))^2/[1+\zeta _i]^3\) for some \(|\zeta _i|\le |\gamma ^\tau \Gamma _i(\beta )|\). We have \(\gamma =J(\beta )^{-1}\bar{\Gamma }(\beta )+J(\beta )^{-1}\delta _n\). Substituting \(\gamma \) into (2.6), it is easy to see that

$$\begin{aligned} 2R_{n}(\beta )=n\bar{\Gamma }(\beta )^{\tau }J(\beta )^{-1}\bar{\Gamma }(\beta )-n\delta _n^{\tau }J(\beta )^{-1}\delta _n+\frac{2}{3}\sum _{i=1}^n(\gamma ^\tau \Gamma _i(\beta ))^3(1+\zeta _i)^{-4} \end{aligned}$$
(A.18)

For \(\beta \in \partial D_n\), where \(\partial D_n\) denotes the boundary of \(D_n\), we write \(\beta =\beta _0+ca_n\phi \) where \(\phi \) is a unit vector, we have a decomposition as \(2R_{n}(\beta )=\Pi _0+\Pi _1+\Pi _2\), where \(\Pi _0=n\bar{\Gamma }(\beta _0)^{\tau }\Sigma _1^{-1}\bar{\Gamma }(\beta _0)\), \(\Pi _1=n(\bar{\Gamma }(\beta )-\bar{\Gamma }(\beta _0))^{\tau }J(\beta )^{-1}(\bar{\Gamma }(\beta )-\bar{\Gamma }(\beta _0))\), \(\Pi _2=n[\bar{\Gamma }(\beta _0)^{\tau }(J(\beta )^{-1}-\Sigma _1^{-1})\bar{\Gamma }(\beta _0)+2\bar{\Gamma }(\beta _0)^{\tau }J(\beta )^{-1}(\bar{\Gamma }(\beta )-\bar{\Gamma }(\beta _0)]-n\delta _n^{\tau }J(\beta )^{-1}\delta _n+\frac{2}{3}\sum _{i=1}^n(\gamma ^\tau \Gamma _i(\beta ))^3(1+\zeta _i)^{-4}\) As \(n\rightarrow \infty \), we see that

$$\begin{aligned} \Pi _1= & {} n\left\{ \left[ \frac{1}{n}\widetilde{{W}}^{\tau }H\widetilde{{W}}-(T-1) \Sigma _{\nu }\right] (\beta -\beta _0)\right\} ^{\tau }J(\beta )^{-1}\left\{ \left[ \frac{1}{n}\widetilde{{W}}^{\tau }H\widetilde{{W}}\right. \right. \\&\left. \left. -\,(T-1)\Sigma _{\nu }\right] (\beta -\beta _0)\right\} \\= & {} c^2na_n^2\phi ^{\tau }\Sigma _0\Sigma _1^{-1}\Sigma _0\phi \{1+o_p(1)\}=O_p(na_n^2), \end{aligned}$$

\(\Pi _2/\Pi _1{\mathop {\rightarrow }\limits ^{P}}0\) and \(2R_{n}(\beta _0)-\Pi _0=o_p(1)\). This implies that for any c given, as \(n\rightarrow \infty \), \(Pr\{2[R_{n}(\beta )- R_{n}(\beta _0)]\ge c\}\rightarrow 1\). In addition, note that for n large,

$$\begin{aligned} \mathcal {L}_{n}(\beta )- \mathcal {L}_{n}(\beta _0)= & {} R_{n}(\beta )- R_{n}(\beta _0)+n\sum _{j=1}^p\{p_{\lambda }(|\beta _j|)-p_{\lambda }(|\beta _{j0}|)\\\ge & {} R_{n}(\beta )- R_{n}(\beta _0)+n\sum _{j\in \mathcal {B}}\{p_{\lambda }(|\beta _j|)-p_{\lambda }(|\beta _{j0}|)\ge R_{n}(\beta )- R_{n}(\beta _0), \end{aligned}$$

where the last inequality holds due to Conditions (B9) and the unbiased property of the SCAD penalty so that \(j\in \mathcal {B}\), \(p_{\lambda }(|\beta _j|)=p_{\lambda }(|\beta _{j0}|)\) when n is large. Hence, \(Pr\{\mathcal {L}_{n}(\beta )\ge \mathcal {L}_{n}(\beta _0)\}\rightarrow 1\) for \(\beta \in \partial D_n\), which establishes Lemma A.7. \(\square \)

Proof of Theorem 2.1

Let \(U_i=\gamma ^\tau \Gamma _i(\beta _0)\). Apply Taylor expansion to (2.10), we have

$$\begin{aligned} 0= \frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta _0)\left( 1-U_i+\frac{U_i^2}{1+U_i}\right) =\bar{\Gamma }(\beta _0)-J(\beta _0)\gamma +\delta _n, \end{aligned}$$
(A.19)

where \(\delta _n=\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta _0)U_i^2-\frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta _0)\frac{U_i^3}{1+U_i}\).

From (A.11) and Lemma A.6, we have

$$\begin{aligned} \mathop {max}\limits _{1\le i\le n}|U_i|\le ||\gamma (\beta )||\mathop {max}\limits _{1\le i\le n}||\Gamma _i(\beta _0)||=O_p(p/n^{1/2-1/r}). \end{aligned}$$

Similar to the proof of (A.19) in Li et al. (2012), we can get \(||\delta _n ||=o_p(p^{5/2}n^{-1}(n^{-1/2}+c_n^2))+o_p(p^2n^{-1}c_n).\) From (A.19), we obtain that \(\gamma =J(\beta _0)^{-1}\bar{\Gamma }(\beta _0)+J(\beta _0)^{-1}\delta _n\). Taylor expansion implies \(\ln (1+U_i)=U_i-U_i^2/2+U_i^3/3(1+\varsigma _i)^4,\) for some \(\varsigma _i\) such that \(|\varsigma _i|\le |U_i|\). Therefore, combining (A.16) and some elementary calculation, we have

$$\begin{aligned} 2R_{n}(\beta _0)&=2\sum _{i=1}^n\ln \{1+U_i\}=n\bar{\Gamma }^{\tau }(\beta _0)J(\beta _0)^{-1}\bar{\Gamma }(\beta _0)-n\delta _n^{\tau }J(\beta _0)^{-1}\delta _n\nonumber \\&\quad +\frac{2}{3}\mathcal {R}_n\{1+o_p(1)\}\nonumber \\&=n\bar{\Gamma }^{\tau }(\beta _0)\Sigma _1^{-1}\bar{\Gamma }(\beta _0)+n\bar{\Gamma }^{\tau }(\beta _0)(J(\beta _0)^{-1}-\Sigma _1^{-1})\bar{\Gamma }(\beta _0)-n\delta _n^{\tau }J(\beta _0)^{-1}\delta _n^{\tau }\nonumber \\&\quad +\frac{2}{3}\mathcal {R}_n\{1+o_p(1)\} \end{aligned}$$
(A.20)

where \(\mathcal {R}_n=\sum _{i=1}^n[\gamma ^\tau \Gamma _i(\beta _0)]^3\), By using the proving method of (A.22) and Lemma B.6 in Li et al. (2012), we can easily derive \(n\delta _n^{\tau }J(\beta _0)^{-1}\delta _n=o_p(\sqrt{p})\) and \(n\bar{\Gamma }^{\tau }(\beta _0)(J(\beta _0)^{-1}-\Sigma _1^{-1})\bar{\Gamma }(\beta _0)=o_p(\sqrt{p})\). The proof of Theorem 2.1 is concluded from the above results together with (A.11). \(\square \)

Proof of Theorem 2.2

Let \(H_{1n}(\beta ,\gamma )=\frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )}\) and \(H_{2n}(\beta ,\gamma )=\frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )}(\frac{\partial \Gamma _i(\beta )}{\partial \beta }^{\tau })^{\tau }\gamma \). Note that \(\hat{\beta }\) and \(\hat{\gamma }\) satisfy \(H_{1n}(\hat{\beta },\hat{\gamma })=0\) and \(H_{2n}(\hat{\beta },\hat{\gamma })=0\) Let \(\varphi =(\beta ^{\tau },\gamma ^{\tau })^{\tau }\), \(\varphi _0=(\beta _0^{\tau },0)^{\tau }\) and \(\hat{\varphi }_0=(\hat{\beta }_0^{\tau },\hat{\gamma }^{\tau }_0)^{\tau }\). Then by

$$\begin{aligned} 0=H_{jn}(\hat{\beta },\hat{\gamma })=H_{jn}(\beta _0^{\tau },0)+\frac{\partial \Gamma _i(\beta ,0)}{\partial \beta }(\hat{\beta }-\beta _0)+\frac{\partial \Gamma _i(\beta ,0)}{\partial \gamma }(\hat{\gamma }-0)+\delta _{jn}, \end{aligned}$$

where \(\delta _{jn}\) with \(\delta _{jn}=\frac{1}{2}(\hat{\varphi }_0-\varphi _0)^{\tau }H_{jn}^{\prime \prime }(\varphi )(\hat{\varphi }_0-\varphi _0)\) for \(j=1,2\). Here \(H_{jn}^{\prime \prime }(\varphi )\) denotes the Hessian matric of \(H_{jn}(\varphi )\). Then

$$\begin{aligned} \left( \begin{array}{cc} \hat{\gamma }\\ \hat{\beta }-\beta _0\\ \end{array}\right) = \left[ \begin{array}{cc} \frac{\partial H_{1n}(\beta ,\gamma )}{\partial \gamma } &{}\frac{\partial H_{1n}(\beta ,\gamma )}{\partial \beta }\\ \frac{\partial H_{2n}(\beta ,\gamma )}{\partial \gamma } &{}\frac{\partial H_{2n}(\beta ,\gamma )}{\partial \beta }\\ \end{array}\right] ^{-1}_{(\beta _0,0)}\left( \begin{array}{cc} H_{1n}(\beta _0,0)+\delta _{1n}\\ H_{2n}(\beta _0,0)+\delta _{2n}\\ \end{array}\right) , \end{aligned}$$

from Lemma A.3, we have

$$\begin{aligned} n^{-1}\sum _{i=1}^n\frac{\partial \Gamma _i(\beta ,0)}{\partial \beta }=\frac{1}{n}\widetilde{{W}}^{\tau }H\widetilde{{W}}-(T-1)\Sigma _{\nu }{\mathop {\rightarrow }\limits ^{d}}(T-1)\Sigma _2\{1+o_p(1)\}=\Sigma _0\{1+o_p(1)\}. \end{aligned}$$
(A.21)

Note that \(||\hat{\gamma }(\beta )||=O_p(a_n)\) by Lemma A.6 and \(||\hat{\beta }-\beta _0||=O_p(a_n)\) by Lemma A.7. Then using the Cauchy-Schwarz inequality, we find

$$\begin{aligned} ||\delta _{1n}||^2=\sum _{k=1}^p\delta _{1n,k}\le c\sum _{k=1}^p||\hat{\varphi }_0-\varphi _0||^4||H_{jn}^{\prime \prime }(\varphi )||^2=O_p(a_n^4p^3). \end{aligned}$$
(A.22)

and Condition (B9) yields that

$$\begin{aligned} ||\sqrt{N}\Sigma _0^{-1}\delta _{1n}||^2\le N\Sigma _0^{-2}||\delta _{1n}||=O_p(Na_n^4p^3)=o_p(1). \end{aligned}$$

and combining with \(H_{1n}(\beta _0,0)=n^{-1}\sum _{i=1}^n\Gamma _i(\beta _0)\) , we have

$$\begin{aligned} \sqrt{n}(\widehat{\beta }-\beta _0 )= & {} \Sigma _0^{-1}\cdot \frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta _0)+o_p(1)\\= & {} \Sigma _0^{-1}\cdot \frac{1}{n}\sum _{i=1}^n\sum _{t=1}^T\frac{T-1}{T}\big \{[(X_{it}-E(X_{it}|U_{it}))(\varepsilon _{it}-\nu _{it}\beta _0)]+\nu _{it}\varepsilon _{it}\\&+\,(\Sigma _{\nu }-\nu ^{\tau }\nu _{11})\beta _0]\big \} +o_p(1)=\Sigma _0^{-1}\cdot \frac{1}{n}\sum _{i=1}^n\Gamma _i(\beta _0)+o_p(1). \end{aligned}$$

Note that

$$\begin{aligned} Cov(\Gamma _i(\beta _0))= & {} (T-1)\big \{E[(X_{11}-E(X_{11}|U_{11}))(\varepsilon _{11}-\nu _{11}\beta _0)]^2 + E[\nu _{11}^{\tau }\varepsilon _{11}]^2\\&+\,E[(\Sigma _{\nu }-\nu ^{\tau }\nu _{11})\beta _0]^2\big \}. \end{aligned}$$

Therefore,

$$\begin{aligned} \mathop {lim}\limits _{n\rightarrow \infty }Cov(\Gamma _i(\beta _0))= & {} (T-1)\big \{E(\varepsilon _{11}-\nu _{11}\beta _0)^2\Sigma +\sigma ^2\Sigma _{\nu }\\&+ \,E[(\nu _{11}\nu _{11}^{\tau }-\Sigma _{\nu })\beta _0]^2\big \}= \Sigma _1. \end{aligned}$$

Invoking the Slutsky theorem and the central limit theorem, we can prove Theorem 2.2. \(\square \)

Proof of Theorem 2.3

From the Lemma A.7, we note that the minimizer of \(\mathcal {L}_{n}(\beta )\) is in \(\mathcal {D}_n\). Considering \(\beta \in \mathcal {D}_n\), we have that for each of its components

$$\begin{aligned} \frac{1}{n}\frac{\partial \mathcal {L}_{n}(\beta )}{\partial \beta _i}= \frac{1}{n}\sum _{i=1}^n\frac{\gamma \left[ \sum _{t=1}^T\widetilde{W}_{it}H\widetilde{W}_{it}+(T-1)\Sigma _{\nu }\right] }{1+\gamma ^\tau \Gamma _i(\beta )}+p_{\lambda }'(|\beta _j|)sign(\beta _j)=I_j+II_j, \end{aligned}$$
(A.23)

First, \(\mathop {max}\nolimits _{j\in \mathcal {B}}|I_j|\le \gamma \Sigma _j(1+o_p(1))=O_p(a_n)\), because \(\gamma ^\tau \Gamma _i(\beta )=o_p(1)\),where \(\Sigma _j\) denotes the jth column of \(\Sigma \). as \(\tau (n/p)^{1/2}\rightarrow \infty \). \(Pr(\mathop {max}\nolimits _{j\in \mathcal {B}}|I_j|>\tau /2)\rightarrow 0\). it can be seen that \(p_{\lambda }'(|\beta _j|)sign(\beta _j)\) dominates the sign of \(\frac{\partial \mathcal {L}_{n}(\beta )}{\partial \beta _i}\) asymptotically for all \(j\notin \mathcal {B}\), as \(n\rightarrow \infty \), for any \(j\notin \mathcal {B}\), with probability tending to 1,

$$\begin{aligned} \frac{\partial \mathcal {L}_{n}(\beta )}{\partial \beta _i}>0,~\beta _j\in (0,ca_n); \frac{\partial \mathcal {L}_{n}(\beta )}{\partial \beta _i}<0, ~\beta _j\in (0,-ca_n) \end{aligned}$$

which implies that \(\hat{\beta }_j=0\) for all \(j\notin \mathcal {B}^c\), with probability tending to 1. Thus part (a) of Theorem 2.3 follows.

Next, we establish part (b), Let \(\Psi _1\) and \(\Psi _2\) be matrices such that \(\Psi _1\beta =\beta _1\) and \(\Psi _2\beta =\beta _2\). As we have shown that as \(n\rightarrow \infty \), \(Pr(\hat{\beta }_2=0)\rightarrow 1\), thus by the Lagange multiplier method, finding the minimizer of \(\mathcal {L}_{n}(\beta )\) is asymptotic equivalent to solve the minimization of the following objective function

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\log \{1+\gamma ^\tau \Gamma _i(\beta )\}+n\sum _{j=1}^pp_{\lambda }(|\beta _j|)+v^{\tau }\Psi _2\beta , \end{aligned}$$
(A.24)

where v is \(p-s\) dimensional column vector of an other Lagrange multiplier. Define

$$\begin{aligned}&\displaystyle \tilde{Q}_{1n}(\beta ,\gamma ,v)=\frac{1}{n}\sum _{i=1}^n\frac{\Gamma _i(\beta )}{1+\gamma ^\tau \Gamma _i(\beta )},\\&\displaystyle \tilde{Q}_{2n}(\beta ,\gamma ,v)=\frac{1}{n}\sum _{i=1}^n\frac{\gamma }{1+\gamma ^\tau \Gamma _i(\beta )}+b(\beta )+\Psi _2^{\tau }v, \end{aligned}$$

and \(\tilde{Q}_{3n}(\beta ,\gamma ,v)=\Psi _2\beta \). where

$$\begin{aligned} b(\beta )=\{p_{\lambda }'(|\beta _1|)sign(\beta _1),p_{\lambda }'(|\beta _2|)sign(\beta _2),\ldots ,p_{\lambda }'(|\beta _p|)sign(\beta _p),0^{\tau }\}^{\tau }. \end{aligned}$$

The minimizer \((\beta ,\gamma ,v)\) of (A.24) satisfies \(\tilde{Q}_{in}(\beta ,\gamma ,v)=0, (i=1,2,3)\). Since \(||\gamma ||=O_p(a_n)\) for \(\beta \in \mathcal {B}\), we can obtain that \(||v||=O_p(a_n)\) from \(\tilde{Q}_{2n}(\beta ,\gamma ,v)=0\), In order to expand \(\tilde{Q}_{in}(\beta ,\gamma ,v)(i=1,2,3)\) around the value \((\beta _0,0,0)\), we first give the following partial derivatives,

$$\begin{aligned}&\displaystyle \frac{\partial \tilde{Q}_{1n}(\beta _0,0,0)}{\partial \gamma }=-J(\beta _0), ~ \frac{\partial \tilde{Q}_{1n}(\beta _0,0,0)}{\partial \beta }=\Sigma (\beta _0),~\frac{\partial \tilde{Q}_{1n}(\beta _0,0,0)}{\partial v}=0,\\&\displaystyle \frac{\partial \tilde{Q}_{2n}(\beta _0,0,0)}{\partial \gamma }=\Sigma (\beta _0),~\frac{\partial \tilde{Q}_{2n}(\beta _0,0,0)}{\partial \beta }=b'(\beta ),~\frac{\partial \tilde{Q}_{2n}(\beta _0,0,0)}{\partial v} =\Psi _2^{\tau },\\&\displaystyle \frac{\partial \tilde{Q}_{3n}(\beta _0,0,0)}{\partial \gamma }=0,~\frac{\partial \tilde{Q}_{3n}(\beta _0,0,0)}{\partial \beta }=\Psi _2,~\frac{\partial \tilde{Q}_{3n}(\beta _0,0,0)}{\partial v} =0. \end{aligned}$$

Then by Taylor expansion, we immediately derive that

$$\begin{aligned} \left( \begin{array}{ccc} \tilde{Q}_{1n}(\beta _0,0,0)\\ 0\\ 0\\ \end{array}\right) = \left( \begin{array}{ccc} -\Sigma _1 &{}\Sigma _0 &{}0\\ \Sigma _0^{\tau }&{}0&{}\Psi _2^{\tau }\\ 0&{}\Psi _2&{}0\\ \end{array}\right) \left( \begin{array}{ccc}\tilde{\gamma }\\ \tilde{\beta }\\ \tilde{v}\\ \end{array}\right) +R_n, \end{aligned}$$
(A.25)

where \(\Sigma (\beta _0)=n^{-1}\sum _{i=1}^n\partial \Gamma _i(\beta _0)/\partial \beta \), \(R_n=\sum _{l=1}^5R_n^{(l)}\), \(R_n^{(1)}=(R_{1n}^{\tau 1},R_{2n}^{\tau 1},0)^{\tau }\), \(R_{jn}^{\tau 1}\in R^p\) and the kth component of \(R_{jn}^{\tau 1}\) for \(j=1,2\) is given by

$$\begin{aligned} R_{jn,k}^{(1)}=\frac{1}{2}(\hat{\vartheta }-\vartheta _0)^{\tau }\frac{\partial ^2\tilde{Q}_{jn,k}(\tilde{\vartheta })}{\partial \vartheta \partial \vartheta ^{\tau }}(\hat{\vartheta }-\vartheta _0), \end{aligned}$$

\(\vartheta =(\beta ,\gamma )^{\tau }, \tilde{\vartheta }=(\tilde{\beta },\tilde{\gamma })^{\tau }\) satisfying \(||\tilde{\vartheta }-\vartheta _0||\le ||\hat{\vartheta }-\vartheta _0||\). \(R_n^{(2)}=(0,b^{\tau }(\beta _0),0)^{\tau }\), \(R_n^{(3)}=[0,\{b'(\tilde{\vartheta })(\hat{\vartheta }-\vartheta _0)\},0]^{\tau }\), \(R_n^{(4)}=[\{(J(\beta _0)-\Sigma _1))\hat{\gamma }\}^{\tau }+(\Sigma (\beta _0)-\Sigma _0)(\hat{\beta }-\beta )\}^{\tau },0,0]^{\tau }\) and \(R_n^{(5)}=[0,\{(\Sigma (\beta _0)-\Sigma _0)\hat{\gamma }\}^{\tau },0]^{\tau }\). Similar to the proof of (A.22), we can get \(R_n^{(1)}=o_p(n^{-1/2})\). Given Condition (B8) and (B9), we see that \(R_n^{(2)}=o_p(n^{-1/2})\) and \(R_n^{(3)}=o_p(n^{-1/2})\). By (A.21) and (A.17) which together with Lemma A.6 yields that \(R_n^{(4)}=o_p(n^{-1/2})\) and \(R_n^{(5)}=o_p(n^{-1/2})\). Hence, we can get \(R_n^{(k)}=o_p(n^{-1/2}),k=1,\ldots ,5\).

Define \(K_{11}=-\Sigma _1, K_{12}=[\Sigma _0,~~0]\) and \(K_{21}=K_{12}^{\tau }\),

$$\begin{aligned} K_{22}=\left( \begin{array}{ccc}0&{}\quad \Psi _2^{\tau }\\ \Psi _2&{}\quad 0\\ \end{array}\right) ,~K=\left( \begin{array}{ccc}K_{11}&{}\quad K_{12}\\ K_{21}&{}\quad K_{22}\\ \end{array}\right) \end{aligned}$$

and let \(\kappa =(\beta ^{\tau },v^{\tau })^{\tau }\). Then by inverted (A.25), we find

$$\begin{aligned} \left( \begin{array}{ccc} \hat{\gamma }\\ \hat{\kappa }-\kappa _0\\ \end{array}\right) =K^{-1}\Big \{ \left( \begin{array}{ccc} -\tilde{Q}_{1n}(\beta _0,0,0)\\ 0\\ \end{array}\right) +R_n\Big \}, \end{aligned}$$
(A.26)

As matrix K is partitioned into four blocks, it can be inverted blockwise as follows

$$\begin{aligned} K^{-1}=\left[ \begin{array}{ccc} K_{11}^{-1}+K_{11}^{-1}K_{12}A^{-1}K_{21}K_{11}^{-1}&{}\quad -K_{11}^{-1}K_{12}A^{-1}\\ -A^{-1}K_{21}K_{11}^{-1}&{}\quad A^{-1}\\ \end{array}\right] , \end{aligned}$$

where \(A=K_{22}-K_{21}K_{11}^{-1}K_{12}=\left[ \begin{array}{ccc} \Omega ^{-1} &{}\quad \Psi _2^{\tau }\\ \Psi _2&{}\quad 0\\ \end{array}\right] \) and \(\Sigma \) is defined in Theorem 2.2. Thus, we get

$$\begin{aligned} \hat{\kappa }-\kappa _0=A^{-1}K_{21}K_{11}^{-1}\tilde{Q}_{1n}(\beta _0,0,0)+o_p(n^{-1/2}). \end{aligned}$$

Matric A can also be inverted blockwise by using the analytic inversion formula,ie.,

$$\begin{aligned} A^{-1}=\left[ \begin{array}{ccc} \Omega -\Omega \Psi _2^{\tau }(\Psi _2\Omega \Psi _2^{\tau })^{-1}\Psi _2\Omega &{}\quad \Omega \Psi _2^{\tau }(\Psi _2\Omega \Psi _2^{\tau })^{-1}\\ (\Psi _2\Omega \Psi _2^{\tau })^{-1}\Psi _2\Omega &{}\quad -(\Psi _2\Omega \Psi _2^{\tau })^{-1}\\ \end{array}\right] , \end{aligned}$$

Further, we have

$$\begin{aligned} \hat{\beta }-\beta _0=[\Omega -\Omega \Psi _2^{\tau }(\Psi _2\Omega \Psi _2^{\tau })^{-1}\Psi _2\Omega ](\Sigma _0\Sigma _1\bar{\Gamma }(\beta _0)+o_p(n^{-1/2})). \end{aligned}$$

It follows by an expansion of \(\hat{\beta }_1\) that

$$\begin{aligned} \hat{\beta }_1-\beta _0=[\Psi _1\Omega -\Psi _1\Omega \Psi _2^{\tau }(\Psi _2\Omega \Psi _2^{\tau })^{-1}\Psi _2\Omega ](\Sigma _0\Sigma _1\bar{\Gamma }(\beta _0)+o_p(n^{-1/2})). \end{aligned}$$
(A.27)

Then similar to the proof of Theorem 2.3 in Fan et al. (2016), we have \(n^{1/2}W_n\Omega _p^{-1/2}(\widehat{\beta }_1-\beta _{10}){\mathop {\rightarrow }\limits ^{d}}N(0,G)\), which completes the proof of Theorem 2.3. \(\square \)

Proof of Theorem 2.4

Let \(\hat{\beta }\) be the minimizer (2.13) and \(U_i=\hat{\gamma }^{\tau }\Gamma _i(\beta )\). Taylor expansion gives

$$\begin{aligned} \mathcal {L}_{n}(\beta )=\sum _{i=1}^nU_i-\sum _{i=1}^nU_i^2/2+\sum _{i=1}^nU_i^3/3(1+\xi _i)^4+o_p(1), \end{aligned}$$

where \(|\xi _i|\rightarrow |U_i|\) and \(o_p(1)\) is due to the penalty function. From (A.26), we have \(\hat{\gamma }=[\Sigma _1^{-1}+\Sigma _1^{-1}\Sigma _0\{\Omega -\Omega \Psi _2^{\tau }(\Psi _2\Omega \Psi _2^{\tau })^{-1}\Psi _2\Omega \}\Sigma _0^{\tau }\Sigma _1^{-1}][\bar{\Gamma }(\beta _0)+o_p(n^{-1/2})].\)

Similar to Tang and Leng (2010), Substituting the expansion of \(\hat{\gamma }\) and \(\hat{\beta }\) given by (A.24) into \(U_i\), we show that

$$\begin{aligned} 2\mathcal {L}_{n}(\hat{\beta })= n\bar{\Gamma }(\beta _0)^{\tau }\Psi _2^{\tau }(\Psi _2\Omega ^{-1}\Psi _2^{\tau })^{-1}\Psi _2\bar{\Gamma }(\beta _0)+o_p(1) \end{aligned}$$
(A.28)

Under the null hypothesis, because \(L_nL_n^{\tau }=I_q\), there exists \(\tilde{\Psi }_2\) such that \(\tilde{\Psi }_2\beta =0\) and \(\tilde{\Psi }_2\tilde{\Psi }_2^{\tau }=I_{p-d+q}.\) Now by repeating the proof of Theorem 2.3, we establish that under the null hypothesis, the estimation of \(\beta \) can be obtained by minimizing (A.27), where \(\Psi _2\) is replaced by \(\tilde{\Psi }_2\), we can easily obtain that

$$\begin{aligned} 2\mathcal {L}_{n}(\tilde{\beta })= n\bar{\Gamma }(\beta _0)^{\tau }\tilde{\Psi }_2^{\tau }(\tilde{\Psi }_2\Omega ^{-1}\tilde{\Psi }_2^{\tau })^{-1}\tilde{\Psi }_2\bar{\Gamma }(\beta _0)+o_p(1). \end{aligned}$$

Combining Eqs. (A.28), we have

$$\begin{aligned} \mathcal {L}_{n}= n\bar{\Gamma }(\beta _0)^{\tau }\Omega ^{-1/2}(P_1-P_2)\Omega ^{-1/2}\bar{\Gamma }(\beta _0)+o_p(1). \end{aligned}$$

where

$$\begin{aligned} P_1=\Omega ^{-1/2}\Psi _2^{\tau }(\Psi _2\Omega ^{-1}\Psi _2^{\tau })^{-1}\Psi _2\Omega ^{-1/2}, \end{aligned}$$

and

$$\begin{aligned} P_2=\Omega ^{-1/2}\tilde{\Psi }_2^{\tau }(\tilde{\Psi }_2\Omega ^{-1}\tilde{\Psi }_2^{\tau })^{-1}\tilde{\Psi }_2\Omega ^{-1/2}, \end{aligned}$$

are two idempotent matrices. As the rank of \(P_1-P_2\) is q, \(P_1-P_2\) can be written as \(\Upsilon ^{\tau }\Upsilon \), where \(\Upsilon \) is \(q\times p\) matrix such that \(\Upsilon ^{\tau }\Upsilon =I_q\), further, we see that

$$\begin{aligned} \sqrt{n}\Upsilon \Omega ^{-1/2}\bar{\Gamma }(\beta _0){\mathop {\rightarrow }\limits ^{d}}N(0,I_q). \end{aligned}$$

Then

$$\begin{aligned} n\bar{\Gamma }(\beta _0)^{\tau }\Omega ^{-1/2}(P_1-P_2)\Omega ^{-1/2}\bar{\Gamma }(\beta _0){\mathop {\rightarrow }\limits ^{d}}\chi _q^2. \end{aligned}$$

and the proof of Theorem 2.4 is finished. \(\square \)

Lemma A.8

Under the conditions of Theorem 2.5. For a given z, if g(z) is the true value of the parameter, then

$$\begin{aligned}&\displaystyle \frac{1}{\sqrt{Nh}}\sum _{i=1}^n\hat{\Xi }_{i}\{g(z)\}-b(z){\mathop {\rightarrow }\limits ^{d}}N(0,R). \end{aligned}$$
(A.29)
$$\begin{aligned}&\displaystyle \frac{1}{Nh}\sum _{i=1}^n\hat{\Xi }_{i}\{g(z)\}\hat{\Xi }_{i}^{\tau }\{g(z)\}{\mathop {\rightarrow }\limits ^{P}}R . \end{aligned}$$
(A.30)
$$\begin{aligned}&\displaystyle \mathop {max}\limits _{1\le i\le n}||\hat{\Xi }_{i}\{g(z)\}||=o_p(\sqrt{Nh}), \phi =O_p(N^{-1/2}). \end{aligned}$$
(A.31)

where \(b(z)=\left( \frac{N}{h}\right) ^{1/2}\frac{T-1}{T}E[g(Z_{it})-g(z)]f(z)\int K(z)dz\) and \(R=\sigma ^2f(z)\int K^2(z)dz\).

Proof

Observe that

$$\begin{aligned} \frac{1}{\sqrt{Nh}}\sum _{i=1}^n\hat{\Xi }_{i}\{g(z)\}-b(z)=S_1(z)+S_2(z)+S_3(z) \end{aligned}$$

where

$$\begin{aligned}&\displaystyle S_1(z)=\frac{1}{\sqrt{Nh}}\sum _{i=1}^n\sum _{t=1}^T(I_{it}-Q)K_h(Z_{it}-z)\varepsilon _{it},\\&\displaystyle S_2(z)=\frac{1}{\sqrt{Nh}}\sum _{i=1}^n\sum _{t=1}^T(I_{it}-Q)\{[g(Z_{it})-g(z)]K_h(Z_{it}-z)-(h/N)^{1/2}b(z)\},\\&\displaystyle S_3(z)=\frac{1}{\sqrt{Nh}}\sum _{i=1}^n\sum _{t=1}^T(I_{it}-Q)X_{it}^{\tau }(\beta -\hat{\beta })K_h(Z_{it}-z). \end{aligned}$$

It is not difficult to prove \(E[S_1(z)]=0\) and \(Var[S_1(z)]=R+o(1)\). \(S_1(z)\) satisfies the conditions of the Cramer–Wold theorem and the Lindeberg condition. Therefore, we get

$$\begin{aligned} S_1(z){\mathop {\rightarrow }\limits ^{d}}N(0,R). \end{aligned}$$
(A.32)

We can also prove that

$$\begin{aligned} S_2(z)=o_p(1). \end{aligned}$$
(A.33)

Theorems 2.2 and condition (B8) imply that \(\beta -\hat{\beta }=O_p(N^{-1/2})\). Therefore, we get \(S_3(z)=O_p(h^{1/2})\). This together with (A.32) and (A.33) proves (A.29).

Analogously to the proof of (A.30). We can verify (A.30) easily. As to (A.31), we find

$$\begin{aligned} \mathop {max}\limits _{1\le i\le n}||\hat{\Xi }_{i}\{g(z)\}||\le & {} \mathop {max}\limits _{1\le i\le n}||(I_{it}-Q)K_h(Z_{it}-z)\varepsilon _{it}||\\&+\mathop {max}\limits _{1\le i\le n}||(I_{it}-Q)[g(Z_{it})-g(z)]K_h(Z_{it}-z)||\\&+\mathop {max}\limits _{1\le i\le n}||(I_{it}-Q)X_{it}^{\tau }(\beta -\hat{\beta })K_h(Z_{it}-z)||=J_1+J_2+J_3 \end{aligned}$$

From Markov inequality and conditions (B3) and (B4), one can obtain

$$\begin{aligned} P(J_1\ge \sqrt{Nh})\le (Nh)^{-s}\sum _{i=1}^nE[(I_{it}-Q)\varepsilon _{it}K_h(Z_{it}-z)]^{2s}\le C(Nh)^{1-s}\rightarrow 0 \end{aligned}$$

which implies that \(J_1=o_p(\sqrt{Nh})\). Using some arguments similar to those used in the proof of Lemma A.6, we can prove \(J_2=o_p(\sqrt{Nh})\) and \(J_3=o_p(\sqrt{Nh})\). Therefore we obtain that \(\mathop {max}\nolimits _{1\le i\le n}||\hat{\Xi }_{i}\{g(z)\}||=o_p(\sqrt{Nh}).\)

Applying (A.30) and the proof in Owen (1990), one can derive that \(\phi =O_p(N^{-1/2})\), which completes the proof of Lemma A.8. \(\square \)

Proof of Theorem 2.5

Invoking some arguments similar to those used in the proof of can be proved Theorems 2.4, we can proof

$$\begin{aligned} 2\mathcal {Q}_{n}(g(z))=\left[ \frac{1}{Nh}\sum _{i=1}^n\hat{\Xi }_{i}^2\{g(z)\}\right] ^{-1}\Big \{\frac{1}{\sqrt{Nh}}\sum _{i=1}^n\hat{\Xi }_{i}\{g(z)\}-b(z)\Big \}^2 \end{aligned}$$

From Lemma A.8, we can prove that \(2\mathcal {Q}_{n}(g(z)){\mathop {\rightarrow }\limits ^{d}}\chi _1^2.\) \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, BQ., Hong, XJ. & Fan, GL. Penalized empirical likelihood for partially linear errors-in-variables panel data models with fixed effects. Stat Papers 61, 2351–2381 (2020). https://doi.org/10.1007/s00362-018-1049-2

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-018-1049-2

Keywords

Navigation