Asymptotic properties of a spatial autoregressive stochastic frontier model

Abstract

This paper considers asymptotic properties of a spatial autoregressive stochastic frontier model. Relying on the asymptotic theory for nonlinear spatial NED processes, we prove the consistency and asymptotic distribution of the maximum likelihood estimator under regularity conditions. When inefficiency exists, all parameter estimators have the \(\sqrt{n}\)-rate of convergence and are asymptotically normal. However, when there is no inefficiency, only some parameter estimators have the \(\sqrt{n}\)-rate of convergence, and the rest have slower convergence rates. We also investigate a corrected two stage least squares estimator that is computationally simple, and derive the asymptotic distributions of the score and likelihood ratio test statistics that test for the existence of inefficiency.

Introduction

In spatial econometrics, there are several popular modeling strategies to take into account cross sectional dependence: in a spatial autoregressive (SAR) or spatial lag model (Cliff and Ord 1973, 1981), the outcome of a spatial unit is specified as a weighted sum of neighbors’ outcomes, i.e., a spatial lag of the dependent variable; in a spatial error model, the SAR process is specified on the error terms; in a spatial Durbin model, weighted sums of neighbors’ characteristics are included as explanatory variables. We may also consider spatial dependence in the dependent variable, exogenous variables and/or error terms simultaneously, e.g., a SAR model with SAR disturbances. The SAR model captures global spillovers which can have a structural economic interpretation, the spatial Durbin model captures in addition local spillovers, and the spatial error model reflects spillovers in unobserved variables. Ignoring spatial dependence can lead to inconsistent estimation and/or incorrect inference. This is also the case for stochastic frontier (SF) models. This paper considers large sample properties of a SAR SF (SARSF) model which contains a spatial lag of the dependent variable and a half normal inefficiency term.

Our research in this paper is motivated by some existing papers in the literature on SF models with spatial dependence. Druska and Horrace (2004) consider an SF model for panel data with fixed effects and spatial error dependence, and calculate efficiency using fixed effects. Glass et al. (2013), Glass et al. (2014) use a similar strategy for a SARSF panel data model with fixed effects. Papers on SF models with spatial dependence in error terms include, among others, Schmidt et al. (2009), Pavlyuk (2011), Areal et al. (2012), Fusco and Vidoli (2013), Tsionas and Michaelides (2016), Vidolia et al. (2016) and Carvalho (2018).Footnote 1 Brehm (2013) and Adetutu et al. (2015) consider SF models with local spatial dependence. Pavlyuk (2013) and Glass et al. (2016) study SARSF models.

We notice that the above papers have not considered large sample properties of SF models with spatial dependence. The asymptotic theory for such models is of interest as they are nonlinear SAR models, which cannot be analyzed by laws of large numbers (LLN) and central limit theorems (CLT) designed for linear processes. But they might be studied by recently developed asymptotic theories on nonlinear spatial models. For the consistency of the maximum likelihood estimator (MLE) of our SARSF model with a half normal inefficiency term, due to the composite error term in the model, the usual LLN for linear-quadratic forms of disturbances (Kelejian and Prucha 2001) for a linear SAR model would not be applicable.

We provide a first rigorous analysis on asymptotic properties of this SARSF model in this paper. For nonlinear spatial econometrics, Jenish and Prucha (2012) introduce the near-epoch dependence (NED) concept of spatial processes and develop a useful LLN. We use their LLN to prove the consistency of the MLE under regularity conditions. For the general case with technical inefficiency, the asymptotic distribution of the MLE can be derived as usual by expanding the first order condition and applying an NED CLT. However, there might be a specific case that there is no inefficiency (but unknown). For such an irregular case, the asymptotic distribution might be different. For the half-normal SF model with no spatial dependence, there is an irregular feature that the information matrix is singular when there is no inefficiency (Lee 1993). This is also the case for the SARSF model. We shall show that the presence of spatial dependence will generally not create extra irregularity. We derive the asymptotic distribution of the MLE in both the cases with and without inefficiency. If inefficiency exists, the information matrix is nonsingular and all parameter estimators have the \(\sqrt{n}\)-rate of convergence and asymptotic normal distribution. But if there is no inefficiency, only some parameter estimators have the \(\sqrt{n}\)-rate of convergence, and the rest of parameters can have slower rates of convergence. The asymptotic distribution of the MLE in the irregular case with no inefficiency is derived by reparameterizing the model into one with a nonsingular information matrix, and the analysis essentially relies on higher order Taylor expansions of the original log likelihood function (Lee 1993; Rotnitzky et al. 2000).

We also investigate the score and likelihood ratio (LR) tests that test for the existence of inefficiency. All the analysis takes into account spatial correlation of observed dependent variables. These tests are useful since the asymptotic distribution of the MLE depends on whether inefficiency exists or not. Because the inefficiency parameter is nonnegative, the score test is left sided and its test statistic is asymptotically normal, similar to the SF model with no spatial dependence (Lee and Chesher 1986). But the asymptotic distribution of the LR test statistic is a mixture of a chi-square distribution with one degree of freedom and a degenerate distribution with a unit mass at 0, in accordance with the result in Lee (1993).

It is possible to consider other distributions of the inefficiency term, e.g., the exponential distribution (Meeusen and van Den Broeck 1977), the truncated normal distribution (Stevenson 1980) or the Gamma distribution (Greene 1990), but the half-normal distribution in Aigner et al. (1977) is arguably most popular in empirical applications. We may also consider spatial Durbin terms and/or spatial error dependence. Spatial Durbin terms and spatial lags or spatial moving averages of disturbances are linear spatial dependence processes, so the analysis would be similar. Kumbhakar et al. (2013) consider a subgroup approach that can allow for a mixture of both fully efficient and inefficient firms, which is useful for empirical research. Large sample properties of models with alternative specifications are of interest in future research.

The rest of this paper is organized as follows. Section 2 studies large sample properties of the MLE. A computationally simple corrected two stage least squares estimator is also investigated. The score and likelihood ratio tests for frontier functions to be possibly efficient are proposed. Section 3 reports Monte Carlo results for the estimators and test statistics. Section 4 concludes. Proofs of propositions are collected in an appendix.

MLE

Consider the following SARSF model:

$$\begin{aligned} y_{ni}=\lambda _{0}w_{n,i\cdot }Y_{n}+x_{ni}'\beta _{0}+\epsilon _{ni},\quad \epsilon _{ni}=v_{ni}-u_{ni},\quad i=1,\dotsc ,n, \end{aligned}$$
(2.1)

where \(y_{ni}\) is the logged value of a dependent variable for the ith unit, \(w_{n,i\cdot }\) is the ith row of an \(n\times n\) spatial weights matrix \(W_{n}=[w_{n,ij}]\), \(Y_{n}=[y_{n1},\dotsc ,y_{nn}]'\), \(x_{ni}=[x_{ni,1},\dots ,x_{ni,k_{x}}]'\) is a \(k_{x}\times 1\) vector of exogenous variables in logarithm, \(\lambda _{0}\) is a scalar spatial dependence parameter, \(\beta _{0}\) is a \(k_{x}\times 1\) parameter vector, \(v_{ni}\) follows the normal distribution \(N(0,\sigma _{v0}^{2})\), \(u_{ni}\) follows the nonnegative half normal distribution \(|N(0,\sigma _{u0}^{2})|\), \(u_{ni}\) and \(v_{ni}\) are independent, and \([u_{ni},v_{ni}]\)’s are i.i.d. for all i. The \(x_{ni}\) typically includes an intercept term, so we let \(x_{ni}=[1,x_{2ni}']'\) and \(\beta _{0}=[\beta _{10},\beta _{20}']'\).

With a nonnegative inefficiency term \(u_{ni}\), model (2.1) can be for production, revenue, profit frontiers and so on. For cost distance frontiers, \(v_{ni}-u_{ni}\) can be replaced by \(v_{ni}+u_{ni}\) to capture cost inefficiency, but the analysis is similar. Such a model has been introduced in the empirical literature of frontier functions, e.g., Glass et al. (2016), where the maximum likelihood estimation is described and various efficiency measures such as direct, indirect and total relative efficiencies are proposed. Model (2.1) can be extended for a panel data set by introducing a subscript t, as in Glass et al. (2016). Without loss of generality, we consider model (2.1) for cross sectional data. This model is similar to the SAR model except for the composite error term \(\epsilon _{ni}\) with a nonzero mean. However, due to the half normal distribution of \(u_{ni}\), estimates of the model parameters would in general not depend on linear and quadratic moments of independent disturbances. So asymptotic analysis and results for the linear SAR model would not be applicable. One has to sort for nonlinear spatial asymptotic theories for estimation and testing.Footnote 2

Let \([\lambda ,\beta ',\sigma _{u}^{2},\sigma _{v}^{2}]\) be an arbitrary parameter vector and the corresponding true parameter vector be \([\lambda _{0},\beta _{0}',\sigma _{u0}^{2},\sigma _{v0}^{2}]\). Denote \(\sigma ^{2}=\sigma _{u}^{2}+\sigma _{v}^{2}\), \(\delta =\sigma _{u}/\sigma _{v}\), and \(\theta =[\lambda ,\beta ',\sigma ^{2},\delta ]'\). The log likelihood function of \(\theta \) for model (2.1) is

$$\begin{aligned} \ln L_{n}(\theta )&= {} n\ln 2-\frac{n}{2}\ln (2\pi \sigma ^{2})+\ln |I_{n}-\lambda W_{n}|\nonumber \\&\quad -\frac{1}{2\sigma ^{2}}\sum _{i=1}^{n}(y_{ni}-\lambda w_{n,i\cdot }Y_{n}-x_{ni}'\beta )^{2}\nonumber \\&\quad +\sum _{i=1}^{n}\ln \Phi \left( -\frac{\delta }{\sigma }(y_{ni}-\lambda w_{n,i\cdot }Y_{n}-x_{ni}'\beta )\right) , \end{aligned}$$
(2.2)

where \(\Phi (\cdot )\) is the distribution function of a standard normal random variable, whose presence is due to the stochastic frontier disturbance \(u_{ni}\). The log likelihood function \(\ln L_{n}(\theta )\) involves the log determinant \(|I_{n}-\lambda W_{n}|\), which can be computed once eigenvalues of \(W_{n}\) are computed (Ord 1975), or by a Taylor series approximation of the log determinant as suggested in LeSage and Pace (2009), even when the sample size is large. Note that \(\Phi \left( -\frac{\delta }{\sigma }(y_{ni}-\lambda w_{n,i\cdot }Y_{n}-x_{ni}'\beta )\right) \) is a nonlinear function of \(y_{n1},\dotsc ,y_{nn}\), so the LLN for linear-quadratic forms of disturbances in spatial econometrics, originated in Kelejian and Prucha (2001), is not applicable. However, we investigate the asymptotic theory in Jenish and Prucha (2012) for near-epoch dependent (NED) random fields, which are generalized from the time series literature.Footnote 3 The spatial NED property is preserved under certain transformations such as summation, multiplication, and Lipschitz transformation and some of its generalizations. We note that some terms in (2.2) are similar to those in the log likelihood function of a SAR Tobit model in Xu and Lee (2015), so some of their analysis on NED properties of relevant terms can be adapted to investigate large sample properties of model (2.1).

The following assumptions are maintained for model (2.1).

Assumption 1

Individual units in the economy are located or living in a region \(D_{n}\subset D\subset {\mathbb {R}}^{d}\), where D is a (possibly) unevenly spaced lattice, the cardinality \(|D_{n}|\) of a finite set \(D_{n}\) satisfies \(\lim _{n\rightarrow \infty }|D_{n}|=\infty \). The distance d(ij) between any two different individuals i and j is larger than or equal to a positive constant, which will be assumed to be 1 for convenience.

Assumption 2

\(c_{1}\equiv \lambda _{m}\sup _{n}\Vert W_{n}\Vert _{\infty }<1\), and \([-\lambda _{m},\lambda _{m}]\) is the compact parameter space of \(\lambda \) on the real line.

Assumption 3

In addition to the diagonal elements of \(W_{n}\), \(w_{n,ii}=0\) for all i, the elements of \(W_{n}\) satisfy at least one of the following two conditions:

  1. (a)

    Only individuals whose distances are less than or equal to some positive constant \(d_{0}\) may affect each other directly, i.e., \(w_{n,ij}\ne 0\) only if \(d(i,j)\le d_{0}\).

  2. (b)

    (i) For every n, the number of columns \(w_{n,\cdot j}\) of \(W_{n}\) with \(|\lambda _{0}|\sum _{i=1}^{n}|w_{n,ij}|>c_{1}\) is less than or equal to some fixed nonnegative integer that does not depend on n;Footnote 4 (ii) there exists an \(\alpha >d\) and a constant \(c_{2}\) such that \(|w_{n,ij}|\le c_{2}/d(i,j)^{\alpha }\).

Assumption 4

(a) \(v_{ni}\sim N(0,\sigma _{v0}^{2})\) and \(u_{ni}\sim |N(0,\sigma _{u0}^{2})|\) a half normal random variable; (b) \(x_{ni}\), \(v_{ni}\) and \(u_{ni}\) are mutually independent; (c) \([v_{ni},u_{ni}]\)’s are i.i.d.

Assumption 5

(a) \(\sup _{1\le k\le k_{x},i,n}{\text {E}}[|x_{ni,k}|^{4+\iota }]<\infty \) for some \(\iota >0\); (b) \(\{x_{ni}\}_{i=1}^{n}\) is an \(\alpha \)-mixing random field with \(\alpha \)-mixing coefficient \(\alpha (u,v,s)\le (u+v)^{c_{3}}\hat{\alpha }(s)\) for some \(c_{3}\ge 0\), where \(\hat{\alpha }(s)\) satisfies \(\sum _{s=1}^{\infty }s^{d-1}\hat{\alpha }(s)<\infty \).

Assumption 6

\(\lim \sup _{n\rightarrow \infty }\frac{1}{n}[{\text {E}}\ln L_{n}(\theta )-{\text {E}}\ln L_{n}(\theta _{0})]<0\) for any \(\theta \ne \theta _{0}\).

Assumption 7

The parameter space of \([\beta ',\sigma ^{2},\delta ]'\) is a compact subset of \({\mathbb {R}}^{k_{x}+2}\) and \(\delta \ge 0\).

Assumption 1 is introduced by Jenish and Prucha (2009), Jenish and Prucha (2012) for spatial mixing and NED processes. As the distance between two units can be a geometrical distance or an economic distance or a mixture of both, the space D is allowed to be high dimensional as a subset of \({\mathbb {R}}^{d}\) and the distance can be induced from any norm in \({\mathbb {R}}^{d}\). The increasing domain asymptotics imposed in Assumption 1 are natural for a regional study, and are usually needed for regular asymptotic properties of estimators. Since the distance between any two different individuals is larger than or equal to some positive constant, the sample region must expand as the sample size increases. Another asymptotic method is the so-called infill asymptotics, where the growth of the sample size can be achieved by sampling points arbitrarily dense in a fixed sample region. Under infill asymptotics, even some popular estimators, such as the least squares and the method of moments estimators, may not be consistent (see, e.g., Lahiri 1996). Assumptions 23 are from Xu and Lee (2015). Assumption 2 is used in Xu and Lee (2015) to establish the NED property of a term similar to \(\Phi \left( -\frac{\delta }{\sigma }(y_{ni}-\lambda w_{n,i\cdot }Y_{n}-x_{ni}'\beta )\right) \) in (2.2), so we also impose it, although it is stronger than the condition that \(\lambda \) is in a compact subset of \((1/\mu _{\min },1/\mu _{\max })\), where \(\mu _{\min }\) and \(\mu _{\max }\) are, respectively, the smallest and largest real eigenvalues of the spatial weights matrix for a linear SAR model, as discussed in, e.g., LeSage and Pace (2009) and Kelejian and Prucha (2010). Assumption 2 also implies the existence of the reduced form of \(Y_{n}\) in (2.1) and the Neuman series expansion \((I_{n}-\lambda W_{n})^{-1}=I_{n}+\lambda W_{n}+\lambda ^{2}W_{n}^{2}+\dots \) for any \(\lambda \) in that parameter space. The compactness of the parameter space for \(\lambda \) in Assumption 2 and that for the rest of parameters in Assumption 7 are typically maintained for extremum estimators. Assumption 3 avoids self-influence, i.e., \(w_{n,ii}=0\) for all i, and also requires the interaction of units i and j in terms of \(w_{n,ij}\) to decline fast enough. While Assumption 3(a) requires no direct interaction for any two units when they are far enough from each other, Assumption 3(b)(ii) possibly allows all non-diagonal elements of \(W_{n}\) to be nonzero but their interactions decline geometrically fast. Assumption 3(b)(i) is a condition on the column sums of \(W_{n}\) in absolute value, i.e., the total effects of each spatial unit on those who are connected to (or nominate) him/her. Only a fixed number of spatial units are allowed to have large aggregated effects on other units. In a network setting, the units with large aggregated effects on other units are referred to as stars. Assumptions 4 and 5 summarize the exogeneity of explanatory variables and distributional assumptions of disturbances. The conditions in Assumption 5 are needed for the NED properties of relevant terms. The mixing coefficient for the random field \(\{x_{ni}\}_{i=1}^{n}\) in Assumption 5(b) does not only depend on the distance between two separate subsets of spatial units but also their sizes.Footnote 5 Assumption 6 is an identification condition for the model.Footnote 6 As a ratio of two standard deviations, \(\delta \) is necessarily nonnegative, that is stated in Assumption 7. Under the above assumptions, pointwise and uniform LLNs can be applied to prove the uniform convergence of the sample average log likelihood function. With identification uniqueness of the true parameters and equicontinuity of the limiting expected log likelihood function in parameters, the MLE \(\hat{\theta }\) will be consistent (White 1994). The detailed proof is in Appendix 1.

Proposition 2.1

Under Assumptions 17, the MLE \(\hat{\theta }\) of model (2.1) is consistent.

We next investigate the asymptotic distribution of the MLE. Let \(G_{n}(\lambda )=W_{n}(I_{n}-\lambda W_{n})^{-1}\), \(\epsilon _{ni}(\lambda ,\beta )=y_{ni}-\lambda w_{n,i\cdot }Y_{n}-x_{ni}'\beta \), and \(f(t)=\phi (t)/\Phi (t)\) be the inverse Mills ratio, where \(\phi (t)\) is the density function of the standard normal distribution. The first order derivatives of the log likelihood function on its parameters are

$$\begin{aligned} \frac{\partial \ln L_{n}(\theta )}{\partial \lambda }&=-{\text {tr}}[G_{n}(\lambda )]+\frac{1}{\sigma ^{2}}\sum _{i=1}^{n}w_{n,i\cdot }Y_{n}\epsilon _{ni}(\lambda ,\beta )\nonumber \\&\quad +\frac{\delta }{\sigma }\sum _{i=1}^{n}w_{n,i.}Y_{n}f\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(2.3)
$$\begin{aligned} \frac{\partial \ln L_{n}(\theta )}{\partial \beta }&=\frac{1}{\sigma ^{2}}\sum _{i=1}^{n}x_{ni}\epsilon _{ni}(\lambda ,\beta )+\frac{\delta }{\sigma }\sum _{i=1}^{n}x_{ni}f\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(2.4)
$$\begin{aligned} \frac{\partial \ln L_{n}(\theta )}{\partial \sigma ^{2}}&=-\frac{n}{2\sigma ^{2}}+\frac{1}{2\sigma ^{4}}\sum _{i=1}^{n}\epsilon _{ni}^{2}(\lambda ,\beta )+\frac{\delta }{2\sigma ^{3}}\sum _{i=1}^{n}f\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) \epsilon _{ni}(\lambda ,\beta ), \end{aligned}$$
(2.5)
$$\begin{aligned} \frac{\partial \ln L_{n}(\theta )}{\partial \delta }&=-\frac{1}{\sigma }\sum _{i=1}^{n}f\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) \epsilon _{ni}(\lambda ,\beta ). \end{aligned}$$
(2.6)

These scores and the second order derivatives used to construct the information matrix will be regular when \(\delta _{0}>0\). However, there are some irregularities for the case with \(\delta _{0}\) happened to be zero. Here we consider both situations. If \(\delta _{0}=0\) but unknown to the investigator, then this constraint would not be imposed for estimation. For this case, we see that \(\frac{\partial \ln L_{n}(\eta ,0)}{\partial \delta }=-\sigma \sqrt{\frac{2}{\pi }}\frac{\partial \ln L_{n}(\eta ,0)}{\partial \beta _{1}}\), where \(\eta =[\lambda ,\beta _{1},\beta _{2}',\sigma ^{2}]'\) and \(\beta _{1}\) is the first component of \(\beta =[\beta _{1},\beta _{2}']'\), because \(f(0)=\frac{\phi (0)}{\Phi (0)}=\sqrt{\frac{2}{\pi }}\). Thus, when the true value \(\delta _{0}\) is zero, the scores of model (2.1) are linearly dependent and the information matrix is singular, which is similar to the SF model with no spatial dependence. The SARSF model has an additional term \(\frac{\partial \ln L_{n}(\eta ,0)}{\partial \lambda }\), but it would not create additional linear dependence on other derivatives because \(\frac{\partial \ln L_{n}(\eta ,0)}{\partial \lambda }=\frac{1}{\sigma ^{2}}[G_{n}(\lambda )X_{n}\beta ]'\epsilon _{n}(\lambda ,\beta )+\frac{1}{\sigma ^{2}}\epsilon _{n}'(\lambda ,\beta )G_{n}(\lambda )\epsilon _{n}(\lambda ,\beta )-{\text {tr}}[G_{n}(\lambda )]\) is linear-quadratic in \(\epsilon _{n}(\lambda ,\beta )\), where \(\epsilon _{n}(\lambda ,\beta )=[\epsilon _{n1}(\lambda ,\beta ),\dotsc ,\epsilon _{nn}(\lambda ,\beta )]'\) and \(X_{n}=[x_{n1},\dotsc ,x_{nn}]'\), so the additional score \(\frac{\partial \ln L_{n}(\eta _{0},0)}{\partial \lambda }\) due to the presence of spatial dependence will not be linearly dependent on other scores, which do not have a quadratic term. With \(\delta _{0}=0\), however the asymptotic distribution of the MLE can be derived by reparameterizing the model into one with a nonsingular information matrix, as in the usual SF model without spatial interactions (Lee 1993). When \(\delta _{0}\ne 0\), the scores are generally not linearly dependent, so the asymptotic distribution of the MLE can be derived as usual by a mean value theorem expansion. In the following, we shall first consider the regular case with \(\delta _{0}\ne 0\) and then the irregular one under \(\delta _{0}=0\).

Asymptotic distribution under \(\delta _{0}\ne 0\)

When \(\delta _{0}\ne 0\), the information matrix \({\text {E}}(\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta }\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta '})=-{\text {E}}(\frac{\partial ^{2}\ln L_{n}(\theta _{0})}{\partial \theta \partial \theta '})\) is generally nonsingular and we assume that it is so in the limit.

Assumption 8

\(\lim _{n\rightarrow \infty }{\text {E}}(-\frac{1}{n}\frac{\partial ^{2}\ln L_{n}(\theta _{0})}{\partial \theta \partial \theta '})\) is positive definite.

The asymptotic distribution of \(\hat{\theta }\) follows by a mean value theorem expansion of its first order condition. The following regularity conditions are needed in the analysis.

Assumption 9

(a) When \(\delta _{0}>0\), the true value of \(\theta \) is in the interior of its parameter space. (b) For the case with \(\delta _{0}=0\), the true values of all remaining parameters are in the interior of their parameter subspace.

Assumption 10

(a) \(\sup _{1\le k\le k_{x},i,n}{\text {E}}[|x_{ni,k}|^{6}]<\infty \); (b) \(\alpha >5d\), where \(\alpha \) is that one in Assumption 3(b)(ii); (c) for the \(\alpha \)-mixing coefficients of \(\{x_{ni}\}_{i=1}^{n}\) in Assumption 5, \(\check{\alpha }(s)\) satisfies \(\sum _{s=1}^{\infty }s^{d[1+c_{3}\iota ^{*}/(2+\iota ^{*})]-1}[\hat{\alpha }(s)]^{\iota ^{*}/(4+2\iota ^{*})}<\infty \) for some \(0<\iota ^{*}<1\), , where the constant \(c_{3}\) is the one in Assumption 5(b).

Assumption 9 is a familiar condition required to derive asymptotic distributions of estimators. With \(\delta _{0}>0\), all components of the true value \(\theta _{0}\) are in the interior of the parameter space. On the other hand, if \(\delta _{0}=0\), the remaining true parameters are not subject to boundary constraints so they are in the interior of their parameter subspace. The moment condition in Assumption 10(a) is used to have the convergence of the second order derivatives of \(\frac{1}{n}\ln L_{n}(\theta )\) at a consistent estimator of \(\theta _{0}\). In addition to \(\sup _{1\le k\le k_{x},i,n}{\text {E}}[|x_{ni,k}|^{6}]<\infty \), Assumption 10(b)–(c) modify the declining rate \(\alpha \) of \(w_{n,ij}\) in Assumption 3 and that on \(\check{\alpha }(s)\) accordingly for the applicability of the CLT for an NED process in Jenish and Prucha (2012).

Proposition 2.2

Under Assumptions110and\(\delta _{0}\ne 0\),

$$\begin{aligned} \sqrt{n}(\hat{\theta }-\theta _{0})\xrightarrow {d}N\Bigl (0,\lim _{n\rightarrow \infty }\left( -\frac{1}{n}{\text {E}}\frac{\partial ^{2}\ln L_{n}(\theta _{0})}{\partial \theta \partial \theta '}\right) ^{-1}\Bigr ). \end{aligned}$$

This proposition gives the asymptotic distribution of the MLE \(\hat{\theta }\) of \(\theta _{0}\) when there is inefficiency in the stochastic frontier function. This is a regular situation of the model. It remains to consider the irregular case when the production function of each firm is efficient. For that case, the analysis is relatively complicated and will be presented in the next subsection.

The boundary case with \(\delta _{0}=0\)

When \(\delta _{0}=0\), it is on the boundary of its parameter space and the scores are linearly dependent as mentioned above. Then the usual analysis on asymptotic distributions does not work, but we can provide an analysis based on reparameterizations.

Let \(\beta _{1}^{\dagger }=\beta _{1}-\delta \sigma \sqrt{\frac{2}{\pi }}\) be a reparameterization. Then the log likelihood function in terms of \([\lambda ,\beta _{1}^{\dagger },\beta _{2}',\sigma ^{2},\delta ]'\) is

$$\begin{aligned} \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{2},\delta )\equiv \ln L_{n}\Bigl (\lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \Bigr ). \end{aligned}$$

Denote \(\eta ^{\dagger }=[\lambda ,\beta _{1}^{\dagger },\beta _{2}',\sigma ^{2}]'\). The derivatives of \(\ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{2},\delta )\) are

$$\begin{aligned} \frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{2},\delta )}{\partial \lambda }&=\frac{\partial \ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \right) }{\partial \lambda },\\ \frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{2},\delta )}{\partial \beta _{1}^{\dagger }}&=\frac{\partial \ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \right) }{\partial \beta _{1}},\\ \frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{2},\delta )}{\partial \beta _{2}}&=\frac{\partial \ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \right) }{\partial \beta _{2}},\\ \frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{2},\delta )}{\partial \sigma ^{2}}&=\frac{\delta }{2\sigma }\sqrt{\frac{2}{\pi }}\frac{\partial \ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \right) }{\partial \beta _{1}}\\&+\frac{\partial \ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \right) }{\partial \sigma ^{2}},\\ \frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{2},\delta )}{\partial \delta }&=\sigma \sqrt{\frac{2}{\pi }}\frac{\partial \ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \right) }{\partial \beta _{1}}\\&+\frac{\partial \ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \right) }{\partial \delta }. \end{aligned}$$

At \(\delta =0\), because \(\beta _{1}^{\dagger }=\beta _{1}\) and \(\eta ^{\dagger }=\eta \), we have \(\frac{\partial \ln L_{2n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \eta ^{\dagger }}=\frac{\partial \ln L_{n}(\eta ,0)}{\partial \eta }\), but

$$\begin{aligned} \frac{\partial \ln L_{2n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \delta }=0 \end{aligned}$$
(2.7)

identically for all possible values of \(\eta \) due to the linear dependence of \(\frac{\partial \ln L_{n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \theta }\) in (2.3)–(2.6). Thus, as \(\frac{\partial \ln L_{2n}(\eta _{0},0)}{\partial \delta }\) is not the leading order term of a Taylor expansion in deriving the asymptotic distribution of the MLE that maximizes \(\ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{2},\delta )\), we need to investigate the second order derivative \(\frac{\partial ^{2}\ln L_{2n}(\eta _{0},0)}{\partial \delta ^{2}}\). Since

$$\begin{aligned} \frac{\partial ^{2}\ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{2},\delta )}{\partial \delta ^{2}}&=\frac{2\sigma ^{2}}{\pi }\frac{\partial ^{2}\ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \right) }{\partial \beta _{1}^{2}}\\&\quad +2\sigma \sqrt{\frac{2}{\pi }}\frac{\partial ^{2}\ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \right) }{\partial \beta _{1}\partial \delta }\\&\quad +\frac{\partial ^{2}\ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\delta \sigma \sqrt{\frac{2}{\pi }},\beta _{2},\sigma ^{2},\delta \right) }{\partial \delta ^{2}}, \end{aligned}$$

by the second order derivatives of \(\ln L_{n}(\theta )\) in Appendix A, \(\frac{\partial ^{2}\ln L_{2n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \delta ^{2}}=-\frac{2}{\pi \sigma ^{2}}\sum _{i=1}^{n}[\epsilon _{ni}^{2}(\lambda ,\beta )-\sigma ^{2}]\). Then

$$\begin{aligned} \frac{\partial ^{2}\ln L_{2n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \delta ^{2}}=-\frac{4\sigma ^{2}}{\pi }\frac{\partial \ln L_{2n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \sigma ^{2}} \end{aligned}$$
(2.8)

is linearly dependent on the score with respect to \(\sigma ^{2}\). Let \(\sigma ^{\dagger 2}=\sigma ^{2}/(1+\frac{2}{\pi }\delta ^{2})\) be another reparameterization, and

$$\begin{aligned} \ln L_{3n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\delta )&\equiv \ln L_{2n}\Bigl (\lambda ,\beta _{1}^{\dagger },\beta _{2},\left( 1+\frac{2}{\pi }\delta ^{2}\right) \sigma ^{\dagger 2},\delta \Bigr )\\&=\ln L_{n}\Bigl (\lambda ,\beta _{1}^{\dagger }+\delta \sigma ^{\dagger }\left( \frac{2}{\pi }+\frac{4}{\pi ^{2}}\delta ^{2}\right) ^{1/2},\beta _{2},\left( 1+\frac{2}{\pi }\delta ^{2}\right) \sigma ^{\dagger 2},\delta \Bigr ). \end{aligned}$$

Then,

$$\begin{aligned} \frac{\partial \ln L_{3n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\delta )}{\partial \lambda }&=\frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},(1+\frac{2}{\pi }\delta ^{2})\sigma ^{\dagger 2},\delta )}{\partial \lambda }, \end{aligned}$$
(2.9)
$$\begin{aligned} \frac{\partial \ln L_{3n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\delta )}{\partial \beta _{1}^{\dagger }}&=\frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},(1+\frac{2}{\pi }\delta ^{2})\sigma ^{\dagger 2},\delta )}{\partial \beta _{1}^{\dagger }}, \end{aligned}$$
(2.10)
$$\begin{aligned} \frac{\partial \ln L_{3n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\delta )}{\partial \beta _{2}}&=\frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},(1+\frac{2}{\pi }\delta ^{2})\sigma ^{\dagger 2},\delta )}{\partial \beta _{2}}, \end{aligned}$$
(2.11)
$$\begin{aligned} \frac{\partial \ln L_{3n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\delta )}{\partial \sigma ^{\dagger 2}}&=\left( 1+\frac{2}{\pi }\delta ^{2}\right) \frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},(1+\frac{2}{\pi }\delta ^{2})\sigma ^{\dagger 2},\delta )}{\partial \sigma ^{2}}, \end{aligned}$$
(2.12)
$$\begin{aligned} \frac{\partial \ln L_{3n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\delta )}{\partial \delta }&=\frac{4\delta \sigma ^{\dagger 2}}{\pi }\frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},(1+\frac{2}{\pi }\delta ^{2})\sigma ^{\dagger 2},\delta )}{\partial \sigma ^{2}}\nonumber \\&\quad +\frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},(1+\frac{2}{\pi }\delta ^{2})\sigma ^{\dagger 2},\delta )}{\partial \delta }; \end{aligned}$$
(2.13)

and hence,

$$\begin{aligned}&\frac{\partial ^{2}\ln L_{3n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\delta )}{\partial \delta ^{2}}\nonumber \\&=\frac{4\sigma ^{\dagger 2}}{\pi }\frac{\partial \ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},(1+\frac{2}{\pi }\delta ^{2})\sigma ^{\dagger 2},\delta )}{\partial \sigma ^{2}}\nonumber \\&\quad +\frac{16\delta ^{2}\sigma ^{\dagger 4}}{\pi ^{2}}\frac{\partial ^{2}\ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},(1+\frac{2}{\pi }\delta ^{2})\sigma ^{\dagger 2},\delta )}{\partial \sigma ^{4}}\nonumber \\&\quad +\frac{8\delta \sigma ^{\dagger 2}}{\pi }\frac{\partial ^{2}\ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},(1+\frac{2}{\pi }\delta ^{2})\sigma ^{\dagger 2},\delta )}{\partial \sigma ^{2}\partial \delta }\nonumber \\&\quad +\frac{\partial ^{2}\ln L_{2n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},(1+\frac{2}{\pi }\delta ^{2})\sigma ^{\dagger 2},\delta )}{\partial \delta ^{2}}. \end{aligned}$$
(2.14)

It follows from (2.8) and (2.14) that

$$\begin{aligned} \frac{\partial ^{2}\ln L_{3n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \delta ^{2}}=0. \end{aligned}$$

Also from (2.7)–(2.12) at \(\delta _{0}=0\),

$$\begin{aligned} \frac{\partial \ln L_{3n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \eta ^{\ddagger }}=\frac{\partial \ln L_{n}(\eta ,0)}{\partial \eta }, \end{aligned}$$

where \(\eta ^{\ddagger }=[\lambda ,\beta _{1}^{\dagger },\beta _{2}',\sigma ^{\dagger 2}]'\). Furthermore, by (2.7) and (2.13),

$$\begin{aligned} \frac{\partial \ln L_{3n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \delta }=0. \end{aligned}$$

Thus, neither \(\frac{\partial \ln L_{3n}(\eta _{0},0)}{\partial \delta }\) nor \(\frac{\partial ^{2}\ln L_{3n}(\eta _{0},0)}{\partial \delta ^{2}}\) is the leading order term of a Taylor expansion in deriving the asymptotic distribution of the MLE that maximizes \(\ln L_{3n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\delta )\), and the third order derivative \(\frac{\partial ^{3}\ln L_{3n}(\eta _{0},0)}{\partial \delta ^{3}}\) need be examined.Footnote 7 It follows that, by one more reparameterization, the model can be transformed to be one with a nonsingular information matrix so that the asymptotic distribution of the MLE for the reparameterized coefficients can be derived as usual (Lee 1993). Let \(\tau =\delta ^{3}\) and

$$\begin{aligned} \ln L_{4n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\tau )&\equiv \ln L_{3n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\tau ^{1/3})\nonumber \\&=\ln L_{n}\Bigl (\lambda ,\beta _{1}^{\dagger }+\tau ^{1/3}\sigma ^{\dagger }\left( \frac{2}{\pi }+\frac{4}{\pi ^{2}}\tau ^{2/3}\right) ^{1/2},\nonumber \\&\qquad \beta _{2},\left( 1+\frac{2}{\pi }\tau ^{2/3}\right) \sigma ^{\dagger 2},\tau ^{1/3}\Bigr ). \end{aligned}$$
(2.15)

Then by Proposition 3 in Lee (1993), \(\frac{\partial \ln L_{4n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \tau }=\frac{1}{6}\frac{\partial ^{3}\ln L_{3n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \delta ^{3}}\). It follows that by some calculation,

$$\begin{aligned} \frac{\partial \ln L_{4n}(\lambda ,\beta _{1},\beta _{2},\sigma ^{2},0)}{\partial \tau }&=\frac{1}{6\sigma ^{3}}\left( 1-\frac{4}{\pi }\right) \sqrt{\frac{2}{\pi }}\sum _{i=1}^{n}\epsilon _{ni}^{3}(\lambda ,\beta )+\frac{2}{\pi \sigma }\sqrt{\frac{2}{\pi }}\sum _{i=1}^{n}\epsilon _{ni}(\lambda ,\beta ). \end{aligned}$$
(2.16)

In addition,

$$\begin{aligned} \frac{\partial \ln L_{4n}(\lambda _{0},\beta _{10},\beta _{20},\sigma _{0}^{2},0)}{\partial \eta ^{\ddagger }}&=\begin{pmatrix}\frac{1}{\sigma _{0}^{2}}\epsilon _{n}'G_{n}\epsilon _{n}-{\text {tr}}(G_{n})+\frac{1}{\sigma _{0}^{2}}(G_{n}X_{n}\beta _{0})'\epsilon _{n}\\ \frac{1}{\sigma _{0}^{2}}X_{n}'\epsilon _{n}\\ \frac{1}{2\sigma _{0}^{4}}(\epsilon _{n}'\epsilon _{n}-n\sigma _{0}^{2}) \end{pmatrix}, \end{aligned}$$

where \(G_{n}=G_{n}(\lambda _{0})\) and \(\epsilon _{n}=[\epsilon _{n1},\dotsc ,\epsilon _{nn}]'\). For the log likelihood function \(\ln L_{4n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\tau )\), the information matrix is

$$\begin{aligned} \Delta _{n}=\begin{pmatrix}\frac{1}{\sigma _{0}^{2}}{\text {E}}[(G_{n}X_{n}\beta _{0})'(G_{n}X_{n}\beta _{0})]+{\text {tr}}(G_{n}G_{n}^{(s)}) &{} * &{} * &{} *\\ \frac{1}{\sigma _{0}^{2}}{\text {E}}(X_{n}'G_{n}X_{n}\beta _{0}) &{} \frac{1}{\sigma _{0}^{2}}{\text {E}}(X_{n}'X_{n}) &{} * &{} *\\ \frac{1}{\sigma _{0}^{2}}{\text {tr}}(G_{n}) &{} 0 &{} \frac{n}{2\sigma _{0}^{4}} &{} *\\ \frac{1}{\sqrt{2\pi \sigma _{0}^{2}}}{\text {E}}(l_{n}'G_{n}X_{n}\beta _{0}) &{} \frac{1}{\sqrt{2\pi \sigma _{0}^{2}}}{\text {E}}(l_{n}'X_{n}) &{} 0 &{} \frac{n}{6\pi }(5-\frac{16}{\pi }+\frac{32}{\pi ^{2}}) \end{pmatrix}, \end{aligned}$$
(2.17)

where \(A^{(s)}=A+A'\) for any square matrix A, and \(l_{n}\) is an \(n\times 1\) vector of ones. Under the following assumption, \(\frac{1}{n}\Delta _{n}\) is positive definite for a large enough n.

Assumption 11

Either (a) \(\lim _{n\rightarrow \infty }\frac{1}{n}{\text {E}}[(G_{n}X_{n}\beta _{0},X_{n})'T_{n}(G_{n}X_{n}\beta _{0},X_{n})]\) is positive definite, where \(T_{n}=I_{n}-\frac{3}{n(5-\frac{16}{\pi }+\frac{32}{\pi ^{2}})}l_{n}l_{n}'\), or (b) \(\lim _{n\rightarrow \infty }\frac{1}{n}{\text {E}}(X_{n}'T_{n}X_{n})\) is positive definite and \(\lim _{n\rightarrow \infty }[\frac{1}{n}{\text {tr}}(G_{n}^{(s)}G_{n}^{(s)})-\frac{1}{n^{2}}{\text {tr}}^{2}(G_{n}^{(s)})]>0\).

The above assumption is similar to one for the SAR model in Lee (2004), except for the presence of the matrix \(T_{n}\), which is due to the inclusion of \(u_{ni}\) in the SARSF model. Note that \(T_{n}=(I_{n}-\frac{1}{n}l_{n}l_{n}')+\frac{2-16/\pi +32/\pi ^{2}}{n(5-16/\pi +32/\pi ^{2})}l_{n}l_{n}'\) is positive definite because \(2-16/\pi +32/\pi ^{2}\) is positive. In addition, \(\frac{1}{n}{\text {tr}}(G_{n}^{(s)}G_{n}^{(s)})\ge \frac{1}{n^{2}}{\text {tr}}^{2}(G_{n}^{(s)})\) by the Cauchy-Schwarz inequality.

As \(\delta \ge 0\), \(\tau \ge 0\). The MLE \([\hat{\lambda },\hat{\beta }_{1}^{\dagger },\hat{\beta }_{2},\hat{\sigma }^{\dagger 2},\hat{\tau }]\) of \([\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\tau ]\) maximizes \(\ln L_{4n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\tau )\) on the transformed parameter space with \(\tau \ge 0\). It is possible that the MLE occurs at the boundary with \(\hat{\tau }=0\). Let \(\check{\eta }\) be the MLE of \(\eta _{0}\) for the SAR model, i.e., model (2.1) with \(\epsilon _{ni}=v_{ni}\). Then the MLE \([\hat{\eta }^{\ddagger \prime },\hat{\tau }]\) is equal to \([\check{\eta }',0]\) if and only if \(\frac{\partial \ln L_{4n}(\check{\eta },0)}{\partial \tau }\le 0\) as in Waldman (1982). As \(\frac{\partial \ln L_{4n}(\check{\eta },0)}{\partial \tau }=\frac{1}{6}\frac{\partial ^{3}\ln L_{3n}(\check{\eta },0)}{\partial \delta ^{3}}\), \(\frac{\partial \ln L_{4n}(\check{\eta },0)}{\partial \tau }\le 0\) if and only if \(\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}\ge 0\) by (2.16), where \(\check{\epsilon }_{ni}=y_{ni}-\check{\lambda }w_{n,i\cdot }Y_{n}-x_{ni}'\check{\beta }\).Footnote 8 The \(\check{\eta }\) satisfies \(\frac{\partial \ln L_{4n}(\check{\eta },0)}{\partial \eta ^{\ddagger }}=0\). Then under regularity conditions, by the CLT for NED processes in Jenish and Prucha (2012),

$$\begin{aligned} \sqrt{n}(\check{\eta }-\eta _{0})= & {} \left( \frac{1}{n}\Delta _{n,11}\right) ^{-1}\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\lambda _{0},\beta _{10},\beta _{20},\sigma _{0}^{2},0)}{\partial \eta ^{\ddagger }}\nonumber \\&+o_{p}(1)\xrightarrow {d}N\Bigl (0,\lim _{n\rightarrow \infty }\left( \frac{1}{n}\Delta _{n,11}\right) ^{-1}\Bigr ), \end{aligned}$$
(2.18)

where

$$\begin{aligned} \Delta _{n,11}=\begin{pmatrix}\frac{1}{\sigma _{0}^{2}}{\text {E}}[(G_{n}X_{n}\beta _{0})'(G_{n}X_{n}\beta _{0})]+{\text {tr}}(G_{n}G_{n}^{(s)}) &{} * &{} *\\ \frac{1}{\sigma _{0}^{2}}{\text {E}}(X_{n}'G_{n}X_{n}\beta _{0}) &{} \frac{1}{\sigma _{0}^{2}}{\text {E}}(X_{n}'X_{n}) &{} *\\ \frac{1}{\sigma _{0}^{2}}{\text {tr}}(G_{n}) &{} 0 &{} \frac{n}{2\sigma _{0}^{4}} \end{pmatrix}. \end{aligned}$$

Let \(J=[J_{1},J_{2},J_{3}',J_{4}]'\) be the multivariate normal vector \(N(0,\lim _{n\rightarrow \infty }(\frac{1}{n}\Delta _{n,11})^{-1})\), where \(J_{1}\), \(J_{2}\) and \(J_{4}\) are univariate normal random variables. Under regularity conditions, by a Taylor expansion and (2.18),

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}&=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\epsilon _{ni}^{3} -\frac{3\sigma _{0}^{2}}{n}[{\text {E}}(l_{n}'G_{n}X_{n}\beta _{0},l_{n}'X_{n},0)]\nonumber \\&\qquad \sqrt{n}(\check{\eta }-\eta _{0})+o_{p}(1)=\Gamma _{n}+o_{p}(1), \end{aligned}$$
(2.19)

where \(\Gamma _{n}=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\epsilon _{ni}^{3}-\frac{3\sigma _{0}^{2}}{n}[{\text {E}}(l_{n}'G_{n}X_{n}\beta _{0},l_{n}'X_{n},0)]\left( \frac{1}{n}\Delta _{n,11}\right) ^{-1}\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\lambda _{0},\beta _{10},\beta _{20},\sigma _{0}^{2},0)}{\partial \eta ^{\ddagger }}\). Since

$$\begin{aligned} {\text {E}}\Bigl [\Bigl (\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\epsilon _{ni}^{3}\Bigr )\Bigl (\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\lambda _{0},\beta _{10},\beta _{20},\sigma _{0}^{2},0)}{\partial \eta ^{\ddagger \prime }}\Bigr )\Bigr ]=\frac{3\sigma _{0}^{2}}{n}{\text {E}}[l_{n}'G_{n}X_{n}\beta _{0},l_{n}'X_{n},0], \end{aligned}$$

\(\Gamma _{n}\) is uncorrelated with the leading order term \(\left( \frac{1}{n}\Delta _{n,11}\right) ^{-1}\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\lambda _{0},\beta _{10},\beta _{20},\sigma _{0}^{2},0)}{\partial \eta ^{\ddagger }}\) of \(\sqrt{n}(\check{\eta }-\eta _{0})\). Then \(\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}\) is asymptotically uncorrelated with \(\sqrt{n}(\check{\eta }-\eta _{0})\). By the CLT in Jenish and Prucha (2012), \([\sqrt{n}(\check{\eta }-\eta _{0})',\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}]'\) converges in distribution to the normal vector \([J',K]'\), where \(K=N(0,6\sigma _{0}^{6})\) is independent of J; therefore, the event \(\frac{\partial \ln L_{4n}(\check{\eta },0)}{\partial \tau }\le 0\) is asymptotically independent of J.

When \(\hat{\tau }>0\), the MLE \([\hat{\eta }^{\ddagger \prime },\hat{\tau }]'\) satisfies the first order conditions \(\frac{\partial \ln L_{4n}(\hat{\eta }^{\ddagger },\hat{\tau })}{\partial \eta ^{\ddagger }}=0\) and \(\frac{\partial \ln L_{4n}(\hat{\eta }^{\ddagger },\hat{\tau })}{\partial \tau }=0\). Let \(F=[F_{1},F_{2},F_{3}',F_{4},F_{5}]'\) be the normal vector distributed as \(N(0,\lim _{n\rightarrow \infty }(\frac{1}{n}\Delta _{n})^{-1})\), where \(F_{1}\), \(F_{2}\), \(F_{4}\) and \(F_{5}\) are univariate random variables, and the distribution of \(N(0,\lim _{n\rightarrow \infty }(\frac{1}{n}\Delta _{n})^{-1})\) is the asymptotic distribution of \((\frac{1}{n}\Delta _{n})^{-1}\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\lambda _{0},\beta _{10},\beta _{20},\sigma _{0}^{2},0)}{\partial \theta ^{\ddagger }}\) for \(\theta ^{\ddagger }=[\eta ^{\ddagger \prime },\tau ]'\). We may show that conditional on \(\hat{\tau }>0\), \(\sqrt{n}[\hat{\eta }^{\ddagger \prime }-\eta _{0}',\hat{\tau }]'\) converges in distribution to the random vector \([F_{1},F_{2},F_{3}',F_{4},|F_{5}|]'\), where \(|F_{5}|\) represents the truncated normal of \(F_{5}\) on the nonnegative axis.

Denote \(\hat{K}=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}\). Since \(\delta =\tau ^{1/3}\), conditional on \(\hat{K}\ge 0\), \(\hat{\delta }=0\); conditional on \(\hat{K}<0\), \((n^{1/6}\hat{\delta })^{3}=n^{1/2}\hat{\tau }\xrightarrow {d}|F_{5}|\), which implies that \(n^{1/6}\hat{\delta }=O_{p}(1)\). With the asymptotic distribution of \([\hat{\eta }^{\ddagger \prime },\hat{\tau }]'\), and the relations \(\beta _{1}=\beta _{1}^{\dagger }+\tau ^{1/3}\sigma ^{\dagger }(\frac{2}{\pi }+\frac{4}{\pi ^{2}}\tau ^{2/3})^{1/2}\) and \(\sigma ^{2}=(1+\frac{2}{\pi }\tau ^{2/3})\sigma ^{\dagger 2}\), conditional on \(\hat{K}\ge 0\), the asymptotic distributions of \(\hat{\beta }_{1}\) and \(\hat{\sigma }^{2}\) are the same as those of \(\hat{\beta }_{1}^{\dagger }\) and \(\hat{\sigma }^{\dagger 2}\); conditional on \(\hat{K}<0\),

$$\begin{aligned} n^{1/6}(\hat{\beta }_{1}-\beta _{10})&=n^{1/6}(\hat{\beta }_{1}^{\dagger }-\beta _{10})+n^{1/6}\hat{\tau }^{1/3}\hat{\sigma }^{\dagger }\left( \frac{2}{\pi }+\frac{4}{\pi ^{2}}\hat{\tau }^{2/3}\right) ^{1/2}\\&=O_{p}(n^{1/6-1/2})+n^{1/6}\hat{\delta }\sigma _{0}\sqrt{2/\pi }+o_{p}(1)\\&=(n^{1/6}\hat{\delta })\sigma _{0}\sqrt{2/\pi }+o_{p}(1) \end{aligned}$$

and

$$\begin{aligned} n^{1/3}(\hat{\sigma }^{2}-\sigma _{0}^{2})&=n^{1/3}(\hat{\sigma }^{\dagger 2}-\sigma _{0}^{2})+\frac{2}{\pi }n^{1/3}\hat{\tau }^{2/3}\hat{\sigma }^{\dagger 2}\\&=O_{p}(n^{1/3-1/2})+\frac{2}{\pi }(n^{1/6}\hat{\delta })^{2}\sigma _{0}^{2}+o_{p}(1)\\&=\frac{2}{\pi }(n^{1/6}\hat{\delta })^{2}\sigma _{0}^{2}+o_{p}(1). \end{aligned}$$

The analysis requires the following assumption.

Assumption 12

(a) \(\sup _{1\le k\le k_{x},i,n}{\text {E}}[|x_{ni,k}|^{14}]<\infty \); (b) \(\alpha >\frac{17}{5}d\); (c) for the \(\alpha \)-mixing coefficients of \(\{x_{ni}\}_{i=1}^{n}\) in Assumption 5, \(\check{\alpha }(s)\) satisfies \(\sum _{s=1}^{\infty }s^{d[1+c_{3}\iota ^{*}/(2+\iota ^{*})]-1}[\hat{\alpha }(s)]^{\iota ^{*}/(4+2\iota ^{*})}<\infty \) for some \(0<\iota ^{*}<5\).

Since \(\frac{\partial \ln L_{3n}(\eta ^{\ddagger },0)}{\partial \delta }=\frac{\partial ^{2}\ln L_{3n}(\eta ^{\ddagger },0)}{\partial \delta ^{2}}=0\), the analysis on the asymptotic distribution of the MLE \(\hat{\theta }\) essentially relies on higher order Taylor expansions of \(\ln L_{3n}(\eta ^{\ddagger },\delta )\), thus Assumption 12(a) is needed so that the orders of terms in a proper higher order Taylor expansion can be derived. With \(\sup _{1\le k\le k_{x},i,n}{\text {E}}[|x_{ni,k}|^{14}]<\infty \), Assumption 12(b)–(c) are conditions for the applicability of the CLT in Jenish and Prucha (2012).

Proposition 2.3

Under Assumptions17, 9, 1112and\(\delta _{0}=0\),

(i):

conditional on\(\hat{K}\ge 0\), \(\hat{\delta }=0\)and\(\sqrt{n}(\hat{\eta }-\eta _{0})\xrightarrow {d}J\), whereJis independent ofK, the limit of\(\hat{K}\); and

(ii):

conditional on\(\hat{K}<0\), \((n^{1/6}{\hat{{\delta }}})^{3}\xrightarrow {d} |F_{5}|\), \(n^{1/6}(\hat{\beta }_{1}-\beta _{10})=\sqrt{\frac{2}{\pi }}\sigma _{0}(n^{1/6}\hat{\delta })+o_{p}(1)\), \(n^{1/3}(\hat{\sigma }^{2}-\sigma _{0}^{2})=\frac{2}{\pi }\sigma _{0}^{2}(n^{1/6}\hat{\delta })^{2}+o_{p}(1)\), and\(\sqrt{n}[\hat{\lambda }-\lambda _{0},\hat{\beta }_{2}'-\beta _{20}']'\xrightarrow {d}[F_{1},F_{3}']'.\)

Tests for \(H_{0}\): \(\delta _{0}=0\)

As the asymptotic distribution of the MLE depends on whether \(\delta _{0}=0\) or not, we consider LR and score tests of \(\delta _{0}=0\). For the LR test, using the relation \(\ln L_{4n}(\lambda ,\beta _{1}^{\dagger },\beta _{2},\sigma ^{\dagger 2},\tau )=\ln L_{n}\left( \lambda ,\beta _{1}^{\dagger }+\tau ^{1/3}\sigma ^{\dagger }(\frac{2}{\pi }+\frac{4}{\pi ^{2}}\tau ^{2/3})^{1/2},\beta _{2},(1+\frac{2}{\pi }\tau ^{2/3})\sigma ^{\dagger 2},\tau ^{1/3}\right) \) in (2.15), we have \(2[\ln L_{n}(\hat{\eta },\hat{\delta })-\ln L_{n}(\check{\eta },0)]=2[\ln L_{4n}(\hat{\eta }^{\ddagger },\hat{\tau })-\ln L_{4n}(\check{\eta },0)]\cdot I(\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}<0)\). Then the asymptotic distribution of \(2[\ln L_{n}(\hat{\eta },\hat{\delta })-\ln L_{n}(\check{\eta },0)]\) is \(\chi ^{2}(0)\cdot I(K\ge 0)+\chi ^{2}(1)\cdot I(K<0)\), which is derived by a Taylor expansion, where \(\chi ^{2}(0)\) is degenerate with a unit mass at zero. Due to the irregular feature of \(L_{n}(\theta )\), a corresponding score test should be constructed with a higher order derivative of \(\ln L_{3n}(\eta ^{\ddagger },\delta )\) at \([\check{\eta },0]\) (Lee and Chesher 1986). Equivalently, we may construct a score test with \(\frac{\partial \ln L_{4n}(\check{\eta },0)}{\partial \tau }\). By (2.16) and (2.19), the score test statistic, which turns out to be \(\frac{n\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}}{\sqrt{6}(\sum _{i=1}^{n}\check{\epsilon }_{ni}^{2})^{3/2}}\), is asymptotically standard normal, and the test is left sided since \(\delta \ge 0\).

Proposition 2.4

Under Assumptions17, 9and1112, when\(\delta _{0}=0\), we have

(a):

\(2[\ln L_{n}(\hat{\eta },\hat{\delta })-\ln L_{n}(\check{\eta },0)]\xrightarrow {d}\chi ^{2}(0)\cdot I(K\ge 0)+\chi ^{2}(1)\cdot I(K<0)\); and

(b):

the score test statistic\(\frac{n\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}}{\sqrt{6}(\sum _{i=1}^{n}\check{\epsilon }_{ni}^{2})^{3/2}}\xrightarrow {d}N(0,1)\), and\(H_{0}\): \(\delta _{0}=0\)is rejected if\(\frac{n\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}}{\sqrt{6}(\sum _{i=1}^{n}\check{\epsilon }_{ni}^{2})^{3/2}}<c_{\varsigma }\), where\(c_{\varsigma }\)satisfies\(\Phi (c_{\varsigma })=\varsigma \)for a chosen level of significance\(\varsigma \).

The LR test involves both unrestricted and restricted MLEs, while the score test only involves the restricted MLE. If the LR test is used, the MLEs will be computed; if the score test is used, in the case that the null hypothesis of \(\delta _{0}=0\) is not rejected, a researcher might further consider the choice of an appropriate model before computing the MLE of the SARSF model. By performing a test and then constructing an estimation based on the result of a test, the final estimator would be subject to the pretesting problem if the level of significance is fixed but does not depend on sample size. However, one might argue that it is reasonable to have the level of significance decrease as the sample size increases, then asymptotically the pretesting problem would not be an issue any more. Indeed, in practice, we can suggest such a testing procedure and then execute the proper estimation.

Also, a consistent estimator of \([\lambda _{0},\beta _{20}']'\) can be derived by a two stage least squares (2SLS). While the 2SLS estimate of the intercept term might not be consistent, it can be adjusted to achieve consistency. Overall, a corrected 2SLS estimator (C2SLSE) of \(\theta _{0}\) can be derived similarly to the corrected ordinary least squares estimator for the SF model with no spatial dependence (Aigner et al. 1977). The details are in the next subsection. The C2SLSE is consistent but might not be asymptotically efficient under regularity conditions. It is also computationally simple for large sample sizes, since it avoids the computation of the determinant \(|I_{n}-\lambda W_{n}|\).Footnote 9

Corrected 2SLS estimation

Let \(Q_{n}\) be an IV matrix for \(Z_{n}=[W_{n}Y_{n},X_{n}]\), which can consist of, e.g., linearly independent columns of \([X_{n},W_{n}X_{n},W_{n}^{2}X_{n}]\). Then the 2SLS estimate \(\tilde{\kappa }\) of \(\kappa _{0}=[\lambda _{0},\beta _{0}']'\) is \(\tilde{\kappa }=(Z_{n}'P_{n}Z_{n})^{-1}Z_{n}'P_{n}Y_{n}\), where \(P_{n}=Q_{n}(Q_{n}'Q_{n})^{-1}Q_{n}'\). Let \(\tilde{\epsilon }_{ni}=y_{ni}-\tilde{\lambda }w_{n,i\cdot }Y_{n}-x_{ni}'\tilde{\beta }\). We can estimate \(\sigma _{u0}^{2}\) by \(\tilde{\sigma }_{u}^{2}=\left[ \frac{\pi }{\pi -4}\sqrt{\frac{\pi }{2}}\left( \frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{3}\right) \right] ^{2/3}\) if \(\frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{3}<0,\) and \(\tilde{\sigma }_{u}^{2}=0\) otherwise. The \(\sigma _{v0}^{2}\) can be estimated by \(\tilde{\sigma }_{v}^{2}=\frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{2}-\frac{\pi -2}{\pi }\tilde{\sigma }_{u}^{2}\). Then estimates of \(\sigma _{0}^{2}\) and \(\delta _{0}\) are, respectively, \(\tilde{\sigma }^{2}=\tilde{\sigma }_{u}^{2}+\tilde{\sigma }_{v}^{2}\) and \(\tilde{\delta }=\tilde{\sigma }_{u}/\tilde{\sigma }_{v}\). To derive a consistent estimate of \(\beta _{10}\), adjust the 2SLS estimate \(\tilde{\beta }_{1}\) to be \(\tilde{\beta }_{1c}=\tilde{\beta }_{1}+\sqrt{\frac{2}{\pi }}\tilde{\sigma }_{u}\). The C2SLSE of \(\theta _{0}\) is \(\tilde{\theta }_{c}=[\tilde{\lambda },\tilde{\beta }_{1c},\tilde{\beta }_{2}',\tilde{\sigma }^{2},\tilde{\delta }]'\). We maintain the following assumption for the C2SLSE, which would be satisfied by proper selection of IVs.

Assumption 13

(a) \(\frac{1}{n}Q_{n}'[\epsilon _{n}-{\text {E}}(\epsilon _{n})]=O_{p}(n^{-1/2})\); (b) \({\text {plim}}_{n\rightarrow \infty }\frac{1}{n}Q_{n}'G_{n}[\epsilon _{n}-{\text {E}}(\epsilon _{n})]=0\);

(c)  \({\text {plim}}_{n\rightarrow \infty }\frac{1}{n}Q_{n}'[G_{n}l_{n}(\beta _{10}-\sigma _{u0}\sqrt{2/\pi })+G_{n}X_{2n}\beta _{20},X_{n}]\) has full column rank; (d) \({\text {plim}}_{n\rightarrow \infty }\frac{1}{n}Q_{n}'Q_{n}\) is positive definite.

Assumption 13(a) imposes a relatively stronger condition than the exogeneity condition that \({\text {plim}}_{n\rightarrow \infty }\frac{1}{n}Q_{n}'[\epsilon _{n}-{\text {E}}(\epsilon _{n})]=0\) of the IV matrix \(Q_{n}\) in terms of its rate of convergence. This is because we would like to investigate the convergence rate of the C2SLSE below. Assumption 13(b) is needed due to the presence of the spatial lag \(W_{n}Y_{n}\). These two conditions would be satisfied by a proper selection of \(Q_{n}\), e.g., \(Q_{n}\) consists of linearly independent columns of \([X_{n},W_{n}X_{n},W_{n}^{2}X_{n}]\), so that the LLN can be applied and the convergence rate of such averages would be of order \(\sqrt{n}\). Assumption 13(c) requires that the instruments are relevant. It has taken into account the nonzero mean of \(v_{ni}\) if \(\sigma _{u0}\ne 0\). When \(\beta _{10}=\sigma _{u0}\sqrt{2/\pi }\) and \(\beta _{20}=0\), Assumption 13(c) would not hold as there is no valid IV for \(W_{n}Y_{n}\). In the case that \(W_{n}\) is normalized to have row sums equal to one, as \(G_{n}l_{n}\) is proportional to \(l_{n}\) and \(X_{n}\) also contains \(l_{n}\), Assumption 13(c) requires \(\beta _{20}\) to be nonzero in order to avoid this possible multicolinearity in the regressors (of the reduced form equation). Assumption 13(d) is standard.

Proposition 2.5

Under Assumptions15and13,

(i):

if\(\delta _{0}>0\), \(\tilde{\theta }_{c}=\theta _{0}+O_{p}(n^{-1/2})\);

(ii):

if\(\delta _{0}=0\), \(\tilde{\lambda }=\lambda _{0}+O_{p}(n^{-1/2})\), \(\tilde{\beta }_{2}=\beta _{20}+O_{p}(n^{-1/2})\), \(\tilde{\delta }=O_{p}(n^{-1/6})\), \(\tilde{\beta }_{1c}=\beta _{10}+O_{p}(n^{-1/6})\)and\(\tilde{\sigma }^{2}=\sigma _{0}^{2}+O_{p}(n^{-1/3})\).

When \(\delta _{0}>0\), the C2SLSE has the \(\sqrt{n}\)-rate of convergence; when \(\delta _{0}=0\), only \(\tilde{\lambda }\) and \(\tilde{\beta }_{2}\) have the \(\sqrt{n}\)-rate of convergence, and other parameter estimators have slower rates of convergence, with the rates equal to the corresponding ones of the MLE.

Monte Carlo

In this section, we report some Monte Carlo results on the estimates and tests considered in this paper.

We generate data from model (2.1). The spatial weights matrix \(W_{n}\) is based on the queen criterion and normalized to have row sums equal to one.Footnote 10 There are three variables in \(x_{ni}\): a constant term and two variables randomly drawn from the standard normal distribution. The true value of \(\beta =[\beta _{1},\beta _{2},\beta _{3}]'\) is \([0.5,0.5,0.5]'\), \(\lambda _{0}\) is either 0.2 or 0.6, \(\sigma _{0}^{2}\) is either 1 or 2, and \(\delta _{0}\) is either 0, 0.5, 1, 1.5, 2 or 2.5. The sample size n is either 144 or 400. The number of Monte Carlo repetitions is 5, 000. Due to a small percentage of outliers, we report the following robust measures of central tendency and dispersions of the MLEs and C2SLSEs: the median bias (MB), the median absolute deviation (MAD), and the interdecile range (IDR).Footnote 11 For the estimates of \(\delta \), we also report the percentages of estimates equal to zero, and the MBs, MADs and IDRs with zero estimates excluded.

The estimation results when \(\delta _{0}=0\) are reported in Table 1. The MLE-r is a restricted MLE with \(\delta =0\) imposed, i.e., the MLE of a standard SAR model. The MLE-r is \(\sqrt{n}\)-consistent under regularity conditions (Lee 2004). Recall that the information matrix of model (2.1) is singular when \(\delta _{0}=0\). As a result, only the MLEs and C2SLSEs of \(\lambda \), \(\beta _{2}\) and \(\beta _{3}\) have the \(\sqrt{n}\)-rate of convergence, and those of \(\beta _{1}\), \(\sigma ^{2}\) and \(\delta \) have slower rates of convergence. Table 1 shows that MLE-r performs the best, MLE has similar performance as that of MLE-r for \(\lambda \), \(\beta _{2}\) and \(\beta _{3}\), but MLE performs worse than MLE-r for \(\beta _{1}\) and \(\sigma ^{2}\). The MLEs and C2SLSEs of \(\lambda \), \(\beta _{2}\) and \(\beta _{3}\) have relatively small MBs in all cases, while those of \(\beta _{1}\), \(\sigma ^{2}\) and \(\delta \) can have larger MBs, especially those of \(\beta _{1}\). For \(\lambda \), the C2SLSEs have much larger MBs, MADs and IDRs than those of the MLEs; for \(\beta _{1}\), the C2SLSEs also have larger MBs, MADs and IDRs than those of the MLEs except when \(n=144\) and \(\lambda _{0}=0.6\); for \(\beta _{2}\) and \(\beta _{3}\), the C2SLSEs have similar MBs, MADs and IDRs as those of the MLEs; for \(\sigma ^{2}\), the C2SLSEs have larger MBs and MADs than those of the MLEs, but they have slightly smaller IDRs; for \(\delta \), the C2SLSEs have slightly smaller IDRs than those of the MLEs, but neither the MLE nor the C2SLSE has a dominating performance in terms of the MB and MAD. Note that some MBs and MADs of the estimates of \(\delta \) are 0.000. This is because more than \(50\%\) estimates are estimated as zero.

Table 2 reports the estimation results when \(\delta _{0}\ne 0\) and \(n=144\). In addition to MBs, MADs and IDRs, coverage probabilities (CP) of \(95\%\) confidence intervals are also reported for MLE-r and MLE. As MLE-r restricts the wrong restriction \(\delta =0\), it has large biases and extremely low CPs for \(\beta _{1}\) and \(\sigma ^{2}\). The MLEs and C2SLSEs have the \(\sqrt{n}\)-rate of convergence when \(\delta _{0}\ne 0\). The MBs of the MLEs are relatively small in all cases, and they are generally smaller than those of the C2SLSEs. For \(\lambda \), \(\beta _{1}\), \(\beta _{2}\) and \(\beta _{3}\), the MLEs have smaller MADs and IDRs than those of the C2SLSEs in most cases; for \(\sigma ^{2}\) and \(\delta \), the C2SLSEs have smaller MADs and IDRs in some cases. When \(\delta _{0}=1\), the CPs of the MLEs for \(\lambda \), \(\beta _{2}\) and \(\beta _{3}\) are close to the nominal \(95\%\), while the CPs of \(\beta _{1}\), \(\sigma ^{2}\) and \(\delta \) are significantly lower than \(95\%\); when \(\delta _{0}=2\), all CPs of the MLEs are close to \(95\%\). Note that in all cases, more than a quarter of MLEs and C2SLSEs of \(\delta \) are estimated as zero when \(\delta _{0}=1\), but less than \(2.5\%\) of MLEs and C2SLSEs of \(\delta \) are estimated as zero when \(\delta _{0}=2\). The large percentages of zero estimates when \(\delta _{0}=1\) explain why the CPs for MLEs and C2SLSEs of some parameters are much lower than the nominal level, while the small percentages of zero estimates when \(\delta _{0}=2\) explain why the CPs are close to the nominal level. Thus, with the sample size \(n=144\), we observe a relatively severe wrong skew problem for \(\delta _{0}=1\), which is mentioned in footnote 8.

Table 3 reports the estimation results when \(\delta _{0}\ne 0\) for a larger sample size \(n=400\). We observe that the MBs, MADs and IDRs are smaller than those in Table 2. Compared to the results with \(n=144\), the C2SLSEs of \(\sigma ^{2}\) and \(\delta \) have smaller MADs and IDRs than those of the MLEs in much fewer cases, and other patterns are similar. Note that, with the sample size \(n=400\), in all cases, less than \(15\%\) of MLEs and C2SLSEs of \(\delta \) are estimated as zero for \(\delta _{0}=1\), and almost all MLEs and C2SLSEs of \(\delta \) are positive for \(\delta _{0}=2\). For \(\delta _{0}=1\), with zero estimates excluded, while the MBs are larger, the MADs and IDRs are much smaller.

Empirical sizes of the score and LR tests are reported in Table 3. With \(n=144\), at the \(5\%\) level of significance, the size distortions of the score and LR tests are within, respectively, 0.9 and 0.8 percentage points; for the \(10\%\) level of significance, they are within, respectively, 1.6 and 1.3 percentage points. Size distortion generally decreases as n increases (from 144 to 400).

Table 4 reports empirical powers of the tests. The score and LR tests have similar powers. Powers increase as \(\delta _{0}\) or the sample size increases. For \(n=144\) and \(\delta _{0}=0.5\), the powers are similar to the significance level and are small; but for \(n=400\) and \(\delta _{0}=2.5\), the powers are all close to 1.

Table 1 MBs, MADs and IDRs of parameter estimates when \(\delta _0=0\)
Table 2 MBs, MADs and IDRs of parameter estimates when \(\delta _0\ne 0\) and \(n=144\)
Table 3 MBs, MADs and IDRs of parameter estimates when \(\delta _0\ne 0\) and \(n=400\)
Table 4 Empirical sizes of the score and LR tests
Table 5 Empirical powers of the score and LR tests

Conclusion

We study asymptotic properties of the MLE and a corrected 2SLSE for the SARSF model in this paper. When inefficiency exists, all model parameter estimators are \(\sqrt{n}\) consistent and asymptotically normal; when there is no inefficiency, only some parameter estimators are \(\sqrt{n}\) consistent and the rest of parameters have slower rates of convergence. We also derive the asymptotic distributions of the score and likelihood ratio test statistics that test for the existence of inefficiency.

For the SARSF model with exponential distribution under efficiency, some very preliminary investigation has indicated that its information matrix might still be non-singular and the rate of convergence of its ML estimator would still be regular. So it is likely that the irregularity of estimates for an SF model and an SARSF model, when all firms are efficiently operated, depends on a parametric form of a one-sided distribution of the possibly inefficient disturbance—a particular feature of a stochastic frontier model. Our analysis does not allow for distributional misspecification of inefficiency and disturbance terms. It is of interest to extend the analysis to the case with unknown distribution of inefficiency and disturbance terms in future research.

Notes

  1. 1.

    Some papers consider SF models with cross sectional dependence in error terms using a factor-based approach, e.g., Mastromarco et al. (2013), Mastromarco et al. (2016).

  2. 2.

    In the existing frontier function literature with interactions, there are several papers on model specification and empirical estimation but there are no rigorous asymptotic studies.

  3. 3.

    The definition of an NED random field is given in Appendix 1.

  4. 4.

    The same \(c_{1}\) as in Assumption 2 is used for simplicity. It can be any positive number smaller than 1.

  5. 5.

    See Jenish and Prucha (2012) for the detailed definition.

  6. 6.

    Due to the nonlinearity of model (2.1), a primitive identification condition is not obvious.

  7. 7.

    See also Rotnitzky et al. (2000) for such an analysis on models with i.i.d. data.

  8. 8.

    When \(\delta _{0}>0\), \(\frac{1}{n}\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}\) has a negative probability limit, by a proof similar to that for \(\frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{3}\) in the proof of Proposition 2.5. However, for a finite sample size in practice, it can be the case that \(\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}\ge 0\), which implies that \([\check{\eta }',0]'\) is a stationary point of the log likelihood function. This is the so-called “wrong skew” problem (see, e.g., Olson et al. 1980; Waldman 1982; Simar and Wilson 2010). Horrace and Wright (2019) study conditions for the existence of stationary points in parametric stochastic frontier models.

  9. 9.

    It can also be used as the starting point in optimization subroutines for the search of the MLE.

  10. 10.

    Connectivity for the queen criterion is based on a grid of cells. Each cell corresponds to a reference location, so the sample size is \(k^{2}\) for a \(k\times k\) grid. The spatial weight \(w_{n,ij}\) is 1 if cell i and cell j share a common side or vertex, and \(w_{n,ij}=0\) otherwise. See Kelejian and Robinson (1995) for more details and definitions of various types of spatial weights matrices.

  11. 11.

    The IDR is the difference between the \(90\%\) quantile and \(10\%\) quantile in the empirical distribution.

References

  1. Adetutu M, Glass AJ, Kenjegalieva K, Sickles RC (2015) The effects of efficiency and TFP growth on pollution in europe: a multistage spatial analysis. J Prod Anal 43:307–326

    Google Scholar 

  2. Aigner D, Lovell CK, Schmidt P (1977) Formulation and estimation of stochastic frontier production function models. J Econ 6:21–37

    Google Scholar 

  3. Andrews DWK (1992) Generic uniform convergence. Econ Theory 8:241–257

    Google Scholar 

  4. Areal FJ, Balcombe K, Tiffin R (2012) Integrating spatial dependence into stochastic frontier analysis. Aust J Agric Resource Econ 56:521–541

    Google Scholar 

  5. Brehm S (2013) Fiscal incentives, public spending, and productivity–county-level evidence from a Chinese province. World Dev 46:92–103

    Google Scholar 

  6. Carvalho A (2018) Efficiency spillovers in Bayesian stochastic frontier models: application to electricity distribution in New Zealand. Spat Econ Anal 13:171–190

    Google Scholar 

  7. Cliff A, Ord JK (1973) Spatial autocorrelation. Pion, London

    Google Scholar 

  8. Cliff A, Ord JK (1981) Spatial process: models and applications. Pion, London

    Google Scholar 

  9. Druska V, Horrace WC (2004) Generalized moments estimation for spatial panel data: Indonesian rice farming. Am J Agric Econ 86:185–198

    Google Scholar 

  10. Fusco E, Vidoli F (2013) Spatial stochastic frontier models: controlling spatial global and local heterogeneity. Int Rev Appl Econ 27:679–694

    Google Scholar 

  11. Glass A, Kenjegalieva K, Paez-Farrell J (2013) Productivity growth decomposition using a spatial autoregressive frontier model. Econ Lett 119:291–295

    Google Scholar 

  12. Glass A, Kenjegalieva K, Sickles RC (2014) Estimating efficiency spillovers with state level evidence for manufacturing in the US. Econ Lett 123:154–159

    Google Scholar 

  13. Glass AJ, Kenjegalieva K, Sickles RC (2016) A spatial autoregressive stochastic frontier model for panel data with asymmetric efficiency spillovers. J Econ 190:289–300

    Google Scholar 

  14. Greene WH (1990) A Gamma-distributed stochastic frontier model. J Econ 46:141–163

    Google Scholar 

  15. Horrace WC, Wright IA (2019) Stationary points for parametric stochastic frontier models. J Bus Econ Stat 56:999

    Google Scholar 

  16. Jenish N, Prucha IR (2009) Central limit theorems and uniform laws of large numbers for arrays of random fields. J Econ 150:86–98

    Google Scholar 

  17. Jenish N, Prucha IR (2012) On spatial processes and asymptotic inference under near-epoch dependence. J Econ 170:178–190

    Google Scholar 

  18. Kelejian HH, Prucha IR (2001) On the asymptotic distribution of the Moran \({I}\) test statistic with applications. J Econ 104:219–257

    Google Scholar 

  19. Kelejian HH, Prucha IR (2010) Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J Econ 157:53–67

    Google Scholar 

  20. Kelejian HH, Robinson DP (1995) Spatial correlation: a suggested alternative to the autoregressive model. In: Anselin L, Florax RJ (eds) New directions in spatial econometrics. Springer, Berlin

    Google Scholar 

  21. Kumbhakar SC, Parmeter CF, Tsionas EG (2013) A zero inefficiency stochastic frontier model. J Econ 172:66–76

    Google Scholar 

  22. Lahiri SN (1996) On inconsistency of estimators based on spatial data under infill asymptotics. Sankhyā 58:403–417

    Google Scholar 

  23. Lee LF (1993) Asymptotic distribution of the maximum likelihood estimator for a stochastic frontier function model with a singular information matrix. Econ Theory 9:413–430

    Google Scholar 

  24. Lee LF (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72:1899–1925

    Google Scholar 

  25. Lee LF, Chesher A (1986) Specification testing when score test statistics are identically zero. J Econ 31:121–149

    Google Scholar 

  26. LeSage J, Pace RK (2009) Introduction to spatial econometrics. Chapman and Hall/CRC, London

    Google Scholar 

  27. Mastromarco C, Serlenga L, Shin Y (2013) Globalisation and technological convergence in the EU. J Prod Anal 40:15–29

    Google Scholar 

  28. Mastromarco C, Serlenga L, Shin Y (2016) Modelling technical efficiency in cross sectionally dependent stochastic frontier panels. J Appl Econ 31:281–297

    Google Scholar 

  29. Meeusen W, van Den Broeck J (1977) Efficiency estimation from Cobb-Douglas production functions with composed error. Int Econ Rev 18:435–444

    Google Scholar 

  30. Olson JA, Schmidt P, Waldman DM (1980) A Monte Carlo study of estimators of stochastic frontier production functions. J Econ 13:67–82

    Google Scholar 

  31. Ord K (1975) Estimation methods for models of spatial interaction. J Am Stat Assoc 70:120–126

    Google Scholar 

  32. Pavlyuk D (2011) Application of the spatial stochastic frontier model for analysis of a regional tourism sector. Transp Telecommun 12:28–38

    Google Scholar 

  33. Pavlyuk D (2013) Distinguishing between spatial heterogeneity and inefficiency: spatial stochastic frontier analysis of European airports. Transp Telecommun 14:29–38

    Google Scholar 

  34. Rotnitzky A, Cox DR, Bottai M, Robins J (2000) Likelihood-based inference with singular information matrix. Bernoulli 6:243–284

    Google Scholar 

  35. Schmidt AM, Moreira ARB, Helfand SM, Fonseca TCO (2009) Spatial stochastic frontier models: accounting for unobserved local determinants of inefficiency. J Prod Anal 31:101–112

    Google Scholar 

  36. Simar L, Wilson PW (2010) Inferences from cross-sectional, stochastic frontier models. Econ Rev 29:62–98

    Google Scholar 

  37. Stevenson RE (1980) Likelihood functions for generalized stochastic frontier estimation. J Econ 13:57–66

    Google Scholar 

  38. Tsionas EG, Michaelides PG (2016) A spatial stochastic frontier model with spillovers: evidence from Italian regions. Scottish J Political Econ 63:243–257

    Google Scholar 

  39. Vidolia F, Cardillob C, Fuscoc E, Canello J (2016) Spatial nonstationarity in the stochastic frontier model: an application to the Italian wine industry. Reg Sci Urban Econ 61:153–164

    Google Scholar 

  40. Waldman DM (1982) A stationary point for the stochastic frontier likelihood. J Econ 18:275–279

    Google Scholar 

  41. White H (1994) Estimation, inference and specification analysis. Cambridge University Press, New York

    Google Scholar 

  42. Xu X, Lee LF (2015) Maximum likelihood estimation of a spatial autoregressive Tobit model. J Econ 188:264–280

    Google Scholar 

Download references

Acknowledgements

We are grateful to the editor James LeSage and three anonymous referees for helpful comments that lead to improvements of this paper. Fei Jin gratefully acknowledges the financial support from the National Natural Science Foundation of China (No. 71973030 and No. 71833004).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Fei Jin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Second order derivatives of \(\ln L_{n}(\theta )\)

$$\begin{aligned} \frac{\partial ^{2}\ln L_{n}(\theta )}{\partial \lambda ^{2}}&=-{\text {tr}}[G_{n}^{2}(\lambda )]-\frac{1}{\sigma ^{2}}\sum _{i=1}^{n}(w_{n,i\cdot }Y_{n})^{2}+\frac{\delta ^{2}}{\sigma ^{2}}\sum _{i=1}^{n}(w_{n,i\cdot }Y_{n})^{2}f^{(1)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(A.1)
$$\begin{aligned} \frac{\partial ^{2}\ln L_{n}(\theta )}{\partial \lambda \partial \beta }&=-\frac{1}{\sigma ^{2}}\sum _{i=1}^{n}w_{n,i\cdot }Y_{n}x_{ni}+\frac{\delta ^{2}}{\sigma ^{2}}\sum _{i=1}^{n}w_{n,i\cdot }Y_{n}x_{ni}f^{(1)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(A.2)
$$\begin{aligned} \frac{\partial ^{2}\ln L_{n}(\theta )}{\partial \lambda \partial \sigma ^{2}}&=-\frac{1}{\sigma ^{4}}\sum _{i=1}^{n}w_{n,i\cdot }Y_{n}\epsilon _{ni}(\lambda ,\beta )-\frac{\delta }{2\sigma ^{3}}\sum _{i=1}^{n}w_{n,i\cdot }Y_{n}f\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) \nonumber \\&\quad +\frac{\delta ^{2}}{2\sigma ^{4}}\sum _{i=1}^{n}w_{n,i\cdot }Y_{n}\epsilon _{ni}(\lambda ,\beta )f^{(1)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(A.3)
$$\begin{aligned} \frac{\partial ^{2}\ln L_{n}(\theta )}{\partial \lambda \partial \delta }&=\frac{1}{\sigma }\sum _{i=1}^{n}w_{n,i\cdot }Y_{n}f\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) -\frac{\delta }{\sigma ^{2}}\sum _{i=1}^{n}w_{n,i\cdot }Y_{n}\epsilon _{ni}(\lambda ,\beta )f^{(1)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(A.4)
$$\begin{aligned} \frac{\partial ^{2}\ln L_{n}(\theta )}{\partial \beta \partial \beta '}&=-\frac{1}{\sigma ^{2}}\sum _{i=1}^{n}x_{ni}x_{ni}'+\frac{\delta ^{2}}{\sigma ^{2}}\sum _{i=1}^{n}x_{ni}x_{ni}'f^{(1)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(A.5)
$$\begin{aligned} \frac{\partial ^{2}\ln L_{n}(\theta )}{\partial \beta \partial \sigma ^{2}}&=-\frac{1}{\sigma ^{4}}\sum _{i=1}^{n}x_{ni}\epsilon _{ni}(\lambda ,\beta )-\frac{\delta }{2\sigma ^{3}}\sum _{i=1}^{n}x_{ni}f\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) \nonumber \\&\quad +\frac{\delta ^{2}}{2\sigma ^{4}}\sum _{i=1}^{n}x_{ni}\epsilon _{ni}(\lambda ,\beta )f^{(1)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(A.6)
$$\begin{aligned} \frac{\partial ^{2}\ln L_{n}(\theta )}{\partial \beta \partial \delta }&=\frac{1}{\sigma }\sum _{i=1}^{n}x_{ni}f\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) -\frac{\delta }{\sigma ^{2}}\sum _{i=1}^{n}x_{ni}\epsilon _{ni}(\lambda ,\beta )f^{(1)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(A.7)
$$\begin{aligned} \frac{\partial ^{2}\ln L_{n}(\theta )}{\partial ^{2}\sigma ^{2}}&=\frac{n}{2\sigma ^{4}}-\frac{1}{\sigma ^{6}}\sum _{i=1}^{n}\epsilon _{ni}^{2}(\lambda ,\beta )-\frac{3\delta }{4\sigma ^{5}}\sum _{i=1}^{n}\epsilon _{ni}(\lambda ,\beta )f\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) \nonumber \\&\quad +\frac{\delta ^{2}}{4\sigma ^{6}}\sum _{i=1}^{n}\epsilon _{ni}^{2}(\lambda ,\beta )f^{(1)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(A.8)
$$\begin{aligned} \frac{\partial ^{2}\ln L_{n}(\theta )}{\partial \sigma ^{2}\partial \delta }&=\frac{1}{2\sigma ^{3}}\sum _{i=1}^{n}\epsilon _{ni}(\lambda ,\beta )f\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) -\frac{\delta }{2\sigma ^{4}}\sum _{i=1}^{n}\epsilon _{ni}^{2}(\lambda ,\beta )f^{(1)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(A.9)
$$\begin{aligned} \frac{\partial ^{2}\ln L_{n}(\theta )}{\partial \delta ^{2}}&=\frac{1}{\sigma ^{2}}\sum _{i=1}^{n}\epsilon _{ni}^{2}(\lambda ,\beta )f^{(1)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) , \end{aligned}$$
(A.10)

where \(f^{(1)}(t)=\frac{\partial f(t)}{\partial t}=-tf(t)-f^{2}(t)\).

Appendix B: Proofs

For any random variable t with a finite pth absolute moment, where \(p\ge 1\), denote its \(L_{p}\)-norm by \(\Vert t\Vert _{p}={\text {E}}[|t|^{p}]^{1/p}\). Let \(g=\{g_{ni},i\in D_{n},n\ge 1\}\) and \(\nu =\{\nu _{ni},i\in D_{n},n\ge 1\}\) be two random fields, where \(D_{n}\) satisfies Assumption 1. Assume that g is uniformly \(L_{p}\) bounded for some \(p\ge 1\), i.e., \(\sup _{i,n}\Vert g_{ni}\Vert _{p}<\infty \). Let \({\mathcal {F}}_{ni}(s)\) be the \(\sigma \)-field generated by the random variables \(\nu _{nj}\)’s with units j’s located within the ball \(B_{i}(s)\), where \(B_{i}(s)\) is centered at i with radius s. The random field g is said to be \(L_{p}\)-NED on \(\nu \) if \(\Vert g_{ni}-{\text {E}}(g_{ni}|{\mathcal {F}}_{ni}(s))\Vert _{p}\le d_{ni}\psi (s)\) for some arrays of finite positive constants \(\{d_{ni},i\in D_{n},n\ge 1\}\) and for some sequence \(\psi (s)\ge 0\) such that \(\lim _{s\rightarrow \infty }\psi (s)=0\). The \(\psi (s)\) is called the NED coefficient. If we also have \(\sup _{n}\sup _{i\in D_{n}}d_{ni}<\infty ,\)g is said to be uniformly \(L_{p}\)-NED on \(\nu \).

The results in the following lemmas are frequently used in subsequent proofs, so we collect them in lemmas. For \(j\ge 0,\)\(f^{(j)}(t)\) denotes the jth derivative of f(t). In the following proofs, c will denote a generic positive constant that may be different in different cases.

Lemma B. 1

Suppose that Assumptions14and7hold. Let\(\Lambda \)and\({\mathcal {B}}\)be, respectively, the parameter spaces of\(\lambda \)and\(\beta \).

  1. (a)

    If \(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{p}<\infty \) for some \(p\ge 1\), then \(\sup _{i,n}\Vert y_{ni}\Vert _{p}<\infty \), \(\sup _{i,n}\Vert w_{n,i\cdot }Y_{n}\Vert _{p}<\infty \), and \(\sup _{i,n,\lambda \in \Lambda ,\beta \in {\mathcal {B}}}\Vert \epsilon _{ni}(\lambda ,\beta )\Vert _{p}<\infty .\)

  2. (b)

    If \(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{p}<\infty \) for some \(p\ge 2\), \(\sup _{i,n,\lambda \in \Lambda ,\beta \in {\mathcal {B}}}\Vert \ln \Phi (-\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta ))\Vert _{p/2}<\infty \); if \(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{p}<\infty \) for some \(p\ge j+1\) with \(j\ge 0\),

    $$\begin{aligned} \sup _{i,n,\lambda \in \Lambda ,\beta \in {\mathcal {B}}}\left\| f^{(j)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) \right\| _{p/(j+1)}<\infty . \end{aligned}$$

Proof

(a) The reduced form of \(y_{ni}\) is \(y_{ni}=\sum _{j=1}^{n}t_{n,ij}(x_{nj}'\beta _{0}+v_{nj}-u_{nj})\), where \(t_{n,ij}\) is the (ij)th element of \((I_{n}-\lambda _{0}W_{n})^{-1}\). For any matrix \(A=[a_{ij}]\), let \(abs(A)=[|a_{ij}|]\). Note that \(abs((I_{n}-\lambda _{0}W_{n})^{-1})\le ^{*}(I_{n}-abs(\lambda _{0}W_{n}))^{-1}\), where \(B\le ^{*}C\) for conformable matrices \(B=[b_{ij}]\) and \(C=[c_{ij}]\) means that \(b_{ij}\le c_{ij}\) for all ij. Denote \(M_{n}=(I_{n}-abs(\lambda _{0}W_{n}))^{-1}=[m_{n,ij}]\). Then \(\sup _{i,n}\Vert y_{ni}\Vert _{p}\le \sup _{i,n}\sum _{j=1}^{n}m_{n,ij}(\sum _{k=1}^{k_{x}}\Vert x_{nj,k}\Vert _{p}|\beta _{0k}|+\Vert v_{nj}\Vert _{p}+\Vert u_{nj}\Vert _{p})\) by the Minkowski inequality. As \(\lambda _{m}\sup _{n}\Vert W_{n}\Vert _{\infty }<\infty \), \(\sup _{1\le k\le k_{x},j,n}\Vert x_{nj,k}\Vert _{p}<\infty \), \(\sup _{j,n}\Vert v_{nj}\Vert _{p}<\infty \) and \(\sup _{j,n}\Vert u_{nj}\Vert _{p}<\infty \), we have \(\sup _{i,n}\Vert y_{ni}\Vert _{p}<\infty \). So is \(\{w_{n,i\cdot }Y_{n}\}_{i=1}^{n}\). As \(\epsilon _{ni}(\lambda ,\beta )=y_{ni}-\lambda w_{n,i\cdot }Y_{n}-x_{ni}'\beta \), by the Minkowski inequality, \(\sup _{i,n,\lambda \in \Lambda ,\beta \in {\mathcal {B}}}\Vert \epsilon _{ni}(\lambda ,\beta )\Vert _{p}<\infty \).

(b) By the proof of Lemma A.9 in Xu and Lee (2015), \(|\ln \Phi (t)|\le c(t^{2}+|t|+1)\) and \(|f(t)|\le 2|t|+c\). Then the first result follows by the Minkowski inequality. Since \(f^{(1)}(t)=-tf(t)-f^{2}(t)\), \(f^{(j)}(t)\) for \(j>1\) can be derived recursively and it can be regarded as a \((j+1)\)-order polynomial function of [tf(t)]. Thus, \(|f^{(j)}(t)|\le c(|t|^{j+1}+\cdots +1)\). With \(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{p}<\infty \) for some \(p\ge j+1\), the second result follows by the Minkowski inequality and (a). \(\square \)

Lemma B. 2

Suppose that Assumptions14and7hold. Let\(\Lambda \)and\({\mathcal {B}}\)be, respectively, the parameter spaces of\(\lambda \)and\(\beta \).

  1. (a)

    If\(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{p}<\infty \)for some\(p\ge 2\), \(\{y_{ni}\}_{i=1}^{n}\)is uniformly\(L_{2}\)-NED on\(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\)with NED coefficient\(\psi (s)=c_{1}^{s/d_{0}}\)under Assumption3(a), where\(c_{1}\)is defined in Assumption2; and\(\psi (s)=s^{-(\alpha -d)}\)under Assumption3(b). The same holds for\(\{w_{n,i\cdot }Y_{n}\}_{i=1}^{n}\)and\(\{\epsilon _{ni}(\lambda ,\beta )\}_{i=1}^{n}\).

  2. (b)

    If\(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{p}<\infty \)forsome\(p>4\), then\(\{w_{n,i\cdot }Y_{n}\epsilon _{ni}\}_{i=1}^{n}\), \(\{w_{n,i\cdot }Y_{n}x_{ni,j}\}_{i=1}^{n}\)and\(\{w_{n,i\cdot }Y_{n}f(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\}_{i=1}^{n}\)are uniformly\(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\)with NED coefficient\(\psi (s)=c_{1}^{s(p-4)/[d_{0}(2p-4)]}\)under Assumption3(a), and\(\psi (s)=s^{-(\alpha -d)(p-4)/(2p-4)}\)under Assumption3(b).

  3. (c)

    If\(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{p}<\infty \)forsome\(p>4\), \(\{\epsilon _{ni}^{2}(\lambda ,\beta )\}_{i=1}^{n}\)and\(\{\ln \Phi (-\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta ))\}_{i=1}^{n}\)are uniformly\(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\); if \(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{p}<\infty \)for some\(p>2j\), \(\{(w_{n,i\cdot }Y_{n})^{j}\}_{i=1}^{n}\)is uniformly\(L_{2}\)-NED on\(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\); if \(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{p}<\infty \)for some\(p>6\), \(\{f(-\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta ))\}_{i=1}^{n}\)is uniformly\(L_{2}\)-NED.

Proof

(a) By Proposition 1 in Jenish and Prucha (2012), \(\{y_{ni}\}_{i=1}^{n}\) is uniformly \(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\) if \(\lim _{s\rightarrow \infty }\sup _{i,n}\sum _{j:d(i,j)>s}m_{n,ij}=0\) and \(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{2}<\infty \). By the proof of Proposition 1 in Xu and Lee (2015), \(\sup _{i,n}\sum _{j:d(i,j)>s}m_{n,ij}\le cc_{1}^{s/d_{0}}\) under Assumption 3(a), and \(\sup _{i,n}\sum _{j:d(i,j)>s}m_{n,ij}\le cs^{-(\alpha -d)}\) under Assumption 3(b). Thus \(\{y_{ni}\}_{i=1}^{n}\) is uniformly\(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\) with NED coefficient \(\psi (s)=c_{1}^{s/d_{0}}\) under Assumption 3(a) and \(\psi (s)=s^{-(\alpha -d)}\) under Assumption 3(b). With the NED property of \(\{y_{ni}\}_{i=1}^{n}\), by the proof of Proposition 1 in Xu and Lee (2015), \(\{w_{n,i\cdot }Y_{n}\}_{i=1}^{n}\) is also uniformly \(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\) with the same NED coefficient as that of \(\{y_{ni}\}_{i=1}^{n}\). The same holds for \(\epsilon _{ni}(\lambda ,\beta )\) as \(\epsilon _{ni}(\lambda ,\beta )=y_{ni}-\lambda w_{n,i\cdot }Y_{n}-x_{ni}'\beta \) is linear in \(y_{ni}\), \(w_{n,i\cdot }Y_{n}\) and \(x_{ni}\).

(b) With the NED property of \(\{w_{n,i\cdot }Y_{n}\}_{i=1}^{n}\) in (a), the results directly follow by Lemma A.2 in Xu and Lee (2015).

(c) By the mean value theorem, \(|t_{1}^{j}-t_{2}^{j}|\le j|\bar{t}|^{j-1}\cdot |t_{1}-t_{2}|\le j(|t_{1}|^{j-1}+|t_{2}|^{j-1})\cdot |t_{1}-t_{2}|\), \(|\ln \Phi (t_{1})-\ln \Phi (t_{2})|\le |f(\dot{t})|\cdot |t_{1}-t_{2}|\le (2|t_{1}|+2|t_{2}|+c)|t_{1}-t_{2}|\) and \(|f(t_{1})-f(t_{2})|\le |f^{(1)}(\ddot{t})|\cdot |t_{1}-t_{2}|\le c(t_{1}^{2}+t_{2}^{2}+1)\cdot |t_{1}-t_{2}|\), where \(\bar{t}\), \(\dot{t}\) and \(\ddot{t}\) are between \(t_{1}\) and \(t_{2}\). By Lemma B. 1(a), if \(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{p}<\infty \), then \(\sup _{i,n}\Vert \epsilon _{ni}(\lambda ,\beta )\Vert _{p}\le \infty \) and \(\sup _{i,n}\Vert w_{n,i\cdot }Y_{n}\Vert _{p}\le \infty \). The NED results in the lemma on functions of \(\epsilon _{ni}(\lambda ,\beta )\) and \(w_{n,i\cdot }Y_{n}\) then follow by Lemma A.4 in Xu and Lee (2015). \(\square \)

Proof of Proposition 2.1

We first prove the uniform convergence of \(\ln L_{n}(\theta )\) that \(\sup _{\theta \in \Theta }\frac{1}{n}|\ln L_{n}(\theta )-{\text {E}}[\ln L_{n}(\theta )]|=o_{p}(1)\). Note that

$$\begin{aligned} \frac{1}{n}[\ln L_{n}(\theta )-{\text {E}}[\ln L_{n}(\theta )]]=a_{1n}(\theta )+a_{2n}(\theta ), \end{aligned}$$

where \(a_{1n}(\theta )=-\frac{1}{2n\sigma ^{2}}\sum _{i=1}^{n}\{\epsilon _{ni}^{2}(\lambda ,\beta )-{\text {E}}[\epsilon _{ni}^{2}(\lambda ,\beta )]\}\) and \(a_{2n}(\theta )=\frac{1}{n}\sum _{i=1}^{n}\left\{ \ln \Phi \left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) -{\text {E}}\left[ \ln \Phi \left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) \right] \right\} .\) We shall prove that \(\sup _{\theta \in \Theta }|a_{1n}(\theta )|=o_{p}(1)\) and \(\sup _{\theta \in \Theta }|a_{2n}(\theta )|=o_{p}(1)\). Under Assumptions 15, by Lemma B. 1, \(\{\epsilon _{ni}^{2}(\lambda ,\beta )\}_{i=1}^{n}\) and \(\{\ln \Phi (-\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta ))\}_{i=1}^{n}\) are uniformly \(L_{2+\iota /2}\) bounded; by Lemma B. 2(c), they are uniformly \(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\). Then \(a_{1n}(\theta )=o_{p}(1)\) and \(a_{2n}(\theta )=o_{p}(1)\) by Theorem 1 in Jenish and Prucha (2012). Since \(a_{1n}(\theta )\) is quadratic in \([\lambda ,\beta ']\) and the parameter space of \(\sigma ^{2}\) is compact, \(\sup _{\theta \in \Theta }|a_{1n}(\theta )|=o_{p}(1)\). For the proof of \(\sup _{\theta \in \Theta }|a_{2n}(\theta )|=o_{p}(1)\), with \(a_{2n}(\theta )=o_{p}(1)\), by Theorem 1 in Andrews (1992), we only need to prove that \(a_{2n}(\theta )\) is stochastically equicontinuous (SE). Denote \( {\dot{\lambda }}=-\frac{\delta }{\sigma }\lambda \), \(\dot{\beta }=-\frac{\delta }{\sigma }\beta \), \(\dot{\delta }=-\frac{\delta }{\sigma }\) and \(\dot{\theta }=[ {\dot{\lambda }},\dot{\beta }',\sigma ^{2},\dot{\delta }]'\). Since the parameter space of \(\theta \) is compact, the elements of \(\frac{\partial \dot{\lambda }}{\partial \theta }\), \(\frac{\partial \dot{\beta }}{\partial \theta '}\) and \(\frac{\partial \dot{\delta }}{\partial \theta }\) are bounded. Then by Lemma A.5 in Xu and Lee (2015), we only need to prove that \(\frac{1}{n}\sum _{i=1}^{n}\{\ln \Phi (\dot{\epsilon }_{ni}(\dot{\theta }))-{\text {E}}[\ln \Phi (\dot{\epsilon }_{ni}(\dot{\theta })]\}\) is SE, where \(\dot{\epsilon }_{ni}(\dot{\theta })=\dot{\delta }y_{ni}-\dot{\lambda }w_{n,i\cdot }Y_{n}-x_{ni}'\dot{\beta }\). By the proof of Lemma B. 2, \(|\ln \Phi (t_{1})-\ln \Phi (t_{2})|\le (2|t_{1}|+2|t_{2}|+c)|t_{1}-t_{2}|\). Then

$$\begin{aligned}&\Bigl |\frac{1}{n}\sum _{i=1}^{n}\ln \Phi (\dot{\epsilon }_{ni}(\dot{\theta }_{1}))-\frac{1}{n}\sum _{i=1}^{n}\ln \Phi (\dot{\epsilon }_{ni}(\dot{\theta }_{2}))\Bigr |\\&\le \frac{1}{n}\sum _{i=1}^{n}[2|\dot{\epsilon }_{ni}(\dot{\theta }_{1})|+2|\dot{\epsilon }_{ni}(\dot{\theta }_{2})|+c]\cdot |\dot{\epsilon }_{ni}(\dot{\theta }_{1})-\dot{\epsilon }_{ni}(\dot{\theta }_{2})|\\&\le \frac{1}{n}\sum _{i=1}^{n}[2|\dot{\epsilon }_{ni}(\dot{\theta }_{1})|+2|\dot{\epsilon }_{ni}(\dot{\theta }_{2})|+c]\Bigr [|y_{ni}|+|w_{n,i\cdot }Y_{n}|+\sum _{k=1}^{k_{x}}|x_{ni,k}|\Bigr ]\\&\quad \times \Bigl (|\dot{\delta }_{1}-\dot{\delta }_{2}|+|\dot{\lambda }_{1}-\dot{\lambda }_{2}|+\sum _{k=1}^{k_{x}}|\dot{\beta }_{1k}-\dot{\beta }_{2k}|\Bigr ). \end{aligned}$$

Since \(\{y_{ni}\}_{i=1}^{n}\), \(\{w_{n,i\cdot }Y_{n}\}_{i=1}^{n}\) and \(\{x_{ni,k}\}_{i=1}^{n}\) are uniformly \(L_{4}\) bounded, each term of \([2|\dot{\epsilon }_{ni}(\dot{\theta }_{1})|+2|\dot{\epsilon }_{ni}(\dot{\theta }_{2})|+c][|y_{ni}|+|w_{n,i\cdot }Y_{n}|+\sum _{k=1}^{k_{x}}|x_{ni,k}|]\) is uniformly \(L_{2}\) bounded by the Cauchy-Schwarz inequality. It follows that \(\frac{1}{n}\sum _{i=1}^{n}{\text {E}}[\ln \Phi (\dot{\epsilon }_{ni}(\dot{\theta })]\) is equicontinuous. Thus, \(\frac{1}{n}\sum _{i=1}^{n}\{\ln \Phi (\dot{\epsilon }_{ni}(\dot{\theta }))-{\text {E}}[\ln \Phi (\dot{\epsilon }_{ni}(\dot{\theta })]\}\) is SE by Lemma 1(a) in Andrews (1992). As the parameter space of \(\sigma ^{2}\) is compact and \(\epsilon _{ni}(\lambda ,\beta )\) is linear in \(\theta \), \(-\frac{1}{2n\sigma ^{2}}\sum _{i=1}^{n}{\text {E}}[\epsilon _{ni}^{2}(\lambda ,\beta )]\) is equicontinuous. It follows that \(\frac{1}{n}{\text {E}}[\ln L_{n}(\theta )]\) is equicontinuous.

With the identification condition in Assumption 6, the uniform convergence \(\sup _{\theta \in \Theta }\frac{1}{n}|\ln L_{n}(\theta )-{\text {E}}[\ln L_{n}(\theta )]|=o_{p}(1)\) and the equicontinuity of \(\frac{1}{n}{\text {E}}[\ln L_{n}(\theta )]\), the consistency of the MLE follows. \(\square \)

Proof of Proposition 2.2

By the mean value theorem, \(0=\frac{\partial \ln L_{n}(\hat{\theta })}{\partial \theta }=\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta }+\frac{\partial ^{2}\ln L_{n}(\bar{\theta })}{\partial \theta \partial \theta '}(\hat{\theta }-\theta _{0})\), where \(\bar{\theta }\) lies between \(\hat{\theta }\) and \(\theta _{0}\). Then

$$\begin{aligned} \sqrt{n}(\hat{\theta }-\theta _{0})=\left( -\frac{1}{n}\frac{\partial ^{2}\ln L_{n}(\bar{\theta })}{\partial \theta \partial \theta '}\right) ^{-1}\frac{1}{\sqrt{n}}\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta }. \end{aligned}$$

We first prove the asymptotic normality of \(\frac{1}{\sqrt{n}}\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta }\). The elements of \(\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta }\) have the following forms: \(c\sum _{i=1}^{n}w_{n,i\cdot }Y_{n}\epsilon _{ni}\), \(c\sum _{i=1}^{n}w_{n,i\cdot }Y_{n}f(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\), \(c\sum _{i=1}^{n}x_{ni}\epsilon _{ni}\), \(c\sum _{i=1}^{n}x_{ni}f(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\), \(c\sum _{i=1}^{n}\epsilon _{ni}^{2}\), \(c\sum _{i=1}^{n}\epsilon _{ni}f(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\) and c. By Lemma B. 1(a) and the Cauchy-Schwarz inequality, \(\{w_{n,i\cdot }Y_{n}\epsilon _{ni}\}_{i=1}^{n}\) is uniformly \(L_{p/2}\) bounded; by Lemma B. 2(b), \(\{w_{n,i\cdot }Y_{n}\epsilon _{ni}\}_{i=1}^{n}\) is uniformly \(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\) with NED coefficient \(\psi (s)=c_{1}^{s(p-4)/[d_{0}(2p-4)]}\) under Assumption 3(a) and \(\psi (s)=s^{-(\alpha -d)(p-4)/(2p-4)}\) under Assumption 3(b), where \(p=6\) under Assumption 10(a). To apply the CLT in Theorem 2 of Jenish and Prucha (2012) to \(\{w_{n,i\cdot }Y_{n}\epsilon _{ni}\}_{i=1}^{n}\), \(\psi (s)\) should satisfy \(\sum _{s=1}^{\infty }s^{d-1}\psi (s)<\infty \). If \(\psi (s)=c_{1}^{s(p-4)/[d_{0}(2p-4)]}\), then \(\sum _{s=1}^{\infty }s^{d-1}\psi (s)<\infty \) as \(0<c_{1}^{(p-4)/[d_{0}(2p-4)]}<1\); if \(\psi (s)=s^{-(\alpha -d)(p-4)/(2p-4)}\), \(\sum _{s=1}^{\infty }s^{d-1}\psi (s)<\infty \) requires \(d-(\alpha -d)(p-4)/(2p-4)<0\), i.e. \(\alpha >(3+\frac{4}{p-4})d\). With \(p=6\), \(\alpha >5d\) is maintained in Assumption 10(b). In addition, for the \(\alpha \)-mixing coefficient of \(\{x_{ni}\}_{i=1}^{n}\), \(\hat{\alpha }(s)\) should satisfy Assumption 3 in Jenish and Prucha (2012): \(\sum _{s=1}^{\infty }s^{d[1+c_{3}\iota ^{*}/(2+\iota ^{*})]-1}[\hat{\alpha }(s)]^{\iota ^{*}/(4+2\iota ^{*})}<\infty \), where \(\iota ^{*}\) is some positive number smaller than \(p/2-2\) so that Assumption 4(a) in Jenish and Prucha (2012) is satisfied. This condition is maintained in Assumption 10(c). By Lemma B. 2, \(\{w_{n,i\cdot }Y_{n}f(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\}_{i=1}^{n}\) has the same NED property as \(\{w_{n,i\cdot }Y_{n}\epsilon _{ni}\}_{i=1}^{n}\). The rest of random fields \(\{cx_{ni}\epsilon _{ni}\}_{i=1}^{n}\), \(\{cx_{ni}f(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\}_{i=1}^{n}\), \(\{c\epsilon _{ni}^{2}\}_{i=1}^{n}\) and \(\{c\epsilon _{ni}f(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\}_{i=1}^{n}\) involved in \(\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta }\) are trivially \(L_{2}\)-NED with \(\psi (s)=0\). Then by the CLT in Theorem 2 of Jenish and Prucha (2012), \(\frac{1}{\sqrt{n}}\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta }\xrightarrow {d}N(0,\lim _{n\rightarrow \infty }\frac{1}{n}{\text {E}}(\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta }\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta '}))\).

As \(\frac{1}{n}{\text {E}}(\frac{\partial ^{2}\ln L_{n}(\theta _{0})}{\partial \theta \partial \theta '})=-\frac{1}{n}{\text {E}}(\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta }\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta '})\), it remains to prove that \(\frac{1}{n}\frac{\partial ^{2}\ln L_{n}(\bar{\theta })}{\partial \theta \partial \theta '}={\text {E}}(\frac{1}{n}\frac{\partial ^{2}\ln L_{n}(\theta _{0})}{\partial \theta \partial \theta '})+o_{p}(1)\). We shall prove that (i) \(\frac{1}{n}\frac{\partial ^{2}\ln L_{n}(\bar{\theta })}{\partial \theta \partial \theta '}=\frac{1}{n}\frac{\partial ^{2}\ln L_{n}(\theta _{0})}{\partial \theta \partial \theta '}+o_{p}(1)\) and (ii) \(\frac{1}{n}\frac{\partial ^{2}\ln L_{n}(\theta _{0})}{\partial \theta \partial \theta '}={\text {E}}(\frac{1}{n}\frac{\partial ^{2}\ln L_{n}(\theta _{0})}{\partial \theta \partial \theta '})+o_{p}(1)\). To prove (i), we first prove a general result on the order of the jth derivative of \(\frac{1}{n}\ln L_{n}(\theta )\). Except \(-{\text {tr}}(G_{n}^{k}(\lambda ))=O(n)\) for \(1\le k\le j\), other terms in the jth derivative of \(\ln L_{n}(\theta )\) have the form \(c(\theta )\sum _{i=1}^{n}h_{ni,1}\cdots h_{ni,j}\) or \(c(\theta )\sum _{i=1}^{n}h_{ni,1}\cdots h_{ni,j}f^{(k)}(-\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta ))\), where \(c(\theta )\) is a function of \(\theta \), \(0\le k\le j-1\), and \(h_{ni,r}\) for \(1\le r\le j\) is either 1, \(\epsilon _{ni}(\lambda ,\beta )\), \(w_{n,i\cdot }Y_{n}\), or an element of \(x_{ni}\). By Lemma B. 1(b), \(\sup _{i,n,\lambda \in \Lambda ,\beta \in {\mathcal {B}}}\Vert f^{(k)}(-\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta ))\Vert _{p/(k+1)}<\infty \) if \(\sup _{i,k,n}\Vert x_{ni,k}\Vert _{p}<\infty \) for \(p\ge k+1\). If \(\sup _{i,r,n}{\text {E}}(|h_{ni,r}|^{2j})<\infty \), then \(\sup _{i,n}{\text {E}}[|h_{ni,1}\cdots h_{ni,j}|]\le \sup _{i,n}\Vert h_{ni,1}\Vert _{j}\cdots \Vert h_{ni,j}\Vert _{j}<\infty \) and

$$\begin{aligned}&\sup _{i,n,\lambda \in \Lambda ,\beta \in {\mathcal {B}}}{\text {E}}\left[ \left| h_{ni,1}\cdots h_{ni,j}f^{(k)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) \right| \right] \\&\le \sup _{i,n,\lambda \in \Lambda ,\beta \in {\mathcal {B}}}\Vert h_{ni,1}\Vert _{2j}\cdots \Vert h_{ni,j}\Vert _{2j}\left\| f^{(k)}\left( -\frac{\delta }{\sigma }\epsilon _{ni}(\lambda ,\beta )\right) \right\| _{2}<\infty \end{aligned}$$

for \(1\le k\le j-1\) by the generalized Hölder’s inequality and Lemma B. 1(a). Thus, if

$$\begin{aligned} \sup _{1\le k\le k_{x},i,n}{\text {E}}(|x_{ni,k}|^{2j})<\infty , \end{aligned}$$

the jth derivative of \(\frac{1}{n}\ln L_{n}(\theta )\) is \(O_{p}(1)\). In particular, if \(\sup _{1\le k\le k_{x},i,n}{\text {E}}(|x_{ni,k}|^{6})<\infty \), the third derivative of \(\frac{1}{n}\ln L_{n}(\theta )\) is \(O_{p}(1)\). Hence, (i) holds by the mean value theorem under Assumption 10(a).

We next prove (ii). By the above argument, except \(-{\text {tr}}(G_{n}^{2})\), elements of \(\frac{\partial ^{2}\ln L_{n}(\theta _{0})}{\partial \theta \partial \theta '}\) have the form \(c\sum _{i=1}^{n}h_{ni,1}h_{ni,2}\), \(c\sum _{i=1}^{n}h_{ni,1}h_{ni,2}f(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\) or \(c\sum _{i=1}^{n}h_{ni,1}h_{ni,2}f^{(1)}(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\), where \(h_{ni,r}\) for \(r=1,2\) is either 1, \(\epsilon _{ni}\), \(w_{n,i\cdot }Y_{n}\), or an element of \(x_{ni}\). By Lemma B. 1 and the Cauchy-Schwarz inequality, each \(\{h_{ni,1}h_{ni,2}\}_{i=1}^{n}\) is uniformly \(L_{3}\) bounded; by Lemma B. 2, each \(\{h_{ni,1}h_{ni,2}\}_{i=1}^{n}\) is uniformly \(L_{2}\)-NED. By the generalized Hölder’s inequality,

$$\begin{aligned} \left\| h_{ni,1}h_{ni,2}f^{(1)}\left( -\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni}\right) \right\| _{3/2}\le \Vert h_{ni,1}\Vert _{6}\Vert h_{ni,2}\Vert _{6}\left\| f^{(1)}\left( -\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni}\right) \right\| _{3}. \end{aligned}$$

Thus, each \(\{h_{ni,1}h_{ni,2}f^{(1)}(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\}_{i=1}^{n}\) is uniformly \(L_{3/2}\) bounded. Since

$$\begin{aligned}&\left\Vert h_{ni,1}h_{ni,2}f^{(1)}\left( -\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni}\right) -{\text {E}}\left[ h_{ni,1}h_{ni,2}f^{(1)}\left( -\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni}\right) \bigr |{\mathcal {F}}_{ni}(s)\right] \right\Vert \\&=\Bigl \Vert [h_{ni,1}h_{ni,2}-{\text {E}}(h_{ni,1}h_{ni,2}|{\mathcal {F}}_{ni}(s))]f^{(1)}\left( -\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni}\right) \Bigr \Vert \\&\le \Vert h_{ni,1}h_{ni,2}-{\text {E}}[h_{ni,1}h_{ni,2}|{\mathcal {F}}_{ni}(s)]\Vert _{2}\Bigl \Vert f^{(1)}\left( -\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni}\right) \Bigr \Vert _{2}, \end{aligned}$$

each \(\{h_{ni,1}h_{ni,2}f^{(1)}(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\}_{i=1}^{n}\) is uniformly \(L_{1}\)-NED. Similarly, each \(\{h_{ni,1}h_{ni,2}f(-\frac{\delta _{0}}{\sigma _{0}}\epsilon _{ni})\}_{i=1}^{n}\) is uniformly \(L_{3/2}\) bounded and uniformly \(L_{1}\)-NED. Thus, by the LLN in Theorem 1 of Jenish and Prucha (2012), (ii) holds. In consequence, the asymptotic distribution of \(\hat{\theta }\) in the proposition follows. \(\square \)

Proof of Proposition 2.3

As the proof is sketched in the main text, here we only verify some claims that have not been proved.

We first prove that \(\lim\nolimits _{n\rightarrow \infty }\frac{1}{n}\Delta _{n}\) is positive definite (PD) under Assumption 11. For a block matrix \(E=\left( {\begin{matrix}A &{} B\\ C &{} D \end{matrix}}\right) \), where A and D are square matrices and D is invertible,

$$\begin{aligned} \begin{pmatrix}I &{} -BD^{-1}\\ 0 &{} I \end{pmatrix}\begin{pmatrix}A &{} B\\ C &{} D \end{pmatrix}\begin{pmatrix}I &{} 0\\ -D^{-1}C &{} I \end{pmatrix}=\begin{pmatrix}A-BD^{-1}C &{} 0\\ 0 &{} D \end{pmatrix}. \end{aligned}$$
(A.11)

If E is symmetric and D is PD, then E is PD when \(A-BD^{-1}C\) is PD. Partition \(\Delta _{n}\) in (2.17) into a \(2\times 2\) block matrix so that the (2, 2)th block is the scalar \(\frac{n}{6\pi }(5-\frac{16}{\pi }+\frac{32}{\pi ^{2}})\), which corresponds to the D block in (A.11). By (A.11), \(\Delta _{n}\) is PD if

$$\begin{aligned}&\begin{pmatrix}\frac{1}{\sigma _{0}^{2}}{\text {E}}[(G_{n}X_{n}\beta _{0})'T_{n}(G_{n}X_{n}\beta _{0})]+{\text {tr}}(G_{n}G_{n}^{(s)}) &{} * &{} *\\ \frac{1}{\sigma _{0}^{2}}{\text {E}}(X_{n}'T_{n}G_{n}X_{n}\beta _{0}) &{} \frac{1}{\sigma _{0}^{2}}{\text {E}}(X_{n}'T_{n}X_{n}) &{} *\\ \frac{1}{\sigma _{0}^{2}}{\text {tr}}(G_{n}) &{} 0 &{} \frac{n}{2\sigma _{0}^{4}} \end{pmatrix}\nonumber \\&\qquad +\frac{3}{n\sigma _{0}^{2}(5-\frac{16}{\pi }+\frac{32}{\pi ^{2}})}[{\text {E}}(\Xi _{1n}'\Xi _{1n})-{\text {E}}(\Xi _{1n}'){\text {E}}(\Xi _{1n})] \end{aligned}$$
(A.12)

is PD, where \(T_{n}=I_{n}-\frac{3}{n(5-\frac{16}{\pi }+\frac{32}{\pi ^{2}})}l_{n}l_{n}'\) is PD and \(\Xi _{1n}=[l_{n}'G_{n}X_{n}\beta _{0},l_{n}'X_{n},0]\). As \({\text {E}}(\Xi _{1n}'\Xi _{1n})-{\text {E}}(\Xi _{1n}'){\text {E}}(\Xi _{1n})\) is positive semidefinite, applying (A.11) to the first matrix in (A.12), \(\Delta _{n}\) is PD if

$$\begin{aligned} \begin{pmatrix}\frac{1}{\sigma _{0}^{2}}{\text {E}}[(G_{n}X_{n}\beta _{0})'T_{n}(G_{n}X_{n}\beta _{0})]+{\text {tr}}(G_{n}G_{n}^{(s)})-\frac{2}{n}{\text {tr}}^{2}(G_{n}) &{} *\\ \frac{1}{\sigma _{0}^{2}}{\text {E}}(X_{n}'T_{n}G_{n}X_{n}\beta _{0}) &{} \frac{1}{\sigma _{0}^{2}}{\text {E}}(X_{n}'T_{n}X_{n}) \end{pmatrix} \end{aligned}$$
(A.13)

is PD. If \(\lim _{n\rightarrow \infty }\frac{1}{n}{\text {E}}[(G_{n}X_{n}\beta _{0},X_{n})'T_{n}(G_{n}X_{n}\beta _{0},X_{n})]\) is PD, since

$$\begin{aligned} {\text {tr}}(G_{n}G_{n}^{(s)})-\frac{2}{n}{\text {tr}}^{2}(G_{n})=\frac{1}{2}{\text {tr}}(G_{n}^{(s)}G_{n}^{(s)})-\frac{1}{2n}{\text {tr}}^{2}(G_{n}^{(s)})\ge 0, \end{aligned}$$

\(\lim _{n\rightarrow \infty }\frac{1}{n}\Delta _{n}\) is PD. Alternatively, if \({\text {E}}(X_{n}'T_{n}X_{n})\) is PD, applying (A.11) to (A.13), \(\Delta _{n}\) is PD when \(\frac{1}{\sigma _{0}^{2}}\Xi _{2n}+[{\text {tr}}(G_{n}G_{n}^{(s)})-\frac{2}{n}{\text {tr}}^{2}(G_{n})]>0\), where

$$\begin{aligned} \Xi _{2n}={\text {E}}[(G_{n}X_{n}\beta _{0})'T_{n}(G_{n}X_{n}\beta _{0})]-{\text {E}}[(G_{n}X_{n}\beta _{0})'T_{n}X_{n}][{\text {E}}(X_{n}'T_{n}X_{n})]^{-1}{\text {E}}(X_{n}'T_{n}G_{n}X_{n}\beta _{0})\ge 0. \end{aligned}$$

Thus, \(\lim \limits _{n\rightarrow \infty }\frac{1}{n}\Delta _n\) is PD under Assumption 11.

We next prove (2.18). By the mean value theorem, \(0=\frac{\partial \ln L_{4n}(\check{\eta },0)}{\partial \eta ^{\ddagger }}=\frac{\partial \ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }}+\frac{\partial ^{2}\ln L_{4n}(\bar{\eta },0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }}(\check{\eta }-\eta _{0})\), where \(\bar{\eta }\) lies between \(\check{\eta }\) and \(\eta _{0}\). Then \(\sqrt{n}(\check{\eta }-\eta _{0})=-(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\bar{\eta },0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }})^{-1}\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }}\). We first prove the asymptotic normality of \(\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }}\). As \(\frac{\partial \ln L_{4n}(\eta ,0)}{\partial \eta ^{\ddagger }}=\frac{\partial \ln L_{n}(\eta ,0)}{\partial \eta }\) is a subvector of \(\frac{\partial \ln L_{n}(\theta _{0})}{\partial \theta }\) with \(\delta _{0}=0\), by the proof of Proposition 2.2, \(\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }}\xrightarrow {d}N(0,\lim _{n\rightarrow \infty }(\frac{1}{n}\Delta _{n,11})^{-1})\), and under Assumption 12(a) with \(p=14\), \(\alpha >(3+\frac{4}{p-4})d=\frac{17}{5}d\) and \(0<\iota ^{*}<p/2-2=5\), which are maintained in Assumption 12(b)–(c).

As \({\text {E}}(\frac{\partial ^{2}\ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }})=-\Delta _{n,11}\), it remains to prove that \(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\bar{\eta },0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }}={\text {E}}(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }})+o_{p}(1)\). We shall prove that \(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }}={\text {E}}(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }})+o_{p}(1)\) and \(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\check{\eta },0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }}=\frac{1}{n}\frac{\partial \ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }}+o_{p}(1)\). By (A.1)–(A.10), except \(-{\text {tr}}(G_{n}^{2})\), elements of \(\frac{\partial ^{2}\ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }}\) have the form \(c\sum _{i=1}^{n}\zeta _{ni}\), where \(\zeta _{ni}\) is either \((w_{n,i\cdot }Y_{n})^{2}\), \(w_{n,i\cdot }Y_{n}x_{ni,j}\), \(w_{n,i\cdot }Y_{n}\epsilon _{ni}\), \(w_{n,i\cdot }Y_{n}\), \(x_{ni,j}x_{ni,k}\), \(x_{ni,j}\epsilon _{ni}\)\(\epsilon _{ni}^{2}\), \(\epsilon _{ni}\) or \(x_{ni,j}\). By Lemma B. 2 and the Cauchy-Schwarz inequality, each \(\{\zeta _{ni}\}_{i=1}^{n}\) is uniformly \(L_{2}\) bounded. Furthermore, each \(\{\zeta _{ni}\}_{i=1}^{n}\) is uniformly \(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\). Thus, \(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }}={\text {E}}(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }})+o_{p}(1)\). As \(\frac{\partial ^{2}\ln L_{4n}(\check{\eta },0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }}=\frac{\partial ^{2}\ln L_{n}(\check{\eta },0)}{\partial \eta \partial \eta ^{\prime }}\) and each term for the third derivative of \(\ln L_{n}(\theta )\) is shown to be \(O_{p}(1)\) in the proof of Proposition 2.2, by the mean value theorem, \(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\check{\eta },0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }}=\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }\partial \eta ^{\ddagger \prime }}+o_{p}(1)\).

We further prove that (2.19) holds. Let \(z_{ni}=[w_{n,i\cdot }Y_{n},x_{ni}']'\), \(\kappa _{0}=[\lambda _{0},\beta _{0}']'\) and \(\check{\kappa }=[\check{\lambda },\check{\beta }']'\). Then \(\check{\epsilon }_{ni}=\epsilon _{ni}+z_{ni}'(\kappa _{0}-\check{\kappa })\). It follows that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}&=\frac{1}{n}\sum _{i=1}^{n}\epsilon _{ni}^{3}+\frac{3}{n}\sum _{i=1}^{n}\epsilon _{ni}^{2}z_{ni}'(\kappa _{0}-\check{\kappa })+\frac{3}{n}\sum _{i=1}^{n}\epsilon _{ni}(\kappa _{0}-\check{\kappa })'z_{ni}z_{ni}'(\kappa _{0}-\check{\kappa })\\&\quad +\frac{1}{n}\sum _{i=1}^{n}(\kappa _{0}-\check{\kappa })'z_{ni}z_{ni}'(\kappa _{0}-\check{\kappa })z_{ni}'(\kappa _{0}-\check{\kappa }). \end{aligned}$$

By the Lindeberg-Lévy CLT, \(\frac{1}{n}\sum _{i=1}^{n}\epsilon _{ni}^{3}={\text {E}}(\epsilon _{ni}^{3})+O_{p}(n^{-1/2})=O_{p}(n^{-1/2})\). By Theorem 1 in Jenish and Prucha (2012), \(\frac{1}{n}\sum _{i=1}^{n}\epsilon _{ni}^{2}z_{ni}=\frac{1}{n}\sum _{i=1}^{n}{\text {E}}(\epsilon _{ni}^{2}z_{ni})+o_{p}(1)=\frac{\sigma _{0}^{2}}{n}{\text {E}}[(G_{n}X_{n}\beta _{0},X_{n})'l_{n}]+o_{p}(1)=O_{p}(1)\). Our proof above shows that \(\frac{1}{n}\sum _{i=1}^{n}z_{ni}z_{ni}'z_{nij}=O_{p}(1)\) and \(\frac{1}{n}\sum _{i=1}^{n}z_{ni}z_{ni}'\epsilon _{ni}=O_{p}(1)\), where \(z_{nij}\) is the jth element of \(z_{ni}\). Hence, (2.19) holds.

We continue to prove that the asymptotic distribution of \(\sqrt{n}[\hat{\eta }^{\ddagger \prime },\hat{\tau }]'\) conditional on \(\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}\le 0\) is \([F_{1},F_{2},F_{3}',F_{4},|F_{5}|]'.\) Since \(\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}\) is asymptotically normal with mean zero, we only need to prove that the MLE \(\dot{\omega }\) of \(\omega =[\eta ^{\ddagger \prime },\tau ]'\) from the log likelihood function \(\ln L_{4n}(\omega )\) with no nonnegativity restriction on \(\tau \) has the asymptotic distribution \(N(0,\lim _{n\rightarrow \infty }(\frac{1}{n}\Delta _{n})^{-1})\). By the mean value theorem, \(0=\frac{\partial \ln L_{4n}(\dot{\omega })}{\partial \omega }=\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \omega }+\frac{\partial ^{2}\ln L_{4n}(\bar{\omega })}{\partial \omega \partial \omega '}(\hat{\omega }-\omega _{0}),\) where \(\bar{\omega }\) lies between \(\dot{\omega }\) and \(\omega _{0}\). Then \(\sqrt{n}(\dot{\omega }-\omega _{0})=(-\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\bar{\omega })}{\partial \omega \partial \omega '})^{-1}\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \omega }\). Compared to \(\frac{\partial \ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }}\), by (2.16), \(\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \omega }\) has the additional element \(\sum _{i=1}^{n}[\frac{1}{6\sigma _{0}^{3}}(1-\frac{4}{\pi })\sqrt{\frac{2}{\pi }}\epsilon _{ni}^{3}+\frac{2}{\pi \sigma _{0}}\sqrt{\frac{2}{\pi }}\epsilon _{ni}]\), which is a sum of i.i.d. elements. Thus, as shown above for \(\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\eta _{0},0)}{\partial \eta ^{\ddagger }}\), by Theorem 2 in Jenish and Prucha (2012), \(\frac{1}{\sqrt{n}}\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \omega }\xrightarrow {d}N(0,\lim _{n\rightarrow \infty }(\frac{1}{n}\Delta _{n})^{-1})\).

For the asymptotic distribution of \(\sqrt{n}[\hat{\eta }^{\ddagger \prime },\hat{\tau }]'\) conditional on \(\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}\le 0\), it remains to prove that \(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\bar{\omega })}{\partial \omega \partial \omega '}=-{\text {E}}(\frac{1}{n}\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \omega }\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \omega '})+o_{p}(1)\). For that purpose, we shall prove the following:

$$\begin{aligned} \frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\bar{\omega })}{\partial \omega \partial \omega '}&=\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\omega _{0})}{\partial \omega \partial \omega '}+o_{p}(1), \end{aligned}$$
(A.14)
$$\begin{aligned} \frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\omega _{0})}{\partial \omega \partial \omega '}&={\text {E}}\left( \frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\omega _{0})}{\partial \omega \partial \omega '}\right) +o_{p}(1), \end{aligned}$$
(A.15)
$$\begin{aligned} {\text {E}}\left( \frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\omega _{0})}{\partial \omega \partial \omega '}\right)&=-{\text {E}}\left( \frac{1}{n}\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \omega }\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \omega '}\right) . \end{aligned}$$
(A.16)

We first prove (A.16). As \(\frac{\partial \ln L_{4n}(\omega )}{\partial \tau }=\frac{1}{3}\tau ^{-2/3}\frac{\partial \ln L_{3n}(\eta ^{\ddagger },\tau ^{1/3})}{\partial \delta }\),

$$\begin{aligned} \frac{\partial ^{2}\ln L_{4n}(\omega )}{\partial \tau ^{2}}=\frac{1}{9}\tau ^{-4/3}\frac{\partial ^{2}\ln L_{3n}(\eta ^{\ddagger },\tau ^{1/3})}{\partial \delta ^{2}}-\frac{2}{9}\tau ^{-5/3}\frac{\partial \ln L_{3n}(\eta ^{\ddagger },\tau ^{1/3})}{\partial \delta }. \end{aligned}$$

Thus, by L’Hôpital’s rule,

$$\begin{aligned} \frac{\partial ^{2}\ln L_{4n}(\eta ^{\ddagger },0)}{\partial \tau ^{2}}&=\lim _{\delta \rightarrow 0}\left( \frac{1}{9\delta ^{4}}\frac{\partial ^{2}\ln L_{3n}(\eta ^{\ddagger },\delta )}{\partial \delta ^{2}}-\frac{2}{9\delta ^{5}}\frac{\partial \ln L_{3n}(\eta ^{\ddagger },\delta )}{\partial \delta }\right) \\&=\lim _{\delta \rightarrow 0}\frac{1}{9\delta ^{5}}\left( \delta \frac{\partial ^{2}\ln L_{3n}(\eta ^{\ddagger },\delta )}{\partial \delta ^{2}}-2\frac{\partial \ln L_{3n}(\eta ^{\ddagger },\delta )}{\partial \delta }\right) \\&=\lim _{\delta \rightarrow 0}\frac{1}{45\delta ^{4}}\left( \delta \frac{\partial ^{3}\ln L_{3n}(\eta ^{\ddagger },\delta )}{\partial \delta ^{3}}-\frac{\partial ^{2}\ln L_{3n}(\eta ^{\ddagger },\delta )}{\partial \delta ^{2}}\right) \\&=\lim _{\delta \rightarrow 0}\frac{1}{180\delta ^{2}}\frac{\partial ^{4}\ln L_{3n}(\eta ^{\ddagger },\delta )}{\partial \delta ^{4}}\\&=\frac{1}{360}\frac{\partial ^{6}\ln L_{3n}(\eta ^{\ddagger },0)}{\partial \delta ^{6}}, \end{aligned}$$

and

$$\begin{aligned} \frac{\partial ^{2}\ln L_{4n}(\eta ^{\ddagger },\delta )}{\partial \tau \partial \eta ^{\ddagger }}&=\lim _{\delta \rightarrow 0}\frac{1}{3\delta ^{2}}\frac{\partial ^{2}\ln L_{3n}(\eta ^{\ddagger },\delta )}{\partial \delta \partial \eta ^{\ddagger }}=\frac{1}{6}\frac{\partial ^{4}\ln L_{3n}(\eta ^{\ddagger },0)}{\partial \delta ^{3}\partial \eta ^{\ddagger }}. \end{aligned}$$

As \(\frac{\partial \ln L_{3n}(\eta ^{\ddagger },0)}{\partial \delta }=\frac{\partial ^{2}\ln L_{3n}(\eta ^{\ddagger },0)}{\partial \delta ^{2}}=0\), by Lemma 1 in Rotnitzky et al. (2000),

$$\begin{aligned} \frac{\partial ^{6}\ln L_{3n}(\omega _{0})}{\partial \delta ^{6}}=\frac{1}{L_{3n}(\omega _{0})}\frac{\partial ^{6}L_{3n}(\omega _{0})}{\partial \delta ^{6}}-\frac{6!}{2\times (3!)^{2}}\left( \frac{\partial ^{3}\ln L_{3n}(\omega _{0})}{\partial \delta ^{3}}\right) ^{2} \end{aligned}$$

and \(\frac{\partial ^{4}\ln L_{3n}(\omega _{0})}{\partial \delta ^{3}\partial \eta ^{\ddagger }}=\frac{1}{L_{3n}(\omega _{0})}\frac{\partial ^{4}L_{3n}(\omega _{0})}{\partial \delta ^{3}\partial \eta ^{\ddagger }}-\frac{\partial ^{3}\ln L_{3n}(\omega _{0})}{\partial \delta ^{3}}\frac{\partial \ln L_{3n}(\omega _{0})}{\partial \eta ^{\ddagger }}\). As \(\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \tau }=\frac{1}{6}\frac{\partial ^{3}\ln L_{3n}(\omega _{0})}{\partial \delta ^{3}}\) and \(\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \eta ^{\ddagger }}=\frac{\partial \ln L_{3n}(\omega _{0})}{\partial \eta ^{\ddagger }}\) by Proposition 3 in Lee (1993), we have

$$\begin{aligned} {\text {E}}\left( \frac{\partial ^{2}\ln L_{4n}(\omega _{0})}{\partial \tau ^{2}}\right) =-{\text {E}}\left[ \left( \frac{\partial \ln L_{4n}(\omega _{0})}{\partial \tau }\right) ^{2}\right] =-\frac{1}{36}{\text {E}}\left[ \left( \frac{\partial ^{3}\ln L_{3n}(\omega _{0})}{\partial \delta ^{3}}\right) ^{2}\right] \end{aligned}$$

and \({\text {E}}(\frac{\partial ^{2}\ln L_{4n}(\omega _{0})}{\partial \tau \partial \eta ^{\ddagger }})=-{\text {E}}(\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \tau }\frac{\partial \ln L_{4n}(\omega _{0})}{\partial \eta ^{\ddagger }})=-\frac{1}{6}{\text {E}}(\frac{\partial ^{3}\ln L_{3n}(\omega _{0})}{\partial \delta ^{3}}\frac{\partial \ln L_{3n}(\omega _{0})}{\partial \eta ^{\ddagger }})\). Hence, (A.16) holds.

We next prove (A.14). As shown above, \(\frac{\partial ^{2}\ln L_{4n}(\omega )}{\partial \omega \partial \omega '}\) as \(\tau \rightarrow 0\) involves the sixth order derivatives of \(\ln L_{n}(\theta )\). Under Assumption 12, by the proof of Proposition 2.2, each term in the seventh order derivatives of \(\frac{1}{n}\ln L_{n}(\theta )\) is \(O_{p}(1)\). Then by the mean value theorem, \(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\bar{\omega })}{\partial \omega \partial \omega '}=\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\omega _{0})}{\partial \omega \partial \omega '}+o_{p}(1)\).

For (A.15), note that except \(-{\text {tr}}(G_{n}^{2})\), each element of \(\frac{\partial ^{2}\ln L_{4n}(\omega _{0})}{\partial \omega \partial \omega '}\) has the form

$$\begin{aligned} c\sum _{i=1}^{n}(w_{n,i\cdot }Y_{n})^{j}h_{ni,1}\cdots h_{ni,k}, \end{aligned}$$

where \(0\le j\le 6\), \(j+k\le 6\), and \(h_{ni,r}\) for \(1\le r\le k\) is either \(\epsilon _{ni}\) or an element of \(x_{ni}\). We shall prove that each \(\{(w_{n,i\cdot }Y_{n})^{j}h_{ni,1}\cdots h_{ni,k}\}_{i=1}^{n}\) with \(1\le j\le 6\) and \(j+k\le 6\) is uniformly \(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\) under Assumption 12. By Lemma B. 2(c), since \(\sup _{1\le k\le k_{x},i,n}\Vert x_{ni,k}\Vert _{14}<\infty \), \(\{(w_{n,i\cdot }Y_{n})^{j}\}_{i=1}^{n}\) is uniformly \(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\) for \(1\le j\le 6\). Then for \(1\le j\le 3\) and \(1\le k\le 3\), \(\{(w_{n,i\cdot }Y_{n})^{j}h_{ni,1}\cdots h_{ni,k}\}_{i=1}^{n}\) is uniformly \(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\) by Lemma A.2 in Xu and Lee (2015), since \(\sup _{i,n}\Vert (w_{n,i\cdot }Y_{n})^{j}\Vert _{14/3}<\infty \) and \(\sup _{i,n}\Vert h_{ni,1}\cdots h_{ni,k}\Vert _{14/3}<\infty \). Similarly, \(\{w_{n,i\cdot }Y_{n}h_{ni,1}\cdots h_{ni,k}\}_{i=1}^{n}\) for \(k=4\) or 5, \(\{(w_{n,i\cdot }Y_{n})^{2}h_{ni,1}\cdots h_{ni,4}\}_{i=1}^{n}\), \(\{(w_{n,i\cdot }Y_{n})^{4}h_{ni,1}\cdots h_{ni,k}\}_{i=1}^{n}\) for \(k=1\) or 2, and \(\{(w_{n,i\cdot }Y_{n})^{5}h_{ni,1}\}_{i=1}^{n}\) are all uniformly \(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\), since the random variables in these random fields can be written as, respectively, \([w_{n,i\cdot }Y_{n}h_{ni,1}h_{ni,2}]\cdot [h_{ni,3}\cdots h_{ni,k}]\), \([(w_{n,i\cdot }Y_{n})^{2}h_{ni,1}]\cdot [h_{ni,2}h_{ni,3}h_{ni,4}]\), \([(w_{n,i\cdot }Y_{n})^{3}]\cdot [w_{n,i\cdot }Y_{n}h_{ni,1}\cdots h_{ni,k}]\), and \([(w_{n,i\cdot }Y_{n})^{3}]\cdot [(w_{n,i\cdot }Y_{n})^{2}h_{ni,1}]\), where each term in the square brackets is \(L_{14/3}\) bounded uniformly in i and n by the generalized Hölder’s inequality and uniformly \(L_{2}\)-NED. Hence, each \(\{(w_{n,i\cdot }Y_{n})^{j}h_{ni,1}\cdots h_{ni,k}\}_{i=1}^{n}\) with \(1\le j\le 6\) and \(j+k\le 6\) is uniformly \(L_{2}\)-NED on \(\{x_{ni},v_{ni},u_{ni}\}_{i=1}^{n}\). It follows that \(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\omega _{0})}{\partial \omega \partial \omega '}={\text {E}}(\frac{1}{n}\frac{\partial ^{2}\ln L_{4n}(\omega _{0})}{\partial \omega \partial \omega '})+o_{p}(1)\).

Proof of Proposition 2.4

The asymptotic distribution of the score test statistic follows by using (2.19), so it remains only for us to prove the asymptotic distribution of the LR test statistic. Denote \(\omega =[\eta ^{\ddagger \prime },\tau ]'\) and \(\check{\omega }=[\check{\eta }',0]'\). When \(\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}<0\), \(\sqrt{n}(\hat{\omega }-\omega _{0})=(\frac{1}{n}\Delta _{n})^{-1}\frac{1}{\sqrt{n}}\frac{\partial L_{4n}(\omega _{0})}{\partial \omega }+o_{p}(1)\). As \(\sqrt{n}(\check{\eta }-\eta _{0})=(\frac{1}{n}\Delta _{n,11})^{-1}\frac{1}{\sqrt{n}}\frac{\partial L_{4n}(\omega _{0})}{\partial \eta ^{\ddagger }}+o_{p}(1)\), \(\sqrt{n}(\check{\omega }-\hat{\omega })=\Xi _{3n}\frac{1}{\sqrt{n}}\frac{\partial L_{4n}(\omega _{0})}{\partial \omega }+o_{p}(1)\) when \(\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}<0\), where \(\Xi _{3n}=(\frac{1}{n}\Delta _{n})^{-1}-\left( {\begin{matrix}(\frac{1}{n}\Delta _{n,11})^{-1} &{} 0\\ 0 &{} 0 \end{matrix}}\right) \). By a first order Taylor expansion,

$$\begin{aligned} 2[\ln L_{n}(\hat{\theta })-\ln L_{n}(\check{\eta },0)]&=-2[\ln L_{4n}(\check{\omega })-\ln L_{4n}(\hat{\omega })]\\&=-\sqrt{n}(\check{\omega }-\hat{\omega })'\frac{1}{n}\frac{\partial ^{2}L_{4n}(\bar{\omega })}{\partial \omega \partial \omega '}\sqrt{n}(\check{\omega }-\hat{\omega })\\&=\left( \frac{1}{\sqrt{n}}\frac{\partial L_{4n}(\omega _{0})}{\partial \omega '}\right) \Xi _{3n}\left( \frac{1}{n}\Delta _{n}\right) \Xi _{3n}\left( \frac{1}{\sqrt{n}}\frac{\partial L_{4n}(\omega _{0})}{\partial \omega }\right) I\Bigl (\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}<0\Bigr )+o_{p}(1)\\&=\left[ \left( \frac{1}{n}\Delta _{n}\right) ^{-1/2}\frac{1}{\sqrt{n}}\frac{\partial L_{4n}(\omega _{0})}{\partial \omega }\right] '\left( \frac{1}{n}\Delta _{n}\right) ^{1/2}\Xi _{3n}\left( \frac{1}{n}\Delta _{n}\right) \Xi _{3n}\left( \frac{1}{n}\Delta _{n}\right) ^{1/2}\\&\quad \cdot \left[ \left( \frac{1}{n}\Delta _{n}\right) ^{-1/2}\frac{1}{\sqrt{n}}\frac{\partial L_{4n}(\omega _{0})}{\partial \omega }\right] I\Bigl (\sum _{i=1}^{n}\check{\epsilon }_{ni}^{3}<0\Bigr )+o_{p}(1), \end{aligned}$$

where \(\bar{\omega }\) lies between \(\check{\omega }\) and \(\hat{\omega }\), and \((\frac{1}{n}\Delta _{n})^{-1/2}\frac{1}{\sqrt{n}}\frac{\partial L_{4n}(\omega _{0})}{\partial \omega }\xrightarrow {d}N(0,I_{k_{x}+3})\). Partition \(\Delta _{n}\) into a \(2\times 2\) block matrix such that \(\Delta _{n}=\left( {\begin{matrix}\Delta _{n,11} &{} \Delta _{n,12}\\ \Delta _{n,21} &{} \Delta _{n,22} \end{matrix}}\right) \). It can be shown by the block matrix inverse formula that

$$\begin{aligned} \Xi _{3n}\left( \frac{1}{n}\Delta _{n}\right) \Xi _{3n}&=[-\Delta _{n,21}\Delta _{n,11}^{-1},I_{k_{x}+2}]'\left( \frac{1}{n}\Delta _{n,22}-\frac{1}{n}\Delta _{n,21}\Delta _{n,11}^{-1}\Delta _{n,12}\right) ^{-1}[-\Delta _{n,21}\Delta _{n,11}^{-1},I_{k_{x}+2}], \end{aligned}$$

and \(\frac{1}{n}\Delta _{n,22}-\frac{1}{n}\Delta _{n,21}\Delta _{n,11}^{-1}\Delta _{n,12}=[-\Delta _{n,21}\Delta _{n,11}^{-1},I_{k_{x}+2}]\frac{1}{n}\Delta _{n}[-\Delta _{n,21}\Delta _{n,11}^{-1},I_{k_{x}+2}]'\). Thus,

$$\begin{aligned} \left(\frac{1}{n}\Delta _{n}\right)^{1/2}\Xi _{3n}\left(\frac{1}{n}\Delta _{n}\right)\Xi _{3n}\left(\frac{1}{n}\Delta _{n}\right)^{1/2} \end{aligned}$$

is a projection matrix with rank being 1. Hence, \(2[\ln L_{n}(\hat{\theta })-\ln L_{n}(\check{\eta },0)]\xrightarrow {d}\chi ^{2}(0)\cdot I(K\ge 0)+\chi ^{2}(1)\cdot I(K<0)\).

Proof of Proposition 2.5

Note that \({\text {E}}(\epsilon _{ni})=-\sigma _{u0}\sqrt{\frac{2}{\pi }}\). Denote \(\kappa _{a}=[\lambda _{0},\beta _{10}-\sigma _{u0}\sqrt{\frac{2}{\pi }},\beta _{20}']'\). The 2SLS estimator \(\tilde{\kappa }\) satisfies

$$\begin{aligned} \tilde{\kappa }&=(Z_{n}'P_{n}Z_{n})^{-1}Z_{n}'P_{n}(Z_{n}\kappa _{0}+\epsilon _{n})=\kappa _{a}+(Z_{n}'P_{n}Z_{n})^{-1}Z_{n}'P_{n}[\epsilon _{n}-{\text {E}}(\epsilon _{n})]\\&=\kappa _{a}+\left[ \frac{1}{n}Z_{n}'Q_{n}\left( \frac{1}{n}Q_{n}'Q_{n}\right) ^{-1}\frac{1}{n}Q_{n}'Z_{n}\right] ^{-1}\frac{1}{n}Z_{n}'Q_{n}\left( \frac{1}{n}Q_{n}'Q_{n}\right) ^{-1}\frac{1}{n}Q_{n}'[\epsilon _{n}-{\text {E}}(\epsilon _{n})], \end{aligned}$$

where \(\frac{1}{n}Q_{n}'Z_{n}=\frac{1}{n}Q_{n}'[G_{n}(X_{n}\beta _{0}+{\text {E}}(\epsilon _{n})),X_{n}]+\frac{1}{n}Q_{n}'G_{n}[\epsilon _{n}-{\text {E}}(\epsilon _{n}),0]\). Thus, under Assumption 13, \(\tilde{\kappa }=\kappa _{a}+O_{p}(n^{-1/2})\). As \(\tilde{\epsilon }_{ni}=y_{ni}-z_{ni}'\kappa _{a}+z_{ni}'(\kappa _{a}-\tilde{\kappa })=\zeta _{ni}+z_{ni}'(\kappa _{a}-\tilde{\kappa })\), where \(z_{ni}=[w_{n,i\cdot }Y_{n},x_{ni}']'\) and \(\zeta _{ni}=v_{ni}-(u_{ni}-\sigma _{u0}\sqrt{\frac{2}{\pi }})\), it follows that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{3}&=\frac{1}{n}\sum _{i=1}^{n}\zeta _{ni}^{3}+\frac{3}{n}\sum _{i=1}^{n}\zeta _{ni}^{2}z_{ni}'(\kappa _{a}-\tilde{\kappa })+\frac{3}{n}\sum _{i=1}^{n}\zeta _{ni}(\kappa _{a}-\tilde{\kappa })'z_{ni}z_{ni}'(\kappa _{a}-\tilde{\kappa })\\&\quad +\frac{1}{n}\sum _{i=1}^{n}(\kappa _{a}-\tilde{\kappa })'z_{ni}z_{ni}'(\tilde{\kappa }-\kappa _{a})z_{ni}'(\kappa _{a}-\tilde{\kappa }). \end{aligned}$$

By the Lindeberg-Lévy CLT, \(\frac{1}{n}\sum _{i=1}^{n}\zeta _{ni}^{3}={\text {E}}(\zeta _{ni}^{3})+O_{p}(n^{-1/2})=\frac{(\pi -4)\sigma _{u0}^{3}}{\pi }\sqrt{\frac{2}{\pi }}+O_{p}(n^{-1/2})\). For \(1\le j\le 3\), let \(h_{nij}\) be 1 or an element of \(z_{ni}\). Then by the generalized Hölder’s inequality, \(\sup _{i,n}{\text {E}}|\zeta _{ni}^{k}h_{ni1}h_{ni2}h_{ni3}|\le \sup _{i,n}[{\text {E}}(\zeta _{ni}^{4k})]^{1/4}[{\text {E}}(h_{ni1}^{4})]^{1/4}[{\text {E}}(h_{ni2}^{4})]^{1/4}[{\text {E}}(h_{ni3}^{4})]^{1/4}<\infty \), where \(k=0,1\) or 2. Thus, \(\frac{1}{n}\sum _{i=1}^{n}\zeta _{ni}^{2}z_{ni}'=O_{p}(1)\), \(\frac{1}{n}\sum _{i=1}^{n}\zeta _{ni}z_{ni}z_{ni}'=O_{p}(1)\), and \(\frac{1}{n}\sum _{i=1}^{n}z_{ni}z_{ni}'z_{nij}=O_{p}(1)\). Hence, \(\frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{3}=\frac{(\pi -4)\sigma _{u0}^{3}}{\pi }\sqrt{\frac{2}{\pi }}+O_{p}(n^{-1/2})\). In addition,

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{2}&=\frac{1}{n}\sum _{i=1}^{n}\zeta _{ni}^{2}+\frac{2}{n}\sum _{i=1}^{n}\zeta _{ni}z_{ni}'(\kappa _{a}-\tilde{\kappa })+\frac{1}{n}\sum _{i=1}^{n}(\kappa _{a}-\tilde{\kappa })'z_{ni}z_{ni}'(\kappa _{a}-\tilde{\kappa })\\&=\frac{\pi -2}{\pi }\sigma _{u0}^{2}+\sigma _{v0}^{2}+O_{p}(n^{-1/2}), \end{aligned}$$

since \(\frac{1}{n}\sum _{i=1}^{n}\zeta _{ni}^{2}=\frac{\pi -2}{\pi }\sigma _{u0}^{2}+\sigma _{v0}^{2}+O_{p}(n^{-1/2})\).

If \(\sigma _{u0}\ne 0\), then \({\text {plim}}_{n\rightarrow \infty }\frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{3}=\frac{(\pi -4)\sigma _{u0}^{3}}{\pi }\sqrt{\frac{2}{\pi }}<0\). Thus \(\frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{n}^{3}<0\) with probability approaching one and \(\tilde{\sigma }_{u}^{2}=\sigma _{u0}^{2}+o_{p}(1)\). For the function \(g(t)=t^{2/3}\), by the mean value theorem, \(\tilde{\sigma }_{u}^{2}=g(\frac{\pi }{\pi -4}\sqrt{\frac{\pi }{2}}(\frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{3}))=g(\sigma _{u0}^{3})+\frac{2}{3}(\bar{\sigma }_{u}^{3})^{-1/3}[\frac{\pi }{\pi -4}\sqrt{\frac{\pi }{2}}(\frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{3})-\sigma _{u0}^{3}],\) where \(\bar{\sigma }_{u}^{3}\) lies between \(\frac{\pi }{\pi -4}\sqrt{\frac{\pi }{2}}(\frac{1}{n}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{3})\) and \(\sigma _{u0}^{3}\). Thus \(\tilde{\sigma }_{u}^{2}=\sigma _{u0}^{2}+O_{p}(n^{-1/2})\). It follows that \(\tilde{\sigma }_{v}^{2}=\sigma _{v0}^{2}+O_{p}(n^{-1/2})\). Similarly, \(\tilde{\beta }_{1c}=\beta _{10}+O_{p}(n^{-1/2})\) by the mean value theorem.

If \(\sigma _{u0}=0\), then

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^{n}\tilde{\epsilon }_{ni}^{3}=\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\zeta _{ni}^{3}-\frac{3}{n}\sum _{i=1}^{n}\zeta _{ni}^{2}z_{ni}'(\frac{1}{n}Z_{n}'P_{n}Z_{n})^{-1}\frac{1}{\sqrt{n}}Z_{n}'P_{n}[\epsilon _{n}-{\text {E}}(\epsilon _{n})]+o_{p}(1)=O_{p}(1), \end{aligned}$$

since \({\text {E}}[(\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\zeta _{ni}^{3})^{2}]={\text {E}}(\zeta _{ni}^{6})<\infty \). When \(\sum _{i=1}^{n}\tilde{\epsilon }_{n}^{3}\ge 0\), \(\tilde{\sigma }_{u}^{2}=0\), \(\tilde{\delta }=0\) and \(\tilde{\beta }_{1c}-\beta _{10}=O_{p}(n^{-1/2})\); when \(\sum _{i=1}^{n}\tilde{\epsilon }_{n}^{3}<0\), \(\tilde{\sigma }_{u}^{2}=O_{p}(n^{-1/3})\), \(\tilde{\sigma }_{v}^{2}=\sigma _{v0}^{2}+O_{p}(n^{-1/3})\), \(\tilde{\delta }=O_{p}(n^{-1/6})\), and \(\tilde{\beta }_{1c}-\beta _{10}=O_{p}(n^{-1/6})\). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Jin, F., Lee, L. Asymptotic properties of a spatial autoregressive stochastic frontier model. J Spat Econometrics 1, 2 (2020). https://doi.org/10.1007/s43071-020-00002-z

Download citation

Keywords

  • Stochastic frontier
  • Spatial autoregression
  • Maximum likelihood
  • Asymptotic property
  • Test

JEL Classification

  • C12
  • C13
  • C21
  • C51
  • R32