Skip to main content
Log in

Categorical data in local maximum likelihood: theory and applications to productivity analysis

  • Published:
Journal of Productivity Analysis Aims and scope Submit manuscript

Abstract

In this paper we consider estimation of models popular in efficiency and productivity analysis (such as the stochastic frontier model, truncated regression model, etc.) via the local maximum likelihood method, generalizing this method here to allow for not only continuous but also discrete regressors. We provide asymptotic theory, some evidence from simulations, and illustrate the method with an empirical example. Our methodology and theory can also be adapted for other models where a likelihood of the unknown functions can be used to identify and estimate the underlying model. Simulation results indicate flexibility of the approach and good performances in various complex scenarios, even with moderate sample sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. We also tried the least squares cross-validation (LSCV) method and the results were very similar. Interestingly, yet not so surprisingly, our simulations showed that the LSCV appeared to be somewhat more robust for relatively small samples and much faster to optimize for large samples, yet the MLCV method sometimes gave better fit for relatively large samples.

  2. Recall that the total conditional variance of \(\varepsilon \) is given by \({\mathrm{Var}}(v-u|x,z)=\sigma _{v}^{2}(x,z)+\sigma _{u}^{2}(x,z)(\pi -2)/\pi \). Note also that here we used local likelihood estimation with linear approximation for \(r(x,z), \log \sigma _{u}^{2}(x,z)\), and \(\log \sigma _{v}^{2}(x,z)\).

  3. E.g., MLCV for obtaining results presented in the NW, SW, NW and NE panels took about 0.2, 0.1, 4.5 and 1.5 hours, respectively, on the desktop with Intel Xeon CPU E5620 @ 2.40GHz with two processors, while parametric MLE took a second.

  4. In the simulations, the MLCV-based optimal bandwidth for the continuous regressor was usually around \(0.15\) for \(n=1,000\) and around \(0.25\) for \(n=100\).

References

  • Aigner DJ, Lovell CAK, Schmidt P (1977) Formulation and estimation of stochastic frontier models. J Econom 6:21–37

    Article  Google Scholar 

  • Aitchison J, Aitken CGG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63(3):413–420

    Article  Google Scholar 

  • Badin L, Simar L (2009) A bias corrected nonparametric envelopment estimator of frontiers. Econom Theory 25:1289–1318

    Article  Google Scholar 

  • Eguchi S, Kim TY, Park BU (2003) Local likelihood method: a bridge over parametric and nonparametric regression. J Nonparametric Stat 15(6):665–683

    Article  Google Scholar 

  • Fan J, Heckman NE, Wand MP (1995) Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. J Am Stat Assoc 90(429):141–150

    Article  Google Scholar 

  • Frölich M (2006) Non-parametric regression for binary dependent variables. Econom J 9:511–540

    Article  Google Scholar 

  • Hall P, Li Q, Racine J (2007) Nonparametric estimation of regression functions in the presence of irrelevant regressors. Rev Econ Stat 89(4):784–789

    Article  Google Scholar 

  • Henderson DJ, Zelenyuk V (2006) Testing for (efficiency) catching-up. South Econ J 73(4):1003–1019

    Google Scholar 

  • Jondrow J, Lovell CAK, Materov IS, Schmidt P (1982) On the estimation of technical inefficiency in stochastic frontier production models. J Econom 19:233–238

    Article  Google Scholar 

  • Kumar S, Russell RR (2002) Technological change, technological catch-up, and capital deepening: relative contributions to growth and convergence. Am Econ Rev 92(3):527–548

    Article  Google Scholar 

  • Kumbhakar SC, Park BU, Simar L, Tsionas EG (2007) Nonparametric stochastic frontiers: a local likelihood approach. J Econom 137(1):1–27

    Article  Google Scholar 

  • Li Q, Racine JS (2007) Nonparametric econometrics: theory and practice. Princeton University Press, Princeton

    Google Scholar 

  • Park B, Simar L, Ch Weiner (2000) The FDH estimator for productivity efficiency scores: asymptotic properties. Econom Theory 16:855–877

    Article  Google Scholar 

  • Park BU, Simar L, Zelenyuk V (2008) Local likelihood estimation of truncated regression and its partial derivatives: theory and application. J Econom 146(1):185–198

    Article  Google Scholar 

  • Racine JS, Li Q (2004) Nonparmetric estimation of regression functions with both categorical and continuous data. J Econom 119(1):99–130

    Article  Google Scholar 

  • Racine J, Hart J, Li Q (2006) Testing the significance of categorical predictor variables in nonparametric regression models. Econom Rev 25(4):523–544

    Article  Google Scholar 

  • Severini TA, Wong WH (1992) Profile likelihood and conditionally parametric models. Ann Stat 20(4):1768–1802

    Article  Google Scholar 

  • Simar L, Wilson PW (2007) Estimation and inference in two-stage, semi-parametric models of production processes. J Econom 136(1):31–64

    Article  Google Scholar 

  • Simar L, Zelenyuk V (2006) On testing equality of distributions of technical eciency scores. Econom Rev 25(4):497–522

    Article  Google Scholar 

  • Simar L, Zelenyuk V (2011) Stochastic FDH/DEA estimators for frontier analysis. J Prod Anal 36(1):1–20

    Article  Google Scholar 

  • Tibshirani R, Hastie TJ (1987) Local likelihood estimation. J Am Stat Assoc 82:559–567

    Article  Google Scholar 

  • Zelenyuk V, Zheka V (2006) Corporate governance and firm’s efficiency: the case of a transitional country, Ukraine. J Prod Anal 25:143–168

    Article  Google Scholar 

Download references

Acknowledgments

All authors acknowledge the financial support from ARC Discovery Grant DP130101022 and the CEPA of School of Economics of The University of Queensland (Australia), from the “Interuniversity Attraction Pole”, Phase VII (No. P7/06) of the Belgian Science Policy (Belgium), from the INRA-GREMAQ, Toulouse (France), from the NRF Grant funded by the Korean government (MEST), No. 20100017437, (listed here in alphabetical order). The authors also thank their colleagues and audiences of many conferences and seminars where this work has been presented. Only the authors and not the above mentioned institutions or people remain responsible for the views expressed.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Valentin Zelenyuk.

Appendix: Technical details

Appendix: Technical details

Below in the conditions and in the proof of Theorem 3.1, \(\Vert \mathbf{v}\Vert \) denotes the usual \(\ell _{2}\)-norm for a vector \(\mathbf{v}\), and the Frobenius (Hilbert−Schmidt) norm for a matrix \(\mathbf{v}\). Define \({\varvec{\psi }}(\mathbf{s}|\mathbf{x},\mathbf{z})=E\left[ \mathbf{g}_{1}(Y,{\varvec{\theta }}(\mathbf{x},\mathbf{z})+\mathbf{s}\,|\,\mathbf{X}=\mathbf{x},\mathbf{Z}=\mathbf{z})\right] \) for \(\mathbf{s}\in {\mathbb {R}}^{k}\). The conditions and the proof are given for a fixed point \((\mathbf{x},\mathbf{z})\) at which we want to estimate the value of \({\varvec{\theta }}=(\theta _{1},\ldots ,\theta _{k})^{\top }\).

1.1 Regularity conditions

  • (A1) For the vector of functions \(\mathbf{G}\) defined at (7.1), the equation \(\mathbf{G}({\varvec{\alpha }},\mathbf{A})=\mathbf{0}\) has the unique solution \({\varvec{\alpha }}=\mathbf{0}\) and \(\mathbf{A}=\mathbf{O}\), where \(\mathbf{0}\) is the zero vector and \(\mathbf{O}\) is the zero matrix. Also, \(E[\mathbf{g}_{1}(Y,{\varvec{\theta }}(\mathbf{X},\mathbf{Z}))|\mathbf{X},\mathbf{Z}]=\mathbf{0}\) almost surely.

  • (A2) For any compact set \(\mathcal {C}\), there exists a function \(U_{1}\) such that \(\sup _{\footnotesize {\varvec{\theta }}\in \mathcal {C}}\Vert \mathbf{g}_{1}(y,{\varvec{\theta }})\Vert \le U_{1}(y)\) and \(\sup _{\Vert \mathbf{u}-\mathbf{x}\Vert \le \epsilon }E[U_{1}(Y)^{2+\delta }|\mathbf{X}=\mathbf{u})<\infty \) for some \(\epsilon ,\,\delta >0\). Also, \(\mathbf{g}_{2}(y,{\varvec{\theta }})\) is continuous in \({\varvec{\theta }}\) for each \(y\), and there exists a function \(U_{2}(y)\) such that \(\sup _{\footnotesize {\varvec{\theta }}\in \mathcal {C}}\Vert \mathbf{g}_{2}(y,{\varvec{\theta }})\Vert \le U_{2}(y)\) for any compact set \(\mathcal {C}\) and \(\sup _{\Vert \mathbf{u}-\mathbf{x}\Vert \le \epsilon }E[U_{2}(Y)^{2}|\mathbf{X}=\mathbf{u}]<\infty \) for some \(\epsilon >0\).

  • (A3) All entries of \({\varvec{\theta }}(\cdot ,\mathbf{v})\) are twice partially continuously differentiable at \(\mathbf{x}\) for all values of \(\mathbf{v}\) such that \(d(\mathbf{v},\mathbf{z})=0\) or \(1\). Also, there exists \(\epsilon >0\) such that for all \(1\le j\le k\)

    $$\begin{aligned} \sup _{\Vert \mathbf{u}-\mathbf{x}\Vert \le \epsilon ,\mathbf{v}\in \mathcal {D}}\Big \Vert \frac{\partial }{\partial \mathbf{u}}\theta _{j}(\mathbf{u},\mathbf{v})\Big \Vert <\infty . \end{aligned}$$
  • (A4) All entries of \({\varvec{\rho }}(\cdot ,\mathbf{v})\) are continuous at \(\mathbf{x}\) for all values \(\mathbf{v}\) such that \(d(\mathbf{v},\mathbf{z})=0\) or \(1\), and \({\varvec{\rho }}(\mathbf{x},\mathbf{z})\) is positive definite.

  • (A5) The density function \(f(\cdot ,\mathbf{v})\) is continuous at \(\mathbf{x}\) for all values \(\mathbf{v}\) such that \(d(\mathbf{v},\mathbf{z})=0\) or \(1\), and \(f(\mathbf{x},\mathbf{z})>0\).

  • (A6) All entries of \({\varvec{\tau }}(\cdot ,\mathbf{z})\) are continuous at \(\mathbf{x}\).

  • (A7) For any compact set \(\mathcal {C}\), it holds that \(\sup _{\mathbf{s}\in \mathcal {C}}\Vert {\varvec{\psi }}(\mathbf{s}|\mathbf{x}+\mathbf{u},\mathbf{z})-{\varvec{\psi }}(\mathbf{s}|\mathbf{x},\mathbf{z})\Vert \rightarrow 0\) as \(\Vert \mathbf{u}\Vert \rightarrow 0\).

The first part of the assumption (A1) is required for likelihood-based methods. Without this assumption, likelihood-based methods would not work. It holds if the logarithm of the conditional density \(\log g(y,{\varvec{\theta }})\) is strictly convex in \({\varvec{\theta }}\), the latter being typically assumed for likelihood-based methods. The second part of (A1) is also typical. It is just a Bartlett identity of first-order. The two conditions of (A2) are for a stochastic expansion and the asymptotic normality of the estimator. For the stochastic expansion we actually need the first condition with \(\delta =0\) and the second one, but for the asymptotic normality we require a higher moment condition on \(U_{1}\). The first part of (A3) is typical for nonparametric smoothing, and is for a bias expansion of the estimator. The second part of (A3) is to deal with those terms involving \(w_{j}\) in the bias expansion. The assumptions (A4)-(A6) are used to obtain the leading bias and variance of the estimator. The last assumption (A7) is also required, along with (A2), for a stochastic expansion of the estimator.

1.2 Proof of Theorem 3.1

Hereafter, \({\varvec{\theta }}\) denotes the true function. We also let \({\varvec{\Theta }}\) denote the matrix of the partial derivatives of the true vector function, that is, \({\varvec{\Theta }}_{jl}(\mathbf{x},\mathbf{z})=\partial \theta _{j}(\mathbf{x},\mathbf{z})/\partial x_{l}\), where \(\theta _{j}\) is the \(j\)th component function of \({\varvec{\theta }}\) and \(x_{l}\) is the \(l\)th coordinate of \(\mathbf{x}\). Define, for a given \((\mathbf{x},\mathbf{z})\),

$$\begin{aligned} \tilde{{\varvec{\theta }}}(\mathbf{u},\mathbf{v})={\varvec{\theta }}(\mathbf{x},\mathbf{z})+{\varvec{\Theta }}(\mathbf{x},\mathbf{z})(\mathbf{u}-\mathbf{x}). \end{aligned}$$

The function \(\tilde{{\varvec{\theta }}}(\mathbf{u},\mathbf{v})\) is an approximation of \({\varvec{\theta }}(\mathbf{u},\mathbf{v})\) for \(\mathbf{u}\) near \(\mathbf{x}\) and for \(\mathbf{v}\) near \(\mathbf{z}\), which is linear in the direction of \(\mathbf{x}\), while constant in the direction of \(\mathbf{z}\). Define \(\mathbf{l}(\mathbf{u})=(1,\mathbf{u}^{\top })^{\top }\) for \(\mathbf{u}\in {\mathbb {R}}^{p}\), and

$$\begin{aligned} \mathbf{G}({\varvec{\alpha }},\mathbf{A})=f(\mathbf{x},\mathbf{z})\int \mathbf{l}(\mathbf{u})\otimes {\varvec{\psi }}({\varvec{\alpha }}+\mathbf{A}\mathbf{u}|\mathbf{x},\mathbf{z})K(\mathbf{u})\, d\mathbf{u}\end{aligned}$$
(7.1)

for \({\varvec{\alpha }}\in {\mathbb {R}}^{k}\) and \(\mathbf{A}\) being a \((k\times p)\)-matrix, where \(\otimes \) denotes the Kronecker product. Note that \(\mathbf{G}\) is a vector of \(k(p+1)\) multivariate functions. This is the population version of

$$\begin{aligned} \mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})&{:=}&n^{-1}\sum _{i=1}^{n}\{\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\otimes \mathbf{g}_{1}\left( Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i})+{\varvec{\alpha }}+\mathbf{A}\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x})\right) \times K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})\}. \end{aligned}$$

The function \(\mathbf{G}_{n}\) is obtained if we differentiate \({\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A}){:=}L_{n}({\varvec{\theta }}(\mathbf{x},\mathbf{z})+{\varvec{\alpha }},{\varvec{\Theta }}(\mathbf{x},\mathbf{z})+\mathbf{A}\mathbf{H}^{-1})\) with respect to \({\varvec{\alpha }}\) and \(\mathbf{A}\), where \(L_{n}\) is defined at (2.1). The top \(k\) entries of \(\mathbf{G}_{n}\) are the partial derivatives \(\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial \alpha _{1},\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial \alpha _{2},\ldots ,\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial \alpha _{k}\), and the next \(k\) entries are \(\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{11},\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{21},\ldots ,\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{k1}\), and the last \(k\) entries are \(\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{1p},\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{2p},\ldots ,\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{kp}\), where we write \({\varvec{\alpha }}=(\alpha _{1},\ldots ,\alpha _{k})^{\top }\) and \(\mathbf{A}=(A_{ij})\). Define

$$\begin{aligned} \hat{{\varvec{\alpha }}}(\mathbf{x},\mathbf{z})=\hat{{\varvec{\theta }}}(\mathbf{x},\mathbf{z})-{\varvec{\theta }}(\mathbf{x},\mathbf{z}),\quad \hat{\mathbf{A}}(\mathbf{x},\mathbf{z})=\left[ \hat{{\varvec{\Theta }}}(\mathbf{x},\mathbf{z})-{\varvec{\Theta }}(\mathbf{x},\mathbf{z})\right] \mathbf{H}. \end{aligned}$$

Then, \((\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})\) is the solution of the equation \(\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})=\mathbf{0}\).

We claim that, for any compact set \(\mathcal {C}\) of \(({\varvec{\alpha }},\mathbf{A})\), one has

$$\begin{aligned} \sup _{(\footnotesize {\varvec{\alpha }},\mathbf{A})\in \mathcal {C}}\Vert \mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})-E\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})\Vert&= O_{p}(n^{-1/2}|\mathbf{H}|^{-1/2}(\log n)^{1/2})\end{aligned}$$
(7.2)
$$\begin{aligned} \sup _{(\footnotesize {\varvec{\alpha }},\mathbf{A})\in \mathcal {C}}\Vert E\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})-\mathbf{G}({\varvec{\alpha }},\mathbf{A})\Vert&= o(1). \end{aligned}$$
(7.3)

These two properties imply the uniform convergence of \(\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})\) to \(\mathbf{G}({\varvec{\alpha }},\mathbf{A})\) in probability over any compact set \(\mathcal {C}\). Due to the first part of the assumption (A1), we can conclude that all the entries of \(\hat{{\varvec{\alpha }}}(\mathbf{x},\mathbf{z})\) and \(\hat{\mathbf{A}}(\mathbf{x},\mathbf{z})\) converge to zero in probability. This enables us to further expand \(\mathbf{G}_{n}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})=0\) around the solution of \(\mathbf{G}({\varvec{\alpha }},\mathbf{A})=\mathbf{0}\) which are \(({\varvec{\alpha }},\mathbf{A})=(\mathbf{0},\mathbf{O})\). Define

$$\begin{aligned} \mathbf{J}_{n}({\varvec{\alpha }},\mathbf{A}) {:=} n^{-1}\sum _{i=1}^{n}\left\{\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))^{\top }\right] \otimes \,\mathbf{g}_{2}\left( Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i})+{\varvec{\alpha }}+\mathbf{A}\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x})\right) K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})\right\}. \end{aligned}$$

This is obtained by differentiating \(\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})\) with respect to \({\varvec{\alpha }}\) and \(\mathbf{A}\). Let \({\varvec{\upsilon }}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})\) denote a \(k(p+1)\)-vector obtained by concatenating the entries of \(\hat{{\varvec{\alpha }}}\) and \(\hat{\mathbf{A}}\). It is defined by \({\varvec{\upsilon }}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})^{\top }=(\hat{{\varvec{\alpha }}}^{\top },\hat{\mathbf{A}}_{1}^{\top },\ldots ,\hat{\mathbf{A}}_{p}^{\top })\), where \(\hat{\mathbf{A}}=[\hat{\mathbf{A}}_{1},\ldots ,\hat{\mathbf{A}}_{p}]\). Then, it follows that, for some \(({\varvec{\alpha }}^{*},\mathbf{A}^{*})\) such that \(\Vert ({\varvec{\alpha }}^{*},\mathbf{A}^{*})\Vert \le \Vert (\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})\Vert \),

$$\begin{aligned} \mathbf{0}=\mathbf{G}_{n}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})=\mathbf{G}_{n}(\mathbf{0},\mathbf{O})+\mathbf{J}_{n}({\varvec{\alpha }}^{*},\mathbf{A}^{*}){\varvec{\upsilon }}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}}). \end{aligned}$$
(7.4)

For \(\mathbf{J}_{n}({\varvec{\alpha }},\mathbf{A})\) we will show that, for any compact set \(\mathcal {C}\),

$$\begin{aligned} \sup _{(\footnotesize {\varvec{\alpha }},\mathbf{A})\in \mathcal {C}}\Vert \mathbf{J}_{n}({\varvec{\alpha }},\mathbf{A})-E\mathbf{J}_{n}({\varvec{\alpha }},\mathbf{A})\Vert =o_{p}(1). \end{aligned}$$
(7.5)

This entails with the second part of the assumption (A2)

$$\begin{aligned} \mathbf{J}_{n}({\varvec{\alpha }}^{*},\mathbf{A}^{*})=E\mathbf{J}_{n}(\mathbf{0},\mathbf{O})+o_{p}(1). \end{aligned}$$
(7.6)

To see this, note that the second part of the assumption (A2) implies that for a given \(\delta >0\) there exists \(\varepsilon >0\) such that, for sufficiently large \(n\), \(\Vert E\mathbf{J}_{n}({\varvec{\alpha }},\mathbf{A})-E\mathbf{J}_{n}(\mathbf{0},\mathbf{O})\Vert \le \delta \) for all \(({\varvec{\alpha }},\mathbf{A})\) with \(\Vert ({\varvec{\alpha }},\mathbf{A})\Vert \le \varepsilon \). This and the consistency of \((\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})\) together with (7.5) establish (7.6). Define a diagonal matrix \(\mathbf{M}\) of dimension \((p+1)\) in such a way that the first entry equals \(1\) and the rest are all \(\mu _{2}\). We claim

$$\begin{aligned} E\mathbf{J}_{n}(\mathbf{0},\mathbf{O})=-\left[ \mathbf{M}\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z})\right] f(\mathbf{x},\mathbf{z})+o(1). \end{aligned}$$
(7.7)

The expansions (7.4), (7.6) and (7.7) give

$$\begin{aligned} \hat{{\varvec{\alpha }}}(\mathbf{x},\mathbf{z})&= \left[ \mathbf{I}_{k},\mathbf{O},\ldots ,\mathbf{O}\right] {\varvec{\upsilon }}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})\nonumber \\&= f(\mathbf{x},\mathbf{z})^{-1}\left[ \mathbf{1}_{p+1}^{\top }\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z})^{-1}\right] \mathbf{G}_{n}(\mathbf{0},\mathbf{O})[1+o_{p}(1)], \end{aligned}$$
(7.8)

where \(\mathbf{1}_{p+1}\) denotes the \((p+1)\)-dimensional unit vector such that \(\mathbf{1}_{p+1}^{\top }=(1,0,\ldots ,0)\).

Now, we derive the first-order properties of \(\mathbf{G}_{n}(\mathbf{0},\mathbf{O})\). For \(\mathbf{Z}^{i}=\mathbf{z}\), \(\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})=1\). For \(\mathbf{Z}^{i}\) with \(d(\mathbf{Z}^{i},\mathbf{z})=1\), we have \(\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})=w_{j}\) for some \(j\). Those \(\mathbf{Z}^{i}\) with \(d(\mathbf{Z}^{i},\mathbf{z})\ge 2\) have a contribution of order \(O_{p}(w^{*2})\) to \(\mathbf{G}_{n}(\mathbf{0},\mathbf{O})\), where \(w^{*}=\max _{1\le j\le d}w_{j}\). Thus,

$$\begin{aligned} \mathbf{G}_{n}(\mathbf{0},\mathbf{O})&= n^{-1}\sum _{i=1}^{n}\{\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\otimes \mathbf{g}_{1}(Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i}))\times K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})\}\nonumber \\&= n^{-1}\sum _{i=1}^{n}\{\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\otimes \mathbf{g}_{1}(Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i}))K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})I(\mathbf{Z}^{i}=\mathbf{z})\\& \quad+n^{-1}\sum _{i=1}^{n}\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\otimes \mathbf{g}_{1}(Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i}))\times K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z}) \\& \quad\times \, I[d(\mathbf{Z}^{i},\mathbf{z})=1]\}+O_{p}(w^{*2})\end{aligned}$$
(7.9)
$$\begin{aligned}&\overset{{\mathrm{let}}}{=}&\mathbf{T}_{1}+\mathbf{T}_{2}+O_{p}(w^{*2}). \end{aligned}$$
(7.10)

The expected value of the first term in (7.9) has the following expansion due to the second part of the assumption (A1) and the assumptions (A2)–(A5):

$$\begin{aligned} E(\mathbf{T}_{1})&= E\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x}))\otimes \mathbf{g}_{2}(Y,{\varvec{\theta }}(\mathbf{X},\mathbf{z}))\right] [\tilde{{\varvec{\theta }}}(\mathbf{X},\mathbf{z})-{\varvec{\theta }}(\mathbf{X},\mathbf{z})]K_{\mathbf{H}}(\mathbf{X}-\mathbf{x})I(\mathbf{Z}=\mathbf{z}) + o({\mathrm{tr}}(\mathbf{H}^{2}))\\&= E\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x}))\otimes {\varvec{\rho }}(\mathbf{X},\mathbf{z})\right] [{\varvec{\theta }}(\mathbf{X},\mathbf{z})-\tilde{{\varvec{\theta }}}(\mathbf{X},\mathbf{z})]\times K_{\mathbf{H}}(\mathbf{X}-\mathbf{x})I(\mathbf{Z}=\mathbf{z}) + o({\mathrm{tr}}(\mathbf{H}^{2}))\\&= \frac{1}{2}\, f(\mathbf{x},\mathbf{z})\int\{ \left[ \mathbf{l}(\mathbf{u})\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z})\right] \left[ \mathbf{u}^{\top }\mathbf{H}{\varvec{\theta }}_{j}''(\mathbf{x},\mathbf{z})\mathbf{H}\mathbf{u}\right] K(\mathbf{u})\}\, d\mathbf{u}+o({\mathrm{tr}}(\mathbf{H}^{2})). \end{aligned}$$

By the properties of the multivariate kernel \(K\), we can further approximate \(E(\mathbf{T}_{1})\) by

$$\begin{aligned} E(\mathbf{T}_{1})=\frac{1}{2}\,\mu _{2}\, f(\mathbf{x},\mathbf{z})\left[ \mathbf{1}_{p+1}\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z})\right] {\varvec{\beta }}_{\mathbf{H}}(\mathbf{x},\mathbf{z})+o({\mathrm{tr}}(\mathbf{H}^{2})), \end{aligned}$$

where \({\varvec{\beta }}_{\mathbf{H}}(\mathbf{x},\mathbf{z})\) is \(k\)-vector whose \(j\)th entry equals \({\mathrm{tr}}({\varvec{\theta }}_{j}''(\mathbf{x},\mathbf{z})\mathbf{H}^{2})\). One can similarly get an approximation of \({\mathrm{var}}(\mathbf{T}_{1})\). In fact,

$$\begin{aligned} {\mathrm{var}}(\mathbf{T}_{1})=n^{-1}|\mathbf{H}|^{-1}f(\mathbf{x},\mathbf{z})\cdot \mathbf{D}\otimes {\varvec{\tau }}(\mathbf{x},\mathbf{z})+o(n^{-1}|\mathbf{H}|^{-1}), \end{aligned}$$

where \(\mathbf{D}\) is a \((p+1)\)-dimensional diagonal matrix whose first diagonal entry equals \(\int K^{2}(\mathbf{u})\, d\mathbf{u}\) and the next \(p\) diagonal entries are \(\int u_{j}^{2}K^{2}(\mathbf{u})\, d\mathbf{u},\,1\le j\le p\).

Next, we look into the term \(\mathbf{T}_{2}\). This term contributes only \(E(\mathbf{T}_{2})\) to the first-order properties of \(\mathbf{G}_{n}(\mathbf{0},\mathbf{O})\) since \({\mathrm{var}}(\mathbf{T}_{2})\) is negligible in comparison with \({\mathrm{var}}(\mathbf{T}_{1})\) because of the additional factors \(w_{j}\) that go to zero as \(n\) tends to infinity. For an expansion of \(E(\mathbf{T}_{2})\), we note that the following approximation holds due to the second part of (A3): uniformly for \(\mathbf{v}\in \mathcal {D}\),

$$\begin{aligned} {\varvec{\theta }}(\mathbf{u},\mathbf{v})-\tilde{{\varvec{\theta }}}(\mathbf{u},\mathbf{v})=\left[ {\varvec{\theta }}(\mathbf{x},\mathbf{v})-{\varvec{\theta }}(\mathbf{x},\mathbf{z})\right] +O(\Vert \mathbf{u}-\mathbf{x}\Vert ) \end{aligned}$$

for \(\mathbf{u}\) near \(\mathbf{x}\). With this and using the assumptions (A4) and (A5) we get

$$\begin{aligned} E(\mathbf{T}_{2})&= E\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x}))\otimes {\varvec{\rho }}(\mathbf{X},\mathbf{Z})\right]\times [{\varvec{\theta }}(\mathbf{X},\mathbf{Z})-\tilde{{\varvec{\theta }}}(\mathbf{X},\mathbf{Z})]K_{\mathbf{H}}(\mathbf{X}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z},\mathbf{z}) \times \, I[d(\mathbf{Z},\mathbf{z})=1]+o(w^{*})\\&= E\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x}))\otimes {\varvec{\rho }}(\mathbf{X},\mathbf{Z})\right]\times [{\varvec{\theta }}(\mathbf{x},\mathbf{Z})-{\varvec{\theta }}(\mathbf{x},\mathbf{z})]K_{\mathbf{H}}(\mathbf{X}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z},\mathbf{z})\times \, I[d(\mathbf{Z},\mathbf{z})=1]+o(w^{*})\\&= \sum _{\mathbf{z}':d(\mathbf{z}',\mathbf{z})=1}\{f(\mathbf{x},\mathbf{z}')\left[ \mathbf{1}_{p+1}\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z}')\right]\times [{\varvec{\theta }}(\mathbf{x},\mathbf{z})-{\varvec{\theta }}(\mathbf{x},\mathbf{z})]\Lambda _{\mathbf{w}}(\mathbf{z}',\mathbf{z})\}+o(w^{*})\\&= \sum _{j=1}^{d}w_{j}\sum _{z_{j}'\ne z_{j},z_{j}'\in \mathcal {D}_{j}}\{f(\mathbf{x},\mathbf{z}_{-j},z_{j}')\times\left[ \mathbf{1}_{p+1}\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z}_{-j},z_{j}')\right]\times [{\varvec{\theta }}(\mathbf{x},\mathbf{z}_{-j},z_{j}')-{\varvec{\theta }}(\mathbf{x},\mathbf{z})]\}+\, o(w^{*}). \end{aligned}$$

Asymptotic normality of \(\mathbf{T}_{1}\) follows from a standard technique and the first part of the assumption (A2). The theorem now follows from some basic properties of Kronecker products. It remains to prove (7.2), (7.3), (7.5) and (7.7). Among them, (7.7) can be proved similarly as in the derivation of the expansion for \(E(\mathbf{T}_{1})\).

We prove (7.2) first. We write simply \(\mathbf{l}^{i}\) for \(\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\), \(\mathbf{g}_{1}^{i}({\varvec{\alpha }},\mathbf{A})\) for \(\mathbf{g}_{1}(Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i})+{\varvec{\alpha }}+\mathbf{A}\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\), \(K^{i}\) for \(K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})\) and \(\Lambda ^{i}\) for \(\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})\). Define \({\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})=(\mathbf{l}^{i}\otimes \mathbf{g}_{1}^{i}({\varvec{\alpha }},\mathbf{A}))K^{i}\Lambda ^{i}\). Then, we can write \(\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})=n^{-1}\sum _{i=1}^{n}{\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})\). We want to get an exponential bound for a large deviation of the centered \(\sqrt{n|\mathbf{H}|/\log n}\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})\) for each fixed \(({\varvec{\alpha }},\mathbf{A})\). Since \({\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})\) are not bounded, we employ a truncation technique. Since \(\Lambda ^{i}\le 1\) for all \(1\le i\le n\) and from the first part of the assumption (A2), we obtain that for any compact set \(\mathcal {C}\)

$$\begin{aligned}&\sup _{(\footnotesize {\varvec{\alpha }},\mathbf{A})\in \mathcal {C}}\Big \Vert n^{-1}\sum _{i=1}^{n}{\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})I(\Vert {\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})\Vert >\sqrt{n})\Big \Vert \end{aligned}$$
(7.11)
$$\begin{aligned}&\le Cn^{-1}\sum _{i=1}^{n}U_{1}(Y^{i})I[U_{1}(Y^{i})>C\sqrt{n}\,]K^{i}. \end{aligned}$$
(7.12)

The right hand side of (7.11) has the expectation of the magnitude \(O(n^{-1/2})\) due to the fact that the conditional second moment of \(U_{1}(Y)\) given \(\mathbf{X}=\mathbf{u}\) is bounded locally uniformly for \(\mathbf{u}\) around \(\mathbf{x}\), see the first part of (A2). This implies that the left hand side of (7.11) is of order \(O_{p}(n^{-1/2})\). Similarly, we also get \(E[{\varvec{\xi }}({\varvec{\alpha }},\mathbf{A})I(\Vert {\varvec{\xi }}({\varvec{\alpha }},\mathbf{A})\Vert >\sqrt{n})]=O(n^{-1/2})\) uniformly for \(({\varvec{\alpha }},\mathbf{A})\) in any compact set. These considerations reduce the proof of (7.2) to that for the truncated version \(\mathbf{G}_{n}^{*}({\varvec{\alpha }},\mathbf{A}){:=}n^{-1}\sum _{i=1}^{n}\{{\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})I[\Vert {\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})\Vert \le \sqrt{n}]\}\). By a simple application of Markov inequality and since \(|\mathbf{H}|E\Vert {\varvec{\xi }}({\varvec{\alpha }},\mathbf{A})\Vert ^{2}\) is bounded, say by \(c\), from the first part of the assumption (A2), we get

$$\begin{aligned} P\Big [\Vert \mathbf{G}_{n}^{*}({\varvec{\alpha }},\mathbf{A})-E\mathbf{G}_{n}^{*}({\varvec{\alpha }},\mathbf{A})\Vert >M\sqrt{(\log n)/(n|\mathbf{H}|)}\Big ]\le n^{c-M} \end{aligned}$$
(7.13)

for any fixed \({\varvec{\alpha }}\) and \(\mathbf{A}\). Since \(\mathbf{G}_{n}\) is Lipschitz continuous of order \(1\) with a Lipschitz constant \(O_{p}(1)\) by the second part of the assumption (A2), the exponential bound (7.13) concludes the proof of (7.2).

Next, we prove (7.3). By the assumption (A7), we obtain

$$\begin{aligned} E\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})&= E\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x}))\otimes {\varvec{\psi }}\big (\tilde{{\varvec{\theta }}}(\mathbf{X},\mathbf{Z})-{\varvec{\theta }}(\mathbf{X},\mathbf{Z})+{\varvec{\alpha }}+\mathbf{A}\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x})\big |\mathbf{X},\mathbf{Z}\big )\right] \times \, K_{\mathbf{H}}(\mathbf{X}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z},\mathbf{z})\\&= \int \left\{\left[ \mathbf{l}(\mathbf{u})\otimes {\varvec{\psi }}\big (\tilde{{\varvec{\theta }}}(\mathbf{x}+\mathbf{H}\mathbf{u},\mathbf{z})-{\varvec{\theta }}(\mathbf{x}+\mathbf{H}\mathbf{u},\mathbf{z})+{\varvec{\alpha }}+\mathbf{A}\mathbf{u}\big |\mathbf{x}+\mathbf{H}\mathbf{u},\mathbf{z}\big )\right] \times \, f(\mathbf{x}+\mathbf{H}\mathbf{u},\mathbf{z})K(\mathbf{u})\right\}\, d\mathbf{u}+O(w^{*})\\&= \mathbf{G}({\varvec{\alpha }},\mathbf{A})+o(1) \end{aligned}$$

uniformly for \(({\varvec{\alpha }},\mathbf{A})\) in any compact set. This completes the proof of (7.3). The proof of (7.5) is similar to that of (7.2). For this proof, one may use continuity of \(\mathbf{g}_{2}(y,{\varvec{\theta }})\) in \({\varvec{\theta }}\) and the following exponential inequality for the truncated version of \(\mathbf{J}_{n}\), denoted by \(\mathbf{J}_{n}^{*}\), constructed in the same way as \(\mathbf{G}_{n}^{*}\): for any \(\varepsilon >0\) it holds that

$$\begin{aligned} P\Big [\Vert \mathbf{J}_{n}^{*}({\varvec{\alpha }},\mathbf{A})-E\mathbf{J}_{n}^{*}({\varvec{\alpha }},\mathbf{A})\Vert >\varepsilon \Big ]\le n^{c}e^{-\varepsilon \sqrt{n|\mathbf{H}|\log n}} \end{aligned}$$

for any fixed \({\varvec{\alpha }}\) and \(\mathbf{A}\), where \(c\) is the same positive constant as at (7.13).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Park, B.U., Simar, L. & Zelenyuk, V. Categorical data in local maximum likelihood: theory and applications to productivity analysis. J Prod Anal 43, 199–214 (2015). https://doi.org/10.1007/s11123-014-0394-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11123-014-0394-y

Keywords

JEL Classification

Navigation