Categorical data in local maximum likelihood: theory and applications to productivity analysis

Park, Byeong U.; Simar, Léopold; Zelenyuk, Valentin

doi:10.1007/s11123-014-0394-y

Categorical data in local maximum likelihood: theory and applications to productivity analysis

Published: 17 June 2014

Volume 43, pages 199–214, (2015)
Cite this article

Journal of Productivity Analysis Aims and scope Submit manuscript

Byeong U. Park¹,
Léopold Simar² &
Valentin Zelenyuk³

1015 Accesses
22 Citations
3 Altmetric
Explore all metrics

Abstract

In this paper we consider estimation of models popular in efficiency and productivity analysis (such as the stochastic frontier model, truncated regression model, etc.) via the local maximum likelihood method, generalizing this method here to allow for not only continuous but also discrete regressors. We provide asymptotic theory, some evidence from simulations, and illustrate the method with an empirical example. Our methodology and theory can also be adapted for other models where a likelihood of the unknown functions can be used to identify and estimate the underlying model. Simulation results indicate flexibility of the approach and good performances in various complex scenarios, even with moderate sample sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stochastic frontier models using the Generalized Exponential distribution

Article 02 February 2021

The conditional mode in parametric frontier models

Article 18 September 2023

Stochastic Frontier Analysis: Foundations and Advances I

Notes

We also tried the least squares cross-validation (LSCV) method and the results were very similar. Interestingly, yet not so surprisingly, our simulations showed that the LSCV appeared to be somewhat more robust for relatively small samples and much faster to optimize for large samples, yet the MLCV method sometimes gave better fit for relatively large samples.
Recall that the total conditional variance of $\varepsilon $ is given by ${\mathrm{Var}}(v-u|x,z)=\sigma _{v}^{2}(x,z)+\sigma _{u}^{2}(x,z)(\pi -2)/\pi $. Note also that here we used local likelihood estimation with linear approximation for $r(x,z), \log \sigma _{u}^{2}(x,z)$, and $\log \sigma _{v}^{2}(x,z)$.
E.g., MLCV for obtaining results presented in the NW, SW, NW and NE panels took about 0.2, 0.1, 4.5 and 1.5 hours, respectively, on the desktop with Intel Xeon CPU E5620 @ 2.40GHz with two processors, while parametric MLE took a second.
In the simulations, the MLCV-based optimal bandwidth for the continuous regressor was usually around $0.15$ for $n=1,000$ and around $0.25$ for $n=100$.

References

Aigner DJ, Lovell CAK, Schmidt P (1977) Formulation and estimation of stochastic frontier models. J Econom 6:21–37
Article Google Scholar
Aitchison J, Aitken CGG (1976) Multivariate binary discrimination by the kernel method. Biometrika 63(3):413–420
Article Google Scholar
Badin L, Simar L (2009) A bias corrected nonparametric envelopment estimator of frontiers. Econom Theory 25:1289–1318
Article Google Scholar
Eguchi S, Kim TY, Park BU (2003) Local likelihood method: a bridge over parametric and nonparametric regression. J Nonparametric Stat 15(6):665–683
Article Google Scholar
Fan J, Heckman NE, Wand MP (1995) Local polynomial kernel regression for generalized linear models and quasi-likelihood functions. J Am Stat Assoc 90(429):141–150
Article Google Scholar
Frölich M (2006) Non-parametric regression for binary dependent variables. Econom J 9:511–540
Article Google Scholar
Hall P, Li Q, Racine J (2007) Nonparametric estimation of regression functions in the presence of irrelevant regressors. Rev Econ Stat 89(4):784–789
Article Google Scholar
Henderson DJ, Zelenyuk V (2006) Testing for (efficiency) catching-up. South Econ J 73(4):1003–1019
Google Scholar
Jondrow J, Lovell CAK, Materov IS, Schmidt P (1982) On the estimation of technical inefficiency in stochastic frontier production models. J Econom 19:233–238
Article Google Scholar
Kumar S, Russell RR (2002) Technological change, technological catch-up, and capital deepening: relative contributions to growth and convergence. Am Econ Rev 92(3):527–548
Article Google Scholar
Kumbhakar SC, Park BU, Simar L, Tsionas EG (2007) Nonparametric stochastic frontiers: a local likelihood approach. J Econom 137(1):1–27
Article Google Scholar
Li Q, Racine JS (2007) Nonparametric econometrics: theory and practice. Princeton University Press, Princeton
Google Scholar
Park B, Simar L, Ch Weiner (2000) The FDH estimator for productivity efficiency scores: asymptotic properties. Econom Theory 16:855–877
Article Google Scholar
Park BU, Simar L, Zelenyuk V (2008) Local likelihood estimation of truncated regression and its partial derivatives: theory and application. J Econom 146(1):185–198
Article Google Scholar
Racine JS, Li Q (2004) Nonparmetric estimation of regression functions with both categorical and continuous data. J Econom 119(1):99–130
Article Google Scholar
Racine J, Hart J, Li Q (2006) Testing the significance of categorical predictor variables in nonparametric regression models. Econom Rev 25(4):523–544
Article Google Scholar
Severini TA, Wong WH (1992) Profile likelihood and conditionally parametric models. Ann Stat 20(4):1768–1802
Article Google Scholar
Simar L, Wilson PW (2007) Estimation and inference in two-stage, semi-parametric models of production processes. J Econom 136(1):31–64
Article Google Scholar
Simar L, Zelenyuk V (2006) On testing equality of distributions of technical eciency scores. Econom Rev 25(4):497–522
Article Google Scholar
Simar L, Zelenyuk V (2011) Stochastic FDH/DEA estimators for frontier analysis. J Prod Anal 36(1):1–20
Article Google Scholar
Tibshirani R, Hastie TJ (1987) Local likelihood estimation. J Am Stat Assoc 82:559–567
Article Google Scholar
Zelenyuk V, Zheka V (2006) Corporate governance and firm’s efficiency: the case of a transitional country, Ukraine. J Prod Anal 25:143–168
Article Google Scholar

Download references

Acknowledgments

All authors acknowledge the financial support from ARC Discovery Grant DP130101022 and the CEPA of School of Economics of The University of Queensland (Australia), from the “Interuniversity Attraction Pole”, Phase VII (No. P7/06) of the Belgian Science Policy (Belgium), from the INRA-GREMAQ, Toulouse (France), from the NRF Grant funded by the Korean government (MEST), No. 20100017437, (listed here in alphabetical order). The authors also thank their colleagues and audiences of many conferences and seminars where this work has been presented. Only the authors and not the above mentioned institutions or people remain responsible for the views expressed.

Author information

Authors and Affiliations

Department of Statistics, Seoul National University, Seoul, Korea
Byeong U. Park
Institut de Statistique, Biostatistique et Sciences Actuarielles, Université Catholique de Louvain, Louvain-la-Neuve, Belgium
Léopold Simar
School of Economics and Centre for Efficiency and Productivity Analysis, University of Queensland, Brisbane, Australia
Valentin Zelenyuk

Authors

Byeong U. Park
View author publications
You can also search for this author in PubMed Google Scholar
Léopold Simar
View author publications
You can also search for this author in PubMed Google Scholar
Valentin Zelenyuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Valentin Zelenyuk.

Appendix: Technical details

Below in the conditions and in the proof of Theorem 3.1, $\Vert \mathbf{v}\Vert $ denotes the usual $\ell _{2}$-norm for a vector $\mathbf{v}$, and the Frobenius (Hilbert−Schmidt) norm for a matrix $\mathbf{v}$. Define ${\varvec{\psi }}(\mathbf{s}|\mathbf{x},\mathbf{z})=E\left[ \mathbf{g}_{1}(Y,{\varvec{\theta }}(\mathbf{x},\mathbf{z})+\mathbf{s}\,|\,\mathbf{X}=\mathbf{x},\mathbf{Z}=\mathbf{z})\right] $ for $\mathbf{s}\in {\mathbb {R}}^{k}$. The conditions and the proof are given for a fixed point $(\mathbf{x},\mathbf{z})$ at which we want to estimate the value of ${\varvec{\theta }}=(\theta _{1},\ldots ,\theta _{k})^{\top }$.

1.1 Regularity conditions

(A1) For the vector of functions $\mathbf{G}$ defined at (7.1), the equation $\mathbf{G}({\varvec{\alpha }},\mathbf{A})=\mathbf{0}$ has the unique solution ${\varvec{\alpha }}=\mathbf{0}$ and $\mathbf{A}=\mathbf{O}$, where $\mathbf{0}$ is the zero vector and $\mathbf{O}$ is the zero matrix. Also, $E[\mathbf{g}_{1}(Y,{\varvec{\theta }}(\mathbf{X},\mathbf{Z}))|\mathbf{X},\mathbf{Z}]=\mathbf{0}$ almost surely.
(A2) For any compact set $\mathcal {C}$, there exists a function $U_{1}$ such that $\sup _{\footnotesize {\varvec{\theta }}\in \mathcal {C}}\Vert \mathbf{g}_{1}(y,{\varvec{\theta }})\Vert \le U_{1}(y)$ and $\sup _{\Vert \mathbf{u}-\mathbf{x}\Vert \le \epsilon }E[U_{1}(Y)^{2+\delta }|\mathbf{X}=\mathbf{u})<\infty $ for some $\epsilon ,\,\delta >0$. Also, $\mathbf{g}_{2}(y,{\varvec{\theta }})$ is continuous in ${\varvec{\theta }}$ for each $y$, and there exists a function $U_{2}(y)$ such that $\sup _{\footnotesize {\varvec{\theta }}\in \mathcal {C}}\Vert \mathbf{g}_{2}(y,{\varvec{\theta }})\Vert \le U_{2}(y)$ for any compact set $\mathcal {C}$ and $\sup _{\Vert \mathbf{u}-\mathbf{x}\Vert \le \epsilon }E[U_{2}(Y)^{2}|\mathbf{X}=\mathbf{u}]<\infty $ for some $\epsilon >0$.
(A3) All entries of ${\varvec{\theta }}(\cdot ,\mathbf{v})$ are twice partially continuously differentiable at $\mathbf{x}$ for all values of $\mathbf{v}$ such that $d(\mathbf{v},\mathbf{z})=0$ or $1$. Also, there exists $\epsilon >0$ such that for all $1\le j\le k$
$$\begin{aligned} \sup _{\Vert \mathbf{u}-\mathbf{x}\Vert \le \epsilon ,\mathbf{v}\in \mathcal {D}}\Big \Vert \frac{\partial }{\partial \mathbf{u}}\theta _{j}(\mathbf{u},\mathbf{v})\Big \Vert <\infty . \end{aligned}$$
(A4) All entries of ${\varvec{\rho }}(\cdot ,\mathbf{v})$ are continuous at $\mathbf{x}$ for all values $\mathbf{v}$ such that $d(\mathbf{v},\mathbf{z})=0$ or $1$, and ${\varvec{\rho }}(\mathbf{x},\mathbf{z})$ is positive definite.
(A5) The density function $f(\cdot ,\mathbf{v})$ is continuous at $\mathbf{x}$ for all values $\mathbf{v}$ such that $d(\mathbf{v},\mathbf{z})=0$ or $1$, and $f(\mathbf{x},\mathbf{z})>0$.
(A6) All entries of ${\varvec{\tau }}(\cdot ,\mathbf{z})$ are continuous at $\mathbf{x}$.
(A7) For any compact set $\mathcal {C}$, it holds that $\sup _{\mathbf{s}\in \mathcal {C}}\Vert {\varvec{\psi }}(\mathbf{s}|\mathbf{x}+\mathbf{u},\mathbf{z})-{\varvec{\psi }}(\mathbf{s}|\mathbf{x},\mathbf{z})\Vert \rightarrow 0$ as $\Vert \mathbf{u}\Vert \rightarrow 0$.

The first part of the assumption (A1) is required for likelihood-based methods. Without this assumption, likelihood-based methods would not work. It holds if the logarithm of the conditional density $\log g(y,{\varvec{\theta }})$ is strictly convex in ${\varvec{\theta }}$, the latter being typically assumed for likelihood-based methods. The second part of (A1) is also typical. It is just a Bartlett identity of first-order. The two conditions of (A2) are for a stochastic expansion and the asymptotic normality of the estimator. For the stochastic expansion we actually need the first condition with $\delta =0$ and the second one, but for the asymptotic normality we require a higher moment condition on $U_{1}$. The first part of (A3) is typical for nonparametric smoothing, and is for a bias expansion of the estimator. The second part of (A3) is to deal with those terms involving $w_{j}$ in the bias expansion. The assumptions (A4)-(A6) are used to obtain the leading bias and variance of the estimator. The last assumption (A7) is also required, along with (A2), for a stochastic expansion of the estimator.

1.2 Proof of Theorem 3.1

Hereafter, ${\varvec{\theta }}$ denotes the true function. We also let ${\varvec{\Theta }}$ denote the matrix of the partial derivatives of the true vector function, that is, ${\varvec{\Theta }}_{jl}(\mathbf{x},\mathbf{z})=\partial \theta _{j}(\mathbf{x},\mathbf{z})/\partial x_{l}$, where $\theta _{j}$ is the $j$th component function of ${\varvec{\theta }}$ and $x_{l}$ is the $l$th coordinate of $\mathbf{x}$. Define, for a given $(\mathbf{x},\mathbf{z})$,

$$\begin{aligned} \tilde{{\varvec{\theta }}}(\mathbf{u},\mathbf{v})={\varvec{\theta }}(\mathbf{x},\mathbf{z})+{\varvec{\Theta }}(\mathbf{x},\mathbf{z})(\mathbf{u}-\mathbf{x}). \end{aligned}$$

The function $\tilde{{\varvec{\theta }}}(\mathbf{u},\mathbf{v})$ is an approximation of ${\varvec{\theta }}(\mathbf{u},\mathbf{v})$ for $\mathbf{u}$ near $\mathbf{x}$ and for $\mathbf{v}$ near $\mathbf{z}$, which is linear in the direction of $\mathbf{x}$, while constant in the direction of $\mathbf{z}$. Define $\mathbf{l}(\mathbf{u})=(1,\mathbf{u}^{\top })^{\top }$ for $\mathbf{u}\in {\mathbb {R}}^{p}$, and

$$\begin{aligned} \mathbf{G}({\varvec{\alpha }},\mathbf{A})=f(\mathbf{x},\mathbf{z})\int \mathbf{l}(\mathbf{u})\otimes {\varvec{\psi }}({\varvec{\alpha }}+\mathbf{A}\mathbf{u}|\mathbf{x},\mathbf{z})K(\mathbf{u})\, d\mathbf{u}\end{aligned}$$

(7.1)

for ${\varvec{\alpha }}\in {\mathbb {R}}^{k}$ and $\mathbf{A}$ being a $(k\times p)$-matrix, where $\otimes $ denotes the Kronecker product. Note that $\mathbf{G}$ is a vector of $k(p+1)$ multivariate functions. This is the population version of

$$\begin{aligned} \mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})&{:=}&n^{-1}\sum _{i=1}^{n}\{\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\otimes \mathbf{g}_{1}\left( Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i})+{\varvec{\alpha }}+\mathbf{A}\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x})\right) \times K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})\}. \end{aligned}$$

The function $\mathbf{G}_{n}$ is obtained if we differentiate ${\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A}){:=}L_{n}({\varvec{\theta }}(\mathbf{x},\mathbf{z})+{\varvec{\alpha }},{\varvec{\Theta }}(\mathbf{x},\mathbf{z})+\mathbf{A}\mathbf{H}^{-1})$ with respect to ${\varvec{\alpha }}$ and $\mathbf{A}$, where $L_{n}$ is defined at (2.1). The top $k$ entries of $\mathbf{G}_{n}$ are the partial derivatives $\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial \alpha _{1},\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial \alpha _{2},\ldots ,\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial \alpha _{k}$, and the next $k$ entries are $\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{11},\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{21},\ldots ,\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{k1}$, and the last $k$ entries are $\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{1p},\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{2p},\ldots ,\partial {\tilde{L}}_{n}({\varvec{\alpha }},\mathbf{A})/\partial A_{kp}$, where we write ${\varvec{\alpha }}=(\alpha _{1},\ldots ,\alpha _{k})^{\top }$ and $\mathbf{A}=(A_{ij})$. Define

$$\begin{aligned} \hat{{\varvec{\alpha }}}(\mathbf{x},\mathbf{z})=\hat{{\varvec{\theta }}}(\mathbf{x},\mathbf{z})-{\varvec{\theta }}(\mathbf{x},\mathbf{z}),\quad \hat{\mathbf{A}}(\mathbf{x},\mathbf{z})=\left[ \hat{{\varvec{\Theta }}}(\mathbf{x},\mathbf{z})-{\varvec{\Theta }}(\mathbf{x},\mathbf{z})\right] \mathbf{H}. \end{aligned}$$

Then, $(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})$ is the solution of the equation $\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})=\mathbf{0}$.

We claim that, for any compact set $\mathcal {C}$ of $({\varvec{\alpha }},\mathbf{A})$, one has

$$\begin{aligned} \sup _{(\footnotesize {\varvec{\alpha }},\mathbf{A})\in \mathcal {C}}\Vert \mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})-E\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})\Vert&= O_{p}(n^{-1/2}|\mathbf{H}|^{-1/2}(\log n)^{1/2})\end{aligned}$$

(7.2)

$$\begin{aligned} \sup _{(\footnotesize {\varvec{\alpha }},\mathbf{A})\in \mathcal {C}}\Vert E\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})-\mathbf{G}({\varvec{\alpha }},\mathbf{A})\Vert&= o(1). \end{aligned}$$

(7.3)

These two properties imply the uniform convergence of $\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})$ to $\mathbf{G}({\varvec{\alpha }},\mathbf{A})$ in probability over any compact set $\mathcal {C}$. Due to the first part of the assumption (A1), we can conclude that all the entries of $\hat{{\varvec{\alpha }}}(\mathbf{x},\mathbf{z})$ and $\hat{\mathbf{A}}(\mathbf{x},\mathbf{z})$ converge to zero in probability. This enables us to further expand $\mathbf{G}_{n}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})=0$ around the solution of $\mathbf{G}({\varvec{\alpha }},\mathbf{A})=\mathbf{0}$ which are $({\varvec{\alpha }},\mathbf{A})=(\mathbf{0},\mathbf{O})$. Define

$$\begin{aligned} \mathbf{J}_{n}({\varvec{\alpha }},\mathbf{A}) {:=} n^{-1}\sum _{i=1}^{n}\left\{\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))^{\top }\right] \otimes \,\mathbf{g}_{2}\left( Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i})+{\varvec{\alpha }}+\mathbf{A}\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x})\right) K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})\right\}. \end{aligned}$$

This is obtained by differentiating $\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})$ with respect to ${\varvec{\alpha }}$ and $\mathbf{A}$. Let ${\varvec{\upsilon }}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})$ denote a $k(p+1)$-vector obtained by concatenating the entries of $\hat{{\varvec{\alpha }}}$ and $\hat{\mathbf{A}}$. It is defined by ${\varvec{\upsilon }}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})^{\top }=(\hat{{\varvec{\alpha }}}^{\top },\hat{\mathbf{A}}_{1}^{\top },\ldots ,\hat{\mathbf{A}}_{p}^{\top })$, where $\hat{\mathbf{A}}=[\hat{\mathbf{A}}_{1},\ldots ,\hat{\mathbf{A}}_{p}]$. Then, it follows that, for some $({\varvec{\alpha }}^{*},\mathbf{A}^{*})$ such that $\Vert ({\varvec{\alpha }}^{*},\mathbf{A}^{*})\Vert \le \Vert (\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})\Vert $,

$$\begin{aligned} \mathbf{0}=\mathbf{G}_{n}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})=\mathbf{G}_{n}(\mathbf{0},\mathbf{O})+\mathbf{J}_{n}({\varvec{\alpha }}^{*},\mathbf{A}^{*}){\varvec{\upsilon }}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}}). \end{aligned}$$

(7.4)

For $\mathbf{J}_{n}({\varvec{\alpha }},\mathbf{A})$ we will show that, for any compact set $\mathcal {C}$,

$$\begin{aligned} \sup _{(\footnotesize {\varvec{\alpha }},\mathbf{A})\in \mathcal {C}}\Vert \mathbf{J}_{n}({\varvec{\alpha }},\mathbf{A})-E\mathbf{J}_{n}({\varvec{\alpha }},\mathbf{A})\Vert =o_{p}(1). \end{aligned}$$

(7.5)

This entails with the second part of the assumption (A2)

$$\begin{aligned} \mathbf{J}_{n}({\varvec{\alpha }}^{*},\mathbf{A}^{*})=E\mathbf{J}_{n}(\mathbf{0},\mathbf{O})+o_{p}(1). \end{aligned}$$

(7.6)

To see this, note that the second part of the assumption (A2) implies that for a given $\delta >0$ there exists $\varepsilon >0$ such that, for sufficiently large $n$, $\Vert E\mathbf{J}_{n}({\varvec{\alpha }},\mathbf{A})-E\mathbf{J}_{n}(\mathbf{0},\mathbf{O})\Vert \le \delta $ for all $({\varvec{\alpha }},\mathbf{A})$ with $\Vert ({\varvec{\alpha }},\mathbf{A})\Vert \le \varepsilon $. This and the consistency of $(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})$ together with (7.5) establish (7.6). Define a diagonal matrix $\mathbf{M}$ of dimension $(p+1)$ in such a way that the first entry equals $1$ and the rest are all $\mu _{2}$. We claim

$$\begin{aligned} E\mathbf{J}_{n}(\mathbf{0},\mathbf{O})=-\left[ \mathbf{M}\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z})\right] f(\mathbf{x},\mathbf{z})+o(1). \end{aligned}$$

(7.7)

The expansions (7.4), (7.6) and (7.7) give

$$\begin{aligned} \hat{{\varvec{\alpha }}}(\mathbf{x},\mathbf{z})&= \left[ \mathbf{I}_{k},\mathbf{O},\ldots ,\mathbf{O}\right] {\varvec{\upsilon }}(\hat{{\varvec{\alpha }}},\hat{\mathbf{A}})\nonumber \\&= f(\mathbf{x},\mathbf{z})^{-1}\left[ \mathbf{1}_{p+1}^{\top }\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z})^{-1}\right] \mathbf{G}_{n}(\mathbf{0},\mathbf{O})[1+o_{p}(1)], \end{aligned}$$

(7.8)

where $\mathbf{1}_{p+1}$ denotes the $(p+1)$-dimensional unit vector such that $\mathbf{1}_{p+1}^{\top }=(1,0,\ldots ,0)$.

Now, we derive the first-order properties of $\mathbf{G}_{n}(\mathbf{0},\mathbf{O})$. For $\mathbf{Z}^{i}=\mathbf{z}$, $\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})=1$. For $\mathbf{Z}^{i}$ with $d(\mathbf{Z}^{i},\mathbf{z})=1$, we have $\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})=w_{j}$ for some $j$. Those $\mathbf{Z}^{i}$ with $d(\mathbf{Z}^{i},\mathbf{z})\ge 2$ have a contribution of order $O_{p}(w^{*2})$ to $\mathbf{G}_{n}(\mathbf{0},\mathbf{O})$, where $w^{*}=\max _{1\le j\le d}w_{j}$. Thus,

$$\begin{aligned} \mathbf{G}_{n}(\mathbf{0},\mathbf{O})&= n^{-1}\sum _{i=1}^{n}\{\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\otimes \mathbf{g}_{1}(Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i}))\times K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})\}\nonumber \\&= n^{-1}\sum _{i=1}^{n}\{\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\otimes \mathbf{g}_{1}(Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i}))K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})I(\mathbf{Z}^{i}=\mathbf{z})\\& \quad+n^{-1}\sum _{i=1}^{n}\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))\otimes \mathbf{g}_{1}(Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i}))\times K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z}) \\& \quad\times \, I[d(\mathbf{Z}^{i},\mathbf{z})=1]\}+O_{p}(w^{*2})\end{aligned}$$

(7.9)

$$\begin{aligned}&\overset{{\mathrm{let}}}{=}&\mathbf{T}_{1}+\mathbf{T}_{2}+O_{p}(w^{*2}). \end{aligned}$$

(7.10)

The expected value of the first term in (7.9) has the following expansion due to the second part of the assumption (A1) and the assumptions (A2)–(A5):

$$\begin{aligned} E(\mathbf{T}_{1})&= E\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x}))\otimes \mathbf{g}_{2}(Y,{\varvec{\theta }}(\mathbf{X},\mathbf{z}))\right] [\tilde{{\varvec{\theta }}}(\mathbf{X},\mathbf{z})-{\varvec{\theta }}(\mathbf{X},\mathbf{z})]K_{\mathbf{H}}(\mathbf{X}-\mathbf{x})I(\mathbf{Z}=\mathbf{z}) + o({\mathrm{tr}}(\mathbf{H}^{2}))\\&= E\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x}))\otimes {\varvec{\rho }}(\mathbf{X},\mathbf{z})\right] [{\varvec{\theta }}(\mathbf{X},\mathbf{z})-\tilde{{\varvec{\theta }}}(\mathbf{X},\mathbf{z})]\times K_{\mathbf{H}}(\mathbf{X}-\mathbf{x})I(\mathbf{Z}=\mathbf{z}) + o({\mathrm{tr}}(\mathbf{H}^{2}))\\&= \frac{1}{2}\, f(\mathbf{x},\mathbf{z})\int\{ \left[ \mathbf{l}(\mathbf{u})\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z})\right] \left[ \mathbf{u}^{\top }\mathbf{H}{\varvec{\theta }}_{j}''(\mathbf{x},\mathbf{z})\mathbf{H}\mathbf{u}\right] K(\mathbf{u})\}\, d\mathbf{u}+o({\mathrm{tr}}(\mathbf{H}^{2})). \end{aligned}$$

By the properties of the multivariate kernel $K$, we can further approximate $E(\mathbf{T}_{1})$ by

$$\begin{aligned} E(\mathbf{T}_{1})=\frac{1}{2}\,\mu _{2}\, f(\mathbf{x},\mathbf{z})\left[ \mathbf{1}_{p+1}\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z})\right] {\varvec{\beta }}_{\mathbf{H}}(\mathbf{x},\mathbf{z})+o({\mathrm{tr}}(\mathbf{H}^{2})), \end{aligned}$$

where ${\varvec{\beta }}_{\mathbf{H}}(\mathbf{x},\mathbf{z})$ is $k$-vector whose $j$th entry equals ${\mathrm{tr}}({\varvec{\theta }}_{j}''(\mathbf{x},\mathbf{z})\mathbf{H}^{2})$. One can similarly get an approximation of ${\mathrm{var}}(\mathbf{T}_{1})$. In fact,

$$\begin{aligned} {\mathrm{var}}(\mathbf{T}_{1})=n^{-1}|\mathbf{H}|^{-1}f(\mathbf{x},\mathbf{z})\cdot \mathbf{D}\otimes {\varvec{\tau }}(\mathbf{x},\mathbf{z})+o(n^{-1}|\mathbf{H}|^{-1}), \end{aligned}$$

where $\mathbf{D}$ is a $(p+1)$-dimensional diagonal matrix whose first diagonal entry equals $\int K^{2}(\mathbf{u})\, d\mathbf{u}$ and the next $p$ diagonal entries are $\int u_{j}^{2}K^{2}(\mathbf{u})\, d\mathbf{u},\,1\le j\le p$.

Next, we look into the term $\mathbf{T}_{2}$. This term contributes only $E(\mathbf{T}_{2})$ to the first-order properties of $\mathbf{G}_{n}(\mathbf{0},\mathbf{O})$ since ${\mathrm{var}}(\mathbf{T}_{2})$ is negligible in comparison with ${\mathrm{var}}(\mathbf{T}_{1})$ because of the additional factors $w_{j}$ that go to zero as $n$ tends to infinity. For an expansion of $E(\mathbf{T}_{2})$, we note that the following approximation holds due to the second part of (A3): uniformly for $\mathbf{v}\in \mathcal {D}$,

$$\begin{aligned} {\varvec{\theta }}(\mathbf{u},\mathbf{v})-\tilde{{\varvec{\theta }}}(\mathbf{u},\mathbf{v})=\left[ {\varvec{\theta }}(\mathbf{x},\mathbf{v})-{\varvec{\theta }}(\mathbf{x},\mathbf{z})\right] +O(\Vert \mathbf{u}-\mathbf{x}\Vert ) \end{aligned}$$

for $\mathbf{u}$ near $\mathbf{x}$. With this and using the assumptions (A4) and (A5) we get

$$\begin{aligned} E(\mathbf{T}_{2})&= E\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x}))\otimes {\varvec{\rho }}(\mathbf{X},\mathbf{Z})\right]\times [{\varvec{\theta }}(\mathbf{X},\mathbf{Z})-\tilde{{\varvec{\theta }}}(\mathbf{X},\mathbf{Z})]K_{\mathbf{H}}(\mathbf{X}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z},\mathbf{z}) \times \, I[d(\mathbf{Z},\mathbf{z})=1]+o(w^{*})\\&= E\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x}))\otimes {\varvec{\rho }}(\mathbf{X},\mathbf{Z})\right]\times [{\varvec{\theta }}(\mathbf{x},\mathbf{Z})-{\varvec{\theta }}(\mathbf{x},\mathbf{z})]K_{\mathbf{H}}(\mathbf{X}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z},\mathbf{z})\times \, I[d(\mathbf{Z},\mathbf{z})=1]+o(w^{*})\\&= \sum _{\mathbf{z}':d(\mathbf{z}',\mathbf{z})=1}\{f(\mathbf{x},\mathbf{z}')\left[ \mathbf{1}_{p+1}\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z}')\right]\times [{\varvec{\theta }}(\mathbf{x},\mathbf{z})-{\varvec{\theta }}(\mathbf{x},\mathbf{z})]\Lambda _{\mathbf{w}}(\mathbf{z}',\mathbf{z})\}+o(w^{*})\\&= \sum _{j=1}^{d}w_{j}\sum _{z_{j}'\ne z_{j},z_{j}'\in \mathcal {D}_{j}}\{f(\mathbf{x},\mathbf{z}_{-j},z_{j}')\times\left[ \mathbf{1}_{p+1}\otimes {\varvec{\rho }}(\mathbf{x},\mathbf{z}_{-j},z_{j}')\right]\times [{\varvec{\theta }}(\mathbf{x},\mathbf{z}_{-j},z_{j}')-{\varvec{\theta }}(\mathbf{x},\mathbf{z})]\}+\, o(w^{*}). \end{aligned}$$

Asymptotic normality of $\mathbf{T}_{1}$ follows from a standard technique and the first part of the assumption (A2). The theorem now follows from some basic properties of Kronecker products. It remains to prove (7.2), (7.3), (7.5) and (7.7). Among them, (7.7) can be proved similarly as in the derivation of the expansion for $E(\mathbf{T}_{1})$.

We prove (7.2) first. We write simply $\mathbf{l}^{i}$ for $\mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))$, $\mathbf{g}_{1}^{i}({\varvec{\alpha }},\mathbf{A})$ for $\mathbf{g}_{1}(Y^{i},\tilde{{\varvec{\theta }}}(\mathbf{X}^{i},\mathbf{Z}^{i})+{\varvec{\alpha }}+\mathbf{A}\mathbf{H}^{-1}(\mathbf{X}^{i}-\mathbf{x}))$, $K^{i}$ for $K_{\mathbf{H}}(\mathbf{X}^{i}-\mathbf{x})$ and $\Lambda ^{i}$ for $\Lambda _{\mathbf{w}}(\mathbf{Z}^{i},\mathbf{z})$. Define ${\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})=(\mathbf{l}^{i}\otimes \mathbf{g}_{1}^{i}({\varvec{\alpha }},\mathbf{A}))K^{i}\Lambda ^{i}$. Then, we can write $\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})=n^{-1}\sum _{i=1}^{n}{\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})$. We want to get an exponential bound for a large deviation of the centered $\sqrt{n|\mathbf{H}|/\log n}\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})$ for each fixed $({\varvec{\alpha }},\mathbf{A})$. Since ${\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})$ are not bounded, we employ a truncation technique. Since $\Lambda ^{i}\le 1$ for all $1\le i\le n$ and from the first part of the assumption (A2), we obtain that for any compact set $\mathcal {C}$

$$\begin{aligned}&\sup _{(\footnotesize {\varvec{\alpha }},\mathbf{A})\in \mathcal {C}}\Big \Vert n^{-1}\sum _{i=1}^{n}{\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})I(\Vert {\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})\Vert >\sqrt{n})\Big \Vert \end{aligned}$$

(7.11)

$$\begin{aligned}&\le Cn^{-1}\sum _{i=1}^{n}U_{1}(Y^{i})I[U_{1}(Y^{i})>C\sqrt{n}\,]K^{i}. \end{aligned}$$

(7.12)

The right hand side of (7.11) has the expectation of the magnitude $O(n^{-1/2})$ due to the fact that the conditional second moment of $U_{1}(Y)$ given $\mathbf{X}=\mathbf{u}$ is bounded locally uniformly for $\mathbf{u}$ around $\mathbf{x}$, see the first part of (A2). This implies that the left hand side of (7.11) is of order $O_{p}(n^{-1/2})$. Similarly, we also get $E[{\varvec{\xi }}({\varvec{\alpha }},\mathbf{A})I(\Vert {\varvec{\xi }}({\varvec{\alpha }},\mathbf{A})\Vert >\sqrt{n})]=O(n^{-1/2})$ uniformly for $({\varvec{\alpha }},\mathbf{A})$ in any compact set. These considerations reduce the proof of (7.2) to that for the truncated version $\mathbf{G}_{n}^{*}({\varvec{\alpha }},\mathbf{A}){:=}n^{-1}\sum _{i=1}^{n}\{{\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})I[\Vert {\varvec{\xi }}^{i}({\varvec{\alpha }},\mathbf{A})\Vert \le \sqrt{n}]\}$. By a simple application of Markov inequality and since $|\mathbf{H}|E\Vert {\varvec{\xi }}({\varvec{\alpha }},\mathbf{A})\Vert ^{2}$ is bounded, say by $c$, from the first part of the assumption (A2), we get

$$\begin{aligned} P\Big [\Vert \mathbf{G}_{n}^{*}({\varvec{\alpha }},\mathbf{A})-E\mathbf{G}_{n}^{*}({\varvec{\alpha }},\mathbf{A})\Vert >M\sqrt{(\log n)/(n|\mathbf{H}|)}\Big ]\le n^{c-M} \end{aligned}$$

(7.13)

for any fixed ${\varvec{\alpha }}$ and $\mathbf{A}$. Since $\mathbf{G}_{n}$ is Lipschitz continuous of order $1$ with a Lipschitz constant $O_{p}(1)$ by the second part of the assumption (A2), the exponential bound (7.13) concludes the proof of (7.2).

Next, we prove (7.3). By the assumption (A7), we obtain

$$\begin{aligned} E\mathbf{G}_{n}({\varvec{\alpha }},\mathbf{A})&= E\left[ \mathbf{l}(\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x}))\otimes {\varvec{\psi }}\big (\tilde{{\varvec{\theta }}}(\mathbf{X},\mathbf{Z})-{\varvec{\theta }}(\mathbf{X},\mathbf{Z})+{\varvec{\alpha }}+\mathbf{A}\mathbf{H}^{-1}(\mathbf{X}-\mathbf{x})\big |\mathbf{X},\mathbf{Z}\big )\right] \times \, K_{\mathbf{H}}(\mathbf{X}-\mathbf{x})\Lambda _{\mathbf{w}}(\mathbf{Z},\mathbf{z})\\&= \int \left\{\left[ \mathbf{l}(\mathbf{u})\otimes {\varvec{\psi }}\big (\tilde{{\varvec{\theta }}}(\mathbf{x}+\mathbf{H}\mathbf{u},\mathbf{z})-{\varvec{\theta }}(\mathbf{x}+\mathbf{H}\mathbf{u},\mathbf{z})+{\varvec{\alpha }}+\mathbf{A}\mathbf{u}\big |\mathbf{x}+\mathbf{H}\mathbf{u},\mathbf{z}\big )\right] \times \, f(\mathbf{x}+\mathbf{H}\mathbf{u},\mathbf{z})K(\mathbf{u})\right\}\, d\mathbf{u}+O(w^{*})\\&= \mathbf{G}({\varvec{\alpha }},\mathbf{A})+o(1) \end{aligned}$$

uniformly for $({\varvec{\alpha }},\mathbf{A})$ in any compact set. This completes the proof of (7.3). The proof of (7.5) is similar to that of (7.2). For this proof, one may use continuity of $\mathbf{g}_{2}(y,{\varvec{\theta }})$ in ${\varvec{\theta }}$ and the following exponential inequality for the truncated version of $\mathbf{J}_{n}$, denoted by $\mathbf{J}_{n}^{*}$, constructed in the same way as $\mathbf{G}_{n}^{*}$: for any $\varepsilon >0$ it holds that

$$\begin{aligned} P\Big [\Vert \mathbf{J}_{n}^{*}({\varvec{\alpha }},\mathbf{A})-E\mathbf{J}_{n}^{*}({\varvec{\alpha }},\mathbf{A})\Vert >\varepsilon \Big ]\le n^{c}e^{-\varepsilon \sqrt{n|\mathbf{H}|\log n}} \end{aligned}$$

for any fixed ${\varvec{\alpha }}$ and $\mathbf{A}$, where $c$ is the same positive constant as at (7.13).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, B.U., Simar, L. & Zelenyuk, V. Categorical data in local maximum likelihood: theory and applications to productivity analysis. J Prod Anal 43, 199–214 (2015). https://doi.org/10.1007/s11123-014-0394-y

Download citation

Published: 17 June 2014
Issue Date: April 2015
DOI: https://doi.org/10.1007/s11123-014-0394-y

Keywords

JEL Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Categorical data in local maximum likelihood: theory and applications to productivity analysis

Abstract

Access this article

Similar content being viewed by others

Stochastic frontier models using the Generalized Exponential distribution

The conditional mode in parametric frontier models

Stochastic Frontier Analysis: Foundations and Advances I

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Technical details

1.1 Regularity conditions

1.2 Proof of Theorem 3.1

Rights and permissions

About this article

Cite this article

Keywords

JEL Classification

Navigation

Categorical data in local maximum likelihood: theory and applications to productivity analysis

Abstract

Access this article

Similar content being viewed by others

Stochastic frontier models using the Generalized Exponential distribution

The conditional mode in parametric frontier models

Stochastic Frontier Analysis: Foundations and Advances I

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix: Technical details

Appendix: Technical details

1.1 Regularity conditions

1.2 Proof of Theorem 3.1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

JEL Classification

Search

Navigation