Abstract
The nested error regression (NER) model is a standard tool to analyze unit-level data in the field of small area estimation. Both random effects and error terms are assumed to be normally distributed in the standard NER model. However, in the case that asymmetry of distribution is observed in a given data, it is not appropriate to assume the normality. In this paper, we suggest the NER model with the error terms having skew-normal distributions. The Bayes estimator and the posterior variance are derived as simple forms. We also construct the estimators of the model-parameters based on the moment method. The resulting empirical Bayes (EB) estimator is assessed in terms of the conditional mean squared error, which can be estimated with second-order unbiasedness by parametric bootstrap methods. Through simulation and empirical studies, we compare the skew-normal model with the usual NER model and illustrate that the proposed model gives much more stable EB estimator when skewness is present.
Similar content being viewed by others
References
Arellano-Valle, R. B., Bolfarine, H., & Lachos, V. H. (2005). Skew-normal linear mixed models. Journal of Data Science, 3, 415–438.
Arellano-Valle, R. B., Bolfarine, H., & Lachos, V. (2007). Bayesian inference for skew-normal linear mixed models. Journal of Applied Statistics, 34, 663–682.
Azzalini, A. (1985). A class of distributions which includes the normal ones. Scandinavian Journal of Statistics, 12, 171–178.
Azzalini, A. (1986). Further results on a class of distributions which includes the normal ones. Statistica, XLVI, 199–208.
Azzalini, A. (2013). The skew-normal and related families. Cambridge: Cambridge University Press.
Azzalini, A., & Capitanio, A. (1999). Statistical applications of the multivariate skew normal distribution. Journal of the Royal Statistical Society, B 61, 579–602.
Battese, G. E., Harter, R. M., & Fuller, W. A. (1988). An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83, 28–36.
Booth, J. G., & Hobert, J. P. (1998). Standard errors of prediction in generalized linear mixed models. Journal of the American Statistical Association, 93, 262–272.
Butar, F. B., & Lahiri, P. (2003). On measures of uncertainty of empirical Bayes small-area estimators. Journal of Statistical Planning and Inference, 112, 63–76.
Diallo, M., & Rao, J. N. K. (2018). Small area estimation of complex parameters under unit-level models with skew-normal errors. Scandinavian Journal of Statistics, 45, 1092–1116.
Dunnett, C. W., & Sobel, M. (1955). Approximations to the probability integral and certain percentage points of a multivariate analogue of Student’s t-distribution. Biometrika, 42, 258–260.
Ferraz, V. R. S., & Moura, F. A. S. (2012). Small area estimation using skew normal models. Computational Statistics and Data Analysis, 56, 2864–2874.
Fuller, W. A., & Battese, G. E. (1973). Transformations for estimation of linear models with nested-error structure. Journal of the American Statistical Association, 68, 626–632.
Ghosh, M., & Rao, J. N. K. (1994). Small area estimation: an appraisal. Statistical Science, 9, 55–76.
Henze, N. (1986). A probabilistic representation of the “skew-normal” distribution. Scandinavian Journal of Statistics, 13, 271–275.
Pewsey, A. (2000). Problems of inference for Azzalini’s skew-normal distribution. Journal of Applied Statistics, 27, 859–870.
Pfeffermann, D. (2013). New important developments in small area estimation. Statistical Science, 28, 40–68.
Prasad, N. G. N., & Rao, J. N. K. (1990). The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association, 85, 163–171.
Rao, J. N. K., & Molina, I. (2015). Small area estimation (2nd ed.). Hoboken: Wiley.
Tallis, G. M. (1961). The moment generating function of the truncated multi-normal distribution. Journal of the Royal Statistical Society: Series B, 23, 223–229.
Acknowledgements
We would like to thank the Associate Editor and the two reviewers for many valuable comments and helpful suggestions which led to an improved version of this paper. The research of the second author was supported in part by Grant-in-Aid for Scientific Research from Japan Society for the Promotion of Science (\(\#\) 18K11188, \(\#\) 15H01943 and \(\#\) 26330036).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs
Appendix: Proofs
All the proofs of lemmas and theorems given in the paper are provided here.
Proof of expression (6)
We explain briefly the derivation of the expression (6). The exponent in \(f(v_i,{{\varvec{u}}}_{1i}\mid {{\varvec{y}}}_i)\) is proportional to
which is rewritten as
The first part corresponds to the density \(\phi (v_i;\mu _{v_i}, {\sigma }_{v_i}^2)\). For simplicity, let \({{\varvec{J}}}_{n_i}=\mathbf{1 }_{n_i}\mathbf{1 }_{n_i}^\top \),
Then the second term can be expressed as \({\sigma }^2{{\varvec{u}}}_{1i}^\top {{\varvec{u}}}_{1i} - 2{{\varvec{c}}}_i^\top {{\varvec{u}}}_{1i}-d{{\varvec{u}}}_{1i}^\top {{\varvec{J}}}_{n_i}{{\varvec{u}}}_{1i}\). After completing square, one gets \(\{{{\varvec{u}}}_{1i}-({\sigma }^2{{\varvec{I}}}_{n_i}-d{{\varvec{J}}}_{n_i})^{-1}{{\varvec{c}}}_i\}^\top ({\sigma }^2{{\varvec{I}}}_{n_i}-d{{\varvec{J}}}_{n_i})\{{{\varvec{u}}}_{1i}-({\sigma }^2{{\varvec{I}}}_{n_i}-d{{\varvec{J}}}_{n_i})^{-1}{{\varvec{c}}}_i\}\), which corresponds to the density \(\phi _{n_i}({{\varvec{u}}}_{1i};{\varvec{\mu }}_i,{\sigma }_{u_i}^2{{\varvec{R}}}_i)\). Thus, we have the expression given in (6). \(\square \)
Proof of Lemma 3.1
Following Tallis (1961), we have
where \({{\varvec{a}}}_{i(-k)}\) and \({{\varvec{w}}}_{i(-k)}\) are respectively \((n_i-1)\)-dimensional vector obtained by dropping the kth element of \({{\varvec{a}}}_i\) and \({{\varvec{w}}}_i\), \({{\varvec{w}}}_i^k=({{\varvec{w}}}_{i(-k)}+\rho _ia_{ik}\mathbf{1 }_{n_i-1})(1 - \rho _i^2)^{-1/2}\), and \({{\varvec{R}}}_i^k\) is the matrix of the partial correlation coefficients for \({{\varvec{w}}}_i\). Using results from Dunnett and Sobel (1955), we reduce the two multiple integrals in (19) to one-dimensional integrals. Since the denominator of the fraction in (19) is written as
where \({{\varvec{W}}}=(W_1,\ldots ,W_{n_i})^\top \sim \mathcal{N}_{n_i}(\mathbf{0 },{{\varvec{R}}}_i)\) with \(({{\varvec{R}}}_i)_{qr}=\rho _i\in [0,1)\) for \(q\ne r\). Thus \(W_j\) can be represented as
where for \(j=0,1,\ldots ,n_i\), \(\xi _j\)’s are mutually independently distributed as \(\mathcal{N}(0,1)\). This transformation gives
which corresponds to (8).
Similarly, we see that the numerator in (19) is
where \({{\varvec{W}}}^k=(W_1^k,\ldots ,W_{k-1}^k,W_{k+1}^k,\ldots ,W_{n_i}^k)^\top \sim \mathcal{N}_{n_i-1}(\mathbf{0 },{{\varvec{R}}}_i^k)\) with \(({{\varvec{R}}}_i^k)_{qr}=\rho _i/(1+\rho _i)\) for \(q\ne r\). Thus, analogously to (20), \(W_l^k\) can be expressed as
where \(\xi _l\)’s are mutually independent standard normal variables again. Then it follows that
which gives (9). \(\square \)
Proof of Theorem 3.1
It suffices to transform the second term of (7) into the desired form. Using Lemma 3.1 and the fact that \(({{\varvec{R}}}_i)_{jk}=\rho _i\) for \(j\ne k\), we have
Since
the formula (21) reduces to
which gives the desired expression. \(\square \)
Proof of Lemma 3.2
The outline is the same as in the proof of Lemma 3.1. Using the results of Tallis (1961) and Theorem 3.1, we have
where \({{\varvec{a}}}_{i(-q,r)}\) and \({{\varvec{w}}}_{i(-q,r)}\) are respectively \((n_i-2)\)-dimensional vectors obtained by dropping the qth and rth elements of \({{\varvec{a}}}\) and \({{\varvec{w}}}\),
and \({{\varvec{R}}}_i^{qr}\) is the matrix of the second-order partial correlation coefficients for \({{\varvec{w}}}_i\). The \((n_i-2)\)-fold integral in (23) is written as
where \({{\varvec{W}}}^{qr}\sim \mathcal{N}_{n_i-2}(\mathbf{0 },{{\varvec{R}}}_i^{qr})\) with \(({{\varvec{R}}}_i^{qr})_{st}=\rho _i/(1+2\rho _i)\) for \(s\ne t\). Here the definition of \({{\varvec{W}}}^{qr}\) is analogous to \({{\varvec{W}}}^q\) in the proof of Lemma 3.1. Then using the similar method to (20) with \(\rho _i/(1+2\rho _i)\) instead of \(\rho _i\), we have
where \(\xi _s\)’s are mutually independent standard normal random variables. Then we have
which corresponds to (12). \(\square \)
Proof of Theorem 3.2
It follows from Lemmas 3.1 and 3.2 that
The conditional variance of \(\sum _{j=1}^{n_i}w_{ij}\) given \({{\varvec{y}}}_i\) is
where \(\nu _i({\varvec{\omega }},{{\varvec{y}}}_i)\) is defined as (14). Then it follows from (11), (22) and (24) that
which proves Theorem 3.2. \(\square \)
Proof of Theorem 4.1
First, we derive the desired properties for the parameters \(\big ({\varvec{\beta }}_{{\varepsilon }}^\top ,{\sigma }^2,\tau ^2,{\lambda }\big )\). In general, consider the case that two estimators \({{\hat{{\theta }}}}_1\) of \({\theta }_1\) and \({{\hat{{\theta }}}}_2\) of \({\theta }_2\) have the forms
where \(h_{1i}({{\varvec{y}}}_i)\) and \(h_{2i}({{\varvec{y}}}_i)\) (written as \(h_{1i}\) and \(h_{2i}\) for simplicity) are functions of \({{\varvec{y}}}_i\) such that \(h_{ki} = O_p(1),\,E[h_{ki}] = O(1)\) for \(k = 1, 2\). Since \({{\varvec{y}}}_i\)’s are mutually independent, it is shown that
which means that it is enough to obtain the required results for the unconditional expectations. Note that all the moments of a skew-normal distribution exist. Since the methods of estimating \({\varvec{\beta }}_{\varepsilon }\) and \(m_2\) are the same as those of the usual NER model, it follows from the results of Fuller and Battese (1973) that
The last formula comes from the unbiasedness of \({{\widehat{m}}}_2\).
Next we treat \({{\widehat{m}}}_3\). Let \({{\widehat{{\varvec{\beta }}}}}_2^{FE}\) be the OLS estimator obtained by regressing \(y_{ij}\) on \({\tilde{{{\varvec{z}}}}}_{ij}\). Then, \({{\widehat{{\varvec{\beta }}}}}_1^{FE}\) is written as \({{\widehat{{\varvec{\beta }}}}}_2^{FE} = {\varvec{\beta }}_2 + (\sum _{i=1}^m\sum _{j=1}^{n_i}{\tilde{{{\varvec{z}}}}}_{ij}{\tilde{{{\varvec{z}}}}}_{ij}^\top )^{-1} \sum _{i=1}^m \sum _{j=1}^{n_i} {\tilde{{{\varvec{z}}}}}_{ij}{\tilde{{\varepsilon }}}_{ij}\). It follows from (RC) and \({{\widehat{{\varvec{\beta }}}}}_2^{FE} - {\varvec{\beta }}_2 = O_p(m^{-1/2})\) that
Since \({\tilde{{\varepsilon }}}_{ij}\)’s are independent for different i and \(E[{\tilde{{\varepsilon }}}_{ij}] = 0\), the bias of \({{\widehat{m}}}_3\) is
which is of order \(O(m^{-1})\). From the fact that \({{\widehat{m}}}_3 = \eta _1^{-1}\sum _{i=1}^m\sum _{j=1}^{n_i}{\tilde{{\varepsilon }}}_{ij}^3 + O_p(m^{-1})\) and \(\eta _1^{-1}\sum _{i=1}^m\sum _{j=1}^{n_i}{\tilde{{\varepsilon }}}_{ij}^3 = O_p(m^{-1/2})\), it follows that
The expectation in (25) is bounded under the condition (RC) and the existence of up to the sixth moment of \({\varepsilon }_{ij}\), which leads to \(E[({{\widehat{m}}}_3 - m_3)^2] = O(m^{-1})\). Then we have \({{\widehat{m}}}_2 - m_2 = O_p(m^{-1/2})\) and \({{\widehat{m}}}_3 - m_3 = O_p(m^{-1/2})\). Also, \(E[({{\widehat{m}}}_2 - m_2)({{\widehat{m}}}_3 - m_3)]\) can be treated by Schwarz’s inequality as
The inverse transformation of \((m_2({\sigma }^2,{\lambda }), m_3({\sigma }^2,{\lambda }))\) is derived from (1) and (2) as
Since \({\lambda }\ne 0\), \({\delta }\ne 0\), or \(m_3 \ne 0\), it is easy to check these functions are three times continuously differentiable. Thus, using the Taylor series expansion we have
which, together with the results obtained up to this point, gives
Concerning the truncated estimator \({{\widehat{{\delta }}}}=\max (-1+1/m, \min ({{\widetilde{{\delta }}}}, 1-1/m))\), we consider the case of \(0<{\delta }<1\). For large m, we have \(1-1/m-{\delta }>0\). Then, \(\Pr ({{\widetilde{{\delta }}}}>1-1/m)=\Pr ({{\widetilde{{\delta }}}}-{\delta }> 1-1/m-{\delta })\le E[({{\widetilde{{\delta }}}}-{\delta })^2]/(1-1/m-{\delta })^2\), so that \(\Pr ({{\widetilde{{\delta }}}}>1-1/m)=O(m^{-1})\). This shows the consistency of \({{\widehat{{\delta }}}}\). Using the same arguments as below, we can show that \(E[{{\widehat{{\delta }}}}-{\delta }]=O(m^{-1})\) and \(E[({{\widehat{{\delta }}}}-{\delta })^2]=O(m^{-1})\). These results lead to the asymptotic properties of \({{\widehat{{\lambda }}}}\).
Concerning \({{\hat{\tau }}}^2\), we have \(E[({\tilde{\tau }}^2 - \tau ^2)^2] = O(m^{-1})\), because \((v_i,\, {\varepsilon }_{ij})\)’s are independent for different i and all the moments of \(v_i\) and \({\varepsilon }_{ij}\) exist. Following Prasad and Rao (1990),
which is of order \(O(m^{-1})\). Then we have
which is of order \(O(m^{-1})\). Also, it follows from \(E[{\tilde{\tau }}^2 - \tau ^2] = 0\) that
As for the first term,
which is of order \( O(m^{-1})\). Thus, together with (27), we obtain \(E[{{\hat{\tau }}}^2 - \tau ^2] = O(m^{-1})\).
We have derived the desired properties for the unconditional case, so that from the argument at the beginning of the proof, the statement of the theorem has been checked for \(\big ({\varvec{\beta }}_{\varepsilon }^\top ,{\sigma }^2,\tau ^2,{\lambda }\big )\). Lastly, we need to consider \({\beta }_0\) instead of \({\beta }_{0{\varepsilon }}\). Since \(\mu _{\varepsilon }= {\sigma }\sqrt{2/\pi }{\lambda }/\sqrt{1 + {\lambda }^2}\) is a three times continuously differentiable function of \(({\sigma }^2,{\lambda })\), it follows that
Using this expansion, the desired results can be easily obtained.
It remains to show the expectations of cross terms are \(O(m^{-1})\). Analogously to (26), this can be achieved by Schwarz’s inequality, and the proof is complete. \(\square \)
Proof of Proposition 4.1
From Theorem 4.1, \(g_{1i}({{\widehat{{\varvec{\omega }}}}}, {{\varvec{y}}}_i)\) can be expanded as \(g_{1i}({{\widehat{{\varvec{\omega }}}}},{{\varvec{y}}}_i) = g_{1i}({\varvec{\omega }},{{\varvec{y}}}_i) + G_{1i}({{\widehat{{\varvec{\omega }}}}},{\varvec{\omega }},{{\varvec{y}}}_i) + O_p(m^{-3/2})\) where
Thus we have \(E[g_{1i}({{\widehat{{\varvec{\omega }}}}},{{\varvec{y}}}_i) {\,|\,} {{\varvec{y}}}_i] = g_{1i}({\varvec{\omega }},{{\varvec{y}}}_i) + E[G_{1i}({{\widehat{{\varvec{\omega }}}}},{\varvec{\omega }},{{\varvec{y}}}_i) {\,|\,} {{\varvec{y}}}_i] + o_p(m^{-1})\). It follows from Theorem 4.1 that \(E[G_{1i}({{\widehat{{\varvec{\omega }}}}},{\varvec{\omega }},{{\varvec{y}}}_i){\,|\,}{{\varvec{y}}}_i] = O_p(m^{-1})\), so that applying the same arguments as in Butar and Lahiri (2003) shows \(E[{{\hat{g}}}_{1i} {\,|\,} {{\varvec{y}}}_i] = g_{1i}({\varvec{\omega }},{{\varvec{y}}}_i) + o_p(m^{-1})\). Also, using Theorem 4.1 again, it can be seen that \(E[{{\hat{g}}}_{i2} {\,|\,} {{\varvec{y}}}_i] = g_{2i}({\varvec{\omega }},{{\varvec{y}}}_i) + o_p(m^{-1})\). Then the proposition can be immediately obtained. \(\square \)
Rights and permissions
About this article
Cite this article
Tsujino, T., Kubokawa, T. Empirical Bayes methods in nested error regression models with skew-normal errors. Jpn J Stat Data Sci 2, 375–403 (2019). https://doi.org/10.1007/s42081-019-00038-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-019-00038-y