Rank-Based Empirical Likelihood for Regression Models with Responses Missing at Random

Bindele, Huybrechts F.; Zhao, Yichuan

doi:10.1007/978-3-319-99389-8_3

Huybrechts F. Bindele⁵ &
Yichuan Zhao⁶

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

1069 Accesses

Abstract

In this paper, a general regression model with responses missing at random is considered. From an imputed rank-based objective function, a rank-based estimator is derived and its asymptotic distribution is established under mild conditions. Inference based on the normal approximation approach results in under coverage or over coverage issues. In order to address these issues, we propose an empirical likelihood approach based on the rank-based objective function, from which its asymptotic distribution is established. Extensive Monte Carlo simulation experiments under different settings of error distributions with different response probabilities are considered. The simulation results show that the proposed approach has better performance for the regression parameters compared to the normal approximation approach and its least-squares counterpart. Finally, a data example is provided to illustrate our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bindele, H. F. (2015). The signed-rank estimator for nonlinear regression with responses missing at random. Electronic Journal of Statistics, 9(1), 1424–1448.
Article MathSciNet MATH Google Scholar
Bindele, H. F., & Abebe, A. (2012). Bounded influence nonlinear signed-rank regression. Canadian Journal of Statistics, 40(1), 172–189.
Article MathSciNet MATH Google Scholar
Bindele, H. F., & Abebe, A. (2015). Semi-parametric rank regression with missing responses. Journal of Multivariate Analysis, 142, 117–132.
Article MathSciNet MATH Google Scholar
Bindele, H. F., & Zhao, Y. (2016). Signed-rank regression inference via empirical likelihood. Journal of Statistical Computation and Simulation, 86(4), 729–739.
Article MathSciNet Google Scholar
Bindele, H. F., & Zhao, Y. (in press). Rank-based estimating equation with non-ignorable missing responses via empirical likelihood. Statistica Sinica.
Google Scholar
Bindele, H. F. A. (2017). Strong consistency of the general rank estimator. Communications in Statistics - Theory and Methods, 46(2), 532–539.
Article MathSciNet MATH Google Scholar
Brunner, E., & Denker, M. (1994). Rank statistics under dependent observations and applications to factorial designs. Journal of Statistical Planning and Inference, 42(3), 353–378.
Article MathSciNet MATH Google Scholar
Chen, S. X., Peng, L., & Qin, Y.-L. (2009). Effects of data dimension on empirical likelihood. Biometrika, 96(3), 711–722.
Article MathSciNet MATH Google Scholar
Cheng, P. E. (1994). Nonparametric estimation of mean functionals with data missing at random. Journal of the American Statistical Association, 89(425), 81–87.
Article MATH Google Scholar
Delecroix, M., Hristache, M., & Patilea, V. (2006). On semiparametric estimation in single-index regression. Journal of Statistical Planning and Inference, 136(3), 730–769.
Article MathSciNet MATH Google Scholar
Einmahl, U., & Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function estimators. The Annals of Statistics, 33(3), 1380–1403.
Article MathSciNet MATH Google Scholar
Gong, Y., Peng, L., & Qi, Y. (2010). Smoothed jackknife empirical likelihood method for ROC curve. Journal of Multivariate Analysis, 101(6), 1520–1531.
Article MathSciNet MATH Google Scholar
Healy, M., & Westmacott, M. (1956). Missing values in experiments analysed on automatic computers. Journal of the Royal Statistical Society. Series C (Applied Statistics), 5(3), 203–206.
Google Scholar
Hettmansperger, T. P., & McKean, J. W. (2011). Robust Nonparametric Statistical Methods. Monographs on Statistics and Applied Probability (Vol. 119, 2nd ed.). Boca Raton, FL: CRC Press.
Google Scholar
Hjort, N. L., McKeague, I. W., & Van Keilegom, I. (2009). Extending the scope of empirical likelihood. The Annals of Statistics, 37(3), 1079–1111.
Article MathSciNet MATH Google Scholar
Jaeckel, L. A. (1972). Estimating regression coefficients by minimizing the dispersion of the residuals. Annals of Mathematical Statistics, 43, 1449–1458.
Article MathSciNet MATH Google Scholar
Jing, B.-Y., Yuan, J., & Zhou, W. (2009). Jackknife empirical likelihood. Journal of the American Statistical Association, 104(487), 1224–1232.
Article MathSciNet MATH Google Scholar
Lahiri, S. N., & Mukhopadhyay, S. (2012). A penalized empirical likelihood method in high dimensions. The Annals of Statistics, 40(5), 2511–2540.
Article MathSciNet MATH Google Scholar
Little, R. J. (1992). Regression with missing x’s: A review. Journal of the American Statistical Association, 87(420), 1227–1237.
Google Scholar
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. New York: Wiley.
Google Scholar
Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data. Wiley Series in Probability and Statistics (2nd ed.). Hoboken, NJ: Wiley-Interscience [John Wiley & Sons].
Google Scholar
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single functional. Biometrika, 75(2), 237–249.
Article MathSciNet MATH Google Scholar
Owen, A. B. (1990). Empirical likelihood ratio confidence regions. The Annals of Statistics, 18(1), 90–120.
Article MathSciNet MATH Google Scholar
Owen, A. B. (2001). Empirical Likelihood. Chapman & Hall/CRC Monographs on Statistics & Applied Probability. Boca Raton, FL: CRC Press.
Google Scholar
Qin, J., & Lawless, J. (1994). Empirical likelihood and general estimating equations. The Annals of Statistics, 22(1), 300–325.
Article MathSciNet MATH Google Scholar
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592.
Article MathSciNet MATH Google Scholar
Rubin, D. B. (2004). Multiple imputation for nonresponse in surveys. Wiley Classics Library (Vol. 81). Hoboken, NJ: Wiley.
Google Scholar
Silverman, B. W. (1986). Density estimation for statistics and data analysis (Vol. 26). Boca Raton, FL: CRC Press.
Book MATH Google Scholar
Tang, C. Y., & Leng, C. (2010). Penalized high-dimensional empirical likelihood. Biometrika, 97(4), 905–920.
Article MathSciNet MATH Google Scholar
Van Buuren, S. (2012). Flexible imputation of missing data. Chapman & Hall/CRC Interdisciplinary Statistics. Boca Raton, FL: Taylor & Francis.
Google Scholar
Wang, C. Y., Wang, S., Zhao, L.-P., & Ou, S.-T. (1997). Weighted semiparametric estimation in regression analysis with missing covariate data. Journal of the American Statistical Association, 92(438), 512–525.
Article MathSciNet MATH Google Scholar
Wang, Q., Linton, O., & Härdle, W. (2004). Semiparametric regression analysis with missing response at random. Journal of the American Statistical Association, 99(466), 334–345.
Article MathSciNet MATH Google Scholar
Wang, Q., & Rao, J. N. K. (2002). Empirical likelihood-based inference under imputation for missing response data. The Annals of Statistics, 30(3), 896–924.
Article MathSciNet MATH Google Scholar
Wang, Q., & Sun, Z. (2007). Estimation in partially linear models with missing responses at random. Journal of Multivariate Analysis, 98(7), 1470–1493.
Article MathSciNet MATH Google Scholar
Yang, H., & Zhao, Y. (2012). New empirical likelihood inference for linear transformation models. Journal of Statistical Planning and Inference, 142(7), 1659–1668.
Article MathSciNet MATH Google Scholar
Yu, W., Sun, Y., & Zheng, M. (2011). Empirical likelihood method for linear transformation models. Annals of the Institute of Statistical Mathematics, 63(2), 331–346.
Article MathSciNet MATH Google Scholar
Zhang, Z., & Zhao, Y. (2013). Empirical likelihood for linear transformation models with interval-censored failure time data. Journal of Multivariate Analysis, 116, 398–409.
Article MathSciNet MATH Google Scholar
Zhao, L. P., Lipsitz, S., & Lew, D. (1996). Regression analysis with missing covariate data using estimating equations. Biometrics, 52(4), 1165–1182.
Article MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the two reviewers for their helpful comments. The research of Yichuan Zhao is supported by the National Security Agency grant.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of South Alabama, Mobile, AL, USA
Huybrechts F. Bindele
Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, USA
Yichuan Zhao

Authors

Huybrechts F. Bindele
View author publications
You can also search for this author in PubMed Google Scholar
Yichuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huybrechts F. Bindele .

Editor information

Editors and Affiliations

Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, USA
Yichuan Zhao
Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Ding-Geng Chen

Appendix

This Appendix contains assumptions used in the development of theoretical results as well as the proof of the main results.

1.1 Assumptions

(I ₁ ):: φ is a nondecreasing, bounded, and twice continuously differentiable score function with bounded derivatives, defined on (0, 1), and, satisfying:
$$\displaystyle \begin{aligned}\quad \int_{0}^{1}\varphi(u)du=0\quad and\quad \int_{0}^{1}\varphi^{2}(u)du=1.\end{aligned}$$
(I ₂ ):: g(⋅) being a function of two variables x and β, it is required that g has continuous derivatives with respect to β that are bounded up to order 3 by p-integrable functions of x, independent of β, p ≥ 1.
(I ₃ ):: K(⋅) is a regular kernel of order r > 2, with window b _n satisfying $\displaystyle {nb_{n}^{4r}\to 0}$, $C(\log {n}/n)^{\gamma }<b_{n}<h_{n}$, for any C > 0, γ = 1 − 2∕p, p > 2 and h _n is a bandwidth such that $C(\log {n}/n)^{\gamma }<h_{n}<1$ with h _n → 0 as n →∞.
(I ₄ ):: sup_x E[|Y |^p|x = x] < ∞, for p ≥ 1 and inf_x Δ(x) > 0.
(I ₅ ):: For fixed n, ${\boldsymbol \beta }_{0,n}\in Int(\mathcal {B})$ is the unique minimizer of E[D _n(β)] such that lim_n→∞ β _0,n = β ₀
(I ₆ ):: The model error has a distribution with a finite Fisher information.
(I ₆ ):: $Var(\sqrt {n}S_{n}^{j}({\boldsymbol \beta }_0))\to \varSigma _{{\boldsymbol \beta }_{0}}^{j}$, where $\varSigma _{{\boldsymbol \beta }_{0}}^{j}$ is positive definite.
(I ₇ ):: Set H = B(B ^τ B)⁻¹ B ^τ, where B = ∇_β g(x, β ₀). H is the projection matrix onto the column space of B, which in this case represents the tangent space generated by B and let h _iin, i = 1, ⋯ , n, be the leverage values that stand for the diagonal entries of H. We assume that lim_n→∞max_1≤i≤n h _iin = 0

Assumptions (I ₂) − (I ₄) are necessary and sufficient to ensure the strong consistency of $\widehat {\pi }({\mathbf x})$ used in the imputation process. On the other hand, assumptions (I ₁), (I ₅) − (I ₇) together with the previous assumptions are necessary to establish the asymptotic properties (consistency and asymptotic normality distribution) of the rank-based estimators of β ₀. An elaborate discussion about these assumptions can be found in Hettmansperger and McKean (2011), Bindele and Abebe (2012), and Bindele (2017).

By definition of $S_{n}^{j}({\boldsymbol \beta })$, $\widehat {{\boldsymbol \beta }}_{n}^{j}$ is solution to the equation $S_{n}^{j}({\boldsymbol \beta })=\mathbf {0}$. As in Brunner and Denker (1994), assume without loss of generality that ∥λ _i∥ = 1 and define

$$\displaystyle \begin{aligned} J_{jn}(s) &=\frac{1}{n}\sum_{i=1}^{n}F_{ij}(s),\ \ \hat{J}_{jn}(s)=\frac{1}{n}\sum_{i=1}^{n}I(\nu_{ij}({\boldsymbol \beta}_{0})\leq s),\ \ F_{jn}(s)=\frac{1}{n}\sum_{i=1}^{n}{\boldsymbol \lambda}_{i}F_{ij}(s)\\ \hat{F}_{jn}(s)&=\frac{1}{n}\sum_{i=1}^{n}{\boldsymbol \lambda}_{i}I(\nu_{ij}({\boldsymbol \beta}_{0})\leq s),\quad T_{n}^{j}({\boldsymbol \beta}_{0})=S_{n}^{j}({\boldsymbol \beta}_{0})-E\big[S_{n}^{j}({\boldsymbol \beta}_{0}) \big]. \end{aligned} $$

The following lemma due to Brunner and Denker (1994) is a key for establishing asymptotic normality of the rank gradient function for dependent data.

Lemma 3.1

Let ς _jn be the minimum eigenvalue of W _jn = V ar(U _jn) with U _jn given by

$$\displaystyle \begin{aligned} U_{jn}=\int\varphi(J_{jn}(s))(\hat{F}_{jn}-F_{jn})(ds)+\int\varphi'(J_{jn}(s))(\hat{J}_{jn}(s)-J_{jn}(s))F_{jn}(ds)\;. \end{aligned}$$

Suppose that ς _jn ≥ Cn ^a for some constants $C, a \in \mathbb {R}$ and m(n) is such that M ₀ n ^γ ≤ m(n) ≤ M ₁ n ^γ for some constants 0 < M ₀ ≤ M ₁ < ∞ and 0 < γ < (a + 1)∕2. Then $\displaystyle {m(n){\mathbf W}_{jn}^{-1}T_{n}^{j}({\boldsymbol \beta }_{0})}$ is asymptotically standard multivariate normal, provided φ is twice continuously differentiable with bounded second derivative.

We provide a sketch of the proof of this lemma. A detailed proof can be found in Brunner and Denker (1994).

Proof

Set

$$\displaystyle \begin{aligned}B_{jn}=-\int(\hat{F}_{jn}-F_{jn})d\varphi(J_{jn})+\int(\hat{J}_{jn}-J_{jn})\frac{dF_{jn}}{dJ_{jn}}d\varphi(J_{jn}).\end{aligned}$$

Brunner and Denker (1994) showed that W _jn = n ² V ar(B _jn), as U _n = nB _n. From its definition, $S_{n}^{j}({\boldsymbol \beta })$ can be rewritten as

$$\displaystyle \begin{aligned}S_{n}^{j}({\boldsymbol \beta}_{0})=\frac{1}{n}\sum_{i=1}^{n}{\boldsymbol \lambda}_{i}\varphi\Big(\frac{R(\nu_{ij}(\beta_{0}))}{n+1}\Big)=\int\varphi\Big(\frac{n}{n+1}\hat{J}_{jn}\Big)dF_{jn}.\end{aligned}$$

By (I ₅), since ${\boldsymbol \beta }_{0}=\displaystyle \lim _{n\to \infty } \operatorname *{\mathrm {Argmin}}_{{\boldsymbol \beta }\in \mathcal {B}}E\{D_{n}^{j}({\boldsymbol \beta })\}$, we have $E\{S_{n}^{j}({\boldsymbol \beta }_{0})\}\to \mathbf {0}$ as n →∞. From the fact that Var(ε|x) > 0, there exists a positive constant C such that ς _jn ≥ Cn ² which satisfies the assumptions of Lemma 3.1, as φ is twice continuously differentiable with bounded derivatives, γ < (a + 1)∕2 with a = 2, M ₀ = M ₁ = 1, γ = 1, and m(n) = n. Thus, for n large enough, $\displaystyle {n{\mathbf W}_{jn}^{-1}T_{n}^{j}({\boldsymbol \beta }_{0})\approx n{\mathbf W}_{jn}^{-1}S_{n}^{j}({\boldsymbol \beta }_{0})}$, which converges to a multivariate standard normal, by Lemma 3.1. A direct application of Slutsky’s Lemma and putting Σ _jn = n ^−1∕2 W _jn give $\displaystyle \sqrt {n}S_{n}^{j}({\boldsymbol \beta }_{0}) \ \xrightarrow {\mathcal {D}} \ N_{p}(\mathbf {0}, \varSigma _{{\boldsymbol \beta }_{0}}^{j})$, j = 1, 2, where $ \varSigma _{{\boldsymbol \beta }_{0}}^{j}=\displaystyle {\lim _{n\to \infty }\varSigma _{jn}\varSigma _{jn}^{\tau }}$.

Proof of Theorem 3.2. Let C be an arbitrary positive constant. Recall from Eq. (3.5) that the log likelihood ratio of β ₀ is given by

$$\displaystyle \begin{aligned}-2\log R_{n}^{j}({\boldsymbol \beta}_{0})=-2\log\prod_{i=1}^{n}\big(1+{\boldsymbol \xi}^{\tau}\eta_{ij}({\boldsymbol \beta}_{0})\big)^{-1}=2\sum_{i=1}^{n}\log\big(1+{\boldsymbol \xi}^{\tau}\eta_{ij}({\boldsymbol \beta}_{0})\big).\end{aligned} $$

Under (I ₁) and (I ₂), there exist a positive constant M and a function h ∈ L ^p, p ≥ 1 such that |φ(t)|≤ M for all t ∈ (0, 1), and ∥∇_β g(x _i, β ₀)∥≤ h(x _i), where ∥⋅∥ stands for the L ²- norm. From this, max_1≤i≤n∥∇_β g(x _i, β ₀)∥ = o _p(n ^1∕2) since E(|h(x _i)|^P) < ∞, p ≥ 1. Also, since $\varSigma _{jn}\varSigma _{jn}^{\tau }\to \varSigma _{{\boldsymbol \beta }_{0}}^{j}\;\; a.s.$, Σ _jn is almost surely bounded. Thus, ∥η _ij(β ₀)∥≤ M ×max_1≤i≤n h(x _i), which implies that

$$\displaystyle \begin{aligned} \max_{1\leq i\leq n}\|\eta_{ij}({\boldsymbol \beta}_{0})\|=o_{p}(n^{1/2})\quad and \quad \frac{1}{n}\sum_{i=1}^{n}\|\eta_{ij}({\boldsymbol \beta}_{0})\|{}^{3}=o_{p}(n^{1/2}).\end{aligned} $$

(3.6)

Moreover, ${\boldsymbol \varLambda }_{nj}=Var(\sqrt {n}S_{n}^{j}({\boldsymbol \beta }_{0}))=n^{-1}\sum _{i=1}^{n}\eta _{ij}({\boldsymbol \beta }_{0})\eta _{ij}^{\tau }({\boldsymbol \beta }_{0})=\varSigma _{{\boldsymbol \beta }_{0}}^{j}+o_{p}(1)$ by assumption (I ₆), from which $\varSigma _{{\boldsymbol \beta }_{0}}^{j}$ is assumed to be positive definite. Hence, following the proof of Lemma 3.1, we have $\varSigma _{jn}\varSigma _{jn}^{\tau }-{\boldsymbol \varLambda }_{nj}\to {0}\; a.s.$ Since $\sqrt {n}S_{n}^{j}({\boldsymbol \beta }_{0})\xrightarrow {\mathcal {D}}N(0,\varSigma _{{\boldsymbol \beta }_{0}}^{j})$, we have $\|S_{n}^{j}({\boldsymbol \beta }_{0})\|=O_{p}(n^{-1/2})$. Now from Eq. (3.4), using similar arguments as those in Owen (1990), it is obtained that ∥ξ∥ = O _p(n ^−1∕2). On the other hand, performing a Taylor expansion to the right-hand side of Eq. (3.5) results in

$$\displaystyle \begin{aligned}-2\log R_{n}^{j}({\boldsymbol \beta}_{0})=2\sum_{i=1}^{n}\Big[{\boldsymbol \xi}^{\tau}\eta_{ij}({\boldsymbol \beta}_{0})-\frac{1}{2}\big({\boldsymbol \xi}^{\tau}\eta_{ij}({\boldsymbol \beta}_{0})\big)^2\Big]+{\boldsymbol \gamma}_{n},\end{aligned} $$

where $\displaystyle {{\boldsymbol \gamma }_{n}=O_{P}(1)\sum _{i=1}^{n}|{\boldsymbol \xi }^{\tau }\eta _{ij}({\boldsymbol \beta }_{0})|{ }^{3}}$. Now, using similar arguments as in Owen (2001), we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} -2\log R_{n}^{j}({\boldsymbol \beta}_{0})&\displaystyle =&\displaystyle \sum_{i=1}^{n}{\boldsymbol \xi}^{\tau}\eta_{ij}({\boldsymbol \beta}_{0})+o_{P}(1)\\ &\displaystyle =&\displaystyle \Big(\frac{1}{n}\sum_{i=1}^{n}\eta_{ij}({\boldsymbol \beta}_{0})\Big)^{\tau}(n{\boldsymbol \varLambda}_{nj})^{-1}\Big(\frac{1}{n}\sum_{i=1}^{n}\eta_{ij}({\boldsymbol \beta}_{0})\Big)+o_{p}(1)\\ &\displaystyle =&\displaystyle \Big(\sqrt{n}{\boldsymbol \varLambda}^{-1/2}_{nj}S_{n}^{j}({\boldsymbol \beta}_{0})\Big)^{\tau}\Big(\sqrt{n}{\boldsymbol \varLambda}^{-1/2}_{nj}S_{n}^{j}({\boldsymbol \beta}_{0})\Big)+ o_{p}(1). \end{array} \end{aligned} $$

Using Slutsky’s lemma, we have $\displaystyle {\sqrt {n}{\boldsymbol \varLambda }_{nj}^{-1/2}S_{n}^{j}({\boldsymbol \beta }_{0})\xrightarrow {\mathcal {D}} N_{p}(0, I_{p})}$ as n →∞, and therefore,

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bindele, H.F., Zhao, Y. (2018). Rank-Based Empirical Likelihood for Regression Models with Responses Missing at Random. In: Zhao, Y., Chen, DG. (eds) New Frontiers of Biostatistics and Bioinformatics. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-99389-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-99389-8_3
Published: 06 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99388-1
Online ISBN: 978-3-319-99389-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Rank-Based Empirical Likelihood for Regression Models with Responses Missing at Random

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Assumptions

Lemma 3.1

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation