Semiparametric likelihoods for regression models with missing at random data (Chen in J Am Stat Assoc 99:1176–1189, 2004, Zhang and Rockette in J Stat Comput Simul 77(2):163–173, 2007, Zhao et al. in Biom J 51: 123–136, 2009, Zhao in Commun Stat Theory Methods 38:3736–3744, 2009) are robust as they use nonparametric models for covariate distributions and do not require modeling the missing data probabilities. Furthermore, the EM algorithms based on the semiparametric likelihoods have closed form expressions for both E-step and M-step. As far as we know the semiparametric likelihoods can only deal with the simple monotone missing data pattern. In this research we extend the semiparemetric likelihood approach to deal with regression models with arbitrary nonmonotone missing at random data. We propose a pseudo-likelihood model, which uses an empirical distribution to model the conditional distribution of missing covariates given observed covariates for each missing data pattern separately. We show that an EM algorithm with closed form updating formulas can be used for computing maximum pseudo-likelihood estimates for regression models with nonmonotone missing data. We then propose estimating the asymptotic variance of the maximum pseudo-likelihood estimator through a profile log likelihood and the EM algorithm. We examine the finite sample performance of the new methods in simulation studies and further illustrate the methods in a real data example investigating high risk gambling behavior and the associated factors.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Chatterjee N, Chen Y, Breslow NE (2003) A pseudo-score estimator for regression problems with two-phase sampling. J Am Stat Assoc 98:158–168
Chen HY (2004) Nonparametric and semiparametric models for missing covariates in parametric regression. J Am Stat Assoc 99:1176–1189
Chen HY, Xie H, Qian Y (2011) Multiple imputation for missing values through conditional semiparametric odds ratio models. Biometrics 67:799–809
Gao X, Song PXK (2011) Composite likelihood EM algorithm with applications to multivariate hidden Markov model. Stat Sin 21:165–185
Huang Y (2009) Statistical analysis of gambling behaviors. Thesis at the University of Regina
Ibrahim JG (1990) Incomplete data in generalized linear models. J Am Stat Assoc 85:765–769
Ibrahim JG, Weisberg S (1992) Incomplete data in generalized linear models with continuous covariates. Aust N Z J Stat 34:461–470
Ibrahim JG, Chen MH, Lipsitz SR (1999) Monte Carlo EM for missing covariates in prametric regression models. Biometrics 55:591–596
Ibrahim JG, Chen MH, Lipsitz SR, Herring AH (2005) Missing-data methods for generalized linear models: a comparative review. J Am Stat Assoc 100:332–346
Lawless JF, Kalbfleisch JD, Wild CJ (1999) Semiparametric methods for response-selective and missing data problems in regression. J R Stat Soc B 61(2):413–438
Lindsay B (1988) Composite likelihood methods. Contemp Math 80:220–239
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York
Little RJA, Schludhter MD (1985) Maximum likelihood estimation for mixed continuous and categorical data with missing values. Biometrika 72(3):497–512
Murphy SA, van der Vaart AW (2000) On profile likelihood. J Am Stat Assoc 95:449–465
Rubin DB (1976) Inference and missing data. Biometrika 63(3):581–592
Sinha S, Saha KK, Wang S (2014) Semiparametric approach for non-monotone missing covariates in a parametric regression model. Biometrics 70(2):299–311
Sun B, Tchetgen EJT (2018) On inverse probability weighting for nonmonotone missing at random data. J Am Stat Assoc 113:369–379
Varin C, Reid N, Firth D (2011) An overview of composite likelihood methods. Stat Sin 21:5–42
Zhang Z, Rockette HE (2007) An EM algorithm for regression analysis with incomplete covariate information. J Stat Comput Simul 77(2):163–173
Zhao Y (2009) Regression analysis with covariates missing at random: a piece-wise nonparametric model for missing covariates. Commun Stat Theory Methods 38:3736–3744
Zhao Y, Joe H (2005) Composite likelihood estimation in multivariate data analysis. Can J Stat 33:335–356
Zhao LP, Lipsitz S (1992) Designs and analysis of two-stage studies. Stat Med 11:769–782
Zhao Y, Lawless JF, McLeish DL (2009) Likelihood methods for regression models with expensive variables missing by design. Biom J 51:123–136
We thank the editors and the two anonymous reviewers for their helpful comments and suggestions. This research was partially supported by Grant from the Natural Sciences and Engineering Research Council of Canada (YZ).
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Zhao, Y. Semiparametric model for regression analysis with nonmonotone missing data. Stat Methods Appl (2020). https://doi.org/10.1007/s10260-020-00530-w
- EM algorithm
- Nonmonotone missing data patterns
- Profile log likelihood
- Semiparametric likelihood