Large sample results for frequentist multiple imputation for Cox regression with missing covariate data

  • Frank ErikssonEmail author
  • Torben Martinussen
  • Søren Feodor Nielsen


Incomplete information on explanatory variables is commonly encountered in studies of possibly censored event times. A popular approach to deal with partially observed covariates is multiple imputation, where a number of completed data sets, that can be analyzed by standard complete data methods, are obtained by imputing missing values from an appropriate distribution. We show how the combination of multiple imputations from a compatible model with suitably estimated parameters and the usual Cox regression estimators leads to consistent and asymptotically Gaussian estimators of both the finite-dimensional regression parameter and the infinite-dimensional cumulative baseline hazard parameter. We also derive a consistent estimator of the covariance operator. Simulation studies and an application to a study on survival after treatment for liver cirrhosis show that the estimators perform well with moderate sample sizes and indicate that iterating the multiple-imputation estimator increases the precision.


Asymptotic distribution Coarsened data Semiparametric Survival Variance estimator 



  1. Andersen, P., Borgan, Ø., Gill, R., Keiding, N. (1992). Statistical models based on counting processes. New York: Springer.zbMATHGoogle Scholar
  2. Bartlett, J., Seaman, S., White, I., Carpenter, J., The Alzheimer’s Disease Neuroimaging Initiative. (2015). Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical Methods in Medical Research, 24, 462–487.Google Scholar
  3. Chen, H. Y. (2002). Double-semiparametric method for missing covariates in Cox regression models. Journal of the American Statistical Association, 97, 565–576.MathSciNetCrossRefzbMATHGoogle Scholar
  4. Chen, H. Y., Little, R. (1999). Proportional hazards regression with missing covariates. Journal of the American Statistical Association, 94, 896–908.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Cox, D. (1972). Regression models and life tables (with discussion). Journal of the Royal Statistical Society, Series B, 34, 187–220.MathSciNetzbMATHGoogle Scholar
  6. Dudley, R. (1984). A course on empirical processes, volume 1097 of Lecture Notes in Mathematics. Berlin: Springer.Google Scholar
  7. Herring, A., Ibrahim, J. (2001). Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association, 96, 292–302.MathSciNetCrossRefzbMATHGoogle Scholar
  8. Jacobsen, M., Keiding, N. (1995). Coarsening at random in general sample spaces and random censoring in continuous time. The Annals of Statistics, 23(3), 774–786.MathSciNetCrossRefzbMATHGoogle Scholar
  9. Kosorok, M. (2008). Introduction to empirical processes and semiparametric inference. New York: Springer.CrossRefzbMATHGoogle Scholar
  10. Martinussen, T. (1999). Cox regression with incomplete covariate measurements using the EM-algorithm. Scandinavian Journal of Statistics, 26, 479–491.MathSciNetCrossRefzbMATHGoogle Scholar
  11. Nielsen, S. F. (2003). Proper and improper multiple imputation. International Statistical Review, 71(3), 593–607.CrossRefzbMATHGoogle Scholar
  12. Pugh, M., Robins, J., Lipsitz, S., Harrington, D. (1993). Inference in the Cox proportional hazards model with missing covariate data. Technical Report, Department of Biostatistics, Harvard School of Public Health.Google Scholar
  13. Qi, L., Wang, C., Prentice, R. (2005). Weighted estimators for proportional hazards regression with missing covariates. Journal of the American Statistical Association, 100, 1250–1263.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Robins, J., Wang, N. (2000). Inference for imputation estimators. Biometrika, 87, 113–124.MathSciNetCrossRefzbMATHGoogle Scholar
  15. Rubin, D. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434), 473–489.CrossRefzbMATHGoogle Scholar
  16. Schenker, N., Welsh, A. H. (1988). Asymptotic results for multiple imputation. The Annals of Statistics, 16(4), 1550–1566.MathSciNetCrossRefzbMATHGoogle Scholar
  17. Schlichting, P., Christensen, E., Andersen, P., Fauerholdt, L., Juhl, E., Poulsen, H., Tygstrup, N. (1983). Prognostic factors in cirrhosis identified by Cox’s regression model. Hepatology, 3, 889–895.CrossRefGoogle Scholar
  18. Sterne, J., White, I., Carlin, J., Spratt, M., Royston, P., Kenward, M., Wood, A., Carpenter, J. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. British Medical Journal, 339, b2393.CrossRefGoogle Scholar
  19. Tsiatis, A. (2006). Semiparametric theory and missing data. New York: Springer.zbMATHGoogle Scholar
  20. van der Vaart, A. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.CrossRefzbMATHGoogle Scholar
  21. van der Vaart, A., Wellner, J. (1996). Weak convergence and empirical processes. With applications to statistics. New York: Springer.CrossRefzbMATHGoogle Scholar
  22. Wang, N., Robins, J. (1998). Large-sample theory for parametric multiple imputation procedures. Biometrika, 85, 935–948.MathSciNetCrossRefzbMATHGoogle Scholar
  23. White, I., Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine, 28, 1982–1998.MathSciNetCrossRefGoogle Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2019

Authors and Affiliations

  • Frank Eriksson
    • 1
    Email author
  • Torben Martinussen
    • 1
  • Søren Feodor Nielsen
    • 2
  1. 1.Section of Biostatistics, Department of Public HealthUniversity of CopenhagenCopenhagenDenmark
  2. 2.Center for Statistics, Department of FinanceCopenhagen Business SchoolFrederiksbergDenmark

Personalised recommendations