Large sample results for frequentist multiple imputation for Cox regression with missing covariate data


Incomplete information on explanatory variables is commonly encountered in studies of possibly censored event times. A popular approach to deal with partially observed covariates is multiple imputation, where a number of completed data sets, that can be analyzed by standard complete data methods, are obtained by imputing missing values from an appropriate distribution. We show how the combination of multiple imputations from a compatible model with suitably estimated parameters and the usual Cox regression estimators leads to consistent and asymptotically Gaussian estimators of both the finite-dimensional regression parameter and the infinite-dimensional cumulative baseline hazard parameter. We also derive a consistent estimator of the covariance operator. Simulation studies and an application to a study on survival after treatment for liver cirrhosis show that the estimators perform well with moderate sample sizes and indicate that iterating the multiple-imputation estimator increases the precision.

This is a preview of subscription content, log in to check access.


  1. Andersen, P., Borgan, Ø., Gill, R., Keiding, N. (1992). Statistical models based on counting processes. New York: Springer.

    Google Scholar 

  2. Bartlett, J., Seaman, S., White, I., Carpenter, J., The Alzheimer’s Disease Neuroimaging Initiative. (2015). Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical Methods in Medical Research, 24, 462–487.

  3. Chen, H. Y. (2002). Double-semiparametric method for missing covariates in Cox regression models. Journal of the American Statistical Association, 97, 565–576.

    MathSciNet  Article  Google Scholar 

  4. Chen, H. Y., Little, R. (1999). Proportional hazards regression with missing covariates. Journal of the American Statistical Association, 94, 896–908.

    MathSciNet  Article  Google Scholar 

  5. Cox, D. (1972). Regression models and life tables (with discussion). Journal of the Royal Statistical Society, Series B, 34, 187–220.

    MathSciNet  MATH  Google Scholar 

  6. Dudley, R. (1984). A course on empirical processes, volume 1097 of Lecture Notes in Mathematics. Berlin: Springer.

  7. Herring, A., Ibrahim, J. (2001). Likelihood-based methods for missing covariates in the Cox proportional hazards model. Journal of the American Statistical Association, 96, 292–302.

    MathSciNet  Article  Google Scholar 

  8. Jacobsen, M., Keiding, N. (1995). Coarsening at random in general sample spaces and random censoring in continuous time. The Annals of Statistics, 23(3), 774–786.

    MathSciNet  Article  Google Scholar 

  9. Kosorok, M. (2008). Introduction to empirical processes and semiparametric inference. New York: Springer.

    Google Scholar 

  10. Martinussen, T. (1999). Cox regression with incomplete covariate measurements using the EM-algorithm. Scandinavian Journal of Statistics, 26, 479–491.

    MathSciNet  Article  Google Scholar 

  11. Nielsen, S. F. (2003). Proper and improper multiple imputation. International Statistical Review, 71(3), 593–607.

    Article  Google Scholar 

  12. Pugh, M., Robins, J., Lipsitz, S., Harrington, D. (1993). Inference in the Cox proportional hazards model with missing covariate data. Technical Report, Department of Biostatistics, Harvard School of Public Health.

  13. Qi, L., Wang, C., Prentice, R. (2005). Weighted estimators for proportional hazards regression with missing covariates. Journal of the American Statistical Association, 100, 1250–1263.

    MathSciNet  Article  Google Scholar 

  14. Robins, J., Wang, N. (2000). Inference for imputation estimators. Biometrika, 87, 113–124.

    MathSciNet  Article  Google Scholar 

  15. Rubin, D. (1996). Multiple imputation after 18+ years. Journal of the American Statistical Association, 91(434), 473–489.

    Article  Google Scholar 

  16. Schenker, N., Welsh, A. H. (1988). Asymptotic results for multiple imputation. The Annals of Statistics, 16(4), 1550–1566.

    MathSciNet  Article  Google Scholar 

  17. Schlichting, P., Christensen, E., Andersen, P., Fauerholdt, L., Juhl, E., Poulsen, H., Tygstrup, N. (1983). Prognostic factors in cirrhosis identified by Cox’s regression model. Hepatology, 3, 889–895.

    Article  Google Scholar 

  18. Sterne, J., White, I., Carlin, J., Spratt, M., Royston, P., Kenward, M., Wood, A., Carpenter, J. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. British Medical Journal, 339, b2393.

    Article  Google Scholar 

  19. Tsiatis, A. (2006). Semiparametric theory and missing data. New York: Springer.

    Google Scholar 

  20. van der Vaart, A. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.

    Google Scholar 

  21. van der Vaart, A., Wellner, J. (1996). Weak convergence and empirical processes. With applications to statistics. New York: Springer.

    Google Scholar 

  22. Wang, N., Robins, J. (1998). Large-sample theory for parametric multiple imputation procedures. Biometrika, 85, 935–948.

    MathSciNet  Article  Google Scholar 

  23. White, I., Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine, 28, 1982–1998.

    MathSciNet  Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Frank Eriksson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.




Assumption 1

Assume that \((\beta _0,\theta _0)\in {\mathcal {B}}\times \Theta \) for known compact sets \({\mathcal {B}}\subset {\mathbb {R}}^p\) and \(\Theta \subset {\mathbb {R}}^q\), and that \(A_0(t)\) is strictly increasing and continuously differentiable and that \(A_0(0)=0\).

Assumption 2

The covariates X are bounded almost surely.

Assumption 3

Data are missing at random, \(pr({\mathcal {C}}=r|Z=z)=pr\{{\mathcal {C}}=r|G_{{\mathcal {C}}}(Z)=G_{r}(z)\}\).

Assumption 4

The full-data information matrix, \(I^F\), for \(\beta \) at the true parameter value is invertible.

Assumption 5

There is a finite maximum follow-up time \(\tau >0\), when all individuals still at risk are censored, and \(pr\{Y(\tau )=1\}=pr(T=\tau )>0\).

Assumption 6

The censoring distribution does not depend on \(\phi _0\) and potentially missing covariates, \( \alpha _{U}(t|x)^{1-\delta }pr(U>t|x)=\alpha _{U}\{t|G_{X,r}(x)\}^{1-\delta }pr\{U>t|G_{X,r}(x)\}\).

Assumption 7

There exists a consistent (but possibly inefficient) asymptotically linear estimator \({\hat{\phi }^{I}}=\{{\hat{\beta }}^{I},{\hat{A}}^I(t),{\hat{\theta }}^{I}\}\) such that \(n^{1/2}({\hat{\phi }^{I}}-\phi _0)(t)=n^{-1/2}\sum _{i=1}^n q\{\mathcal C_i,G_{{\mathcal {C}}_i}(Z_i)\}(t)+o_P(1)\), where \(q\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}(t)\) are independent processes, converges weakly to a tight Gaussian process in \({\mathbb {R}}^{p}\times \ell ^{\infty }[0,\tau ]\times {\mathbb {R}}^{q}\). Further, we assume that the variance \(\textit{var}\{q\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}(t)\}\) can be estimated consistently by \(n^{-1}\sum _{i=1}^n{\hat{q}}\{{\mathcal {C}}_i,G_{\mathcal C_i}(Z_i)\}(t){\hat{q}}\{{\mathcal {C}}_i,G_{\mathcal C_i}(Z_i)\}(t)^\top \) for some suitable \({\hat{q}}\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}(t)\).

Assumption 8

Assume that \(p_{X|{\mathcal {C}},G}(x|r,g,\phi )\), the conditional density of X given \({\mathcal {C}}\) and \(G_{{\mathcal {C}}}\) with respect to a reference measure \(\nu _X\), is a Lipschitz continuous function of \(\phi \) (with respect to the \(L_2\)-norm) in a neighborhood of \(\phi _0\), with an integrable Lipschitz constant, h(x|rg) such that \(\int h(x|r,g){\mathrm {d}}\nu _X(x)\) is a bounded function of (rg).


We first introduce some notation. The density of the (potentially unobserved) full data \(z=(t,\delta ,x)\) and the observed data \(\{r,g=(t,\delta ,g_{x})\}\) are

$$\begin{aligned} p_{{\mathcal {C}},Z}(r,z,\phi )&={pr({\mathcal {C}}=r|Z=z)}\alpha _{U}(t|x)^{1-\delta }pr(U>t|x)\\&\phantom {=}\;\times \alpha (t)^{\delta }\exp \left\{ \delta \beta ^{\top }x-A(t)\exp (\beta ^{\top }x)\right\} p_{X}(x,\theta )\\&={pr\{{\mathcal {C}}=r|G_{{\mathcal {C}}}(Z)=G_{r}(z)\}}\alpha _{U}\{t|G_{X,r}(x)\}^{1-\delta }\\&\phantom {=}\;\times pr\{U>t|G_{X,r}(x)\}\alpha (t)^{\delta }\exp \left\{ \delta \beta ^{\top }x-A(t)\exp (\beta ^{\top }x)\right\} p_{X}(x,\theta ),\\ p_{{\mathcal {C}},G}(r,g,\phi )&=\int _{\{G_r(z)=g\}}p_{{\mathcal {C}},Z}(r,z,\phi )d\nu _Z(z)\\&={pr({\mathcal {C}}=r|G_{{\mathcal {C}}}(Z)=g)}\alpha _{U}\{t|G_{X,r}(x)\}^{1-\delta }pr\{U>t|G_{X,r}(x)\}\\&\phantom {=}\;\times \alpha (t)^{\delta } \int _{\{G_{X,r}(x)=g_{x}\}}\exp \left\{ \delta \beta ^{\top }x-A(t)\exp (\beta ^{\top }x)\right\} p_{X}(x,\theta )d\nu _X(x), \end{aligned}$$

where \(\nu _{\cdot }(\cdot )\) is a dominating measure for which the densities of the random variables are defined. Recall the definition \(\tilde{p}_{Z}(z,\phi )=\exp \left\{ \delta \beta ^{\top }x-A(t)\exp \left( \beta ^{\top }x\right) \right\} p_{X}(x,\theta )\) and let \(\tilde{p}_{G}(g,\phi )=\int _{\{G_{r}(v)=g\}} \tilde{p}_{Z}(v){\mathrm {d}}\nu _Z(v)\). Note that

$$\begin{aligned} \frac{p_{{\mathcal {C}},Z}(r,z,\phi )}{p_{{\mathcal {C}},G}\{r,G_{r}(z),\phi \}}=\frac{\tilde{p}_{Z}(z,\phi )}{\tilde{p}_{G}\{G_{r}(z,\phi )\}}. \end{aligned}$$

The following lemma building on Wang and Robins (1998), Robins and Wang (2000), see also Tsiatis (2006, Lemma 14.2), will be used repeatedly.

Lemma 1

For f(tZ), continuous in \(t\in [0,\tau ]\) and bounded with probability one,

$$\begin{aligned}&n^{1/2}E\left[ f\{t,Z(\phi )\}-f\{t,Z(\phi _0)\}\right] _{|\phi ={\hat{\phi }^{I}}}\\&\quad =E\left( f(t,Z)\left[ {\mathcal {S}}_{\phi _0}(Z)-{\mathcal {S}}_{\phi _0}\{{\mathcal {C}},G_{{\mathcal {C}}}(Z)\}\right] \right) n^{1/2}({\hat{\phi }^{I}}-\phi _0)+o_p(1) \end{aligned}$$

where the remainder term is uniform in t.


Following Tsiatis (2006, pp. 350–352), we write

$$\begin{aligned} E\left[ f\{t,Z(\phi )\}\right]&=E\left( E\left[ \left. f\{t,Z(\phi )\}\right| {\mathcal {C}},G_{{\mathcal {C}}}(Z),\phi \right] \right) \\&=\int f(t,z)\frac{p_{{\mathcal {C}},Z}(r,z,\phi )}{p_{{\mathcal {C}},G}\{r,G_r(z),\phi \}}p_{{\mathcal {C}},G}\{r,G_r(z),\phi _0\}\mathrm{d}\nu _{{\mathcal {C}},Z}(r,z)\\&=\int f(t,z)\frac{\tilde{p}_{Z}(z,\phi )}{\tilde{p}_{G}\{G_r(z),\phi \}}p_{{\mathcal {C}},G}\{r,G_r(z),\phi _0\}\mathrm{d}\nu _{{\mathcal {C}},Z}(r,z) \end{aligned}$$

so that

$$\begin{aligned}&E\left[ f\{t,Z(\phi )\}-f\{t,Z(\phi _0)\}\right] _{|\phi ={\hat{\phi }^{I}}}\\&\quad =\int f(t,z)\left[ \frac{\tilde{p}_{Z}(z,{\hat{\phi }^{I}})}{\tilde{p}_{G}\{G_r(z),{\hat{\phi }}^{I}\}}-\frac{\tilde{p}_{Z}(z,\phi _0)}{\tilde{p}_{G}\{G_r(z),\phi _0\}}\right] p_{{\mathcal {C}},G}\{r,G_r(z),\phi _0\}\mathrm{d}\nu _{{\mathcal {C}},Z}(r,z)\\&\quad =\int f(t,z)\frac{\tilde{p}_{Z}(z,\phi _0)}{\tilde{p}_{G}\{G_r(z),\phi _0\}}\left[ {\mathcal {S}}_{\phi _0}(z)-{\mathcal {S}}_{\phi _0}\{r,G_r(z)\}\right] ({\hat{\phi }^{I}}-\phi _0)\\&\qquad \phantom {=}\;\times p_{{\mathcal {C}},G}\{r,G_r(z),\phi _0\}\mathrm{d}\nu _{{\mathcal {C}},Z}(r,z)+o_P(n^{-1/2})\\&\quad =\int f(t,z)\left[ {\mathcal {S}}_{\phi _0}(z)-{\mathcal {S}}_{\phi _0}\{r,G_r(z)\}\right] ({\hat{\phi }^{I}}-\phi _0)p_{{\mathcal {C}},Z}(r,z,\phi _0)\mathrm{d}\nu _{{\mathcal {C}},Z}(r,z)\\&\qquad +o_P(n^{-1/2})\\&\quad =E\left( f(t,Z)\left[ {\mathcal {S}}_{\phi _0}(Z)-{\mathcal {S}}_{\phi _0}\{{\mathcal {C}},G_{{\mathcal {C}}}(Z)\}\right] \right) ({\hat{\phi }^{I}}-\phi _0)+o_P(n^{-1/2}). \end{aligned}$$

\(\square \)

Lemma 2

Let \(f[\{X_{ij}(\phi ),{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}_{j=1,\ldots ,m}]\) be a bounded function. Then the logarithm of the \(\epsilon \)-bracketing number of the class

$$\begin{aligned}&\{(r,g)\mapsto E(f[\{X_{ij}(\phi )\}_{j=1,\ldots ,m},{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)]|{\mathcal {C}}_i=r,G_{{\mathcal {C}}_i}(Z_i)=g)\nonumber \\&\quad :\Vert \phi -\phi _0\Vert _{L_2}\le \delta \} \end{aligned}$$

is bounded by a constant times \(1/\epsilon \).


Let \(F_i(\phi )=E(f[\{X_{ij}(\phi )\}_{j=1,\ldots ,m},{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)]|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i))\). Then

$$\begin{aligned}&|F_i(\phi )-F_i(\phi _0)|\\&\quad \le \int |f\{x,{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}||p_{X|{\mathcal {C}},G}\{x|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i),\phi \}\\&\qquad \qquad -p_{X|{\mathcal {C}},G}\{x|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i),\phi _0\}|d\nu _X(x)\\&\quad \le \textit{constant}\times \Vert \phi -\phi _0\Vert _{L_2} \end{aligned}$$

by assumption 8. It follows that the bracketing number of the class (8) is bounded by the bracketing number of \(\{\phi : \Vert \phi -\phi _0\Vert _{L_2}\le \delta \}\) and this is dominated by the bracketing number of the integrated baseline hazard which is smaller than \(\exp (K/\epsilon )\) by van der Vaart and Wellner (1996, Theorem 2.7.5) for a constant K.\(\square \)

It follows that for a bounded function f, the process

$$\begin{aligned} {n^{-1/2}}\sum _{i=1}^n\left\{ E(f[\{Z_{ij}(\phi )\}_{j=1,\ldots ,m}]|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i))-E(f[\{Z_{ij}(\phi )\}_{j=1,\ldots ,m}])\right\} \end{aligned}$$

is stochastic equicontinuous near \(\phi _0\), and that

$$\begin{aligned} n^{-1}\sum _{i=1}^nE(f[\{Z_{ij}(\phi )\}_{j=1,\ldots ,m}]|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)) \end{aligned}$$

converges almost surely, uniformly in a neighborhood of \(\phi _0\). The process

$$\begin{aligned} {n^{-1/2}}\sum _{i=1}^n\left\{ f[\{Z_{ij}(\phi )\}_{j=1,\ldots ,m}]-E(f[\{Z_{ij}(\phi )\}_{j=1,\ldots ,m}])\right\} \end{aligned}$$

is not stochastic equicontinuous in general. A proof of this is included at the end of this appendix.

We will need some results for averages of functions of the imputations and the unknown parameter.

Lemma 3

Let \(f[ \{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},\phi ]\) be a bounded function which is Lipschitz continuous as a function of \(\phi \) in a neighborhood of \(\phi _0\) with a bounded Lipschitz constant. Then

$$\begin{aligned} n^{-1}\sum _{i=1}^nf[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},{\tilde{\phi }}]-E(f[\{Z_{ij}(\phi )\}_{j=1,\ldots ,m},\phi _0]|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i))_{|\phi ={\hat{\phi }^{I}}} \end{aligned}$$

converges to in probability to 0 for any consistent estimator \({\tilde{\phi }}\) of \(\phi _0\).


As \(|f[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},{\tilde{\phi }}]-f[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},\phi _0]|\le \textit{constant}\times \Vert {\tilde{\phi }}-\phi _0\Vert _{L_2}\), we only need to consider the case where \({\tilde{\phi }}=\phi _0\). Letting

$$\begin{aligned} F_i=f[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},\phi _0]-E(f[ \{Z_{ij}(\phi )\}_{j=1,\ldots ,m},\phi _0]|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i))_{|\phi ={\hat{\phi }^{I}}} \end{aligned}$$

we see that \(E\{F_i|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}=0\) so that

$$\begin{aligned} \textit{var}\left( n^{-1}\sum _{i=1}^nF_i\right)&=E\left[ n^{-2}\sum _{i=1}^n\textit{var}\left\{ \left. F_i\right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right\} \right] =O(1/n) \end{aligned}$$

as \(F_i\) is bounded by assumption.\(\square \)

Corollary 1

Let \(f[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},\phi ]\) be a bounded function, which is Lipschitz continuous as a function of \(\phi \) in a neighborhood of \(\phi _0\) with a bounded Lipschitz constant. Suppose further that \(E(f[\{Z_{ij}(\phi ')\}_{j=1,\ldots ,m},\phi _0])\) is a continuous function of \(\phi '\). Then

$$\begin{aligned} n^{-1}\sum _{i=1}^nf[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},{\tilde{\phi }}]\rightarrow E(f[\{Z_{1j}\}_{j=1,\ldots ,m},\phi _0]) \end{aligned}$$

in probability for any consistent estimator \({\tilde{\phi }}\) of \(\phi _0\).


The average \(n^{-1}\sum _{i=1}^nf[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},{\tilde{\phi }}]\) may be split into a sum of

$$\begin{aligned} n^{-1}\sum _{i=1}^nf[\{Z_{ij}({\hat{\phi }^{I}})\}_{j=1,\ldots ,m},{\tilde{\phi }}]- E(f[\{Z_{ij}(\phi )\}_{j=1,\ldots ,m},\phi _0]|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i))_{|\phi ={\hat{\phi }^{I}}} \end{aligned}$$

which is \(o_P(1)\) by lemma 3, and \(n^{-1}\sum _{i=1}^n E(f[\{Z_{ij}(\phi )\}_{j=1,\ldots ,m},\phi _0]|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i))_{|\phi ={\hat{\phi }^{I}}}\) which converges to \(E(f[\{Z_{1j}\}_{j=1,\ldots ,m},\phi _0])\) by lemma 2 and the uniform law of large numbers.\(\square \)

Lemma 4

If \({\tilde{\beta }}\rightarrow \beta _0\) in probability, then

$$\begin{aligned} n^{-1} \sum _{i=1}^n S_{k}\{t,Z_{ij}({\hat{\phi }^{I}}),{\tilde{\beta }}\}\rightarrow s_k(t)\qquad (k=0,1,2, j=1,\ldots ,m) \end{aligned}$$

in probability, uniformly in \(t\in [0,\tau ]\).


It suffices to consider the case where X is one-dimensional. Clearly, by differentiability and boundedness,

$$\begin{aligned} \sup _{t\in [0,\tau ]}\left| n^{-1}\sum _{i=1}^n S_{k}\{t,Z_{ij}({\hat{\phi }^{I}}),{\tilde{\beta }}\}-n^{-1}\sum _{i=1}^nS_{k}\{t,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}\right| \le \textit{constant}\,\times |{\tilde{\beta }}-\beta _0| \end{aligned}$$

so we may replace \({\tilde{\beta }}\) by \(\beta _0\). Furthermore, by corollary 1, \(n^{-1}\sum _{i=1}^nS_{k}\{t,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}-s_k(t)=o_P(1)\) for any t. Assume for simplicity \(X_1\ge 0\) with probability 1. Choose finitely many \(0=t_0< t_1<\cdots < t_L= \tau \) such that for any t there is an \(\ell \) such that \(E\{Y_1(t)-Y_1(t_\ell )\}, E\{Y_1(t_{\ell -1})-Y_1(t)\}\le \epsilon /c_k\), where \(c_k\) is an upper bound on \(X_1^k\exp (\beta _0^\top X_1)\). Then

$$\begin{aligned}&n^{-1}\sum _{i=1}^nS_{k}\{t,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}-s_k(t)\\&\quad \le n^{-1}\sum _{i=1}^nS_{k}\{t_{\ell -1},Z_{ij}({\hat{\phi }^{I}}),\beta _0\}-s_k(t_{\ell -1})+s_k(t_{\ell -1})-s_k(t) \le o_P(1)+\epsilon \end{aligned}$$

where the \(o_P(1)\)-term does not depend on t. Combined with a similar lower bound, this yields the desired uniform convergence. If \(pr(X_1<0)>0\) we may split (when \(k=1\)) \(X_{ij}({\hat{\phi }^{I}})\) into a sum of \(X_{ij}({\hat{\phi }^{I}})-\min X_1\) and \(\min X_1\), where \(\min X_1\) denotes the lower bound for the support of \(X_1\) (the essential infimum). Thus, \(n^{-1}\sum _{i=1}^nS_{k}\{t,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}\) may be split into a sum of two terms, each of which may be handled as indicated above.\(\square \)

Proof of Theorem 1: Regression parameters

The multiple-imputation estimator of \(\beta _0\) is \({\hat{\beta }}=m^{-1}\sum _{j=1}^m\hat{\beta }_j\), where the jth imputation estimator \(\hat{\beta }_j\) is the solution to \(U_j(\hat{\beta }_j,\hat{\phi }^I)=0\), with

$$\begin{aligned} U_j(\beta ,\hat{\phi }^I)&=\sum _{i=1}^n \left[ X_{ij}(\hat{\phi }^I)-\frac{\sum _{l=1}^nS_{1}\{T_i,Z_{lj}({\hat{\phi }^{I}}),\beta \}}{\sum _{l=1}^nS_{0}\{T_i,Z_{lj}({\hat{\phi }^{I}}),\beta \}}\right] \Delta _i. \end{aligned}$$

Following standard arguments and using lemma 4, \({\hat{\beta }_{j}}\) may be shown to be consistent and \(n^{1/2}({\hat{\beta }_{j}}-\beta _0)=n^{-1/2}\left( I^F\right) ^{-1}U_j(\beta _0,\hat{\phi }^I)+o_P(1)\), where \(I^F\) is the full-data information matrix for \(\beta \). Averaging the m estimators we get

$$\begin{aligned} n^{1/2}({\hat{\beta }}-\beta _0)&=n^{-1/2}\left( I^F\right) ^{-1}m^{-1}\sum _{j=1}^mU_j(\beta _0,\hat{\phi }^I)+o_P(1). \end{aligned}$$

As the imputations depend on the initial estimator, \(\hat{\phi }^I\), which involves information from all subjects, this is not a sum of independent and identically distributed terms. We can write


The second term on the right-hand side above converges to zero in probability by Lemma 4 and Kosorok (2008, Lemma 4.2). To show that the third term also converges to zero in probability, it suffices (by Kosorok 2008, Lemma 4.2) to show that the second factor in the integrand of (10),

$$\begin{aligned}&n^{-1/2}\sum _{i=1}^nY_i(u)\left[ \exp \left\{ \beta _0^{\top }X_{ij}({\hat{\phi }^{I}})\right\} -\exp \left\{ \beta _0^{\top }X_{ij}(\phi _0)\right\} \right] \nonumber \\&\quad =n^{-1/2}\sum _{i=1}^nY_i(u)\left( \exp \left\{ \beta _0^{\top }X_{ij}({\hat{\phi }^{I}})\right\} \!-\!E\left[ \left. \exp \left\{ \beta _0^{\top }X_{ij}(\phi )\right\} \right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] _{|\phi ={\hat{\phi }^{I}}}\right) \nonumber \\&\qquad -n^{-1/2}\sum _{i=1}^nY_i(u)\left( \exp \left\{ \beta _0^{\top }X_{ij}(\phi _0)\right\} -E\left[ \left. \exp \left\{ \beta _0^{\top }X_{ij}(\phi _0)\right\} \right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] \right) \nonumber \\&\qquad +n^{-1/2}\sum _{i=1}^nY_i(u)\left( E\left[ \left. \exp \{\beta _0^{\top }X_{ij}(\phi )\}\right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] _{|\phi ={\hat{\phi }^{I}}}\right. \nonumber \\&\qquad \left. \phantom {E\left\{ \right\} _{|\phi ={\hat{\phi }^{I}}}}-E\left[ \left. \exp \{\beta _0^{\top }X_{ij}(\phi _0)\}\right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] \right) \end{aligned}$$

is bounded in probability. The first two terms have mean zero and finite variance and are thus bounded in probability. By stochastic equicontinuity, continuity of the mean and \(n^{1/2}\)-consistency of the initial estimator, the third term is also bounded in probability. Thus,

$$\begin{aligned}&n^{-1/2}m^{-1}\sum _{j=1}^m U_j(\beta _0,{\hat{\phi }^{I}})\nonumber \\&\quad =n^{-1/2}\sum _{i=1}^nm^{-1}\sum _{j=1}^m S_{\mathrm {eff}}^F\{Z_{ij}({\hat{\phi }^{I}})\}+o_P(1)\nonumber \\&\quad =n^{-1/2}\sum _{i=1}^nm^{-1}\sum _{j=1}^m\left( S_{\mathrm {eff}}^F\{Z_{ij}({\hat{\phi }^{I}})\}-E[S_{\mathrm {eff}}^F\{Z_{ij}(\phi )\}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)]_{|\phi ={\hat{\phi }^{I}}}\right) \end{aligned}$$
$$\begin{aligned}&\qquad + n^{-1/2}\sum _{i=1}^n\left( E[S_{\mathrm {eff}}^F\{Z_{i1}(\phi )\}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)]_{|\phi ={\hat{\phi }^{I}}}-E[S_{\mathrm {eff}}^F\{Z_{i1}(\phi )\}]_{|\phi ={\hat{\phi }^{I}}}\right) \nonumber \\&\qquad -n^{-1/2}\sum _{i=1}^n\left( E[S_{\mathrm {eff}}^F\{Z_{i1}(\phi _0)\}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)]-E[S_{\mathrm {eff}}^F\{Z_{i1}(\phi _0)\}]\right) \end{aligned}$$
$$\begin{aligned}&\qquad +n^{-1/2}\sum _{i=1}^nE[S_{\mathrm {eff}}^F\{Z_{i1}(\phi _0)\}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)] \end{aligned}$$
$$\begin{aligned}&\qquad + n^{1/2}\left( E[S_{\mathrm {eff}}^F\{Z_{11}(\phi )\}]_{|\phi ={\hat{\phi }^{I}}}-E[S_{\mathrm {eff}}^F\{Z_{11}(\phi _0)\}]\right) +o_P(1), \end{aligned}$$

where \(E[S_{\mathrm {eff}}^F\{Z_{i1}(\phi _0)\}]\) equals zero but has been included for clarity. Using lemma 1 we may write

$$\begin{aligned}&n^{1/2}\left( E[S_{\mathrm {eff}}^F\{Z_{11}(\phi )\}]_{|\phi ={\hat{\phi }^{I}}}-E[S_{\mathrm {eff}}^F\{Z_{11}(\phi _0)\}]\right) \\&\quad =D_{\mathrm {eff}}(\phi _0){n^{1/2}}({\hat{\phi }^{I}}-\phi _0)+o_P(1)\\&\quad =n^{-1/2}\sum _{i=1}^nD_{\mathrm {eff}}(\phi _0) q\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}+o_P(1) \end{aligned}$$

where \(D_{\mathrm {eff}}(\phi _0)=E(S_{\mathrm {eff}}^F(Z)[{\mathcal {S}}_{\phi _0}(Z)-{\mathcal {S}}_{\phi _0}\{{\mathcal {C}},G_{{\mathcal {C}}}(Z)\}])\). Thus, the last three terms—(13), (14), (15)—may be written as

$$\begin{aligned} n^{-1/2}\sum _{i=1}^n\left( E[S_{\mathrm {eff}}^F\{Z_{i1}(\phi _0)\}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)]+ D_{\mathrm {eff}}(\phi _0) q\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}\right) +o_P(1) \end{aligned}$$

as (13) is \(o_P(1)\) by the stochastic equicontinuity implied by lemma 2.

Lemma 2 (with a straightforward extension) also implies that

$$\begin{aligned} n^{-1}\sum _{i=1}^n\textit{var}[S_{\mathrm {eff}}^F\{Z_{i1}(\phi )\}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)] \rightarrow E\left( \textit{var}[S_{\mathrm {eff}}^F\{Z(\phi )\}|{{\mathcal {C}},G_{{\mathcal {C}}}(Z)}]\right) \end{aligned}$$

almost surely, uniformly in a neighborhood of \(\phi _0\). Assume for now (for simplicity) that \({\hat{\phi }^{I}}\) is strongly consistent. Then, conditionally on the observed data, for almost every realization,

$$\begin{aligned} \begin{aligned}&n^{-1/2}\sum _{i=1}^nm^{-1}\sum _{j=1}^m\left( S_{\mathrm {eff}}^F\{Z_{ij}({\hat{\phi }^{I}})\}-E[S_{\mathrm {eff}}^F\{Z_{ij}(\phi )\}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)]_{|\phi ={\hat{\phi }^{I}}}\right) \\&\quad \rightarrow N\left\{ 0, m^{-1}E\left( \textit{var}[S_{\mathrm {eff}}^F\{Z(\phi _0)\}|{{\mathcal {C}},G_{{\mathcal {C}}}(Z)}]\right) \right\} \end{aligned} \end{aligned}$$

in distribution by the Lindeberg–Feller central limit theorem (van der Vaart 1998, Proposition 2.27). Using Schenker and Welsh (1988, Lemma 1) or Nielsen (2003, Lemma 1), it follows that (16) also holds unconditionally and that (12) is asymptotically independent of the observed data. Without strong consistency, we may for every subsequence extract a further subsequence where \({\hat{\phi }^{I}}\) converges almost surely to \(\phi _0\). Thus, every subsequence has a subsequence, where (16) holds. Thus, the conditional characteristic function of the left-hand side of (16) converges almost surely along subsequences of subsequences to the characteristic function of the right-hand side of (16). This implies that the convergence holds in probability for the original sequence of characteristic functions and as the characteristic function is bounded this ensures that (16) holds unconditionally. The asymptotic distribution of \({\hat{\beta }}\) now follows.

Proof of Theorem 1: Cumulative baseline hazard

The multiple-imputation estimator of the cumulative baseline hazard function is \({\hat{A}}(t)=m^{-1}\sum _{j=1}^m{\hat{A}}_{j}(t,{\hat{\beta }_{j}})\), where

$$\begin{aligned} {\hat{A}}_{j}(t,\beta )=\int _0^t\frac{1}{\sum _{i=1}^n S_{0}\{u,Z_{ij}({\hat{\phi }}^{I}),\beta \}}dN_{\cdot }(u) \end{aligned}$$

is the estimator from the jth imputation where \(N_{\cdot }(t)=\sum _{i=1}^nN_i(t)\). Let

$$\begin{aligned} \mathrm{d}M(t,Z_{i})=\mathrm{d}N_{i}(t)-E\{Y_{i}(t)\exp (\beta _0^\top X_{i})|{\mathcal {C}}_{i},G_{{\mathcal {C}}_{i}}(Z_{i})\}\alpha _0(t)\mathrm{d}t. \end{aligned}$$

Then, \(M_{\cdot }(t)=\sum _{i=1}^nM(t,Z_i)\) is a zero mean square-integrable martingale with respect to the observed filtration.

We may write \(n^{1/2}\{{\hat{A}}(t)-A_0(t)\}=n^{1/2}\{{\hat{A}}(t)-{\hat{A}}_0(t)\}+n^{1/2}\{{\hat{A}}_0(t)-A_0(t)\}\), where

$$\begin{aligned} {\hat{A}}_0(t)=m^{-1}\sum _{j=1}^m\int _0^t\frac{1}{\sum _{i=1}^nS_{0}\{u,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}}\mathrm{d}N_{\cdot }(u). \end{aligned}$$

Using lemma 4 and Kosorok (2008, Lemma 4.2), we have

$$\begin{aligned}&n^{1/2}\{{\hat{A}}(t)-{\hat{A}}_0(t)\}\\&\quad =-m^{-1}\sum _{j=1}^m\int _0^t\frac{n^{-1}\sum _{i=1}^nS_{1}\{u,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}}{[n^{-1}\sum _{i=1}^nS_{0}\{u,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}]^2}n^{-1}{\mathrm {d}}N_{\cdot }(u)n^{1/2}({\hat{\beta }}\!-\!\beta _0)\!+\!o_P(1)\\&\quad =-\int _0^t\frac{s_1(u)}{s_0(u)}\alpha _0(u){\mathrm {d}}u\, n^{1/2}({\hat{\beta }}-\beta _0)+o_P(1). \end{aligned}$$


$$\begin{aligned}&n^{1/2}\{{\hat{A}}_0(t)-A_0(t)\}\nonumber \\&\quad =n^{1/2}\left( m^{-1}\sum _{j=1}^m\int _0^t\frac{1}{\sum _{i=1}^nS_{0}\{u,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}}{\mathrm {d}}N_{\cdot }(u)-\int _0^t\alpha _0(u)\mathrm{d}u\right) \nonumber \\&\quad =m^{-1}\sum _{j=1}^m\int _0^t\left[ \frac{1}{\sum _{i=1}^nS_{0}\{u,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}}\right. \nonumber \\&\qquad \left. -\frac{1}{\sum _{i=1}^nE\{Y_i(u){\exp (\beta _0^\top X_{i})}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}}\right] n^{1/2}\mathrm{d}N_{\cdot }(u)\nonumber \\&\qquad +n^{1/2}\left[ \int _0^t\frac{1}{\sum _{i=1}^nE\{Y_i(u){\exp (\beta _0^\top X_{i})}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}}\mathrm{d}N_{\cdot }(u)-\int _0^t\alpha _0(u)\mathrm{d}u\right] . \end{aligned}$$

The second term of (17) may be rewritten as:

$$\begin{aligned}&\int _0^t\frac{1}{n^{-1}\sum _{i=1}^nE\{Y_i(u){\exp (\beta _0^\top X_{i})}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}} n^{-1/2}\mathrm{d}M_{\cdot }(u)+o_P(1)\\&\quad =\int _0^t\frac{1}{s_0(u)}n^{-1/2}\mathrm{d}M_{\cdot }(u)+o_P(1) \end{aligned}$$

which converges to a Gaussian martingale. Before turning to the first term of (17), we note that

$$\begin{aligned}&n^{-1/2}\sum _{i=1}^nY_i(u)\left( \exp \left\{ \beta _0^\top X_{ij}(\hat{\phi }^I)\right\} -E\left[ \exp \left( \beta _0^\top X_{i}\right) |\mathcal {C}_i,G_{\mathcal {C}_i}(Z_i)\right] \right) \nonumber \\&\quad =n^{-1/2}\sum _{i=1}^nY_i(u)\left( \exp \left\{ \beta _0^\top X_{ij}(\hat{\phi }^I)\right\} -E\left[ \exp \left\{ \beta _0^\top X_{i1}(\phi )\right\} |\mathcal {C}_i,G_{\mathcal {C}_i}(Z_i)\right] _{|\phi =\hat{\phi }^I}\right) \nonumber \\&\qquad + n^{-1/2}\sum _{i=1}^nY_i(u)\left( E\left[ \left. {\exp \left\{ \beta _0^\top X_{i1}(\phi )\right\} }\right| \mathcal {C}_i,G_{\mathcal {C}_i}(Z_i)\right] _{|\phi =\hat{\phi }^I}\right. \nonumber \\&\qquad \left. -E\left\{ {\exp \left( \beta _0^\top X_{i}\right) }|\mathcal {C}_i,G_{\mathcal {C}_i}(Z_i)\right\} \right) . \end{aligned}$$

The second term of (18) is asymptotically equivalent to

$$\begin{aligned} n^{1/2}\left( E[S_0\{u,Z(\phi ),\beta _0\}]_{|\phi ={\hat{\phi }^{I}}}\!-E\{S_0(u,Z,\beta _0)\}\right) =D_0(u,\phi _0)n^{1/2}({\hat{\phi }^{I}}-\phi _0)+o_P(1) \end{aligned}$$

where \(D_0(u,\phi _0)=E(S_0(u,Z,\beta _0)[{\mathcal {S}}_{\phi _0}(Z)-{\mathcal {S}}_{\phi _0}\{{\mathcal {C}},G_{{\mathcal {C}}}(Z)\}])\) by lemma 1. Thus, we may write the integrand of the first term of (17) as

$$\begin{aligned}&n^{1/2}\left( \frac{1}{\sum _{i=1}^nS_{0}\{u,Z_{ij}({\hat{\phi }^{I}}),\beta _0\}}-\frac{1}{\sum _{i=1}^nE\left[ Y_i(u){\exp \left\{ \beta _0^\top X_{i1}(\phi _0)\right\} }|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] }\right) \\&\quad =-\frac{n^{-3/2}\sum _{i=1}^nY_i(u)\left( {\exp \left\{ \beta _0^\top X_{ij}({\hat{\phi }^{I}})\right\} }-E\left[ \left. {\exp \left\{ \beta _0^\top X_{i1}(\phi )\right\} }\right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] _{|\phi ={\hat{\phi }^{I}}}\right) }{s_0(u)^2}\\&\qquad -n^{-1}D_0(u,\phi _0)\frac{n^{1/2}({\hat{\phi }^{I}}-\phi _0)}{s_0(u)^2}+o_P(1) \end{aligned}$$

and hence the first term of (17) as

$$\begin{aligned}&-\int _0^t n^{-1}\sum _{i=1}^nY_i(u)\left( m^{-1}\sum _{j=1}^m{\exp \left\{ \beta _0^\top X_{ij}({\hat{\phi }^{I}})\right\} }\right. \\&\quad \left. -E\left[ \left. {\exp \left\{ \beta _0^\top X_{i1}(\phi )\right\} }\right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] _{|\phi ={\hat{\phi }^{I}}}\right) \frac{n^{-1/2}{\mathrm {d}}M.(u)}{s_0(u)^2}\\&\quad -\int _0^tn^{-1/2}\sum _{i=1}^nY_i(u)\left( m^{-1}\sum _{j=1}^m{\exp \left\{ \beta _0^\top X_{ij}({\hat{\phi }^{I}})\right\} }\right. \\&\quad \left. -E\left[ \left. {\exp \left\{ \beta _0^\top X_{i1}(\phi )\right\} }\right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] _{|\phi ={\hat{\phi }^{I}}}\right) \frac{\alpha _0(u)}{s_0(u)}du\\&\quad -\int _0^tD_0(u,\phi _0)\frac{1}{s_0(u)^2}n^{-1}{\mathrm {d}}M.(u)n^{1/2}({\hat{\phi }^{I}}-\phi _0)\\&\quad -\int _0^tD_0(u,\phi _0)\frac{\alpha _0(u)}{s_0(u)}{\mathrm {d}}u\, n^{1/2}({\hat{\phi }^{I}}-\phi _0)+o_P(1) \end{aligned}$$

where the first and the third term are both \(o_P(1)\) (Kosorok 2008, Lemma 4.2). Thus

$$\begin{aligned}&n^{1/2}\{{\hat{A}}(t)-A_0(t)\}\nonumber \\&\quad =-\int _0^t n^{-1/2}\sum _{i=1}^nY_i(u)\left( m^{-1}\sum _{j=1}^m{\exp \left\{ \beta _0^\top X_{ij}({\hat{\phi }^{I}})\right\} }\right. \nonumber \\&\qquad \left. -E\left[ \left. {\exp \left\{ \beta _0^\top X_{i1}(\phi )\right\} }\right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] _{|\phi ={\hat{\phi }^{I}}}\right) \frac{\alpha _0(u)}{s_0(u)}\mathrm{d}u\nonumber \\&\qquad -\int _0^tD_0(u,\phi _0)\,\frac{\alpha _0(u)}{s_0(u)}\mathrm{d}u\,n^{1/2}({\hat{\phi }^{I}}-\phi _0)+\int _0^t\frac{1}{s_0(u)}n^{-1/2}{\mathrm {d}}M_{\cdot }(u)\nonumber \\&\qquad -\int _0^t\frac{s_1(u)}{s_0(u)}\alpha _0(u)\mathrm{d}u\, n^{1/2}({\hat{\beta }}-\beta _0)+o_P(1) \end{aligned}$$

where the three latter terms converge as processes. To show tightness of the first term, let w(st) denote

$$\begin{aligned}&-\int _s^t n^{-1/2}\sum _{i=1}^nY_i(u)\left( m^{-1}\sum _{j=1}^m{\exp \left\{ \beta _0^\top X_{ij}({\hat{\phi }^{I}})\right\} }\right. \\&\qquad \left. -E\left[ \left. {\exp \left\{ \beta _0^\top X_{i1}(\phi )\right\} }\right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] _{|\phi ={\hat{\phi }^{I}}}\right) \frac{\alpha _0(u)}{s_0(u)}\mathrm{d}u\\&\quad =-n^{-1/2}\sum _{i=1}^n\int _s^tY_i(u)\frac{\alpha _0(u)}{s_0(u)}\mathrm{d}u\,\\&\qquad \times m^{-1}\sum _{j=1}^m\left( {\exp \left\{ \beta _0^\top X_{ij}({\hat{\phi }^{I}})\right\} }-E\left[ \left. {\exp \left\{ \beta _0^\top X_{i1}(\phi )\right\} }\right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] _{|\phi ={\hat{\phi }^{I}}}\right) . \end{aligned}$$

Then, clearly \(E\{w(s,t)\}=E(E[w(s,t)|\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}_{i=1,\ldots ,n})\}=0\) so that

$$\begin{aligned} E\{w(s,t)^2\}&=E(\textit{var}\,[w(s,t)|\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}_{i=1,\ldots ,n}])\\&=n^{-1}\sum _{i=1}^nE\left( \left\{ \int _s^tY_i(u)\frac{\alpha _0(u)}{s_0(u)}\mathrm{d}u\right\} ^2\right. \\&\quad \left. \times m^{-1}\textit{var}\left[ \left. {\exp \left\{ \beta _0^\top X_{i1}(\phi _0)\right\} }\right| \{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}_{i=1,\ldots ,n}\right] \right) \\&=O\{(t-s)^2\} \end{aligned}$$

implying (van der Vaart and Wellner 1996, Section 2.2.3) that also the first term of (19) is tight. Finally, we may write \(n^{1/2}\{{\hat{A}}(t)-A_0(t)\}\) as a sum of

$$\begin{aligned} \begin{aligned}&n^{-1/2}\sum _{i=1}^n\left\{ \int _0^t\frac{1}{s_0(u)}{\mathrm {d}}M_i(u)-\int _0^tD_0(u,\phi _0)\frac{\alpha _0(u)}{s_0(u)}\mathrm{d}u\, q\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}\right. \\&\quad \left. -\int _0^t\frac{s_1(u)}{s_0(u)}\alpha _0(u)\mathrm{d}u\,(I^F)^{-1}\right. \\&\quad \left. \times \left( E[S_{\mathrm {eff}}^F\{Z_{ij}(\phi _0)\}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)] +D_{\mathrm {eff}}(\phi _0)q\{{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\}\right) \right\} \end{aligned} \end{aligned}$$


$$\begin{aligned} \begin{aligned}&-n^{-1/2}\sum _{i=1}^n\left\{ \int _0^t\frac{s_1(u)}{s_0(u)}\alpha _0(u)\mathrm{d}u\,(I^F)^{-1}\right. \\&\quad \left. \times m^{-1}\sum _{j=1}^m\left( S_{\mathrm {eff}}^F\{Z_{ij}({\hat{\phi }^{I}})\}-E[S_{\mathrm {eff}}^F\{Z_{ij}(\phi )\}|{\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)]_{|\phi ={\hat{\phi }^{I}}}\right) \right. \\&\phantom {-n^{-1/2}\sum _{i=1}^n\{}\;\left. +\int _0^tY_i(u)\frac{\alpha _0(u)}{s_0(u)}\mathrm{d}u\,\right. \\&\quad \left. \times m^{-1}\sum _{j=1}^m\left( {\exp \left\{ \beta _0^\top X_{ij}({\hat{\phi }^{I}})\right\} }-E\left[ \left. {\exp \left\{ \beta _0^\top X_{i1}(\phi )\right\} }\right| {\mathcal {C}}_i,G_{{\mathcal {C}}_i}(Z_i)\right] _{|\phi ={\hat{\phi }^{I}}}\right) \right\} \end{aligned} \end{aligned}$$

plus \(o_P(1)\)-terms. Proceeding as in the proof of asymptotic normality of the regression parameters, we can show that the terms in (21) are asymptotically independent of the terms in (20) and converge in distribution to a normal distribution. Also the terms in (20) are asymptotically normal. Thus \(n^{1/2}\{{\hat{A}}(t)-A_{0}(t)\}\) converges to a Gaussian process with mean 0.

Proof of Theorem 1: Joint convergence

To see that \(n^{1/2}({\hat{\beta }}-\beta _0)\) and \(n^{1/2}\{{\hat{A}}(t)-A_0(t)\}_{t\in [0,\tau ]}\) converge jointly in distribution, note that we have written both as a sum of terms—(12), (21)—that depend on the imputations but are asymptotically independent of the observed data, terms—(14), (15), (20)—that depend only on the observed data, and terms, that are asymptotically negligible. Joint convergence follows by noting that linear combinations of the “imputation terms”, (12) and (21), are asymptotically independent of the observed data and converge to a normal distribution, while the same linear combinations of the “observed data terms”, (14), (15) and (20), also converge to a normal distribution. Hence, \(n^{1/2}({\hat{\beta }}-\beta _0)\) and \(n^{1/2}\{{\hat{A}}(t)-A_0(t)\}_{t\in [0,\tau ]}\) converge jointly in distribution to a Gaussian process.

Iterating the estimation process

In order to establish asymptotic results for the iterated multiple-imputation estimator, we extend the arguments in the previous parts of the appendix to the case where the “initial estimator” is a multiple-imputation estimator of the type we are considering. We let \({\hat{\phi }}^{(1)}\) denote the multiple-imputation estimator based on the initial imputations and let \(Z_{ij}^{(2)}({\hat{\phi }}^{(1)})\) denote the second iteration imputations, i.e., imputations generated using \({\hat{\phi }}^{(1)}\) as the true parameter. We focus on the asymptotic distribution of \({\hat{\beta }}^{(2)}\), the multiple-imputation estimator of \(\beta _0\) based on the second iteration imputations and outline the changes we need to make to the expansion of the score function given in Eqs. (12)–(15).

Consider first the term (12). Conditional on the observed data and the first iteration imputations the mean of \(S_{\mathrm {eff}}^F\{Z_{ij}^{(2)}({\hat{\phi }}^{(1)})\}\) equals \(E[S_{\mathrm {eff}}^F\{Z_{ij}^{(2)}(\phi )\}|{\mathcal {C}}_i, G_{{\mathcal {C}}}(Z_i)]_{|\phi ={\hat{\phi }}^{(1)}}\) as the second iteration imputations only depend on the first iteration imputations through the first iteration estimator \({\hat{\phi }}^{(1)}\). It follows as before that (12) is asymptotically normal and asymptotically independent of the observed data (and the first iteration imputations).

The terms (13) and (14) are unchanged. Finally, the term (15) may be rewritten as \(D_{\mathrm {eff}}(\phi _0){n^{1/2}}({\hat{\phi }}^{(1)}-\phi _0)\). When plugging in the asymptotic expression for \({n^{1/2}}({\hat{\phi }}^{(1)}-\phi _0)\) derived above, and splitting it into the first iteration imputation part corresponding to (12) and (21) and the rest, we end up with a term (12) depending on the second iteration imputations, which is asymptotically independent of the first iteration imputations, terms depending on the first iteration imputations and the observed data, which are asymptotically independent of the observed data, and terms depending only on the observed data. It now follows that the Cox partial score function is asymptotically normal and it is straightforward to verify that it has the same asymptotic distribution as (5) with \(q_i\) replaced by \(\rho _i=(I^F)^{-1}\xi _i\).

The second iteration estimator of the integrated baseline hazard may be shown to be asymptotically Gaussian by following a similar line of arguments, splitting (21) into a sum of terms depending on the second iteration imputations and terms depending on the first iteration imputations and conditioning as above. Joint convergence follows in a similar manner to what we did for the original multiple-imputation estimator. Further iterations may be handled by splitting the “imputation terms” into additional terms and repeated conditioning.

Stochastic equicontinuity

Whereas stochastic equicontinuity of the empirical process based on \(m^{-1}\sum _{j=1}^mS_{\mathrm {eff}}^F\{Z_{ij}(\phi _0)\}\) is straightforward to verify when imputing a large class of continuous covariates, we claim that for discrete covariates the combination of the unknown baseline hazard and the inherent discontinuity of the covariate rules out stochastic equicontinuity. To see this, we prove the following lemma:

Lemma 5

The set of sets

$$\begin{aligned} \bigg \{\,\{(x,t)\in {\mathcal {X}}\times {\mathbb {R}}: x\le a(t)\}\quad a:{\mathbb {R}}\rightarrow {\mathbb {R}}\text { increasing}\,\bigg \} \end{aligned}$$

with \({\mathcal {X}}\subset {\mathbb {R}}\) is a Vapnik–Chervonenkis (VC) class if and only if \({\mathcal {X}}\) is a finite set.


Consider a set \(A=\{(x_1,t_1),\ldots ,(x_n,t_n)\}\). Assuming that \(|{\mathcal {X}}|\) is finite, then any set of \(n>|{\mathcal {X}}|\) points will contain at least two points \((x_{i},t_{i}),(x_{j},t_{j})\), such that \(x_{i}=x_{j}\) and (without loss of generality) \(t_i\le t_j\). Clearly, we cannot pick out a subset of A containing \(x_{i}\) but not \(x_{j}\): If \(a(t_{i})\ge x_i\) then \(a(t_j)\ge a(t_i)\ge x_i=x_j\). Thus, no sufficiently large set is shattered, and the set of sets is a VC class. If \({\mathcal {X}}\) is not finite, then choosing A such that \(x_1<x_2<\cdots <x_n\) and \(t_1<t_2<\cdots <t_n\) any subset may be picked out: For a subset \(B\subseteq A\) choose a so that it jumps to just above \(x_i\) just before \(t_i\) for any i such that \((x_i,t_i)\in B\). As A can be shattered, the set of sets is not a VC class.\(\square \)

Consider imputing a single binary explanatory variable, X, with conditional probability of success given by

$$\begin{aligned} p\{{\mathcal {C}}, G_{{\mathcal {C}}}(Z),\phi \}=\frac{\exp \{\Delta \beta -A(T)\exp (\beta )\}p(\theta )}{\exp \{\Delta \beta -A(T)\exp (\beta )\}p(\theta )+\exp \{-A(T)\}\{1-p(\theta )\}}. \end{aligned}$$

Then the simplest way of simulating X is

$$\begin{aligned} X(\phi )=I[\{{\tilde{U}}\le \Delta \beta -A(T)(\exp (\beta )-1)-\text {logit}\{p(\theta )\}], \end{aligned}$$

with \({\tilde{U}}=\text {logit}(U)\), where U is uniformly distributed. Lemma 5 shows that even if we fix \(\beta \) and \(\theta \), these indicator functions are not indicators of a VC class of sets. It follows that it is not VC if we allow \(\beta \) and \(\theta \) to vary, either. Dudley (1984, Theorem 11.4.1) shows that when a set of indicator functions are not based on a VC class, the corresponding empirical process is not pregaussian. This basically rules out stochastic equicontinuity.

This argument shows that the efficient score process with imputed data is not stochastic equicontinuous in general. It does not rule out—though we find it unlikely—that one might construct another simulation scheme which would be sufficiently “smooth” for a discrete covariate to make the process stochastic equicontinuous.

About this article

Verify currency and authenticity via CrossMark

Cite this article

Eriksson, F., Martinussen, T. & Nielsen, S.F. Large sample results for frequentist multiple imputation for Cox regression with missing covariate data. Ann Inst Stat Math 72, 969–996 (2020).

Download citation


  • Asymptotic distribution
  • Coarsened data
  • Semiparametric
  • Survival
  • Variance estimator