Abstract
We address the testing problem of proportional hazards in the two-sample survival setting allowing right censoring, i.e., we check whether the famous Cox model is underlying. Although there are many test proposals for this problem, only a few papers suggest how to improve the performance for small sample sizes. In this paper, we do exactly this by carrying out our test as a permutation as well as a wild bootstrap test. The asymptotic properties of our test, namely asymptotic exactness under the null and consistency, can be transferred to both resampling versions. Various simulations for small sample sizes reveal an actual improvement of the empirical size and a reasonable power performance when using the resampling versions. Moreover, the resampling tests perform better than the existing tests of Gill and Schumacher and Grambsch and Therneau . The tests’ practical applicability is illustrated by discussing real data examples.
Similar content being viewed by others
References
Andersen PK (1983) Comparing survival distributions via hazard ratio estimates. Scand J Stat 10:77–85
Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer, New York
Begun JM, Reid N (1983) Estimating te relative risk with censored data. J Am Stat Assoc 78:337–341
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependence. Ann Stat 29:1165–1188
Beyersmann J, Di Termini S, Pauly M (2013) Weak convergence of the wild bootstrap for the Aalen–Johansen estimator of the cumulative incidence function of a competing risk. Scand J Stat 40:387–402
Bluhmki T, Dobler D, Beyersmann J, Pauly M (2019) The wild bootstrap for multivariate Nelson–Aalen estimators. Lifetime Data Anal 25:97–127
Collett D (2015) Modelling survival data in medical research. Wiley, New York
Cox DR (1972) Regression models and life-tables. J R Stat Soc Ser B 34:187–220
Chen W, Wang D, Li Y (2015) A class of tests of proportional hazards assumption for left-truncated and right-censored data. J Appl Stat 42:2307–2320
Chung EY, Romano JP (2013) Exact and asymptotically robust permutation tests. Ann Stat 41:484–507
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26
Fleming TR, O’Fallon JR, O’Brien PC, Harrington DP (1980) Modified Kolmogorov–Smirnov test procedures with application to arbitrarily right-censored data. Biometrics 36:607–625
Gill RD (1980) Censoring and stochastic integrals (Ph.D. thesis). Mathematical Centre Tracts 124. Mathematisch Centrum, Amsterdam, V
Gill RD, Schumacher M (1987) A simple test of proportional hazards assumption. Biometrika 74:289–300
Grambsch PM, Therneau TM (1994) Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 81:515–526
Hájek J, Šidák Z, Sen PK (1999) Theory of rank tests, 2nd edn. Academic Press Inc, San Diego
Hall P, Heyde CC (1980) Martingale limit theory and its application. Academic Press, New York
Hall WJ, Wellner JA (1980) Confidence bands for a survival curve from censored data. Biometrika 67:133–143
Harrington DP, Fleming TR (1982) A class of rank test procedures for censored survival data. Biometrika 69:553–565
Hess KR (1995) Graphical methods for assessing violations of the proportional hazards assumption in Cox regression. Stat Med 14:1707–1723
Heller G, Venkatraman ES (1996) Resampling procedures to compare two survival distributions in the presence of right-censored data. Biometrics 52:1204–1213
Hothorn T, Hornik K, van de Wiel MA, Zeileis A (2006) A Lego system for conditional inference. Am Stat 60:257–263
Jacod J, Shiryaev AN (2003) Limit theorems for stochastic processes, 2nd edn. Springer, Berlin
Janssen A (1997) Studentized permutation tests for non-iid hypotheses and the generalized Behrens–Fisher problem. Stat Probab Lett 36:9–21
Janssen A (2005) Resampling student’s t-type statistics. Ann Inst Stat Math 57:507–529
Janssen A, Mayer C-D (2001) Conditional studentized survival tests for randomly censored models. Scand J Stat 28:283–293
Janssen A, Neuhaus G (1997) Two-sample rank tests for censored data with non-predictable weights. J Stat Plan Inference 60:45–59
Janssen A, Pauls T (2003) How do bootstrap and permutation tests work? Ann Stat 31:768–806
Janssen A, Werft W (2004) A survey about the efficiency of two-sample survival tests for randomly censored data. Mitt Math Semin Gießen 254:1–47
Kirk AP, Jain S, Pocock S, Thomas HC, Sherlock S (1980) Late results of the Royal Free Hospital prospective controlled trial of prednisolone therapy in hepatitis B surface antigen negative chronic active hepatitis. Gut 21:78–83
Konietschke F, Pauly M (2012) A studentized permutation test for the nonparametric Behrens–Fisher problem in paired data. Electron J Stat 6:1358–1372
Kraus D (2009) Checking proportional rates in the two-sample transformation model. Kybernetika 2:261–278
Lin DY (1991) Goodness-of-fit analysis for the Cox regression model based on a class of parameter estimators. J Am Stat Assoc 86:725–728
Lin D (1997) Non-parametric inference for cumulative incidence functions in competing risks studies. Stat Med 16:901–910
Lin DY, Wei LJ, Ying Z (1993) Checking the Cox model with cumulative sums of martingale-based residuals. Biometrika 80:557–572
Liu RY (1988) Bootstrap procedures under some non-i.i.d. models. Ann Stat 16:1696–1708
Mammen E (1992) When does bootstrap work? Asymptotic results and simulations. Springer, New York
Mantel N (1966) Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemoth Rep 50:163–170
Neuhaus G (1993) Conditional rank tests for two-sample problem under random censorship. Ann Stat 21:1760–1779
Omelka M, Pauly M (2012) Testing equality of correlation coefficients in two populations via permutation methods. J Stat Plann Inf 142:1396–1406
Pauly M (2011) Discussion about the quality of F-ratio resampling tests for comparing variances. TEST 20:163–179
Pauly M (2011) Weighted resampling of martingale difference arrays with applications. Electron J Stat 5:41–52
Pauly M, Brunner E, Konietschke F (2015) Asymptotic permutation tests in general factorial designs. J R Stat Soc B 77:461–473
Peto R, Peto J (1972) Asymptotically efficient rank invariant test procedures (with discussion). J R Stat Soc A 135:185–206
Prentice RL (1978) Linear rank tests with right censored data. Biometrika 65:167–179
R Core Team (2019) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/. Accessed 14 June 2019
Scheike TH, Martinussen T (2004) On estimation and tests of time-varying effects in the proportional hazards model. Scand J Stat 31:51–62
Schoenfeld D (1982) Partial residuals for the proportional hazards regression model. Biometrika 69:239–241
Schumacher M (1984) Two-sample tests of Cramér–von Mises- and Kolmogorov–Smirnov-type for randomly censored data. Int Stat Rev/Revue Internationale de Statistique 52:263–281
Sengupta D, Bhattacharjee A, Rajeev B (2004) Testing for the proportionality of hazards in two samples against the increasing cumulative hazard ratio alternative. Scand J Stat 31:51–62
Wei LJ (1984) Goodness of fit for proportional hazards model with censored observations. J Am Stat Assoc 79:649–652
Wu C-FJ (1986) Jackknife, bootstrap and other resampling methods in regression analysis. Ann Stat 14:1261–1350
Acknowledgements
The authors thank two referees and an associate editor for increasing the paper’s quality by their helpful comments. Funding was provided by Deutsche Forschungsgemeinschaft (Grant No. PA-2409 5-1).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Proofs
Appendix: Proofs
In the following we give all the proofs. Considering appropriate subsequences we can assume without loss of generality that \(n_1/n\rightarrow \kappa _1\in (0,1)\) and \(n_2/n\rightarrow \kappa _2=1-\kappa _1\). Note that the final statements of all our theorems do not depend on \(\kappa _1\) and \(\kappa _2\) as well as the considered subsequence. For the readers’ convenience we present some technical, known results before giving the actual proofs.
1.1 Preliminaries
One basic tool for our proofs are discrete martingale theorems, see Hall and Heyde (1980) and references therein. For our purposes, a simplified version of Theorem 8.3.33 from Jacod and Shiryaev (2003) is sufficient, see also their Theorem 2.4.36.
Proposition 1
[c.f. Jacod and Shiryaev (2003)] For each \(n\in { {\mathbb {N}} }\) let \((\xi _{n,i})_{1\le i \le n}\) be a martingale difference scheme with respect to some filtration \(({\mathscr {F}}_{n,i})_{0\le i \le n}\), i.e., \({ E }(\xi _{n,i}\vert {\mathscr {F}}_{n,i-1})=0\) for every \(1\le i \le n\). Assume that the conditional Lindeberg condition, \(\sum _{i=1}^{n} { E }(\xi _{n,i}^2{\mathbf {1}}\{|\xi _{n,i}|\ge \varepsilon \}\vert \mathscr {F}_{n,i-1})\rightarrow 0\) in probability for all \(\varepsilon >0\), is fulfilled. Define \(M_n\) by \(M_n(t)=\sum _{i=1}^{[nt]}\xi _{n,i}\)\((t\in [0,1])\). Suppose that the predictable quadratic variation process \(\langle M_n\rangle \) given by \(\langle M_n\rangle (t)=\sum _{i=1}^{[nt]} { E }(\xi _{n,i}^2\vert \mathscr {F}_{n,i-1})\)\((t\in [0,1])\) converges in probability pointwisely to a continuous function \(\sigma ^2:[0,1]\rightarrow [0,\infty )\). Then \(M_n\) converges in distribution on the Skorohod space D[0, 1] to the rescaled Brownian motion \(B\circ \sigma ^2\), where B denotes a classical Brownian motion.
Proposition 2
[c.f. Janssen and Mayer (2001)] Assume that all assumptions of our paper are underlying. Let \({{\widetilde{\delta }}}_n=({\widetilde{\delta }}_{n,1},\ldots ,{{\widetilde{\delta }}}_{n,n})\in \{0,1\}^n\) and \(w_n=(w_{n,1},\ldots ,w_{n,n})\in { {\mathbb {R}} }^n\) such that \(\lim _{n\rightarrow \infty }n^{-1}\sum _{i=1}^n{{\widetilde{\delta }}}_{n,i}w_{n,i}^2\in (0,\infty )\) and \(\max \{|w_{n,i}|:i=1,\ldots ,n\}\le M\in (0,\infty )\) for all \(n\in { {\mathbb {N}} }\). Define for all \(t\in [0,1]\)
where for the latter we set \(\inf \emptyset = 1\). Then \(V_n(\alpha _n(t))/V_n(1)\) converges in probability to t for all \(t\in [0,1]\) and \(V_n(1)^{-1/2}W_n\circ \alpha _n\) tends in distribution to a Brownian motion B on the Skorohod space D[0, 1]. Moreover, the sequences \((V_n(1))_{n\in { {\mathbb {N}} }}\) and \((W_n(1))_{n\in { {\mathbb {N}} }}\) are tight, i.e., we have \(\lim _{t\rightarrow \infty }\limsup _{n\rightarrow \infty } P ( |W_n(1)| \ge t)+ P ( |V_n(1)| \ge t)=0\).
Proof
It is easy to check that due to the assumptions on \(w_n\) and \({{\widetilde{\delta }}}_n\) we have \(0<\liminf _{n\rightarrow \infty }\beta _n^2(1)\le \limsup _{n\rightarrow \infty }\beta _n^2(1)<\infty \) and, thus, condition (15) of Janssen and Mayer (2001) is fulfilled. Moreover, the conditions for the regression coefficients in the paper of Janssen and Mayer (2001) hold for our (rescaled) coefficients \({{\widetilde{c}}}_{(i)}= (n_1n_2/n)^{-1/2} c_{(i)}\). Consequently, we can apply their Theorem 1 and Lemma 3. Note that \(\beta _n^2(1)=1\) is assumed in their Lemma 3 as well as in the proof of their Theorem 1, but this can always be ensured by rescaling the weight coefficients \({{\widetilde{w}}}_{n,i}= w_{n,i}/\beta _n(1)\). Now, the desired distributional convergence of \(V_n(1)^{-1/2}W_n\circ \alpha _n\) follows from their Theorem 1. In the proof of Lemma 3 Janssen and Mayer (2001) showed \(\beta _n^2(\alpha _n(t))/\beta _n^2(1)\rightarrow t\) for all \(t\in [0,1]\) and by their (33) we have \([1/\beta _n^{2}(1)]\sup _{t\in [0,1]}|\beta _n^2(t)-V_n(t)|\rightarrow 0\) in probability. Combining both we obtain the convergence in probability of \(V_n(\alpha _n(t))/V_n(1)\). Moreover, their Theorem 1 implies that \(W_n(1)V_n(1)^{-1/2}\) converges in distribution to a standard normal distributed random variable. Finally, the tightness of \((V_n(1))_{n\in { {\mathbb {N}} }}\) and \((W_n(1))_{n\in { {\mathbb {N}} }}\), respectively, follows from \(\limsup _{n\rightarrow \infty }\beta _n^2(1)<\infty \). \(\square \)
The subsequent result concerning linear rank tests is well-known and can be found, for example, in the book of Hájek et al. (1999). Combining their Theorem 3 in Section 3.3.1 and Chebyshev’s inequality we obtain:
Proposition 3
[c.f. Hájek et al. (1999)] Let \(w_{n,1},\ldots ,w_{n,n}\) be real-valued constants. Introduce the linear rank statistic \(\xi _n^\pi =\sum _{i=1}^{n}w_{n,i}(c_{n,i}^\pi -{{\bar{c}}})\), where \(\bar{c}=n^{-1}\sum _{i=1}^nc_{n,i}^\pi =n_2/n\). Then \({ E }(\xi _n^\pi )=0\) and
In particular, if \( \max _{1\le i \le n}|w_{n,i}|\le M/n\) for all \(n\in { {\mathbb {N}} }\) and some fixed \(M>0\) then \( \text {Var} (\xi _n^\pi )\rightarrow 0\) and \(\xi _n^\pi \) converges in probability to 0.
1.2 Proof of Theorem 1
The following lemma and the corresponding proof are extensions of Lemma 5.3 and its proof of Janssen and Werft (2004).
Lemma 1
Suppose that \(A_2=\vartheta A_1\) for some \(\vartheta >0\). Then \((\xi _{n,i})_{1\le i \le n}\) given by
is a martingale difference scheme with respect to the filtration \(({\mathscr {F}}_{n,i})_{0\le i \le n}\) defined in (6). Moreover, the predictable quadratic variation process \(\langle M_n\rangle \) of the martingale \(M_n\) given by \(M_n(t)=S_n(t)=\sum _{i=1}^{[nt]}\xi _{n,i}\)\((t\in [0,1])\) equals \({{\widehat{\sigma }}}^2\) from (5).
Proof
Fix \(1\le i\le n\). First, observe that \(\sum _{m=i}^nc_{(m)}\) and \(\delta _{(i)}\) are predictable, i.e., \(\mathscr {F}_{n,i-1}\)-measurable. Since \(c_{(i)}\) equals either 0 or 1 it is easy to see that all postulated statements follow from
Clearly, (10) is true in the case \(\delta _{(i)}=0\). Hence, it is sufficient for the proof of (10) to consider events \(A\in {\mathscr {F}}_{n,i-1}\) of the form \(A= \{ \delta _{(i)}=1,\,\delta _{(m)}= \delta _m, \,d_{(m)}= d_m;\;m\le i-1\}\) with constants \( \delta _1, \ldots ,\delta _{i-1}\in \{0,1\}\) and pairwise different \(d_1,\ldots ,d_{i-1}\in \{(j,k): 1\le k \le n_j;\,j=1,2\}\). Introduce \(Z_{j,k}={\mathbf {1}}\{d_{(i)}=(j,k)\}\)\((1\le k \le n_j;\, j=1,2)\). Clearly, \(1-c_{(i)}= \sum _{k=1}^{n_1} Z_{1,k}\) and \(c_{(i)}= \sum _{k=1}^{n_2} Z_{2,k}\). That is why we analyse \(Z_{j,k}\) in the next step. For this purpose we define the set \(B_x=\bigl \{ X_{ d_{1}}<\cdots<X_{d_{i-1}}< x\,\, ,\,\, \delta _{d_{(m)}} = \delta _{m}\,\,\text {for}\,\,m\le i-1\,\bigr \}\quad (x>0)\). From Fubini’s theorem and \(\mathrm dF_j / \mathrm dA_j = 1- F_j\) we obtain for all \((j,k)\in J=\{(j,k): 1\le k \le n_j;\,j=1,2\}{\setminus } \{ d_1,\ldots , d_{i-1}\}\) that
for some \(C^*\ge 0\). Obviously, \( P ( \{ Z_{j,k} = 1 \} \cap A )=0\) for \((j,k)\notin J\). Remind in the following that \(n_1-\sum _{m=1}^{i-1}(1-c_{d_m})\) and \(n_2-\sum _{m=1}^{i-1}c_{d_m}\) elements of J belong to the first and second group, respectively. Thus, we can deduce from summing up all probabilities \( P ( \{ Z_{j,k} = 1 \} \cap A )\) that
Finally, inserting \(C^*\) into (11) and recalling \(c_{(i)}= \sum _{k=1}^{n_2} Z_{2,k}\) proves (10) since \(n_2-\sum _{m=1}^{i-1}c_{d_m}\) equals \(\sum _{m=i}^nc_{(m)}\) under A. \(\square \)
It is easy to see that there is a sequence \((b_n)_{n\in { {\mathbb {N}} }}\) converging to 0 such that \(\max _{1\le i \le n}|\xi _{n,i}|\le b_n\). This implies immediately the conditional Lindeberg condition. Thus, to apply Proposition 1 it remains to show convergence of the corresponding predictable quadratic variation \({{\widehat{\sigma }}}^2\). For this purpose we prefer the following slight modification of its representation in (3)
Clearly, the integrand is bounded. As well known, we can deduce from the extended Glivenko–Cantelli theorem that \(\sup \{|Y_{j}(t)/n-y_j(t)|:t\in [0,\infty )\}\rightarrow 0\) as well as \(\sup \{|w_{n}(t)-w(t)|:t\in [0,s]\}\rightarrow 0\) for every \(s<\tau \), both in probability, where \(y_j=\kappa _j(1-F_j)(1-G_j)\) and \(w=(\kappa _1\kappa _2)^{-1}y_1y_2/( y_1 + \vartheta y_2)^2\). Set \(y=y_1+y_2\). Moreover, by the law of large numbers \(N_j(t)/n\) converges in probability to \(L_j(t)=\kappa _j P (X_{j,1}\le t, \Delta _{j,1}=1)=\kappa _j\int _{[0,t]}(1-G_j)\,\mathrm { d }F_j\)\((t\ge 0; \,j=1,2)\) and, thus, \(N(t)/n\rightarrow L(t)=L_1(t)+L_2(t)\). Since \(|w_{n}|\) is uniformly bounded and \(\mathrm dF_j / \mathrm dA_j = 1- F_j\) we obtain that \(\sigma _n^2(t,\vartheta )\) converges in probability to
Applying Proposition 1 yields that \(S_n(\cdot ,\vartheta )\) converges in distribution to \(B\circ \sigma ^2(\cdot , \vartheta )\) on the Skorohod space D[0, 1], where B is a Brownian motion.
In the next step, we plug-in the estimator \({{\widehat{\vartheta }}}\) of the proportionality factor \(\vartheta \). The statement of Theorem 1 holds for general estimators \({{\widehat{\vartheta }}}\) of the proportionality factor \(\vartheta \) fulfilling the subsequent Assumption I. As already mentioned in Sect. 2, by Gill (1980) \({{\widehat{\vartheta }}}_K\) with \(K=(w\circ {{\widehat{F}}})K_L\) obeys a central limit theorem and, in particular, fulfills the subsequent Assumption I.
Assumption I Let \({{\widehat{\vartheta }}}\) be a positive random variable which is bounded with probability one, i.e., \( P ({{\widehat{\vartheta }}} \le \eta _\vartheta )\rightarrow 1\) for some \(\eta _\vartheta \). Suppose that \(n^{1/2}({{\widehat{\vartheta }}} - \vartheta )\) converges in distribution to a real-valued random variable \(Z_\vartheta \).
Obviously, \({{\widehat{\vartheta }}}\) is a consistent estimator for \(\vartheta \) under Assumption I, i.e., \({{\widehat{\vartheta }}}\) tends in probability to \(\vartheta \). Analogously to the convergence of \({{\widehat{\sigma }}}^2(t,\vartheta )\), we can deduce from the consistency that \({{\widehat{\sigma }}}^2(t,{{\widehat{\vartheta }}})\) converges in probability to \(\sigma ^2(t,\vartheta )\) for every \(t\in [0,1]\). It is easy to check that \(S_n(t,\vartheta ) - S_n(t,{{\widehat{\vartheta }}})\) equals \(Z_{n,\vartheta }{{\widetilde{\sigma }}}_n^2(t,\vartheta )\), where
Clearly, \(Z_{n,\vartheta }\) converges in distribution to \({{\widetilde{Z}}}_\vartheta = (\kappa _1\kappa _2)^{1/2} Z_\vartheta /\vartheta \). Analogously to the argumentation above, \({{\widetilde{\sigma }}}^2(t,\vartheta )\) converges in probability to \(\sigma _n^2(t,\vartheta )\) for all \(t\in [0,1]\). Hence, for every \(t\in [0,1]\)
in probability. Consequently, \((S_n(\cdot ,\vartheta ), {{\widehat{\sigma }}}^2(\cdot ,{{\widehat{\vartheta }}}), R_n(\cdot ,\vartheta ))\) converges in distribution to \((B\circ \sigma ^2(\cdot ,\vartheta ),\sigma ^2 (\cdot ,\vartheta ), 0)\) on \(D[0,1]\times D[0,1]\times D[0,1]\). By the continuous mapping theorem, respecting that \(t\mapsto \sigma ^2(t,\vartheta )\) is continuous and nondecreasing,
Note that the assumptions ensure \(\sigma ^2(1,\vartheta )>0\). Let \(B_0\) be a Brownian bridge on [0, 1]. Since \(B(c^2\cdot )\) and cB for every \(c>0\) as well as \(t\mapsto B(t)-tB(1)\)\((t\in [0,1])\) and \(B_0\) have the same distribution, respectively, we obtain that the distribution of \({{\widetilde{T}}}\) equals the one of T.
1.3 Proof of Theorem 2
Let \({{\widehat{\vartheta }}}={{\widehat{\vartheta }}}_K\) with \(K=(w\circ {{\widehat{F}}})K_L\) for some \(w\in {\mathscr {W}}\). Following the argumentation of the convergence of \({\widehat{\sigma }}^2(t,{{\widehat{\vartheta }}})\) in the proof of Theorem 1 we obtain under \(H_1\) that in probability
where \(y_1,y_2,y,L\) are defined as in the proof of Theorem 1. Similarly, we can deduce that \(n^{-1/2}T_n\) converges in probability under \(H_1\) to
where
Clearly, the consistency of our test follows if \({{\widetilde{T}}}>0\). Contrary to this, suppose that \({{\widetilde{T}}} =0\). Then there is some \(\eta _0\ne 0\) such that \(S(t,\vartheta _0)=\eta _0 \sigma ^2(t,\vartheta _0)\) for all \(t\in [0,1]\). It can easily be seen that
follows for every \(0\le s<t\le 1\) and some \(\eta _1\ne 0\). Assume that \(\eta _1>0\). The case \(\eta _1<0\) can be treated analogously and, thus, we omit it to the reader. Since the right hand side of Eq. (15) is nonnegative and \((y_1+\vartheta _0 y_2){{\widetilde{K}}}/(y_1y_2)\ge 0\) we can deduce
for all \(0\le s<t\le 1\). Note that due to the definition of \(\vartheta _0\), see (14), we have equality in (16) for \(s=0\) and \(t=1\). Hence, we get equality in (16) for all \(0\le s<t\le 1\). But this implies \(A_2(x)=\vartheta _0 A_1(x)\) for all \(x {>} 0\) with \({{\widetilde{K}}}(x)>0\) or, equivalently, for all \(x\in (0,\tau )\).
1.4 Proof of Theorem 3
To distinguish between the processes depending on the original data and the permuted data, respectively, we add a superscript \(\pi \) to these processes if the permutation versions are meant, namely \({{\widehat{A}}}_j^\pi \), \(Y_j^\pi \), \(N_j^\pi \), \(K_L^\pi \), \(K^\pi \), \(S_n^\pi \), \({\widehat{\sigma }}^{2,\pi }\)\((1\le j\le 2)\). Note that the processes N and Y as well as the pooled Kaplan–Meier estimator \({{\widehat{F}}}\) do not depend on the group membership vector.
Assumption II Let \({{\widehat{\vartheta }}}={{\widehat{\vartheta }}}(c^{(n)},\delta ^{(n)})\) be an estimator fulfilling Assumption I, which depends only on the group memberships \(c^{(n)}\) and the censoring status \(\delta ^{(n)}\) of the ordered values. Let \({{\widehat{\vartheta }}}^\pi ={{\widehat{\vartheta }}}(c_n^\pi ,\delta ^{(n)})\) be the corresponding permutation version of the estimator. Moreover, suppose that a certain conditioned tightness assumption is fulfilled for \((n^{1/2}({{\widehat{\vartheta }}}^\pi -1))_{n\in { {\mathbb {N}} }}\), namely that for every subsequence there is a further sub-subsequence such that along this sub-subsequence we have with probability one
At the proof’s end, we verify that Assumption II holds for the estimators considered in the paper.
Lemma 2
Let \(w\in {\mathscr {W}}\). Then the estimator \({{\widehat{\vartheta }}}={{\widehat{\vartheta }}}_K\) with \(K=(w\circ {{\widehat{F}}})K_L\) fulfills Assumption II.
As in the proof of Theorem 1, we obtain that \(n^{-1}\sum _{i=1}^{n}\delta _{(i)}\rightarrow \int \mathrm { d }L\ge L(\tau )>0\) in probability, where L and y were defined there. Now, we start with an arbitrary subsequence of \({ {\mathbb {N}} }\). Since we are interested in the conditional distributional convergence we treat the censoring status \(\delta _{(1)},\ldots ,\delta _{(n)}\) as constants. Considering an appropriate sub-subsequence of the pre-chosen subsequence we can assume that \(\lim _{n\rightarrow \infty }n^{-1}\sum _{i=1}^{n}\delta _{(i)}>0\) and that by Assumption II the sequence \((\sqrt{n}({{\widehat{\vartheta }}}^\pi -1))_{n\in { {\mathbb {N}} }}\) is tight. The latter implies, in particular, \({\widehat{\vartheta }}^\pi \rightarrow 1\) in probability. The proof’s rest goes along the argumentation of the proof of Theorem 1. Nevertheless, we carry out the important steps.
Observe that \(S_n^\pi (t,1)=W_n(t,\delta ^{(n)},w_n)\) and \({\widehat{\sigma }}^{2,\pi }(t,1)=V_n(t,\delta ^{(n)},w_n)\) for \(w_n=(1,\ldots ,1)\). By Proposition 2\({\widehat{\sigma }}^{2,\pi }(1,1)^{-1/2}S_n^\pi \circ \alpha _n\) converge in distribution to a Brownian motion B and \({\widehat{\sigma }}^{2,\pi }(\alpha _n(t),1)/{\widehat{\sigma }}^{2,\pi }(1,1)\) tends in probability to t for all \(t\in [0,1]\), where \(\alpha _n\) is defined as in Proposition 2. Moreover, \(({{\widehat{\sigma }}}^{2,\pi }(1,1))_{n\in { {\mathbb {N}} }}\) is tight. Let us now discuss what happens when we plug-in the estimator \({\widehat{\vartheta }}^\pi \). First, for all \(t\in [0,1]\)
Thus, \({\widehat{\sigma }}^{2,\pi }(\alpha _n(t),{\widehat{\vartheta }}^\pi )/{\widehat{\sigma }}^{2,\pi }(1,1)\) as well as \({\widehat{\sigma }}^{2,\pi }(\alpha _n(t),{\widehat{\vartheta }}^\pi )/{\widehat{\sigma }}^{2,\pi }(1,{\widehat{\vartheta }}^\pi )\) tend to t for all \(t\in [0,1]\). It remains to study \(S_n^\pi (t,1)-S_n^\pi (t,{\widehat{\vartheta }}^\pi )\). This difference equals, compare to the proof of Theorem 1, \(Z_{n}^\pi {\widehat{\sigma }}^{2,\pi }(t)\)\((t\in [0,1])\), where
Analogously to (17), we get \({\widetilde{\sigma }}^{2,\pi }(\alpha _n(t))/{\widehat{\sigma }}^{2,\pi }(1,1)\rightarrow t\) for all \(t\in [0,1]\). Since \((Z_n^\pi )_{n\in { {\mathbb {N}} }}\) and \(({\widehat{\sigma }}^{2,\pi }(1,1))_{n\in { {\mathbb {N}} }}\) are tight we obtain
in probability. Finally, the statement follows from the continuous mapping theorem, compare to the proof of Theorem 1.
Proof (Lemma 2)
As already mentioned in the proof of Theorem 1, \({{\widehat{\vartheta }}}\) fulfills Assumption I. Now, define \(w_{n,i}=w( {{\widehat{F}}}(X_{(i)}))\) and set \(w_n=(w_{n,1},\ldots ,w_{n,n})\). Analogously to the proof of Theorem 1, we obtain that in probability
where y and L are defined in the proof of Theorem 1 as well as F is given by (13). It is well known that for every subsequence of \({ {\mathbb {N}} }\) there exists a further sub-subsequence such that the convergences in (18) and (19) hold with probability one along this sub-subsequence. Since we are interested in the probability conditioned under \(\delta ^{(n)}\) we can treat the censoring status \(\delta _{(1)},\ldots ,\delta _{(n)}\) as constants and, hence, \(w_{n,1},\ldots ,w_{n,n}\) as constants with \(\lim _{n\rightarrow \infty }\sum _{i=1}^nw_{n,i}\delta _{(i)}\in (0,\infty )\) as well as \(\lim _{n\rightarrow \infty }n^{-1}\sum _{i=1}^{[n_2/2]} w_{n,i} \delta _{(i)}\in (0,\infty )\) along an appropriate subsequence. Thus, we can apply Proposition 2 to the numerator of
Note that we always have \(Y^\pi _2(X_{(i)})/Y(X_{(i)})\ge (n_2-1)/(2n)\) for \(i=1,\ldots ,[n_2/2]\). Hence, we can bound the denominator \(D_n\) from below as follows:
where \({{\bar{c}}} = n^{-1}\sum _{i=1}^n c_{(i)}=n_2/n\). Applying Proposition 3 for \({{\widetilde{w}}}_{n,i}= (w_{n,i}\delta _{n,i}/n) {\mathbf {1}}\{ i \le [n_2/2]\}\) yields that the second summand converges in probability to 0. The first summand converges to \(M>0\), say. To sum up, \( P (D_n\ge M/2)\rightarrow 1\). Combining this and the tightness result for the numerator according to Proposition 2 gives us the desired tightness of \((n^{1/2}({{\widehat{\vartheta }}}^\pi -1))_{n\in { {\mathbb {N}} }}\). \(\square \)
1.5 Proof of Theorems 4 and 5
For simplicity we restrict here to estimators \({{\widehat{\vartheta }}}={{\widehat{\vartheta }}}\) of the form \({{\widehat{\vartheta }}}_K\) with \(K=(w\circ {{\widehat{F}}})K_L\) for \(w\in {\mathscr {W}}\). We give the proof of Theorems 4 and 5 simultaneously. First, we verify that (7) holds for some real-valued random variable \(T_0\), say, under \(H_0\) as well as under \(H_1\), respectively, where the distribution of \( T_0\) depends on the underlying distributions \(F_1,F_2,G_1,G_2\). From this convergence we can immediately deduce the bootstrap test’s consistency (i.e., Theorem 5) since we already know that \(T_n\) converges in probability to \(\infty \) under \(H_1\), compare to Theorem 7 of Janssen and Pauls (2003). In the second step, we show that \(T_0\) has the same distribution as T under \(H_0\) and, thus, Theorem 4 follows.
Recall from the proof of Theorem 1 that \({{\widehat{\sigma }}}^2(t,{{\widehat{\vartheta }}})\rightarrow \sigma ^2(t,\vartheta )\) (\(t\in [0,1]\)), \(\sup \{|Y_{j}(x)/n-y_j(x)|: x\in [0,\infty )\}\rightarrow 0\)\((j=1,2)\) and \(N_j(s)/n\rightarrow L_j(s)\) (\( s\ge 0;\,j=1,2\)), all in probability under \(H_0\). It is easy to check that the same is valid under \(H_1\). Restricting to t and s coming from dense subsets of [0, 1] and \([0,\infty )\), respectively, for every subsequence we can construct a further sub-subsequence such that all the convergences mentioned above hold simultaneously with probability one under \(H_0\) as well as under \(H_1\), respectively. Due to the monotonicity and the continuity of the limits the convergences carry over to the whole intervals [0, 1] and \([0,\infty )\), respectively. Regarding (14) \({\widehat{\vartheta }}\) converges in probability to some \(\vartheta >0\) under \(H_0\) as well as under \(H_1\), where \(\vartheta \) equals the proportionality factor under \(H_0\). Since we are interested in the conditional distribution of \(T_n^G\) given the whole data we can treat \({{\widehat{\sigma }}}^2(\cdot ,{{\widehat{\vartheta }}})\), \(Y_j\), \(N_j\) as nonrandom functions and \({{\widehat{\vartheta }}}\) as a constant. Starting with an arbitrary subsequence we can always construct a further sub-subsequence, compare to the explanation above, that along this sub-subsequence \({{\widehat{\sigma }}}^2(t,{{\widehat{\vartheta }}})\rightarrow \sigma ^2(t,\vartheta )\) (\(t\in [0,1]\)), \(\sup \{|Y_{j}(x)/n-y_j(x)|:x \in [0,\infty )\}\rightarrow 0\)\((r=1,2)\) and \(N_j(s)/n\rightarrow L_j(s)\) (\(s\ge 0\)) as well as \({{\widehat{\vartheta }}}\rightarrow \vartheta \). All following considerations are along this sub-subsequence.
Let \(G_{(i)}\) be the multiplier corresponding to \(X_{(i)}\). Clearly, \(G_{(1)},\ldots ,G_{(n)}\) are (given the data) still independent and identical distributed with the same distribution as \(G_{1,1}\). We can now rewrite the statistic \(S_n^G\) as a linear rank statistic:
To obtain the asymptotic behavior of the process \(t\mapsto S_n^G(t,{{\widehat{\vartheta }}})\) (\(t\in [0,1]\)) we apply Proposition 1. In contrast to the two previous proofs, we have already replaced \(\vartheta \) by \({{\widehat{\vartheta }}}\) here. That is why we do not need to discuss the difference \(S_n^G(t,\vartheta )-S_n^G(t,{{\widehat{\vartheta }}})\) at the proof’s end. Let us now introduce the natural filtration \(\mathscr {G}_{n,i}=\sigma (G_{(1)},\ldots ,G_{(i)})\) (\(0\le i \le n\)). Denoting the summands in (20) by \(\xi _{n,i}\)\((1\le i \le n)\) we, clearly, have \({ E }(\xi _{n,i}\vert \mathscr {G}_{n,i-1})={ E }(\xi _{n,i})=0\). To verify the conditional Lindeberg condition, first note that there is a constant \(\eta >0\) independent of i and n such that \(|\xi _{n,i}|\le \eta n^{-1/2}|G_i|\). Hence, it is sufficient to show for all \(\varepsilon >0\) that
tends to 0. This follows immediately from Lebesgue’s dominated limit theorem and \({ E }(G_{(1)}^2)=1\). In the final step we discuss the asymptotic behavior of the predictable quadratic variation process \(t\mapsto \sigma _n^{2,G}(t,{{\widehat{\vartheta }}})\) (\(t\in [0,1]\)) given by
Rewriting \(\sigma _n^{2,G}(t,{{\widehat{\vartheta }}})\) in terms of the (nonrandom) functions \(Y_j\), \(N_j\) we obtain for \(t\in [0,1]\)
Applying Proposition 1 yields distributional convergence of \(S_n^G(\cdot ,{{\widehat{\vartheta }}})\) to the rescaled Brownian motion \(B(\sigma ^{2,G}(\cdot ,\vartheta ))\) on the Skorohod space D[0, 1]. Similarly to the proofs of Theorems 1 and 3, we can conclude from the continuous mapping theorem that
Finally, it remains to show \(\sigma ^{2,G}(t,{{\widehat{\vartheta }}})= \sigma ^{2}(t,\vartheta )\) under \(H_0\). By inserting \(L_j(t)=\int y_j(1+(\vartheta -1){\mathbf {1}}\{j=2\})\,\mathrm { d }A_1\) in the formulas of \(\sigma ^{2,G}(t,{{\widehat{\vartheta }}})\) and \(\sigma ^{2}(t,\vartheta )\) the equality follows immediately.
Rights and permissions
About this article
Cite this article
Ditzhaus, M., Janssen, A. Bootstrap and permutation rank tests for proportional hazards under right censoring. Lifetime Data Anal 26, 493–517 (2020). https://doi.org/10.1007/s10985-019-09487-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-019-09487-9