Abstract
The hazard ratio derived from the Cox model is a commonly used summary statistic to quantify a treatment effect with a time-to-event outcome. The proportional hazards assumption of the Cox model, however, is frequently violated in practice and many alternative models have been proposed in the statistical literature. Unfortunately, the regression coefficients obtained from different models are often not directly comparable. To overcome this problem, we propose a family of weighted hazard ratio measures that are based on the marginal survival curves or marginal hazard functions, and can be estimated using readily available output from various modeling approaches. The proposed transformation family includes the transformations considered by Schemper et al. (Statist Med 28:2473–2489, 2009) as special cases. In addition, we propose a novel estimate of the weighted hazard ratio based on the maximum departure from the null hypothesis within the transformation family, and develop a Kolmogorov\(-\)Smirnov type of test statistic based on this estimate. Simulation studies show that when the hazard functions of two groups either converge or diverge, this new estimate yields a more powerful test than tests based on the individual transformations recommended in Schemper et al. (Statist Med 28:2473–2489, 2009), with a similar magnitude of power loss when the hazards cross. The proposed estimates and test statistics are applied to a colorectal cancer clinical trial.
Similar content being viewed by others
References
Abadi A, Amanpour F, Bajdik C, Yavari P (2012) Breast cancer survival analysis: applying the generalized gamma distribution under different conditions of the proportional hazards and accelerated failure time assumptions. Int J Prev Med 3(9):644
Abrahamowicz M, Mackenzie T, Esdaile JM (1996) Time-dependent hazard ratio: modeling and hypothesis testing with application in lupus nephritis. J Am Statist Assoc 91(436):1432–1439
Banerjee T, Chen MH, Dey DK, Kim S (2007) Bayesian analysis of generalized odds-rate hazards models for survival data. Lifetime Data Anal 13(2):241–260
Cox D (1972) Regression models and life-tables (with Discussion). J Royal Statist Soc Ser B 34:187–220
Fan J, Yao Q (2003) Nonlinear time series. Springer, Berlin
Gill RD, van der Vaart AW (1993) Non- and semi-parametric maximum likelihood estimators and the von Mises method: II. Scand J Statist 20(4):271–288
Gray RJ (1992) Flexible methods for analyzing survival data using splines, with applications to breast cancer prognosis. J Am Statist Assoc 87(420):942–951
Hastie T, Tibshirani R (1993) Varying-coefficient models. J Royal Statist Soc Ser B (Methodological) 55(4):757–796
Hess KR (1994) Assessing time-by-covariate interactions in proportional hazards regression models using cubic spline functions. Statist Med 13(10):1045–1062
Kalbfleisch JD, Prentice RL (1981) Estimation of the average hazard ratio. Biometrika 68:105–112
Kooperberg C, Stone CJ, Truong YK (1995) Hazard regression. J Am Statist Assoc 90(429):78–94
Kosorok MR (2007) Introduction to empirical processes and semiparametric inference. Springer, Berlin
Lininger L, Gail MH, Green SB, Byar DP (1979) Comparison of four tests for equality of survival curves in the presence of stratification and censoring. Biometrika 66:417–428
Moeschberger ML, Klein JP (2003) Survival analysis: techniques for censored and truncated data. Springer, Berlin
Pepe MS, Fleming TR (1989) Weighted Kaplan–Meier statistics: a class of distance tests for censored survival data. Biometrics 45(2):497–507
Robins J, Tsiatis A (1992) Semiparametric estimation of an accelerated failure time model with time-dependent covariates. Biometrika 79:311–20
Royston P, Parmar M (2002) Flexible parametric proportional-hazards and proportional-odds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Statist Med 21:2175–2197
Schemper M, Wakounig S, Heinze G (2009) The estimation of average hazard ratios by weighted Cox regression. Statist Med 28:2473–2489
Shen Y, Fleming TR (1997) Weighted mean survival test statistics: a class of distance tests for censored survival data. J Royal Statist Soc 59(1):269–280
Therneau T, Grambsch P (2000) Modeling survival data: extending the cox model. Springer, New York
Wei L (1992) The accelerated failure time model: a useful alternative to the cox regression model in survival analysis. Statist Med 11:1871–1879
Xu R, O’Quigley J (2000) Estimating average regression effect under non-proportional hazards. Biostatistics 1:423–439
Zeng D, Chen Q, Chen MH, Ibrahim J, Group AR (2012) Estimating treatment effects with treatment crossovers via semi-competing risks models: an application to a colorectal cancer study. Biometrika 99:167–184
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
We state here the conditions needed to establish consistency of \(\hat{\theta }_G\) and to derive its asymptotic distribution.
-
(C.1)
\(G\) is thrice-continuously differentiable and is strictly increasing. Additionally, \(\varOmega (t;s,v)\) is twice-continuously differentiable in \((t,s,v)\).
-
(C.2)
\(\sqrt{n}(\hat{S}_1-S_{10},\hat{S}_0-S_{00})\) converges in distribution to a bivariate mean-zero Gaussian process, denoted by \((\mathcal{G}_1, \mathcal{G}_0)\) in \(BV[0,\tau ]\times BV[0,\tau ]\), where \(BV[0,\tau ]\) denotes the spaces consisting of functions that have finite total variation in \([0,\tau ]\) and \(S_{k0}\) is the true survival function in treatment arm \(k\). Here, \(\tau \) is the study duration.
-
(C.3)
\(S_{k0}\) is strictly decreasing and thrice-continuously differentiable in \([0,\tau ]\). Moreover, \(S_{k0}(\tau )>0\).
-
(C.4)
The kernel function \(K(x)\) is differentiable, symmetric with respect to 0, and has compact support on \([-1,1]\).
-
(C.5)
The bandwidth \(a_n\) satisfies \(n a_n^2\rightarrow \infty \) and \(na_n^4\rightarrow 0\).
-
(C.6)
Conditional on the data, \(\sqrt{n} (\hat{S}_1^*-\hat{S}_1, \hat{S}_0^*-\hat{S}_0)\) converges in distribution to \((\mathcal{G}_1, \mathcal{G}_0)\), where \((\hat{S}_1^*, \hat{S}_0^*)\) are resampled statistics for \((\hat{S}_1, \hat{S}_0)\).
Condition (C.6) is an assumption regarding the consistency and asymptotic distribution of the estimates generated by the boostrap procedure. Chapter 20 of Kosorok (2007) validates this assumption for several survival modeling techniques including the Cox proportional hazards model.
Lemma 1
Under Conditions (C.1)–(C.5), \(\sup _{t\in [0,\tau ]}\vert \hat{h}_k(t)-h_{k0}(t)\vert \rightarrow _p 0, \ \ k=0,1,\) where \(h_{k0}\) denotes the true hazard function in treatment arm \(k\).
Proof (of Lemma 1)
First, we note that \(\hat{h}_k(t)=-a_n^{-1}\int K(x) d\log \hat{S}_k(t+a_n x).\) By carrying out an integration by parts, we can rewrite \(\hat{h}_k(t)\) as \(\hat{h}_k(t)=\int a_n^{-1}\log \hat{S}_k(t+a_n x) K'(x)dx.\) Moreover, we can continuously extend defining \(\hat{S}_k\) and \(S_{k0}\) to \([-a_n,\tau +a_n]\) so that (C.2) still holds. Thus, (C.2) implies \(\sup _{t\in [-a_n, \tau +a_n]} \vert \hat{S}_k(t)- S_{k0}(t)\vert =O_p(n^{-1/2}).\) (C.3) further gives \(\sup _{t\in [-a_n, \tau +a_n]} \vert \log \hat{S}_k(t)-\log S_{k0}(t)\vert =O_p(n^{-1/2}).\) Therefore,
goes to 0 by condition (C.5). This proves the lemma. \(\square \)
Proof (of Theorem 1)
Using Lemma 1 and noting \(\inf _{t\in [0,\tau ]} h_0(t)>0\), we obtain that uniformly in \(t\in [0,\tau ]\), as \(n \rightarrow \infty , G\left\{ \frac{\hat{h}_1(t)}{\hat{h}_0(t)}\right\} \rightarrow _p G\left\{ \frac{h_{10}(t)}{h_{00}(t)}\right\} ,\) and \(\varOmega \{t; \hat{S}_0(t), \hat{S}_1(t)\}\rightarrow _p \varOmega \{t;S_{00}(t), S_{10}(t)\}.\) Thus, it is clear that \(\hat{\theta }_G \rightarrow _p \theta _G\) as \(n \rightarrow \infty \). This establishes consistency.
To derive the asymptotic distribution of \(\hat{\theta }_G\), by the mean-value theorem, we obtain
Using the mean-value theorem again, we have
where \(o_p(1)\) is a random element converging in probability to zero uniformly in \(t\in [0,\tau ]\).
For convenience, we denote \(H=(G^{-1})'\{\int Q_0(t)Q_1(t)dt\}\), \(Q_0(t)= G\left\{ \frac{h_{10}(t)}{h_{00}(t)}\right\} , Q_1(t)= \varOmega \{t; S_{00}(t), S_{10}(t)\}\), \(Q_2(t)=\frac{\partial }{\partial S_0(t)} \varOmega \{t; S_{00}(t), S_{10}(t)\}\), and \(Q_3(t)=\frac{\partial }{\partial S_1(t)} \varOmega \{t; S_{00}(t), S_{10}(t)\}\). Then, after combining the above results, we obtain
The last two terms on the right-hand side both take the form \(n^{1/2} \int [A(t)+o_p(1)](\hat{S}_k(t)-S_{k0}(t))dt\) for some bounded function \(A(t)\) so that they converge to a normal distribution by condition (C.2). Thus, we only focus on the first two terms, namely (I) and (II), on the right-hand side. Using the definition of \(\hat{h}_1(t)\), we can rewrite the first term (I) as
Since
and \(na_n^4\rightarrow 0\), (I) becomes
which is also equal to
However, since
it follows that
Similarly, we obtain
Thus, we have
where \(A_0(t)=H G'\left\{ \frac{h_{10}(t)}{h_{00}(t)}\right\} Q_1(t)h_{10}(t)/h_{00}(t)^2\), \(A_1(t)=-H G'\left\{ \frac{h_{10}(t)}{h_{00}(t)}\right\} Q_1(t)/h_{00}(t)\), \(B_0(t) =H Q_0(t)Q_2(t)\), and \(B_1(t)=H Q_0(t)Q_3(t)\). Hence, Theorem 1 follows from condition (C.2). \(\square \)
Proof of Theorem 2
Let the weight function \(\varOmega (t)\) be independent of \(S_0(t)\) and \(S_1(t)\) and satisfy \(\int \varOmega (t) dt=1\). We write the hazard function of the treatment arm as \(h_1(t)/h_0(t)=1+\epsilon \lambda (t)\), where \(\lambda (t)\) is a function of \(t\). When \(h_1(t)\) is in the neighborhood of the null hypothesis with \(h_1(t)=h_0(t)\), i.e., \(\epsilon \) is close to zero, the Taylor’s series expansion of \(\theta _a\) is given by
Moreover, according to (15), the variance of \(\hat{\theta }_a\) around the null hypothesis can be written as \((G_a'(t))^2 B\), where \(B\) is a positive value independent of \(a\). Therefore, the local power of \(\theta _a\) can be written as
For the transformation family in (2) with \(a \in [-1,1]\), we can optimize the local power by maximizing \(|(G_a^{-1})'\{G_a(1)\}|=|(1+a)^{1+a}|\) for \(a \in [-1,1]\), resulting in an optimal value at \(a=1\).
When \(\int \varOmega (t) \lambda (t) dt=0\) for the crossing hazards case, we need a higher order Taylor’s series expansion, given by
In this case, the local power of \(\theta _a\) is given by
where \(G_a''(x)=-\left( \frac{a+1}{a+x} \right) G_a'(x)\). Again, this local power is maximized at \(a=1\). \(\square \)
Proof of Theorem 3
The proof is based on the same linearization given in equation (15) but on the right hand side of equation (15), expressions \(A_1(t), A_0(t), B_1(t)\), and \(B_0(t)\) are indexed by \(a \in [0,1]\). Additionally, \(o_p(1)\) on the right hand side of (15) converges in probability to zero uniformly in \(a\). It is easy to check from the explicit expressions of \(A_1, A_0, B_1, B_0\) that they all belong to a bounded set in \(BV[0,\tau ]\) for any \(a \in [0,1]\). Thus, condition (C.2) and the results in Gill and Vaart (1993) yield that \(n^{1/2} (\hat{\theta }_a-\theta _a)\), as a stochastic process indexed by \(a \in [0,1]\), converges weakly to a Gaussian process. Theorem 3 thus follows from the continuity theorem. \(\square \)
Rights and permissions
About this article
Cite this article
Chen, Q., Zeng, D., Ibrahim, J.G. et al. Quantifying the average of the time-varying hazard ratio via a class of transformations. Lifetime Data Anal 21, 259–279 (2015). https://doi.org/10.1007/s10985-014-9301-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-014-9301-0