Skip to main content
Log in

Estimation of a Concordance Probability for Doubly Censored Time-to-Event Data

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Evaluating the relationship between a response variable and explanatory variables is important to establish better statistical models. Concordance probability is one measure of this relationship and is often used in biomedical research. Concordance probability can be seen as an extension of the area under the receiver operating characteristic curve. In this study, we propose estimators of concordance probability for time-to-event data subject to double censoring. A doubly censored time-to-event response is observed when either left or right censoring may occur. In the presence of double censoring, existing estimators of concordance probability lack desirable properties such as consistency and asymptotic normality. The proposed estimators consist of estimators of the left-censoring and the right-censoring distributions as a weight for each pair of cases, and reduce to the existing estimators in special cases. We show the statistical properties of the proposed estimators and evaluate their performance via numerical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Chang MN (1990) Weak convergence of a self-consistent estimator of the survival function with doubly censored data. Ann Stat 18:391–404

    Article  MathSciNet  Google Scholar 

  2. Chang MN, Yang GL (1987) Strong consistency of a nonparametric estimator of the survival function with doubly censored data. Ann Stat 15:1536–1547

    Article  MathSciNet  Google Scholar 

  3. Cook NR (2007) Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 115:928–935

    Article  Google Scholar 

  4. DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837–845

    Article  Google Scholar 

  5. Gönen M, Heller G (2005) Concordance probability and discriminatory power in proportional hazards regression. Biometrika 92:965–970

    Article  MathSciNet  Google Scholar 

  6. Hara M, Sakata Y, Nakatani D, Suna S, Nishino M, Sato H, Kitamura T, Nanto S, Hamasaki T, Hori M, Komuro I (2016) Subclinical elevation of high-sensitive troponin T levels at the convalescent stage is associated with increased 5-year mortality after ST-elevation myocardial infarction. J Cardiol 67:314–320

    Article  Google Scholar 

  7. Harrell FE, Lee KL, Mark DB (1996) Tutorial in biostatistics: multivariate prognostic models: issues in developing models evaluating assumptions and adequacy and measuring and reducing errors. Stat Med 15:361–387

    Article  Google Scholar 

  8. Hilden J, Gerds TT (2014) A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med 33:3405–3414

    Article  MathSciNet  Google Scholar 

  9. Ji S, Peng L, Cheng Y, HuiChuan L (2012) Quantile regression for doubly censored data. Biometrics 68:101–112

    Article  MathSciNet  Google Scholar 

  10. Julià O, Gómez G (2011) Simultaneous marginal survival estimators when doubly censored data is present. Lifetime Data Anal 17:347–372

    Article  MathSciNet  Google Scholar 

  11. Kyle RT, Rajkumar TV, Offord J, Larson D, Plevak M, Melton LJ III (2002) A long-terms study of prognosis in monoclonal gammopathy of undetermined significance. New Engl J Med 346:564–569

    Article  Google Scholar 

  12. Kim Y, Kim B, Jang W (2010) Asymptotic properties of the maximum likelihood estimator for the proportional hazards model with doubly censored data. J Multivar Anal 101:1339–1351

    Article  MathSciNet  Google Scholar 

  13. Klein JP, Moeschberger ML (2003) Survival analysis: techniques for censored and truncated data. Springer, New York

    MATH  Google Scholar 

  14. Mantel N (1967) Ranking procedures for arbitrarily restricted observation. Biometrics 23:65–78

    Article  Google Scholar 

  15. Nolan D, Pollard D (1987) \(U\)-processes: rates of convergence. Ann Stat 15:780–799

    Article  MathSciNet  Google Scholar 

  16. Nolan D, Pollard D (1988) Functional limit theorems for \(U\)-processes. Ann Stat 16:1291–1298

    MathSciNet  MATH  Google Scholar 

  17. Pencina MJ, D’Agostino RB (2004) Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Stat Med 23:2109–2123

    Article  Google Scholar 

  18. Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, New York

    MATH  Google Scholar 

  19. Pepe MS, Janes H, Li CI (2014) Net risk reclassification p values: valid or misleading? J Natl Cancer Inst 106:dju041

    Article  Google Scholar 

  20. Pepe MS, Thompson LT (2000) Combining diagnostic test results to increasing accuracy. Biostatistics 1:123–140

    Article  Google Scholar 

  21. Peto R (1973) Experimental survival curves for interval-censored data. J R Stat Soc C 22:86–91

    Google Scholar 

  22. R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Ann Stat 4:1317–1334. http://www.R-project.org/

  23. Tsai WY, Crowley J (1985) A large sample study of generalized maximum likelihood estimators from incomplete data via self-consistency. Ann Stat 4:1317–1334

    Article  MathSciNet  Google Scholar 

  24. Turnbull BW (1974) Nonparametric estimation of a survivorship function with doubly censored data. J Am Stat Assoc 69:169–173

    Article  MathSciNet  Google Scholar 

  25. Turnbull BW (1976) The empirical distribution function with arbitrarily grouped censored and truncated data. J R Stat Soc B 38:290–295

    MathSciNet  MATH  Google Scholar 

  26. Uno H, Cai H, Pencina MJ, D’Agostino RB, Wei LJ (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30:1105–1117

    MathSciNet  Google Scholar 

  27. Zhang C-H, Li X (1996) Linear regression with doubly censored data. Ann Stat 24:2720–2743

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors are grateful to Drs. Yasuhiko Sakata, Daisaku Nakatani, and Yasushi Sakata for allowing us to utilize the dataset used in [6]. The authors would like to acknowledge the associate editor and anonymous reviewers for very useful comments and suggestions that improve the presentation of the paper. K. Hayashi is supported by JSPS KAKENHI (Grant-in-Aid for Scientific Research) Grant Number 15K15950. S. Shimizu is supported by JSPS KAKENHI (Grant-in-Aid for Scientific Research) Grant Number 70423085.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kenichi Hayashi.

Appendix

Appendix

We shall show the consistency and asymptotic normality of \(\widehat{C}_{\mathrm{D}}^{(1)}(\hat{h})\) in this section. Since we can also show these properties for \(\widehat{C}_{\mathrm{D}}^{(2)}(h)\) in the same way, the corresponding proofs are omitted.

To show the weak consistency, note that

$$\begin{aligned}&\mathrm{E}\left[ \varGamma _i\varDelta _i\varGamma _j\left\{ F(X_i)\bar{G}(X_i)^2F(X_j)\right\} ^{-1}\mathbb {I}_{{\varvec{\tau }}}\left\{ X_i<X_j\right\} \right] \\&\quad = \mathrm{E}\left[ \mathbb {I}_{{\varvec{\tau }}}\left\{ L_i<T_i\right\} \mathbb {I}_{{\varvec{\tau }}}\left\{ T_i<R_i\right\} \mathbb {I}_{{\varvec{\tau }}}\left\{ L_j<T_j\right\} \left\{ F(X_i)\bar{G}(X_i)^2F(X_j)\right\} ^{-1}\mathbb {I}_{{\varvec{\tau }}}\left\{ T_i<\widetilde{X}_j\right\} \right] \\&\quad = \mathrm{E}\left[ \mathbb {I}_{{\varvec{\tau }}}\left\{ L_i<T_i\right\} \mathbb {I}_{{\varvec{\tau }}}\left\{ T_i<R_i\right\} \mathbb {I}_{{\varvec{\tau }}}\left\{ L_j<\widetilde{X}_j\right\} \left\{ F(T_i)\bar{G}(T_i)^2F(\widetilde{X}_j)\right\} ^{-1}\mathbb {I}_{{\varvec{\tau }}}\left\{ T_i<\widetilde{X}_j\right\} \right] \\&\quad = \mathrm{E}\Big [\left\{ F(T_i)\bar{G}(T_i)^2F(\widetilde{X}_j)\right\} ^{-1}\mathbb {I}_{{\varvec{\tau }}}\left\{ T_i<\widetilde{X}_j\right\} \\&\quad \mathrm{E}\left[ \mathbb {I}_{{\varvec{\tau }}}\left\{ L_i<T_i\right\} \mathbb {I}_{{\varvec{\tau }}}\left\{ T_i<R_i\right\} \mathbb {I}_{{\varvec{\tau }}}\left\{ L_j<\widetilde{X}_j\right\} \Big |T_i,\widetilde{X}_j\right] \Big ]\\&\quad = \mathrm{E}\left[ \left\{ F(T_i)\bar{G}(T_i)^2F(\widetilde{X}_j)\right\} ^{-1}\mathbb {I}_{{\varvec{\tau }}}\left\{ T_i<\widetilde{X}_j\right\} F(T_i)\bar{G}(T_i)F(\widetilde{X}_j)\right] \ \ (\mathrm{by independence})\\&\quad = \mathrm{E}\left[ \bar{G}(T_i)^{-1}\bar{G}(T_i)\mathbb {I}_{{\varvec{\tau }}}\left\{ T_i<T_j\right\} \right] = \mathrm{E}\left[ \mathbb {I}_{{\varvec{\tau }}}\left\{ T_i<T_j\right\} \right] , \end{aligned}$$

where \(\widetilde{X}_j=\max (T_j,R_j)\). Then, the denominator of \(\widehat{C}_{\mathrm{D}}^{(1)}(h)\),

\(\displaystyle \frac{1}{n(n-1)}\sum _{i,j}\varGamma _i\varDelta _i\varGamma _j\mathbb {I}_{{\varvec{\tau }}}\left\{ X_i<X_j\right\} \left( \widehat{F}(X_i)\widehat{\bar{G}}(X_i)^2\widehat{F}(X_j)\right) ^{-1}\), converges to \(\mathrm{E}\, [ \mathbb {I}_{{\varvec{\tau }}}\{T_1<T_2\}]\) in probability by the uniform consistency of \(\widehat{F}(\cdot )\) and \(\widehat{\bar{G}}(\cdot )\) [2] and a law of large numbers for U-processes [15]. Similarly, it can also be shown that

$$\begin{aligned}&\frac{1}{n(n-1)}\sum _{i,j}\varGamma _i\varDelta _i\varGamma _j\mathbb {I}\left\{ \hat{h}({\varvec{Z}}_i)>\hat{h}({\varvec{Z}}_j)\right\} \mathbb {I}_{{\varvec{\tau }}}\left\{ X_i<X_j\right\} \left( \widehat{F}(X_i)\widehat{\bar{G}}(X_i)^2\widehat{F}(X_j)\right) ^{-1} \\&\quad \overset{p}{\rightarrow }\mathrm{E}\left[ \mathbb {I}\left\{ h^*({\varvec{Z}}_1)>h^*({\varvec{Z}}_2)\right\} \mathbb {I}_{{\varvec{\tau }}}\left\{ T_1<T_2\right\} \right] . \end{aligned}$$

Then, the consistency of \(\widehat{C}_{\mathrm{D}}^{(1)}(\hat{h})\) follows from Slutsky’s lemma.

To show the asymptotic normality of \(\widehat{C}_{\mathrm{D}}^{(1)}(\hat{h})\), we first consider asymptotic behavior of the estimator with fixed \(h=h(\cdot ;{\varvec{\beta }})\). For notational brevity, hereafter the following symbols are used.

$$\begin{aligned}&\varLambda _{ij}=\varGamma _i\varDelta _i\varGamma _j,\ J_{ij}(h)=\mathbb {I}\left\{ h({\varvec{Z}}_i)>h({\varvec{Z}}_j)\right\} ,\ I_{ij}^{{\varvec{\tau }}}=\mathbb {I}_{{\varvec{\tau }}}\left\{ X_i<X_j\right\} ,\\&W_{ij}=\left\{ F(X_i)\bar{G}(X_i)^2F(X_j)\right\} ^{-1}, \widehat{W}_{ij}=\left\{ \widehat{F}(X_i)\widehat{\bar{G}}(X_i)^2\widehat{F}(X_j)\right\} ^{-1}. \end{aligned}$$

For \(C^*(h)=\mathrm{P}\left[ h({\varvec{Z}}_1)>h({\varvec{Z}}_2)|T_1<T_2, T_1<\tau _{\mathrm{U}}, \tau _{\mathrm{L}}<T_2\right] \), note that

$$\begin{aligned} \mathcal {W}(h):= & {} \sqrt{n}\left( \widehat{C}_{\mathrm{D}}^{(1)}(h)-C^*(h)\right) \\= & {} \sqrt{n}\left( \frac{\sum _{i,j} \varLambda _{ij}I_{ij}^{{\varvec{\tau }}}\{J_{ij}(h)-C^*(h)\} W_{ij}}{\sum _{i,j}\varLambda _{ij}I_{ij}^{{\varvec{\tau }}}\widehat{W}_{ij}}\right) \\&+ \,\sqrt{n}\left( \frac{\sum _{i,j} \varLambda _{ij}I_{ij}^{{\varvec{\tau }}}\{J_{ij}(h)-C^*(h)\} (\widehat{W}_{ij}-W_{ij})}{\sum _{i,j}\varLambda _{ij}I_{ij}^{{\varvec{\tau }}}\widehat{W}_{ij}}\right) \\=: & {} \mathcal {W}_1(h) + \mathcal {W}_2(h). \end{aligned}$$

Remark that \(\mathcal {W}_1(h)\) and \(\mathcal {W}_2(h)\) correspond to the variance component of the U-statistic and weights \(\widehat{W}_{ij}\), respectively. It follows from the uniform consistency of \(\widehat{F}\) and \(\widehat{\bar{G}}\) and a functional limit theorem for U-processes [16] that

$$\begin{aligned} \mathcal {W}_1(h) = n^{-3/2}\frac{\sum _{i,j}\varLambda _{ij}I_{ij}^{{\varvec{\tau }}}(J_{ij}(h)-C^*(h))}{p({\varvec{\tau }})} + o_p(1),\quad n\rightarrow \infty , \end{aligned}$$
(4)

where \(p({\varvec{\tau }})=\mathrm{P}\left[ T_1<T_2,T_1<\tau _{\mathrm{U}}, \tau _{\mathrm{L}}<T_2\right] \). Moreover note that

$$\begin{aligned} \mathcal {W}_2(h) = \int _{0}^{\tau _{\mathrm{U}}}\int _{\tau _{\mathrm{L}}}^{M}\sqrt{n}\left( \frac{\widehat{W}(s,t)}{W(s,t)}-1\right) \mathrm{d}{\hat{\gamma }}(s,t,h), \end{aligned}$$
(5)

where \(W(s,t)=\left( F(s)\bar{G}(s)^2F(t)\right) ^{-1}\), \(\widehat{W}(s,t)=\left( \widehat{F}(s)\widehat{\bar{G}}(s)^2\widehat{F}(t)\right) ^{-1}\), and

$$\begin{aligned} {\hat{\gamma }}(s,t,h)=\sum _{i,j}\varLambda _{ij}I_{ij}^{{\varvec{\tau }}}(J_{ij}(h)-C^*(h))W_{ij}\mathbb {I}\left\{ X_i\le s,X_j\le t\right\} \Big /\sum _{i,j}\varLambda _{ij}I_{ij}^{{\varvec{\tau }}}\widehat{W}_{ij}. \end{aligned}$$

By a uniform law of large numbers for U-processes [15] and the uniform consistency of \(\widehat{F}\) and \(\widehat{\bar{G}}\), we obtain that

$$\begin{aligned} \sup \{ |{\hat{\gamma }}(s,t,h)-\gamma (s,t,h)|\,:\, s\in [0,\tau _{\mathrm{U}}], t\in [\tau _{\mathrm{L}},M], {\varvec{\beta }}\}\overset{p}{\rightarrow }0, \end{aligned}$$
(6)

where \(\gamma (s,t,h)=p((s,t)')(\widehat{C}_{\mathrm{D}}^{(1)}(h)-C^*(h))/p({\varvec{\tau }})\).

Therefore, the asymptotic distribution for \(\widehat{C}_{\mathrm{D}}^{(1)}(h)\) is obtained if we show that \(\sqrt{n}\left( \frac{\widehat{W}(s,t)}{W(s,t)}-1\right) \) converges in distribution to a zero-mean Gaussian process (indexed by s and t).

Note that

$$\begin{aligned} \sqrt{n}\left( \frac{\widehat{W}(s,t)}{W(s,t)}-1\right)= & {} \sqrt{n}\left( \frac{\widehat{F}(s)-F(s)}{F(s)}\right) \frac{F(s)\bar{G}(s)^2F(t)}{\widehat{F}(s)\widehat{\bar{G}}(s)^2\widehat{F}(t)}\\&+\,\sqrt{n}\left( \frac{\widehat{\bar{G}}(s)^2-\bar{G}(s)^2}{\bar{G}(s)^2}\right) \frac{\bar{G}(s)^2F(t)}{\widehat{\bar{G}}(s)^2\widehat{F}(t)}\nonumber \\&+\,\sqrt{n}\left( \frac{\widehat{F}(t)-F(t)}{F(t)}\right) \frac{F(t)}{\widehat{F}(t)}\nonumber \\= & {} \sqrt{n}\left( \frac{\widehat{F}(s)-F(s)}{F(s)} +2\frac{\widehat{\bar{G}}(s)-\bar{G}(s)}{\bar{G}(s)} +\frac{\widehat{F}(t)-F(t)}{F(t)}\right) \\&+\,\, o_p(1). \end{aligned}$$

As in Remark, \(\sqrt{n}(\widehat{F}-F,\widehat{\bar{G}}-\bar{G})\) jointly converges in distribution to a two-dimensional zero-mean Gaussian process. Hence any finite-dimensional distributions of \(\sqrt{n}(\widehat{F}-F,\widehat{\bar{G}}-\bar{G})\) converges in distribution to the corresponding multidimensional normal distribution. To complete the proof, we consider the asymptotic expansion of \(\mathcal {W}(\hat{h})\) with respect to \({\hat{\beta }}\):

$$\begin{aligned} \mathcal {W}(\hat{h})=\mathcal {W}(h^*)+\nabla C^*(h^*)^\top \sqrt{n}(\hat{{\varvec{\beta }}}-{\varvec{\beta }}^*)+o_p(1), \end{aligned}$$

where \(\nabla C^*(h^*)\) is the partial derivative of \(C^*(h)\) with respect to \({\varvec{\beta }}\) evaluated at \({\varvec{\beta }}={\varvec{\beta }}^*\). By the same argument as in Appendix of [26]; see especially (A1) and (A2) in [26], we can conclude the asymptotic normality of \({{\mathcal {W}}}(\hat{h})\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hayashi, K., Shimizu, Y. Estimation of a Concordance Probability for Doubly Censored Time-to-Event Data. Stat Biosci 10, 546–567 (2018). https://doi.org/10.1007/s12561-018-9216-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-018-9216-5

Keywords

Navigation