Abstract
Evaluating the relationship between a response variable and explanatory variables is important to establish better statistical models. Concordance probability is one measure of this relationship and is often used in biomedical research. Concordance probability can be seen as an extension of the area under the receiver operating characteristic curve. In this study, we propose estimators of concordance probability for time-to-event data subject to double censoring. A doubly censored time-to-event response is observed when either left or right censoring may occur. In the presence of double censoring, existing estimators of concordance probability lack desirable properties such as consistency and asymptotic normality. The proposed estimators consist of estimators of the left-censoring and the right-censoring distributions as a weight for each pair of cases, and reduce to the existing estimators in special cases. We show the statistical properties of the proposed estimators and evaluate their performance via numerical experiments.
Similar content being viewed by others
References
Chang MN (1990) Weak convergence of a self-consistent estimator of the survival function with doubly censored data. Ann Stat 18:391–404
Chang MN, Yang GL (1987) Strong consistency of a nonparametric estimator of the survival function with doubly censored data. Ann Stat 15:1536–1547
Cook NR (2007) Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 115:928–935
DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44:837–845
Gönen M, Heller G (2005) Concordance probability and discriminatory power in proportional hazards regression. Biometrika 92:965–970
Hara M, Sakata Y, Nakatani D, Suna S, Nishino M, Sato H, Kitamura T, Nanto S, Hamasaki T, Hori M, Komuro I (2016) Subclinical elevation of high-sensitive troponin T levels at the convalescent stage is associated with increased 5-year mortality after ST-elevation myocardial infarction. J Cardiol 67:314–320
Harrell FE, Lee KL, Mark DB (1996) Tutorial in biostatistics: multivariate prognostic models: issues in developing models evaluating assumptions and adequacy and measuring and reducing errors. Stat Med 15:361–387
Hilden J, Gerds TT (2014) A note on the evaluation of novel biomarkers: do not rely on integrated discrimination improvement and net reclassification index. Stat Med 33:3405–3414
Ji S, Peng L, Cheng Y, HuiChuan L (2012) Quantile regression for doubly censored data. Biometrics 68:101–112
Julià O, Gómez G (2011) Simultaneous marginal survival estimators when doubly censored data is present. Lifetime Data Anal 17:347–372
Kyle RT, Rajkumar TV, Offord J, Larson D, Plevak M, Melton LJ III (2002) A long-terms study of prognosis in monoclonal gammopathy of undetermined significance. New Engl J Med 346:564–569
Kim Y, Kim B, Jang W (2010) Asymptotic properties of the maximum likelihood estimator for the proportional hazards model with doubly censored data. J Multivar Anal 101:1339–1351
Klein JP, Moeschberger ML (2003) Survival analysis: techniques for censored and truncated data. Springer, New York
Mantel N (1967) Ranking procedures for arbitrarily restricted observation. Biometrics 23:65–78
Nolan D, Pollard D (1987) \(U\)-processes: rates of convergence. Ann Stat 15:780–799
Nolan D, Pollard D (1988) Functional limit theorems for \(U\)-processes. Ann Stat 16:1291–1298
Pencina MJ, D’Agostino RB (2004) Overall C as a measure of discrimination in survival analysis: model specific population value and confidence interval estimation. Stat Med 23:2109–2123
Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction. Oxford University Press, New York
Pepe MS, Janes H, Li CI (2014) Net risk reclassification p values: valid or misleading? J Natl Cancer Inst 106:dju041
Pepe MS, Thompson LT (2000) Combining diagnostic test results to increasing accuracy. Biostatistics 1:123–140
Peto R (1973) Experimental survival curves for interval-censored data. J R Stat Soc C 22:86–91
R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Ann Stat 4:1317–1334. http://www.R-project.org/
Tsai WY, Crowley J (1985) A large sample study of generalized maximum likelihood estimators from incomplete data via self-consistency. Ann Stat 4:1317–1334
Turnbull BW (1974) Nonparametric estimation of a survivorship function with doubly censored data. J Am Stat Assoc 69:169–173
Turnbull BW (1976) The empirical distribution function with arbitrarily grouped censored and truncated data. J R Stat Soc B 38:290–295
Uno H, Cai H, Pencina MJ, D’Agostino RB, Wei LJ (2011) On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat Med 30:1105–1117
Zhang C-H, Li X (1996) Linear regression with doubly censored data. Ann Stat 24:2720–2743
Acknowledgements
The authors are grateful to Drs. Yasuhiko Sakata, Daisaku Nakatani, and Yasushi Sakata for allowing us to utilize the dataset used in [6]. The authors would like to acknowledge the associate editor and anonymous reviewers for very useful comments and suggestions that improve the presentation of the paper. K. Hayashi is supported by JSPS KAKENHI (Grant-in-Aid for Scientific Research) Grant Number 15K15950. S. Shimizu is supported by JSPS KAKENHI (Grant-in-Aid for Scientific Research) Grant Number 70423085.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
We shall show the consistency and asymptotic normality of \(\widehat{C}_{\mathrm{D}}^{(1)}(\hat{h})\) in this section. Since we can also show these properties for \(\widehat{C}_{\mathrm{D}}^{(2)}(h)\) in the same way, the corresponding proofs are omitted.
To show the weak consistency, note that
where \(\widetilde{X}_j=\max (T_j,R_j)\). Then, the denominator of \(\widehat{C}_{\mathrm{D}}^{(1)}(h)\),
\(\displaystyle \frac{1}{n(n-1)}\sum _{i,j}\varGamma _i\varDelta _i\varGamma _j\mathbb {I}_{{\varvec{\tau }}}\left\{ X_i<X_j\right\} \left( \widehat{F}(X_i)\widehat{\bar{G}}(X_i)^2\widehat{F}(X_j)\right) ^{-1}\), converges to \(\mathrm{E}\, [ \mathbb {I}_{{\varvec{\tau }}}\{T_1<T_2\}]\) in probability by the uniform consistency of \(\widehat{F}(\cdot )\) and \(\widehat{\bar{G}}(\cdot )\) [2] and a law of large numbers for U-processes [15]. Similarly, it can also be shown that
Then, the consistency of \(\widehat{C}_{\mathrm{D}}^{(1)}(\hat{h})\) follows from Slutsky’s lemma.
To show the asymptotic normality of \(\widehat{C}_{\mathrm{D}}^{(1)}(\hat{h})\), we first consider asymptotic behavior of the estimator with fixed \(h=h(\cdot ;{\varvec{\beta }})\). For notational brevity, hereafter the following symbols are used.
For \(C^*(h)=\mathrm{P}\left[ h({\varvec{Z}}_1)>h({\varvec{Z}}_2)|T_1<T_2, T_1<\tau _{\mathrm{U}}, \tau _{\mathrm{L}}<T_2\right] \), note that
Remark that \(\mathcal {W}_1(h)\) and \(\mathcal {W}_2(h)\) correspond to the variance component of the U-statistic and weights \(\widehat{W}_{ij}\), respectively. It follows from the uniform consistency of \(\widehat{F}\) and \(\widehat{\bar{G}}\) and a functional limit theorem for U-processes [16] that
where \(p({\varvec{\tau }})=\mathrm{P}\left[ T_1<T_2,T_1<\tau _{\mathrm{U}}, \tau _{\mathrm{L}}<T_2\right] \). Moreover note that
where \(W(s,t)=\left( F(s)\bar{G}(s)^2F(t)\right) ^{-1}\), \(\widehat{W}(s,t)=\left( \widehat{F}(s)\widehat{\bar{G}}(s)^2\widehat{F}(t)\right) ^{-1}\), and
By a uniform law of large numbers for U-processes [15] and the uniform consistency of \(\widehat{F}\) and \(\widehat{\bar{G}}\), we obtain that
where \(\gamma (s,t,h)=p((s,t)')(\widehat{C}_{\mathrm{D}}^{(1)}(h)-C^*(h))/p({\varvec{\tau }})\).
Therefore, the asymptotic distribution for \(\widehat{C}_{\mathrm{D}}^{(1)}(h)\) is obtained if we show that \(\sqrt{n}\left( \frac{\widehat{W}(s,t)}{W(s,t)}-1\right) \) converges in distribution to a zero-mean Gaussian process (indexed by s and t).
Note that
As in Remark, \(\sqrt{n}(\widehat{F}-F,\widehat{\bar{G}}-\bar{G})\) jointly converges in distribution to a two-dimensional zero-mean Gaussian process. Hence any finite-dimensional distributions of \(\sqrt{n}(\widehat{F}-F,\widehat{\bar{G}}-\bar{G})\) converges in distribution to the corresponding multidimensional normal distribution. To complete the proof, we consider the asymptotic expansion of \(\mathcal {W}(\hat{h})\) with respect to \({\hat{\beta }}\):
where \(\nabla C^*(h^*)\) is the partial derivative of \(C^*(h)\) with respect to \({\varvec{\beta }}\) evaluated at \({\varvec{\beta }}={\varvec{\beta }}^*\). By the same argument as in Appendix of [26]; see especially (A1) and (A2) in [26], we can conclude the asymptotic normality of \({{\mathcal {W}}}(\hat{h})\).
Rights and permissions
About this article
Cite this article
Hayashi, K., Shimizu, Y. Estimation of a Concordance Probability for Doubly Censored Time-to-Event Data. Stat Biosci 10, 546–567 (2018). https://doi.org/10.1007/s12561-018-9216-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12561-018-9216-5