Abstract
The volume under the receiver operating characteristic surface (VUS) is useful for measuring the overall accuracy of a diagnostic test when the possible disease status belongs to one of three ordered categories. In medical studies, the VUS of a new test is typically estimated through a sample of measurements obtained by some suitable sample of patients. However, in many cases, only a subset of such patients has the true disease status assessed by a gold standard test. In this paper, for a continuous-scale diagnostic test, we propose four estimators of the VUS which accommodate for nonignorable missingness of the disease status. The estimators are based on a parametric model which jointly describes both the disease and the verification process. Identifiability of the model is discussed. Consistency and asymptotic normality of the proposed estimators are shown, and variance estimation is discussed. The finite-sample behavior is investigated by means of simulation experiments. An illustration is provided.
Similar content being viewed by others
References
Baker SG (1995) Evaluating multiple diagnostic tests with partial verification. Biometrics 51(1):330–337
Chi YY, Zhou XH (2008) Receiver operating characteristic surfaces in the presence of verification bias. J R Stat Soc Ser C (Appl Stat) 57(1):1–23
Fluss R, Reiser B, Faraggi D, Rotnitzky A (2009) Estimation of the ROC curve under verification bias. Biom J 51(3):475–490
Fluss R, Reiser B, Faraggi D (2012) Adjusting ROC curve for covariates in the presence of verification bias. J Stat Plan Inference 142(1):1–11
Kang L, Tian L (2013) Estimation of the volume under the ROC surface with three ordinal diagnostic categories. Comput Stat Data Anal 62:39–51
Li J, Zhou XH (2009) Nonparametric and semiparametric estimation of the three way receiver operating characteristic surface. J Stat Plan Inference 139(12):4133–4142
Little RJ, Rubin DB (2002) Statistical analysis with missing data. Wiley, New York
Liu D, Zhou XH (2010) A model for adjusting for nonignorable verification bias in estimation of the ROC curve and its area with likelihood-based approach. Biometrics 66(4):1119–1128
Nakas CT, Yiannoutsos CT (2004) Ordered multiple-class ROC analysis with continuous measurements. Stat Med 23(22):3437–3449
Rotnitzky A, Faraggi D, Schisterman E (2006) Doubly robust estimation of the area under the receiver-operating characteristic curve in the presence of verification bias. J Am Stat Assoc 101(475):1276–1288
Scurfield BK (1996) Multiple-event forced-choice tasks in the theory of signal detectability. J Math Psychol 40(3):253–269
To Duc K (2017) bcROCsurface: an R package for correcting verification bias in estimation of the ROC surface and its volume for continuous diagnostic tests. BMC Bioinform 18(1):503
To Duc K, Chiogna M, Adimari G (2016) Bias-corrected methods for estimating the receiver operating characteristic surface of continuous diagnostic tests. Electron J Stat 10(2):3063–3113
van der Vaart AW (2000) Asymptotic statistics. Cambridge University Press, Cambridge
Xiong C, van Belle G, Miller JP, Morris JC (2006) Measuring and estimating diagnostic accuracy when there are three ordinal diagnostic groups. Stat Med 25(7):1251–1273
Zhang Y, Alonzo TA (2018) Estimation of the volume under the receiver-operating characteristic surface adjusting for non-ignorable verification bias. Stat Methods Med Res 27(3):715–739
Zhou XH, Castelluccio P (2003) Nonparametric analysis for the ROC areas of two diagnostic tests in the presence of nonignorable verification bias. J Stat Plan Inference 115(1):193–213
Zhou XH, Castelluccio P (2004) Adjusting for non-ignorable verification bias in clinical studies for Alzheimer’s disease. Stat Med 23(2):221–230
Zhou XH, Rodenberg CA (1998) Estimating an ROC curve in the presence of nonignorable verification bias. Commun Stat 27(3):273–285
Acknowledgements
The authors thank the Alzheimers Disease Neuroimaging Initiative research group for kindly permitting access to the data analyzed in this paper. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense Award Number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimers Association; Alzheimers Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
Author information
Authors and Affiliations
Consortia
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Data used in preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
Appendices
Appendix 1
Proves
Proof of Theorem 1
We can show that \({\mathbb {E}}\{G_{i\ell r,*}(\mu _0,{\varvec{\xi }}_0)\} = 0\) (see the “Appendix 2”). Then \(e_*(\mu _0,{\varvec{\xi }}_0) = 0\), and, by condition (C2) and an application of implicit function theorem, there exists a neighborhood of \({\varvec{\xi }}_0\) in which a continuously differentiable function, \(m({\varvec{\xi }})\), is uniquely defined such that \(m({\varvec{\xi }}_0) = \mu _0\) and \(e_*(m({\varvec{\xi }}),{\varvec{\xi }}) = 0\). Since the maximum likelihood estimator \(\hat{{\varvec{\xi }}}\) is consistent, i.e., \(\hat{{\varvec{\xi }}} {\mathop {\rightarrow }\limits ^{p}} {\varvec{\xi }}_0\), we have that \({{\tilde{\mu }}}_* = m(\hat{{\varvec{\xi }}}){\mathop {\rightarrow }\limits ^{p}} \mu _0\). On the other hand, \(G_*({{\hat{\mu }}}_*, \hat{{\varvec{\xi }}}) = 0\) and condition (C3) implies that \(e_*({{\hat{\mu }}}_*,\hat{{\varvec{\xi }}}){\mathop {\rightarrow }\limits ^{p}} 0\). Thus, \({\hat{\mu }}_* {\mathop {\rightarrow }\limits ^{p}} {{\tilde{\mu }}}_*\). \(\square \)
Proof of Theorem 2
We have
Since \(e_*(\mu _0,{\varvec{\xi }}_0) = 0\), we get
Condition (C1) implies that the first term in right hand side of the last identity is \(o_p(1)\). Using the Taylor expansion, we have
It is straightforward to show that
By standard results on the limit distribution of U-statistics (van der Vaart 2000, Theorem 12.3, Chap. 12),
where \(\sqrt{n}{\tilde{G}}_{*}(\mu ,{\varvec{\xi }})\) is the projection of \(U_{n,*}\) onto the set of all statistics of the form
for \(\ell \ne i\) and \(r \ne \ell , r \ne i\). For the maximum likelihood estimator \(\hat{{\varvec{\xi }}}\), we can write
Hence, from (18),
Note that the observed data \(O_i\) are i.i.d, then \(Q_{i,*}(\mu _0,{\varvec{\xi }}_0)\) are also i.i.d. In addition, we easily show that
Therefore, \({\mathbb {E}}\{Q_{i,*} (\mu _0,{\varvec{\xi }}_0)\} = 0\), and \(\frac{1}{\sqrt{n}} Q_* (\mu _0,{\varvec{\xi }}_0) {\mathop {\rightarrow }\limits ^{d}} {\mathcal {N}}(0, {\mathbb {V}}\mathrm {ar}\left\{ Q_{i,*} (\mu _0,{\varvec{\xi }}_0)\right\} )\) by the Central Limit Theorem. It follows that
where
\(\square \)
Variance estimation
Under condition (C3), a consistent estimator of \(\varLambda _*\) can be obtained as
where \({\hat{\theta }}_{k,*}\) are the estimates of the disease probabilities, \(\theta _{k}\) for \(k = 1,2,3\). Specifically, \({\hat{\theta }}_{k,\mathrm {FI}} = \frac{1}{n}\sum \nolimits _{i=1}^{n} {\hat{\rho }}_{ki}\), \({\hat{\theta }}_{k,\mathrm {MSI}} = \frac{1}{n}\sum \nolimits _{i=1}^{n} {\tilde{D}}_{ki,\mathrm {MSI}}\), \({\hat{\theta }}_{k,\mathrm {PDR}} = \frac{1}{n}\sum \nolimits _{i=1}^{n} {\tilde{D}}_{ki,\mathrm {PDR}}\) and \({\hat{\theta }}_{k,\mathrm {IPW}} = \sum \nolimits _{i=1}^{n} V_i D_{ki}{\hat{\pi }}_i^{-1} \bigg /\sum \nolimits _{i=1}^{n} V_i{\hat{\pi }}_i^{-1}\). According to (19), we have that
In addition, for fixed i, we also have that
Therefore,
The quantity \(\sum \nolimits _{i=1}^{n} \frac{\partial {\mathcal {S}}_i({\varvec{\xi }})}{\partial {\varvec{\xi }}^\top }\bigg |_{{\varvec{\xi }} = \hat{{\varvec{\xi }}}}\) could be obtained as the Hessian matrix of the log-likelihood function at \(\hat{{\varvec{\xi }}}\). In order to compute \(\frac{\partial G_{i\ell r,*}({\hat{\mu }}_{*},{\varvec{\xi }})}{\partial {\varvec{\xi }}^\top }\bigg |_{{\varvec{\xi }} = \hat{{\varvec{\xi }}}}\), we have to get the derivatives \(\frac{\partial }{\partial {\varvec{\xi }}^\top } \rho _{ki}({\varvec{\tau }}_{0\rho _k})\), \(\frac{\partial }{\partial {\varvec{\xi }}^\top } \rho _{k(0)i}({\varvec{\xi }})\), \(\frac{\partial }{\partial {\varvec{\xi }}^\top } \pi ^{-1}_{i}({\varvec{\lambda }}, {\varvec{\tau }}_\pi )\), \(\frac{\partial }{\partial {\varvec{\xi }}^\top } \pi _{10i}({\varvec{\lambda }}, {\varvec{\tau }}_\pi )\), \(\frac{\partial }{\partial {\varvec{\xi }}^\top } \pi _{01i}({\varvec{\lambda }}, {\varvec{\tau }}_\pi )\) and \(\frac{\partial }{\partial {\varvec{\xi }}^\top } \pi _{00i}({\varvec{\lambda }}, {\varvec{\tau }}_\pi )\).
In Sect. 2.3, we obtain
and
where \((d_1, d_2)\) belongs to the set \(\{(1,0), (0,1), (0,0)\}\). Also, we have
Moreover,
with \(s = 1, 2\). Then, recall that
After some algebra, we get
Finally, we set \(z = (1 - \pi _{10i})\rho _{1i} + (1 - \pi _{01i})\rho _{2i} + (1 - \pi _{00i})\rho _{3i}\), and get
The derivative \(\dfrac{\partial }{\partial {\varvec{\xi }}^\top } \rho _{3(0)i}({\varvec{\xi }})\) can be computed by using the fact that \(\rho _{3(0)i} = 1 - \rho _{1(0)i} - \rho _{2(0)i}\).
Appendix 2
Here, we show that the estimating functions \(G_{i\ell r,*}\) are unbiased under the working disease and verification models. Recall that \({\varvec{\xi }} = ({\varvec{\lambda }}^\top , {\varvec{\tau }}^\top _\pi , {\varvec{\tau }}^\top _\rho )^\top \).
FI estimator
We have
$$\begin{aligned} {\mathbb {E}}\left\{ G_{i\ell r,\mathrm {FI}}(\mu _0, {\varvec{\xi }}_0)\right\}= & {} {\mathbb {E}}\left\{ \rho _{1i}({\varvec{\tau }}_{0\rho }) \rho _{2\ell }({\varvec{\tau }}_{0\rho }) \rho _{3r}({\varvec{\tau }}_{0\rho }) (I_{i\ell r} - \mu ) \right\} \\= & {} {\mathbb {E}}\left\{ \rho _{1i}\rho _{2\ell }\rho _{3r}(I_{i\ell r} - \mu _0) \right\} . \end{aligned}$$Hence, \({\mathbb {E}}\left\{ G_{i\ell r,\mathrm {FI}}(\mu _0, {\varvec{\xi }}_0)\right\} = 0\) from (13).
MSI estimator
Consider \({\mathbb {E}}\left\{ D_{ki,\mathrm {MSI}}({\varvec{\xi }}_0)|T_i, {\varvec{A}}_i\right\} \). We have
$$\begin{aligned}&{\mathbb {E}}\left\{ D_{ki,\mathrm {MSI}}({\varvec{\xi }}_0)|T_i, {\varvec{A}}_i\right\} \\&\quad = {\mathbb {E}}\left\{ V_i D_{ki} + (1 - V_i)\rho _{k(0)i}({\varvec{\xi }}_0)|T_i, {\varvec{A}}_i\right\} \\&\quad = {\mathbb {E}}\left[ {\mathbb {E}}\left\{ V_i D_{ki} + (1 - V_i)\rho _{k(0)i}({\varvec{\xi }}_0)|T_i, {\varvec{A}}_i, V_i \right\} | T_i, {\varvec{A}}_i \right] \\&\quad = \mathrm {Pr}(V_i = 1|T_i, {\varvec{A}}_i){\mathbb {E}}\left( D_{ki}|V_i = 1, T_i, {\varvec{A}}_i\right) \\&\qquad + \, \mathrm {Pr}(V_i = 0|T_i, {\varvec{A}}_i){\mathbb {E}}\left( \rho _{k(0)i}({\varvec{\xi }}_0)|V_i = 0, T_i, {\varvec{A}}_i \right) \\&\quad = \mathrm {Pr}(V_i = 1|T_i, {\varvec{A}}_i)\mathrm {Pr}(D_{ki} = 1|V_i = 1, T_i, {\varvec{A}}_i) \\&\qquad + \, \mathrm {Pr}(V_i = 0|T_i, {\varvec{A}}_i)\mathrm {Pr}(D_{ki} = 1|V_i = 0, T_i, {\varvec{A}}_i) \\&\quad = \mathrm {Pr}(D_{ki} = 1|T_i, {\varvec{A}}_i) = \rho _{ki}. \end{aligned}$$Therefore,
$$\begin{aligned}&{\mathbb {E}}\left\{ G_{i\ell r, \mathrm {MSI}}(\mu _0,{\varvec{\xi }}_0) \right\} \\&\quad = {\mathbb {E}}\left\{ D_{1i,\mathrm {MSI}}({\varvec{\xi }}_0) D_{2\ell , \mathrm {MSI}}({\varvec{\xi }}_0) D_{3r, \mathrm {MSI}}({\varvec{\xi }}_0) \left( I_{i\ell r} - \mu _0 \right) \right\} \\&\quad = {\mathbb {E}}\Big [ \left( I_{i\ell r} - \mu _0 \right) {\mathbb {E}}\left\{ D_{1i,\mathrm {MSI}}({\varvec{\xi }}_0) | T_i, {\varvec{A}}_i \right\} {\mathbb {E}}\left\{ D_{2\ell ,\mathrm {MSI}}({\varvec{\xi }}_0) | T_\ell , {\varvec{A}}_\ell \right\} \\&\qquad \times \, {\mathbb {E}}\left\{ D_{3r,\mathrm {MSI}}({\varvec{\xi }}_0) | T_r, {\varvec{A}}_r \right\} \Big ] \\&\quad = {\mathbb {E}}\left\{ \rho _{1i}\rho _{2\ell }\rho _{3r}(I_{i\ell r} - \mu _0) \right\} . \end{aligned}$$IPW estimator
In this case,
$$\begin{aligned} {\mathbb {E}}\left( \frac{V_i D_{ki}}{\pi _i({\varvec{\xi }}_0)} \bigg |T_i, {\varvec{A}}_i \right)= & {} \frac{{\mathbb {E}}\left( V_i D_{ki}|T_i, {\varvec{A}}_i\right) }{\pi _i({\varvec{\xi }}_0)} \\= & {} \frac{{\mathbb {E}}\left\{ D_{ki} {\mathbb {E}}\left( V_i |D_{1i}, D_{2i}, T_i, {\varvec{A}}_i\right) \big | T_i, {\varvec{A}}_i\right\} }{\pi _i({\varvec{\xi }}_0)} \\= & {} \frac{{\mathbb {E}}\left( \pi _i D_{ki}|T_i, {\varvec{A}}_i\right) }{\pi _i} = \rho _{ki}. \end{aligned}$$Thus,
$$\begin{aligned}&{\mathbb {E}}\left\{ G_{i\ell r, \mathrm {IPW}}(\mu _0, {\varvec{\xi }}_0)\right\} \\&\quad = {\mathbb {E}}\left\{ \frac{V_i V_\ell V_r D_{1i} D_{2\ell } D_{3r}}{\pi _i({\varvec{\xi }}_0) \pi _\ell ({\varvec{\xi }}_0) \pi _k({\varvec{\xi }}_0)} \left( I_{i\ell r} - \mu _0\right) \right\} \\&\quad = {\mathbb {E}}\Bigg \{ \left( I_{i\ell r} - \mu _0\right) {\mathbb {E}}\left( \frac{V_i D_{1i}}{\pi _i({\varvec{\xi }}_0)} \bigg | T_i, {\varvec{A}}_i\right) {\mathbb {E}}\left( \frac{V_\ell D_{2\ell }}{\pi _\ell ({\varvec{\xi }}_0)} \bigg |T_\ell , {\varvec{A}}_\ell \right) \\&\qquad \times \, {\mathbb {E}}\left( \frac{V_r D_{3r}}{\pi _r({\varvec{\xi }}_0)} \bigg | T_r, {\varvec{A}}_r\right) \Bigg \} \\&\quad = {\mathbb {E}}\left\{ \rho _{1i} \rho _{2\ell } \rho _{3r}(I_{i\ell r} - \mu _0) \right\} . \end{aligned}$$PDR estimator
$$\begin{aligned}&{\mathbb {E}}\left\{ D_{ki, \mathrm {PDR}}({\varvec{\xi }}_0)|T_i, {\varvec{A}}_i\right\} \\&\quad = {\mathbb {E}}\left[ {\mathbb {E}}\left\{ \frac{V_i D_{ki}}{\pi _i({\varvec{\xi }}_0)} - \rho _{k(0)i}({\varvec{\xi }}_0)\left( \frac{V_i}{\pi _i({\varvec{\xi }}_0)} - 1\right) \bigg | D_{1i}, D_{2i}, T_i, {\varvec{A}}_i\right\} \bigg | T_i, {\varvec{A}}_i\right] \\&\quad = {\mathbb {E}}\Bigg \{D_{ki} {\mathbb {E}}\left( \frac{V_i}{\pi _i({\varvec{\xi }}_0)} \bigg | D_{1i}, D_{2i}, T_i, {\varvec{A}}_i\right) \\&\qquad - \, \rho _{k(0)i}({\varvec{\xi }}_0) {\mathbb {E}}\left( \frac{V_i}{\pi _i({\varvec{\xi }}_0)} - 1 \bigg | D_{1i}, D_{2i}, T_i, {\varvec{A}}_i\right) \bigg | T_i, {\varvec{A}}_i \Bigg \} \\&\quad = {\mathbb {E}}(D_{ki} | T_i, {\varvec{A}}_i) = \rho _{ki}. \end{aligned}$$Hence,
$$\begin{aligned}&{\mathbb {E}}\left\{ G_{i\ell r, \mathrm {PDR}}(\mu _0,{\varvec{\xi }}_0)\right\} \\&\quad = {\mathbb {E}}\left\{ D_{1i,\mathrm {PDR}}({\varvec{\xi }}_0) D_{2\ell , \mathrm {PDR}}({\varvec{\xi }}_0) D_{3r, \mathrm {PDR}}({\varvec{\xi }}_0) \left( I_{i\ell r} - \mu _0 \right) \right\} \\&\quad = {\mathbb {E}}\Big [ \left( I_{i\ell r} - \mu _0 \right) {\mathbb {E}}\left\{ D_{1i,\mathrm {PDR}}({\varvec{\xi }}_0) | T_i, {\varvec{A}}_i \right\} {\mathbb {E}}\left\{ D_{2\ell ,\mathrm {PDR}}({\varvec{\xi }}_0) | T_\ell , {\varvec{A}}_\ell \right\} \\&\qquad \times \, {\mathbb {E}}\left\{ D_{3r,\mathrm {PDR}}({\varvec{\xi }}_0) | T_r, {\varvec{A}}_r \right\} \Big ] \\&\quad = {\mathbb {E}}\left\{ \rho _{1i}\rho _{2\ell }\rho _{3r}(I_{i\ell r} - \mu _0) \right\} . \end{aligned}$$
Appendix 3
Here, we present results of an additional simulation study, that covers the cases of: (i) missing at random (MAR) assumption for the missigness of the disease status; (ii) model misspecification in the estimation process.
In the study, the diagnostic test T, covariate A and the disease status \({\mathcal {D}}\) are generated as in scenario I of Sect. 4 of the paper. Moreover, the verification status V is:
- (i)
generated as in scenario I with \(h(T,A;{\varvec{\tau }}_\pi )=-1 + T - 1.2A\) and \(\lambda _1 = \lambda _2 = 0\), i.e., under MAR assumption (verification rate roughly equal to 0.57);
- (ii)
generated as in scenario I, but models for the verification and disease processes used in the fitting procedure are misspecified, because the estimated verification model uses as predictors \(T^{1/3}\) and \(\log |A|\) instead of T and A, respectively, and the estimated disease model uses \(A^{1/3}\) instead of A.
In both (i) and (ii), the true VUS is 0.791. We consider three different values of sample size, i.e., 250, 500 and 1500. The number of replications in each simulation experiment is set to 1000.
Simulation results are given in Tables 4 and 5, for the case (i), and in Tables 6 and 7, for the case (ii). As expected, in case (i) results show some bias of the proposed VUS estimators when compared to the SPE estimator which is properly used here. However, the bias decreases when the sample size increases. In case (ii), all estimators appear to be biased, even when the sample size is large. Moreover, although in the considered case the bias seems to stay on acceptable levels, we expect that, given the nature of the estimators, it could be even dramatically high with other kinds of misspecification.
Rights and permissions
About this article
Cite this article
To Duc, K., Chiogna, M., Adimari, G. et al. Estimation of the volume under the ROC surface in presence of nonignorable verification bias. Stat Methods Appl 28, 695–722 (2019). https://doi.org/10.1007/s10260-019-00451-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-019-00451-3