The area under the receiver operating characteristic curve (AUC) is the most commonly reported measure of discrimination for prediction models with binary outcomes. However, recently it has been criticized for its inability to increase when important risk factors are added to a baseline model with good discrimination. This has led to the claim that the reliance on the AUC as a measure of discrimination may miss important improvements in clinical performance of risk prediction rules derived from a baseline model. In this paper we investigate this claim by relating the AUC to measures of clinical performance based on sensitivity and specificity under the assumption of multivariate normality. The behavior of the AUC is contrasted with that of discrimination slope. We show that unless rules with very good specificity are desired, the change in the AUC does an adequate job as a predictor of the change in measures of clinical performance. However, stronger or more numerous predictors are needed to achieve the same increment in the AUC for baseline models with good versus poor discrimination. When excellent specificity is desired, our results suggest that the discrimination slope might be a better measure of model improvement than AUC. The theoretical results are illustrated using a Framingham Heart Study example of a model for predicting the 10-year incidence of atrial fibrillation.
Risk prediction Discrimination AUC IDI Youden index relative utility
This is a preview of subscription content, log in to check access.
This research has been supported by National Heart, Lung, and Blood Institute’s Framingham Heart Study; contract/Grant Number: N01-HC-25195. Dr. Pencina has been additionally supported by NIH/ARRA Risk Prediction of Atrial Fibrillation; Grant Number: RC1HL101056.
Baker SG, Cook NR, Vickers A et al (2009) Using relative utility curves to evaluate risk prediction. J R Stat Soc Ser A Stat Soc 172(4):729–748MathSciNetCrossRefGoogle Scholar
Cook NR (2007) Use and misuse of the receiver operating characteristics curve in risk prediction. Circulation 115(7):928–935CrossRefGoogle Scholar
Cox DR (1972) Regression models and life tables. J R Stat Soc Ser B 34:187–220MATHGoogle Scholar
D’Agostino RB Sr, Pencina MJ (2012) Invited commentary: clinical usefulness of the framingham cardiovascular risk profile beyond its statistical performance. Am J Epidemiol 176(3):187–189Google Scholar
DeLong ER, DeLong DM, Clarke-Pearson DL (1988) Comparing areas under two or more correlated reciever operating characteristics curves: a nonparamentric approach. Biometrics 44(3):837–845MATHCrossRefGoogle Scholar
Demler OV, Pencina MJ, D’Agostino RB Sr (2012) Misuse of DeLong test to compare AUCs for nested models. Stat Med 31:2577–2587CrossRefGoogle Scholar
Pencina MJ, D’Agostino RB Sr, D’Agostino RB Jr et al (2008) Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 27(2):157–172MathSciNetCrossRefGoogle Scholar
Pencina MJ, D’Agostino RB Sr, Steyerberg E (2011) Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat Med 30(1):11–21MathSciNetCrossRefGoogle Scholar
Pencina MJ, D’Agostino RB, Demler OV (2012) Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models. Stat Med 31:101–113MathSciNetCrossRefGoogle Scholar
Pepe MS, Janes H, Longton G et al (2004) Limitations of the odds ratio in gauging the performance of a diagnostic, prognostic, or screening marker. Am J Epidemiol 159(9):882–890CrossRefGoogle Scholar
Schnabel RB, Larson MG, Yamamoto JF et al (2010) Relations of biomarkers of distinct pathophysiological pathways and atrial fibrillation incidence in the community. Circulation 121(2):200–207CrossRefGoogle Scholar
Steyerberg EW, Vickers AJ, Cook NR et al (2010) Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology 21(1):128–138CrossRefGoogle Scholar
Steyerberg EW, Pencina MJ, Lingsma HF et al (2012) Assessing the incremental value of diagnostic and prognostic markers: a review and illustration. Eur J Clin Invest 42(2):216–228CrossRefGoogle Scholar