In addition to performance measures such as discrimination and calibration, we may want to know whether a prediction model is clinically useful: Is the model beneficial in clinical practice to guide diagnostic work-up, or decision making, on therapy. For such decisions, we need a cutoff for the predicted probability (“decision threshold,” or “classification cutoff,” see Chap. 2). Patients with predictions above the cutoff are classified as positive; those under the cutoff as negative. We will use the term clinical usefulness for a model's ability to make such classifications better than a default policy without the prediction model.
We consider performance measures for classification from a decision-analytic perspective, and discuss their relationships with performance measures as discussed in the previous chapter. Finally, we discuss study designs for measuring the actual impact of decision rules in clinical practice. We will illustrate the use of clinical usefulness measures in the testicular cancer case study, with model development in 544 patients and external validation with 273 patients from another centre.