Skip to main content

Estimating Improvement in Prediction with Matched Case-Control Designs

  • Conference paper
  • First Online:
Risk Assessment and Evaluation of Predictions

Part of the book series: Lecture Notes in Statistics ((LNSP,volume 215))

  • 1906 Accesses

Abstract

When an existing risk prediction model is not sufficiently predictive, additional variables are sought for inclusion in the model. This paper addresses study designs to evaluate the improvement in prediction performance that is gained by adding a new predictor to a risk prediction model. We consider studies that measure the new predictor in a case-control subset of the study cohort, a practice that is common in biomarker research. We ask if matching controls to cases in regards to baseline predictors improves efficiency. A variety of measures of prediction performance are studied. We find through simulation studies that matching improves the efficiency with which most measures are estimated, but can reduce efficiency for some. Efficiency gains are less when more controls per case are included in the study. A method that models the distribution of the new predictor in controls appears to improve estimation efficiency considerably.

This paper appeared in volume 19 (2013) of Lifetime Data Analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Anderson, M., Wilson, P.W., Odell, P.M., Kannel, W.B.: An updated coronary risk profile: a statement for health professionals. Circulation 83, 356–362 (1991)

    Article  Google Scholar 

  2. Breslow, N.E.: Statistics in epidemiology: the case-control study. J. Am. Stat. Assoc. 91(433), 14–27 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  3. Breslow, N.E., Cain, K.C.: Logistic regression for two-stage case-control data. Biometrika 75(1), 11–20 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  4. Breslow, N.E., Day, N.E.: Statistical methods in cancer research, vol. 1 - The analysis of case-control studies. International Agency for Research on Cancer, Lyon (1980)

    Google Scholar 

  5. Baker, S.G.: Putting risk prediction in perspective: relative utility curves. J. Natl. Cancer. Inst. 101, 1538–1542 (2009)

    Article  Google Scholar 

  6. Bura, E., Gastwirth, J.L.: The binary regression quantile plot: assessing the importance of predictors in binary regression visually. Biom. J. 43, 5–21 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  7. Efron, B., Tibshirani, R.J.: An introduction to the bootstrap. Chapman & Hall/CRC, New York (1993)

    Book  MATH  Google Scholar 

  8. Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults.: Executive summary of the third report of the national cholesterol education program (NCEP) expert panel on detection, evaluation, and treatment of high blood cholesterol in adults (Adult Treatment Panel III). J. Am. Med. Assoc. 285(19), 2486–2497 (2001)

    Google Scholar 

  9. Fears, T.R., Brown, C.C.: Logistic regression methods for retrospective case-control studies using complex sampling procedures. Biometrics 42, 955–960 (1986)

    Article  MATH  Google Scholar 

  10. Gail, M.H., Brinton, L.A., Byar, D.P., Corle, D.K., Green, S.B., Shairer, C., Mulvihill, J.J.: Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J. Natl. Cancer Inst. 81(24), 1879–1886 (1989)

    Article  Google Scholar 

  11. Gail, M.H., Costantino, J.P., Bryant, J., Croyle, R., Freedman, L., Helzlsouer, K., Vogel, V.: Weighing the risks and benefits of tamoxifen treatment for preventing breast cancer. J. Natl. Cancer Inst. 91(21), 1829–1846 (1999)

    Article  Google Scholar 

  12. Gordon, T., Kannel, W.B.: Multiple risk functions for predicting coronary heart disease: the concept, accuracy, and application. Am. Heart J. 103, 1031–1039 (1982)

    Article  Google Scholar 

  13. Gu, W., Pepe, M.: Measures to summarize and compare the predictive capacity of markers. Int. J. Biostat. 5, article 27 (2009)

    Google Scholar 

  14. Gu, W., Pepe, M.S.: Estimating the capacity for improvement in risk prediction with a marker. Biostatistics 10(1), 172–186 (2009)

    Article  Google Scholar 

  15. Heagerty, P.J., Pepe, M.S.: Semiparametric estimation of regression quantiles with application to standardizing weight for height and age in children. Appl. Stat. 48(4), 533–551 (1999)

    MATH  Google Scholar 

  16. Huang, Y., Pepe, M.S.: Semiparametric methods for evaluating risk prediction markers in case-control studies. Biometrika 96(4), 991–997 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  17. Huang, Y., Pepe, M.S., Feng, Z.: Evaluating the predictiveness of a continuous marker. Biometrics 63(4), 1181–1188 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  18. Janes, H., Pepe, M.S.: Matching in studies of classification accuracy: implications for analysis, efficiency, and assessment of incremental value. Biometrics 64, 1–9 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  19. Janes, H., Pepe, M.S.: Adjusting for covariate effects on classification accuracy using the covariate adjusted ROC curve. Biometrika 96, 371–382 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  20. Janssens, A.C.J.W., Deng, Y., Borsboom, G.J.J.M., Eijkemans, M.J.C., Habemma, J.D.F., Steyerberg, E.W.: A new logistic regression approach for the evaluation of diagnostic test results. Ann. Intern. Med. 25(2), 168–177 (2005)

    Google Scholar 

  21. Kannel, W.B., McGee, D., Gordon, T.: A general cardiovascular risk profile: the Framingham study. Am. J. Cardiol. 38, 46–51 (1976)

    Article  Google Scholar 

  22. Kerr, K.F., McClelland, R.L., Brown, E.R., Lumley, T.: Evaluating the incremental value of new biomarkers with integrated discrimination improvement. Am. J. Epidemiol. 174(3), 364–374 (2011)

    Article  Google Scholar 

  23. Krijnen, P., van Jaarsveld, B.C., Steyerberg, E.W., Man in’t Veld, A.J., Schalekamp, M.A.D.H., Habbema, J.D.F.: A clinical prediction rule for renal artery stenosis. Stat. Med. 129(9), 705–711 (1998)

    Google Scholar 

  24. Mealiffe, M.E., Stokowski, R.P., Rhees, B.K., Prentice, R.L., Pettinger, M., Hinds, D.A.: Assessment of clinical validity of a breast cancer risk model combining genetic and clinical information. J. Natl. Cancer Inst. 102(21), 1618–1627 (2010)

    Article  Google Scholar 

  25. Pauker, S.G., Kassierer, J.P.: The threshold approach to clinical decision making. N. Engl. J. Med. 302, 1109–1117 (1980)

    Article  Google Scholar 

  26. Pencina, M.J., D’Agostino, R.B. Sr., D’Agostino, R.B. Jr., Vasan, R.S.: Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat. Med. 27, 157–172 (2008)

    Article  MathSciNet  Google Scholar 

  27. Pencina, M.J., D’Agostino, R.B. Sr., Steyerberg, E.W.: Extensions of net reclassification improvement calculations to measure usefulness of new biomarkers. Stat. Med. 30, 11–21 (2011)

    Article  MathSciNet  Google Scholar 

  28. Pepe, M., Janes, H.: Methods for evaluating prediction performance of biomarkers and tests. University of Washington Working Paper 384. The Berkley Electronic Press, Berkley (2012)

    Google Scholar 

  29. Pepe, M.S., Feng, Z., Janes, H., Bossuyt, P., Potter, J.: Pivotal evaluation of the accuracy of a biomarker used for classification or prediction: standards for study design. J. Natl. Cancer Inst. 100(20), 1432–1438 (2008)

    Article  Google Scholar 

  30. Pepe, M.S., Fan, J., Seymour, C.W., Li, C., Huang, Y., Feng, Z.: Biases introduced by choosing controls to match risk factors of cases in biomarker research. Clin. Chem. 58(8), 1242–1251 (2012)

    Article  Google Scholar 

  31. Pepe, M.S., Kerr, K.F., Longton, G., Wang, Z.: Testing for improvement in prediction model performance. Stat. Med. 32(9), 1467–1482 (2013)

    Article  MathSciNet  Google Scholar 

  32. Pfeiffer, R.M., Gail, M.H.: Two criteria for evaluating risk prediction models. Biometrics 67, 1057–1065 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  33. Prentice, R.L., Pyke, R.: Logistic disease incidence models and case-control studies. Biometrika 66, 403–411 (1979)

    Article  MathSciNet  MATH  Google Scholar 

  34. Truett, J., Cornfield, J., Kannel, W.: A multivariate analysis of the risk of coronary heart disease in Framingham. J. Chronic Dis. 20, 511–524 (1967)

    Article  Google Scholar 

  35. Vickers, A.J., Elkin, E.B.: Decision curve analysis: a novel method for evaluating prediction models. Med. Decis. Making. 26, 565–574 (2006)

    Article  Google Scholar 

  36. Vickers, A.J., Cronin, A.M., Begg, C.M.: One statistical test is sufficient for assessing new predictive markers. BMC Med. Res. Methodol. 11(1), 13 (2011)

    Article  Google Scholar 

Download references

Acknowledgements

Support for this research was provided by RO1-GM-54438 and PO1-CA-053996. The authors thank Mr. Jing Fan for his contribution to the simulation studies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aasthaa Bansal .

Editor information

Editors and Affiliations

Appendix

Appendix

Table 8 Estimators of performance measures: Nonparametric estimators using the baseline risk model and cohort data
Table 9 Estimators of performance measures: Nonparametric estimators using the enhanced risk model and case-control subset data
Table 10 Estimators of performance measures: Semiparametric estimators using the enhanced risk model and both cohort and case-control subset data. We let superscripts ‘cohort’ and ‘cc’ denote data from the cohort and the case-control subset, respectively

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

Bansal, A., Pepe, M.S. (2013). Estimating Improvement in Prediction with Matched Case-Control Designs. In: Lee, ML., Gail, M., Pfeiffer, R., Satten, G., Cai, T., Gandy, A. (eds) Risk Assessment and Evaluation of Predictions. Lecture Notes in Statistics, vol 215. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8981-8_8

Download citation

Publish with us

Policies and ethics