A graphical method for assessing risk factor threshold values using the generalized additive model: the multi-ethnic study of atherosclerosis

  • Claude Messan Setodji
  • Maren Scheuner
  • James S. Pankow
  • Roger S. Blumenthal
  • Haiying Chen
  • Emmett Keeler


Continuous variable dichotomization is a popular technique used in the estimation of the effect of risk factors on health outcomes in multivariate regression settings. Researchers follow this practice in order to simplify data analysis, which it unquestionably does. However thresholds used to dichotomize those variables are usually ad-hoc, based on expert opinions, or mean, median or quantile splits and can add bias to the effect of the risk factors on specific outcomes and underestimate such effect. In this paper, we suggest the use of a semi-parametric method and visualization for improvement of the threshold selection in variable dichotomization while accounting for mixture distributions in the outcome of interest and adjusting for covariates. For clinicians, these empirically based thresholds of risk factors, if they exist, could be informative in terms of the highest or lowest point of a risk factor beyond which no additional impact on the outcome should be expected.


Generalized additive model Smearing estimates Threshold detection Recycled prediction 



The authors would like to thank the MESA investigators and staff for their flexibility on the use of their data for this work and the participants of the MESA study for their valuable contributions. This work was supported by the National Heart, Lung, and Blood Institute Grant 1 R21 HL081175-01A1. MESA was supported by contracts N01-HC-95159 through N01-HC-95165 and N01-HC-95169 from the National Heart, Lung, and Blood Institute.


  1. Altman, D.G., Lausen, B., Sauerbrei, W., Schumacher, M.: Dangers of using “optimal” cutpoints in the evaluation of prognostic factors. J. Natl Cancer Inst. 86, 829–835 (1994)PubMedCrossRefGoogle Scholar
  2. Austin, P.C., Brunner, L.J.: Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses. Stat. Med. 23, 1159–1178 (2004)PubMedCrossRefGoogle Scholar
  3. Bild, D.E., Bluemke, D.A., Burke, G.L., Detrano, R., Diez Roux, A.V., Folsom, A.R., Greenland, P., Jacob, D.R. Jr, Kronmal, R., Liu, K., Nelson, J.C., O’Leary, D., Saad, M.F., Shea, S., Szklo, M., Tracy, R.P.: Multi-ethnic study of atherosclerosis: objectives and design. Am. J. Epidemiol. 156, 871–881 (2002)PubMedCrossRefGoogle Scholar
  4. Braun, J.V., Braun, R.K., Muller, H.G.: Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation. Biometrika 87, 301–314 (2000)CrossRefGoogle Scholar
  5. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)Google Scholar
  6. Cumsille, F., Bangdiwala, S.I., Sen, P.K., Kupper, L.L.: Effect of dichotomizing a continuous variable on the model structure in multiple linear regression models. Commun. Stat. Theory Methods 29, 643–654 (2000)CrossRefGoogle Scholar
  7. D’Agostino, R.B., Vasan, R.S., Pencina, M.J., Wolf, P.A., Cobain, M., Kannel, W.B.: General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 117, 743–753 (2008)PubMedCrossRefGoogle Scholar
  8. Del Priore, G., Zandieh, P., Lee, M.J.: Treatment of continuous data as categoric variables in obstetrics and gynecology. Obstet. Gynecol. 89, 351–354 (1997)PubMedCrossRefGoogle Scholar
  9. Detrano, R., Guerci, A.D., Carr, J.J., Bild, D.E., Burke, G., Folsom, A.R., Liu, K., Shea, S., Szklo, M., Bluemke, D.A., O’Leary, D.H., Tracy, R., Watson, K., Wong, N.D., Kronmal, R.A.: Coronary calcium as a predictor of coronary events in four racial or ethnic groups. N. Engl. J. Med. 358(13), 1336–1345 (2008)PubMedCrossRefGoogle Scholar
  10. Duan, N.: Smearing estimate: a nonparametric retransformation method. J. Am. Stat. Assoc. 78, 605–610 (1983)CrossRefGoogle Scholar
  11. Duan, N., Manning, W.G., Morris, C.N., Newhouse, J.P.: A comparison of alternative models for the demand for medical care. J. Bus. Econ. Stat. 1(2), 115–126 (1983)CrossRefGoogle Scholar
  12. Efron, B.: Better bootstrap confidence intervals (with discussion). J. Am. Stat. Assoc. 82, 171–200 (1987)CrossRefGoogle Scholar
  13. George, G., Mallery, P.: SPSS for Windows Step by Step: A Simple Guide and Reference, 11.0 update. Allyn and Bacon, Boston (2003)Google Scholar
  14. Graubard, B.I., Korn, E.L.: Predictive margins with survey data. Biometrics 55, 59–652 (1999)CrossRefGoogle Scholar
  15. Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Chapman and Hall, London (1990)Google Scholar
  16. Hayes, J.R., Hatch, J.A.: Issues in measuring reliability. Writ. Commun. 16, 354–367 (1999)CrossRefGoogle Scholar
  17. Howard, D.H., McGowan, J.E.: Initial and follow-up costs by treatment outcome for children with respiratory infections. Pediatrics 113, 1352–1356 (2004)PubMedCrossRefGoogle Scholar
  18. Kannel, W.B., Schatzkin, A.: Sudden death: lessons from subsets in population studies. J. Am. Coll. Cardiol. 5, 141B–149B (1985)PubMedCrossRefGoogle Scholar
  19. Manning, W.G., Morris, C.N., Newhouse, J.P. et al.: A two-part model of the demand for medical care. In: van der Gagg, J., Perlman, M. (eds.) Health, Economics, and Health Economics, Proceedings of the World Congress on Health Economics, North Holland Publishing Co., pp. 103–124 (1981)Google Scholar
  20. McCullagh, P., Nelder, J.: Generalized Linear Models. Chapman and Hall, London (1989)Google Scholar
  21. Mullahy, J.: Much ado about two: reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17, 247–281 (1998)PubMedCrossRefGoogle Scholar
  22. Nasir, K., Budoff, M.J., Wong, N.D., Scheuner, M., Herrington, D., Arnett, D.K., Szklo, M., Greenland, P., Blumenthal, R.S.: Calcification: multi-ethnic study of atherosclerosis (MESA) family history of premature coronary heart disease and coronary artery. Circulation 116, 619–626 (2007)PubMedCrossRefGoogle Scholar
  23. National Cholesterol Education Program: Executive Summary of the Third Report of the National Cholesterol Education Program (NCEP) Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III) JAMA 285: 2486–2497 (2001)Google Scholar
  24. Nunnally, J.C., Bernstein, I.H.: Psychometric Theory. 3rd edn. McGraw-Hill, New York (1994)Google Scholar
  25. O’Brien, S.M.: Cutpoint selection for categorizing a continuous predictor. Biometrics 60, 504–509 (2004)PubMedCrossRefGoogle Scholar
  26. Pawitan, Y.: Change-point problem. In: Armitage, P., Colton, T. (eds) Encyclopedia of Biostatistics, Wiley, New York (1998)Google Scholar
  27. Pearson, K.: Contributions to the mathematical theory of evolution. Philos. Trans. A 185, 71–110 (1893)CrossRefGoogle Scholar
  28. Pletcher, M.J., Tice, J.A., Pignone, M., Browner, W.S.: Using the coronary artery calcium score to predict coronary heart disease events. Arch. Intern. Med. 164, 1285–1292 (2004)PubMedCrossRefGoogle Scholar
  29. Royston, P., Altman, D.G., Sauerbrei, W.: Dichotomizing continuous predictors in multiple regression: a bad idea. Stat. Med. 25, 127–141 (2006)PubMedCrossRefGoogle Scholar
  30. Scheuner, M.T., Setodji, C.M., Pankow, J.S., Blumenthal, R.S., Keeler, E.: Relation of familial patterns of coronary heart disease, stroke, and diabetes to subclinical atherosclerosis: the multi-ethnic study of atherosclerosis. Genet. Med. 10, 879–887 (2008)PubMedCrossRefGoogle Scholar
  31. Zhou, S., Shen, X.: Spatially adaptive regression splines and accurate knot selection schemes. J. Am. Stat. Assoc. 96, 247–259 (2001)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Claude Messan Setodji
    • 1
  • Maren Scheuner
    • 2
  • James S. Pankow
    • 3
  • Roger S. Blumenthal
    • 4
  • Haiying Chen
    • 5
  • Emmett Keeler
    • 2
  1. 1.RANDPittsburghUSA
  2. 2.RANDSanta MonicaUSA
  3. 3.University of MinnesotaMinneapolisUSA
  4. 4.Johns Hopkins UniversityBaltimoreUSA
  5. 5.Wake Forest UniversityWinston-SalemUSA

Personalised recommendations