Towards more Accessible Precision Medicine: Building a more Transferable Machine Learning Model to Support Prognostic Decisions for Micro- and Macrovascular Complications of Type 2 Diabetes Mellitus

  • Era KimEmail author
  • Pedro J. Caraballo
  • M. Regina Castro
  • David S. Pieczkiewicz
  • Gyorgy J. Simon
Patient Facing Systems
Part of the following topical collections:
  1. Patient Facing Systems


Although machine learning models are increasingly being developed for clinical decision support for patients with type 2 diabetes, the adoption of these models into clinical practice remains limited. Currently, machine learning (ML) models are being constructed on local healthcare systems and are validated internally with no expectation that they would validate externally and thus, are rarely transferrable to a different healthcare system. In this work, we aim to demonstrate that (1) even a complex ML model built on a national cohort can be transferred to two local healthcare systems, (2) while a model constructed on a local healthcare system’s cohort is difficult to transfer; (3) we examine the impact of training cohort size on the transferability; and (4) we discuss criteria for external validity. We built a model using our previously published Multi-Task Learning-based methodology on a national cohort extracted from OptumLabs® Data Warehouse and transferred the model to two local healthcare systems (i.e., University of Minnesota Medical Center and Mayo Clinic) for external evaluation. The model remained valid when applied to the local patient populations and performed as well as locally constructed models (concordance: .73–.92), demonstrating transferability. The performance of the locally constructed models reduced substantially when applied to each other’s healthcare system (concordance: .62–.90). We believe that our modeling approach, in which a model is learned from a national cohort and is externally validated, produces a transferable model, allowing patients at smaller healthcare systems to benefit from precision medicine.


Machine learning Large national data External validation Transferable model Complications of type 2 diabetes Precision medicine 



This work was supported by NIH award R01 LM011972, NSF awards IIS 1602198. The views expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Compliance with ethical standard

Conflict of interest

The access to the claims and EHR data from the OLDW was made possible through use of an OptumLabs research credit. Author Era Kim owns stock in UnitedHealth Group.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

10916_2019_1321_MOESM1_ESM.pdf (15 kb)
ESM 1 (PDF 15 kb)
10916_2019_1321_MOESM2_ESM.pdf (15 kb)
ESM 2 (PDF 15 kb)


  1. 1.
    Obermeyer, Z., and Emanuel, E. J., Predicting the future - big data, machine learning, and clinical medicine. The New England journal of medicine 375(13):1216–1219, 2016.CrossRefGoogle Scholar
  2. 2.
    Florez, J. C., Precision medicine in diabetes: Is it time? Diabetes Care 39(7):1085–1088, 2016.CrossRefGoogle Scholar
  3. 3.
    Kavakiotis, I., Tsave, O., Salifoglou, A., Maglaveras, N., Vlahavas, I., and Chouvarda, I., Machine learning and data mining methods in diabetes research. Comput. Struct. Biotechnol. J. 15:104–116, 2017.CrossRefGoogle Scholar
  4. 4.
    Perveen, S. et al., A systematic machine learning based approach for the diagnosis of non-alcoholic fatty liver disease risk and progression. Comput. Struct. Biotechnol. J. 13(December):1445–1454, 2017, 2016.Google Scholar
  5. 5.
    Lagani, V. et al., Development and validation of risk assessment models for diabetes-related complications based on the DCCT/EDIC data. J. Diabetes Complications 29(4):479–487, 2015.CrossRefGoogle Scholar
  6. 6.
    Cichosz, S. L., Johansen, M. D., and Hejlesen, O., Toward big data analytics. Review of Predictive Models in Management of Diabetes and Its Complications, 2016.Google Scholar
  7. 7.
    Bengio, Y., Delalleau, O., and Simard, C., Decision trees do not Generaliza to new variations. Comput. Intell. 26(4):449–467, 2010.CrossRefGoogle Scholar
  8. 8.
    Lisboa, P. J., and Taktak, A. F. G., The use of artificial neural networks in decision support in cancer: A systematic review. Neural Networks 19(4):408–415, 2006.CrossRefGoogle Scholar
  9. 9.
    Cruz, J. A., and Wishart, D. S., Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2:59–77, 2006.CrossRefGoogle Scholar
  10. 10.
    Kourou, K., Exarchos, T. P., Exarchos, K. P., Karamouzis, M. V., and Fotiadis, D. I., Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 13:8–17, 2015.CrossRefGoogle Scholar
  11. 11.
    Weeks, J., and Pardee, R., Learning to share health care data : A brief timeline of influential common data models and distributed health data networks in U.S. health care research. Gener. Evid. Methods to Improv. patient outcomes 7(1):1–7, 2019.CrossRefGoogle Scholar
  12. 12.
    Arterburn, D. et al., Comparative effectiveness and safety of bariatric procedures for weight loss. Ann. Intern. Med. 169(11):741–750, 2018.CrossRefGoogle Scholar
  13. 13.
    Inge, T. H. et al., Comparative effectiveness of bariatric procedures among adolescents : The PCORnet bariatric study. Surg. Obes. Relat. Dis. 14(9):1374–1386, 2018.CrossRefGoogle Scholar
  14. 14.
    C. L. Roumie et al., “Performance of a computable phenotype for identification of patients with diabetes within PCORnet : The Patient - Centered Clinical Research Network,” no. December 2018, pp. 1–8, 2019.Google Scholar
  15. 15.
    Chubak, J. et al., The Cancer research network : A platform for epidemiologic and health services research on cancer prevention, care, and outcomes in large, stable populations. Cancer Causes Control 27(11):1315–1323, 2016.CrossRefGoogle Scholar
  16. 16.
    Hripcsak, G., Ryan, P. B., Duke, J. D., and Shah, N. H., R. Woong, and V. Huser, “Characterizing treatment pathways at scale using the OHDSI network,” 113(27):7329–7336, 2016.Google Scholar
  17. 17.
    Wallace, P. J., Shah, N. D., Dennen, T., Bleicher, P. A., and Crown, W. H., Optum labs: Building a novel node in the learning health care system. Health Aff. 33(7):1187–1194, 2014.CrossRefGoogle Scholar
  18. 18.
    OptumLabs, “OptumLabs and OptumLabs Data Warehouse (OLDW) Descriptions and Citation,” Cambridge, MA: n.p., PDF, Reproduced with permission from OptumLabs, 2018.Google Scholar
  19. 19.
    American Diabetes Association (ADA), “Standards of Medical Care in Diabetes - 2017,” Diabetes Care, vol. 40 (sup 1), no. January, pp. s4–s128, 2017.Google Scholar
  20. 20.
    Uno, H., Cai, T., Pencina, M. J., D’Agostino, R. B., and Wei, L. J., On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data. Stat. Med. 30(10):1105–1117, 2011.PubMedPubMedCentralGoogle Scholar
  21. 21.
    Nathan, D. M., Kuenen, J., Borg, R., Zheng, H., Schoenfeld, D., and Heine, R. J., Translating the A1C assay into estimated average glucose values. Diabetes Care 31(8):1473–1478, 2008.CrossRefGoogle Scholar
  22. 22.
    E. Kim, D. S. Pieczkiewicz, M. R. Castro, P. J. Caraballo, and G. J. Simon, “Multi-Task Learning to Identify Outcome-Specific Risk Factors that Distinguish Individual Micro and Macrovascular Complications of Type 2 Diabetes,” AMIA 2018 Informatics Summit Proc., 2018.Google Scholar
  23. 23.
    Deedwania, P. C. et al., Differing predictive relationships between baseline LDL-C, systolic blood pressure, and cardiovascular outcomes. Int. J. Cardiol. 222:548–556, 2016.CrossRefGoogle Scholar
  24. 24.
    Despres, J. P., Lemieux, I., Dagenais, G. R., Cantin, B., and Lamarche, B., HDL-cholesterol as a marker of coronary heart disease risk: The Quebec cardiovascular study. Atherosclerosis 153:263–272, 2000.CrossRefGoogle Scholar
  25. 25.
    Retnakaran, R., Cull, C. A., Thorne, K. I., Adler, A. I., and Holman, R. R., Risk factors for renal dysfunction in type 2 diabetes. Diabetes 55(6):1832–1839, 2006.CrossRefGoogle Scholar
  26. 26.
    Franklin, S. et al., Does the relation of blood pressure to coronary heart disease risk change with aging?: The Framingham heart study. Circulation 103(9):1245–1249, 2001.CrossRefGoogle Scholar
  27. 27.
    Evans, G. W. et al., Effects of intensive blood-pressure control in type 2 diabetes mellitus. N. Engl. J. Med. 362(17):1575–1585, 2010.CrossRefGoogle Scholar
  28. 28.
    Li, W. et al., Body mass index and heart failure among patients with type 2 diabetes mellitus. Circ. Hear. Fail. 8(3):455–463, 2015.CrossRefGoogle Scholar
  29. 29.
    Schulz, K. F. et al., CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials. BMC Med. 8(1):18, 2010.CrossRefGoogle Scholar
  30. 30.
    von Elm, E., Altman, D. G., Egger, M., Pocock, S. J., Gøtzsche, P. C., and Vandenbroucke, J. P., The strengthening the reporting of observational studies in epidemiology (STROBE) statement: Guidelines for reporting observational studies. Int. J. Surg. 12(12):1495–1499, 2014.CrossRefGoogle Scholar
  31. 31.
    Bossuyt, P. M. et al., RESEARCH METHODS & REPORTING STARD 2015 : An updated list of essential items for. Radiographies 277(3):1–9, 2015.Google Scholar
  32. 32.
    Collins, G. S., Reitsma, J. B., Altman, D. G., and Moons, K. G. M., Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. Eur. Urol. 67(6):1142–1151, 2015.CrossRefGoogle Scholar
  33. 33.
    Ahmed, I., Debray, T. P. A., Moons, K. G. M., and Riley, R. D., Developing and validating risk prediction models in an individual participant data meta-analysis. BMC Med. Res. Methodol. 14(1):3, 2014.CrossRefGoogle Scholar
  34. 34.
    Abo-Zaid, G., Sauerbrei, W., and Riley, R. D., Individual participant data meta-analysis of prognostic factor studies: State of the art? BMC Med. Res. Methodol. 12:56, 2012.CrossRefGoogle Scholar
  35. 35.
    Dekkers, O. M., von Elm, E., Algra, A., Romijn, J. A., and Vandenbroucke, J. P., How to assess the external validity of therapeutic trials: A conceptual approach. Int. J. Epidemiol. 39(1):89–94, 2010.CrossRefGoogle Scholar
  36. 36.
    Van Soest, J. et al., Prospective validation of pathologic complete response models in rectal cancer: Transferability and reproducibility. Med. Phys. 44(9), 2017.Google Scholar
  37. 37.
    Huang, J., and Ling, C. X., Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17(3):299–310, 2005.CrossRefGoogle Scholar
  38. 38.
    Sokolova, M., and Lapalme, G., A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 45(4):427–437, 2009.CrossRefGoogle Scholar
  39. 39.
    Osborn, C. Y., Groot, M., and Wagner, J. A., Racial and ethnic disparities in diabetes complications in the northeastern United States: The role of socioeconomic status. J. Natl. Med. Assoc. 105(1):51–58, Jan. 2013.CrossRefGoogle Scholar
  40. 40.
    Maier, W. et al., The impact of regional deprivation and individual socio-economic status on the prevalence of type 2 diabetes in Germany. A pooled analysis of five population-based studies. Diabet. Med. 30(3):e78–e86, Mar. 2013.CrossRefGoogle Scholar
  41. 41.
    Hu, R., Shi, L., Rane, S., Zhu, J., and Chen, C. C., Insurance, racial/ethnic, SES-related disparities in quality of care among US adults with diabetes. J. Immigr. Minor. Heal. 16(4):565–575, 2014.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Era Kim
    • 1
    • 2
    Email author
  • Pedro J. Caraballo
    • 3
    • 4
  • M. Regina Castro
    • 5
  • David S. Pieczkiewicz
    • 1
  • Gyorgy J. Simon
    • 1
    • 6
  1. 1.Institute for Health InformaticsUniversity of MinnesotaMinneapolisUSA
  2. 2.OptumLabs Visiting FellowCambridgeUSA
  3. 3.Division of General Internal Medicine. Department of MedicineMayo ClinicRochesterUSA
  4. 4.Center for Translational Informatics and Knowledge ManagementMayo ClinicRochesterUSA
  5. 5.Division of Endocrinology and Metabolism, Department of MedicineMayo ClinicRochesterUSA
  6. 6.Department of MedicineUniversity of MinnesotaMinneapolisUSA

Personalised recommendations