Big Data Cohort Extraction for Personalized Statin Treatment and Machine Learning

  • Terrence J. AdamEmail author
  • Chih-Lin Chi
Part of the Methods in Molecular Biology book series (MIMB, volume 1939)


The creation of big clinical data cohorts for machine learning and data analysis require a number of steps from the beginning to successful completion. Similar to data set preprocessing in other fields, there is an initial need to complete data quality evaluation; however, with large heterogeneous clinical data sets, it is important to standardize the data in order to facilitate dimensionality reduction. This is particularly important for clinical data sets including medications as a core data component due to the complexity of coded medication data. Data integration at the individual subject level is essential with medication-related machine learning applications since it can be difficult to accurately identify drug exposures, therapeutic effects, and adverse drug events without having high-quality data integration of insurance, medication, and medical data. Successful data integration and standardization efforts can substantially improve the ability to identify and replicate personalized treatment pathways to optimize drug therapy.

Key words

Medication safety Clinical data integration Clinical comorbidity evaluation Personalized medication therapy 


  1. 1.
    Jill Kolesar LV (2015) McGraw-Hill's 2016/2017 top 300 pharmacy drug cards. McGraw-HillGoogle Scholar
  2. 2.
    ClinCalc (2017) The Top 200 of 2017 ClinCalc LLC. Accessed 30 July 2017
  3. 3.
    Agency for Healthcare Research and Quality R, MD (2017) Medical expenditure panel surveyGoogle Scholar
  4. 4.
    Food and Drug Administration US (2017) National drug code directory. Accessed 30 July 2017
  5. 5.
    Food and Drug Administration US (2017) Structured product labeling resources. Accessed 31 July 2017
  6. 6.
    WHO Collaborating Centre for Drug Statistics O (2017) ATC: structure and principles. Accessed 31 July 2017
  7. 7.
    WHO Collaborating Centre for Drug Statistics O (2017) ATC/DDD index 2017. Accessed 31 July 2017
  8. 8.
    U.S. National Library of Medicine B (2017) RxNorm technical documentation. U.S. National Library of Medicine. Accessed 31 July 2017
  9. 9.
    Svensson-Ranallo PA, Adam TJ, Sainfort F (2011) A framework and standardized methodology for developing minimum clinical datasets. AMIA Jt Summits Transl Sci Proc 2011:54–58Google Scholar
  10. 10.
    Regenstrief I (2017) LOINC: the international standard for identifying health measurements, observations, and documents. Regenstrief Institute Accessed 31 July 2017
  11. 11.
    Agency for Healthcare Research and Quality R, MD (2017) HCUP chronic condition indicator. Healthcare cost and utilization project (HCUP): Chronic condition indicator (CCI) for ICD-9-CM. Accessed 31 July 2017
  12. 12.
    Agency for Healthcare Research and Quality R, MD (2012) HCUP CCS fact sheet. Healthcare cost and utilization project (HCUP). Accessed 31 July 2017
  13. 13.
    Charlson ME, Pompei P, Ales KL, MacKenzie CR (1987) A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 40(5):373–383Google Scholar
  14. 14.
    Manitoba Centre for Health Policy C (2016) Concept: charlson comorbidity index Accessed 31 July 2017
  15. 15.
    Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, Saunders LD, Beck CA, Feasby TE, Ghali WA (2005) Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care 43(11):1130–1139Google Scholar
  16. 16.
    Deyo RA, Cherkin DC, Ciol MA (1992) Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol 45(6):613–619Google Scholar
  17. 17.
    Elixhauser A, Steiner C, Harris DR, Coffey RM (1998) Comorbidity measures for use with administrative data. Med Care 36(1):8–27Google Scholar
  18. 18.
    Manitoba Centre for Health Policy C (2016) Concept: elixhauser comorbidity index Accessed 31 July 2017
  19. 19.
    Chi C-L, Wang J, Clancy TR, Robinson JG, Tonellato PJ, Adam TJ (2017) Big data cohort extraction to facilitate machine learning to improve statin treatment. West J Nurs Res 39(1):42–62. Google Scholar
  20. 20.
    Hebert PL, Geiss LS, Tierney EF, Engelgau MM, Yawn BP, McBean AM (1999) Identifying persons with diabetes using medicare claims data. Am J Med Qual 14(6):270–277. Google Scholar
  21. 21.
    Center for Medicare and Medicaid Services H (2017) 2017 ICD-10-CM and GEMs. Accessed 28 July 2017
  22. 22.
    Olson CH, Dierich M, Adam T, Westra BL (2014) Optimization of decision support tool using medication regimens to assess rehospitalization risks. Appl Clin Inform 5(3):773–788. Google Scholar
  23. 23.
    Benner JS, Glynn RJ, Mogun H, Neumann PJ, Weinstein MC, Avorn J (2002) Long-term persistence in use of statin therapy in elderly patients. JAMA 288(4):455–461Google Scholar
  24. 24.
    Nau DP (2017) Proportion of days covered (PDC) as a preferred method of measuring medication adherence. Pharmacy quality alliance. Accessed 31 July 2017

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of Pharmaceutical Care and Health Systems, Health Informatics, Social and Administrative PharmacyUniversity of Minnesota College of PharmacyMinneapolisUSA
  2. 2.University of Minnesota School of NursingMinneapolisUSA

Personalised recommendations