Skip to main content

A Big Data Application of Machine Learning-Based Framework to Identify Type 2 Diabetes Through Electronic Health Records

  • Conference paper
  • First Online:
Knowledge Management in Organizations (KMO 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 731))

Included in the following conference series:

Abstract

Studies of the genotype-phenotype associations for diseases such as type 2 diabetes mellitus (T2DM) become increasingly popular in recent years. Commonly used methods are genome-wide association study (GWAS) and phenome-wide association study (PheWAS). To perform the above analysis, it is necessary to identify T2DM subjects’ cases and controls based on electronic health records (EHR). However, the existing expert-based identification algorithms often have a low recall and miss a large number of the valuable samples under conservative filtering standards. As a pilot study, this paper proposed a semi-automated framework based on machine learning. We target to optimize the filtering criteria to improve recall at the same time keeping low false-positive rate. We validate the proposed framework using a EHR database with ten years of records and show the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Centers for Disease Control and Prevention, National Diabetes Statistics Report: Estimates of Diabetes and Its Burden in the United States, US Department of Health and Human Services, Atlanta, GA(2014)

    Google Scholar 

  2. Xu, Y., Wang, L., He, J.: Prevalence and control of diabetes in Chinese adults. JAMA 310(9), 948–959 (2013)

    Article  Google Scholar 

  3. Zheng, T., Chen, Y., et al.: A machine learning-based framework to identify type 2 diabetes through electronic health records. Int. J. Med. Inform. 97, 120–127 (2017)

    Article  Google Scholar 

  4. Kho, A.N., et al.: Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc. 19(2), 212–218 (2012)

    Article  Google Scholar 

  5. Grant, R.W., et al.: Practice-linked online personal health records for type 2 diabetes mellitus. JAMA Intern. Med. 168(16), 1776–1782 (2008)

    Article  Google Scholar 

  6. Wei, W.Q., et al.: Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus. J. Am. Med. Inform. Assoc. 19(2), 219–224 (2012)

    Article  Google Scholar 

  7. Marinov, M., Mosa, A., Yoo, I., Boren, S.A.: Data-mining technologies for diabetes: a systematic review. J. Diabetes Sci. Technol. 5(6), 1549–1556 (2011)

    Article  Google Scholar 

  8. Mani, S., Chen, Y., Elasy, T., Clayton, W., Denny, J.: Type 2 diabetes risk forecasting from EMR data using machine learning. AMIA Ann. Symp. Proc. 2012, 606–615 (2012)

    Google Scholar 

  9. Huang, Y., McCullagh, P., Black, N., Harper, R.: Feature selection and classification model construction on type 2 diabetic patients’ data. Artif. Intell. Med. 41(3), 251–262 (2007)

    Article  Google Scholar 

  10. Wang, Y., Sung, P., Lin, P., Yu, Y., Chung, R.: A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families. BMC Genom. 16, 381 (2015)

    Article  Google Scholar 

  11. Mitteroecker, P., Cheverud, J.M., Pavlicev, M.: Multivariate analysis of genotype-phenotype association. Genetics 203, 3 (2016)

    Google Scholar 

Download references

Acknowledgments

We thank members of the Changning regional distributed EHR network, Shanghai, China, for sharing of the de-identified EHR data evaluated in this paper. We would like to thank the members of the Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA and Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tao Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zheng, T., Zhang, Y. (2017). A Big Data Application of Machine Learning-Based Framework to Identify Type 2 Diabetes Through Electronic Health Records. In: Uden, L., Lu, W., Ting, IH. (eds) Knowledge Management in Organizations. KMO 2017. Communications in Computer and Information Science, vol 731. Springer, Cham. https://doi.org/10.1007/978-3-319-62698-7_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-62698-7_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-62697-0

  • Online ISBN: 978-3-319-62698-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics