A Big Data Application of Machine Learning-Based Framework to Identify Type 2 Diabetes Through Electronic Health Records

Zheng, Tao; Zhang, Ya

doi:10.1007/978-3-319-62698-7_37

Tao Zheng^12,13 &
Ya Zhang¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 731))

Included in the following conference series:

International Conference on Knowledge Management in Organizations

2008 Accesses
2 Citations

Abstract

Studies of the genotype-phenotype associations for diseases such as type 2 diabetes mellitus (T2DM) become increasingly popular in recent years. Commonly used methods are genome-wide association study (GWAS) and phenome-wide association study (PheWAS). To perform the above analysis, it is necessary to identify T2DM subjects’ cases and controls based on electronic health records (EHR). However, the existing expert-based identification algorithms often have a low recall and miss a large number of the valuable samples under conservative filtering standards. As a pilot study, this paper proposed a semi-automated framework based on machine learning. We target to optimize the filtering criteria to improve recall at the same time keeping low false-positive rate. We validate the proposed framework using a EHR database with ten years of records and show the effectiveness of the proposed framework.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Centers for Disease Control and Prevention, National Diabetes Statistics Report: Estimates of Diabetes and Its Burden in the United States, US Department of Health and Human Services, Atlanta, GA(2014)
Google Scholar
Xu, Y., Wang, L., He, J.: Prevalence and control of diabetes in Chinese adults. JAMA 310(9), 948–959 (2013)
Article Google Scholar
Zheng, T., Chen, Y., et al.: A machine learning-based framework to identify type 2 diabetes through electronic health records. Int. J. Med. Inform. 97, 120–127 (2017)
Article Google Scholar
Kho, A.N., et al.: Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc. 19(2), 212–218 (2012)
Article Google Scholar
Grant, R.W., et al.: Practice-linked online personal health records for type 2 diabetes mellitus. JAMA Intern. Med. 168(16), 1776–1782 (2008)
Article Google Scholar
Wei, W.Q., et al.: Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus. J. Am. Med. Inform. Assoc. 19(2), 219–224 (2012)
Article Google Scholar
Marinov, M., Mosa, A., Yoo, I., Boren, S.A.: Data-mining technologies for diabetes: a systematic review. J. Diabetes Sci. Technol. 5(6), 1549–1556 (2011)
Article Google Scholar
Mani, S., Chen, Y., Elasy, T., Clayton, W., Denny, J.: Type 2 diabetes risk forecasting from EMR data using machine learning. AMIA Ann. Symp. Proc. 2012, 606–615 (2012)
Google Scholar
Huang, Y., McCullagh, P., Black, N., Harper, R.: Feature selection and classification model construction on type 2 diabetic patients’ data. Artif. Intell. Med. 41(3), 251–262 (2007)
Article Google Scholar
Wang, Y., Sung, P., Lin, P., Yu, Y., Chung, R.: A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families. BMC Genom. 16, 381 (2015)
Article Google Scholar
Mitteroecker, P., Cheverud, J.M., Pavlicev, M.: Multivariate analysis of genotype-phenotype association. Genetics 203, 3 (2016)
Google Scholar

Download references

Acknowledgments

We thank members of the Changning regional distributed EHR network, Shanghai, China, for sharing of the de-identified EHR data evaluated in this paper. We would like to thank the members of the Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA and Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.

Author information

Authors and Affiliations

Institute of Image Communication and Networking, Shanghai Jiao Tong University, Shanghai, China
Tao Zheng & Ya Zhang
Tongren Hospital, Shanghai Jiao Tong University, Shanghai, China
Tao Zheng

Authors

Tao Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ya Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tao Zheng .

Editor information

Editors and Affiliations

University of Staffordshire, Stoke-on-Trent, Staffordshire, United Kingdom
Lorna Uden
Beijing Jiaotong University, Beijing, China
Wei Lu
Department of Information Management, University of Kaohsiung, Kaohsiung, Taiwan
I-Hsien Ting

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, T., Zhang, Y. (2017). A Big Data Application of Machine Learning-Based Framework to Identify Type 2 Diabetes Through Electronic Health Records. In: Uden, L., Lu, W., Ting, IH. (eds) Knowledge Management in Organizations. KMO 2017. Communications in Computer and Information Science, vol 731. Springer, Cham. https://doi.org/10.1007/978-3-319-62698-7_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-62698-7_37
Published: 12 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62697-0
Online ISBN: 978-3-319-62698-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics