Abstract
Studies of the genotype-phenotype associations for diseases such as type 2 diabetes mellitus (T2DM) become increasingly popular in recent years. Commonly used methods are genome-wide association study (GWAS) and phenome-wide association study (PheWAS). To perform the above analysis, it is necessary to identify T2DM subjects’ cases and controls based on electronic health records (EHR). However, the existing expert-based identification algorithms often have a low recall and miss a large number of the valuable samples under conservative filtering standards. As a pilot study, this paper proposed a semi-automated framework based on machine learning. We target to optimize the filtering criteria to improve recall at the same time keeping low false-positive rate. We validate the proposed framework using a EHR database with ten years of records and show the effectiveness of the proposed framework.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Centers for Disease Control and Prevention, National Diabetes Statistics Report: Estimates of Diabetes and Its Burden in the United States, US Department of Health and Human Services, Atlanta, GA(2014)
Xu, Y., Wang, L., He, J.: Prevalence and control of diabetes in Chinese adults. JAMA 310(9), 948–959 (2013)
Zheng, T., Chen, Y., et al.: A machine learning-based framework to identify type 2 diabetes through electronic health records. Int. J. Med. Inform. 97, 120–127 (2017)
Kho, A.N., et al.: Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc. 19(2), 212–218 (2012)
Grant, R.W., et al.: Practice-linked online personal health records for type 2 diabetes mellitus. JAMA Intern. Med. 168(16), 1776–1782 (2008)
Wei, W.Q., et al.: Impact of data fragmentation across healthcare centers on the accuracy of a high-throughput clinical phenotyping algorithm for specifying subjects with type 2 diabetes mellitus. J. Am. Med. Inform. Assoc. 19(2), 219–224 (2012)
Marinov, M., Mosa, A., Yoo, I., Boren, S.A.: Data-mining technologies for diabetes: a systematic review. J. Diabetes Sci. Technol. 5(6), 1549–1556 (2011)
Mani, S., Chen, Y., Elasy, T., Clayton, W., Denny, J.: Type 2 diabetes risk forecasting from EMR data using machine learning. AMIA Ann. Symp. Proc. 2012, 606–615 (2012)
Huang, Y., McCullagh, P., Black, N., Harper, R.: Feature selection and classification model construction on type 2 diabetic patients’ data. Artif. Intell. Med. 41(3), 251–262 (2007)
Wang, Y., Sung, P., Lin, P., Yu, Y., Chung, R.: A multi-SNP association test for complex diseases incorporating an optimal P-value threshold algorithm in nuclear families. BMC Genom. 16, 381 (2015)
Mitteroecker, P., Cheverud, J.M., Pavlicev, M.: Multivariate analysis of genotype-phenotype association. Genetics 203, 3 (2016)
Acknowledgments
We thank members of the Changning regional distributed EHR network, Shanghai, China, for sharing of the de-identified EHR data evaluated in this paper. We would like to thank the members of the Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA and Department of Biomedical Informatics, Vanderbilt University, Nashville, TN, USA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Zheng, T., Zhang, Y. (2017). A Big Data Application of Machine Learning-Based Framework to Identify Type 2 Diabetes Through Electronic Health Records. In: Uden, L., Lu, W., Ting, IH. (eds) Knowledge Management in Organizations. KMO 2017. Communications in Computer and Information Science, vol 731. Springer, Cham. https://doi.org/10.1007/978-3-319-62698-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-319-62698-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62697-0
Online ISBN: 978-3-319-62698-7
eBook Packages: Computer ScienceComputer Science (R0)