Abstract
Methicillin-resistant Staphylococcus aureus (MRSA), an antibiotic resistant bacteria, is a common cause of one of the more devastating hospital-acquired infections (HAI) in the United States. In this work, we study the practicality of leveraging machine learning methods for early detection of MRSA infections based on a rich variety of patient information commonly available in modern Electronic Health Records (EHR). We explore heterogeneous types of data in EHRs including on-admission demographics, throughout-stay time series and free-form clinical notes. On-admission data capture non-clinical information (e.g., age, marital status) while Throughout-stay data include vital signs, medications, laboratory studies, and other clinical assessments. Clinical notes, free-from text documents created by medical professionals, contain expert observations about patients. Our proposed system generates dense patient-level representations for each data type, extracting features from each of our data types. It then generates scores for each patient, indicating their risk of acquiring MRSA. We evaluate prediction performance achieved by core Machine Learning methods, namely Logistic Regression, Support Vector Machine, and Random Forest, when mining these different types of EHR data retrospectively to detect patterns predictive of MRSA infection. We evaluate classification performance using MIMIC III – a critical care data set comprised of 12 years of patient records from the Beth Israel Deaconess Medical Center Intensive Care Unit in Boston, MA. Our experiments show that while all types of data contain predictive signals, the fusion of all sources of data leads to the most effective prediction accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aureden, K., Arias, K., Burns, L., et al.: Guide to the Elimination of Methicillin-Resistant Staphylococcus Aureus (MRSA): Transmission in Hospital Settings. APIC, Washington, D.C. (2010)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Celi, L.A., Mark, R.G., Stone, D.J., Montgomery, R.A.: “Big Data” in the intensive care unit. Closing the data loop. Am. J. Respir. Crit. Care Med. 187(11), 1157–1160 (2013)
Chang, Y., et al.: Predicting hospital-acquired infections by scoring system with simple parameters. PLoS ONE 6(8), e23137 (2011)
CMS: Electronic health records (EHR) incentive programs (2011). https://www.cms.gov/Regulations-and-Guidance/Legislation/EHRIncentivePrograms/index.html
Congress of the United States: American Recovery and Reinvestment Act (2009). www.healthit.gov/policy-researchers-implementers/health-it-legislation
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Dantes, R., et al.: National burden of invasive Methicillin-resistant Staphylococcus aureus infections, United States, 2011. JAMA Intern. Med. 173(21), 1970–1978 (2013)
Dubois, S., Kale, D.C., Shah, N., Jung, K.: Learning effective representations from clinical notes. arXiv preprint arXiv:1705.07025 (2017)
Dutta, R., Dutta, R.: Maximum probability rule based classification of MRSA infections in hospital environment: using electronic nose. Sens. Actuators B: Chem. 120(1), 156–165 (2006)
Fukuta, Y., Cunningham, C.A., Harris, P.L., Wagener, M.M., Muder, R.R.: Identifying the risk factors for hospital-acquired methicillin-resistant Staphylococcus aureus (MRSA) infection among patients colonized with MRSA on admission. Infect. Control Hosp. Epidemiol. 33(12), 1219–1225 (2012)
Hajian-Tilaki, K.: Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J. Intern. Med. 4(2), 627 (2013)
Hartvigsen, T., Sen, C., Brownell, S., Teeple, E., Kong, X., Rundensteiner, E.: Early Prediction of MRSA Infections using Electronic Health Records. HealthInf, Valletta (2018)
Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13(6), 395 (2012)
Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)
Jones, D.A., Shipman, J.P., Plaut, D.A., Selden, C.R.: Characteristics of personal health records: findings of the Medical Library Association/National Library of Medicine joint electronic personal health record task force. JMLA: J. Med. Libr. Assoc. 98(3), 243 (2010)
Khalilia, M., Chakraborty, S., Popescu, M.: Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inform. Decis. Mak. 11(1), 51 (2011)
Lebedev, A., et al.: Random forest ensembles for detection and prediction of Alzheimer’s disease with a good between-cohort robustness. NeuroImage: Clin. 6, 115–125 (2014)
Maree, C., Daum, R., Boyle-Vavra, S., Matayoshi, K., Miller, L.: Community-associated methicillin-resistant Staphylococcus aureus isolates and healthcare-associated infections. Emerg. Infect. Dis. 13(2), 236 (2007). https://doi.org/10.3201/eid1302.060781
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Murdoch, T., Detsky, A.: The inevitable application of big data to health care. JAMA 309(13), 1351–1352 (2013)
Neu, H.C.: The crisis in antibiotic resistance. Science 257(5073), 1064–1074 (1992)
Nseir, S., Grailles, G., Soury-Lavergne, A., Minacori, F., Alves, I., Durocher, A.: Accuracy of American Thoracic Society/Infectious Diseases Society of America criteria in predicting infection or colonization with multidrug-resistant bacteria at intensive-care unit admission. Clin. Microbiol. Infect. 16(7), 902–908 (2010)
Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 3 (2014)
Sen, C., Hartvigsen, T., Rundensteiner, E., Claypool, K.: CREST - risk prediction for clostridium difficile infection using multimodal data mining. In: Altun, Y., et al. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10536, pp. 52–63. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71273-4_5
Shang, J.S., Lin, Y.E., Goetz, A.M.: Diagnosis of MRSA with neural networks and logistic regression approach. Health Care Manag. Sci. 3(4), 287 (2000)
Sintchenko, V., Coiera, E., Gilbert, G.L.: Decision support systems for antibiotic prescribing. Curr. Opin. Infect. Dis. 21(6), 573–579 (2008)
Ventola, C.L.: The antibiotic resistance crisis: Part 1: causes and threats. Pharm. Ther. 40(4), 277 (2015)
Visser, H., le Cessie, S., Vos, K., Breedveld, F.C., Hazes, J.M.: How to diagnose rheumatoid arthritis early: a prediction model for persistent (erosive) arthritis. Arthritis Rheumatol. 46(2), 357–365 (2002)
Weiner, L., et al.: Antimicrobial-resistant pathogens associated with healthcare-associated infections: summary of data reported to the National Healthcare Safety Network at the centers for disease control and prevention, 2011–2014. Infect. Control Hosp. Epidemiol. 37(11), 1288–1301 (2016)
Wiens, J., Guttag, J., Horvitz, E.: Learning evolving patient risk processes for c. diff. colonization. In: ICML Workshop on Machine Learning from Clinical Data (2012)
Wu, J., Roy, J., Stewart, W.F.: Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches. Med. Care 48(6), S106–S113 (2010)
Acknowledgements
Thomas Hartvigsen thanks the US Department of Education for supporting his PhD studies via the grant P200A150306 on “GAANN Fellowships to Support Data-Driven Computing Research”, while Cansu Sen thanks WPI for granting her the Arvid Anderson Fellowship (2015–2016) to pursue her PhD studies. We also thank the DSRG and Data Science Community at WPI for their continued support and feedback.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hartvigsen, T., Sen, C., Rundensteiner, E.A. (2019). Detecting MRSA Infections by Fusing Structured and Unstructured Electronic Health Record Data. In: Cliquet Jr., A., et al. Biomedical Engineering Systems and Technologies. BIOSTEC 2018. Communications in Computer and Information Science, vol 1024. Springer, Cham. https://doi.org/10.1007/978-3-030-29196-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-29196-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29195-2
Online ISBN: 978-3-030-29196-9
eBook Packages: Computer ScienceComputer Science (R0)