Skip to main content

Detecting MRSA Infections by Fusing Structured and Unstructured Electronic Health Record Data

  • Conference paper
  • First Online:
Biomedical Engineering Systems and Technologies (BIOSTEC 2018)

Abstract

Methicillin-resistant Staphylococcus aureus (MRSA), an antibiotic resistant bacteria, is a common cause of one of the more devastating hospital-acquired infections (HAI) in the United States. In this work, we study the practicality of leveraging machine learning methods for early detection of MRSA infections based on a rich variety of patient information commonly available in modern Electronic Health Records (EHR). We explore heterogeneous types of data in EHRs including on-admission demographics, throughout-stay time series and free-form clinical notes. On-admission data capture non-clinical information (e.g., age, marital status) while Throughout-stay data include vital signs, medications, laboratory studies, and other clinical assessments. Clinical notes, free-from text documents created by medical professionals, contain expert observations about patients. Our proposed system generates dense patient-level representations for each data type, extracting features from each of our data types. It then generates scores for each patient, indicating their risk of acquiring MRSA. We evaluate prediction performance achieved by core Machine Learning methods, namely Logistic Regression, Support Vector Machine, and Random Forest, when mining these different types of EHR data retrospectively to detect patterns predictive of MRSA infection. We evaluate classification performance using MIMIC III – a critical care data set comprised of 12 years of patient records from the Beth Israel Deaconess Medical Center Intensive Care Unit in Boston, MA. Our experiments show that while all types of data contain predictive signals, the fusion of all sources of data leads to the most effective prediction accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aureden, K., Arias, K., Burns, L., et al.: Guide to the Elimination of Methicillin-Resistant Staphylococcus Aureus (MRSA): Transmission in Hospital Settings. APIC, Washington, D.C. (2010)

    Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  Google Scholar 

  3. Celi, L.A., Mark, R.G., Stone, D.J., Montgomery, R.A.: “Big Data” in the intensive care unit. Closing the data loop. Am. J. Respir. Crit. Care Med. 187(11), 1157–1160 (2013)

    Article  Google Scholar 

  4. Chang, Y., et al.: Predicting hospital-acquired infections by scoring system with simple parameters. PLoS ONE 6(8), e23137 (2011)

    Article  Google Scholar 

  5. CMS: Electronic health records (EHR) incentive programs (2011). https://www.cms.gov/Regulations-and-Guidance/Legislation/EHRIncentivePrograms/index.html

  6. Congress of the United States: American Recovery and Reinvestment Act (2009). www.healthit.gov/policy-researchers-implementers/health-it-legislation

  7. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  8. Dantes, R., et al.: National burden of invasive Methicillin-resistant Staphylococcus aureus infections, United States, 2011. JAMA Intern. Med. 173(21), 1970–1978 (2013)

    Google Scholar 

  9. Dubois, S., Kale, D.C., Shah, N., Jung, K.: Learning effective representations from clinical notes. arXiv preprint arXiv:1705.07025 (2017)

  10. Dutta, R., Dutta, R.: Maximum probability rule based classification of MRSA infections in hospital environment: using electronic nose. Sens. Actuators B: Chem. 120(1), 156–165 (2006)

    Article  Google Scholar 

  11. Fukuta, Y., Cunningham, C.A., Harris, P.L., Wagener, M.M., Muder, R.R.: Identifying the risk factors for hospital-acquired methicillin-resistant Staphylococcus aureus (MRSA) infection among patients colonized with MRSA on admission. Infect. Control Hosp. Epidemiol. 33(12), 1219–1225 (2012)

    Article  Google Scholar 

  12. Hajian-Tilaki, K.: Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J. Intern. Med. 4(2), 627 (2013)

    Google Scholar 

  13. Hartvigsen, T., Sen, C., Brownell, S., Teeple, E., Kong, X., Rundensteiner, E.: Early Prediction of MRSA Infections using Electronic Health Records. HealthInf, Valletta (2018)

    Book  Google Scholar 

  14. Jensen, P.B., Jensen, L.J., Brunak, S.: Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13(6), 395 (2012)

    Article  Google Scholar 

  15. Johnson, A.E., et al.: MIMIC-III, a freely accessible critical care database. Sci. Data 3, 160035 (2016)

    Article  Google Scholar 

  16. Jones, D.A., Shipman, J.P., Plaut, D.A., Selden, C.R.: Characteristics of personal health records: findings of the Medical Library Association/National Library of Medicine joint electronic personal health record task force. JMLA: J. Med. Libr. Assoc. 98(3), 243 (2010)

    Article  Google Scholar 

  17. Khalilia, M., Chakraborty, S., Popescu, M.: Predicting disease risks from highly imbalanced data using random forest. BMC Med. Inform. Decis. Mak. 11(1), 51 (2011)

    Article  Google Scholar 

  18. Lebedev, A., et al.: Random forest ensembles for detection and prediction of Alzheimer’s disease with a good between-cohort robustness. NeuroImage: Clin. 6, 115–125 (2014)

    Article  Google Scholar 

  19. Maree, C., Daum, R., Boyle-Vavra, S., Matayoshi, K., Miller, L.: Community-associated methicillin-resistant Staphylococcus aureus isolates and healthcare-associated infections. Emerg. Infect. Dis. 13(2), 236 (2007). https://doi.org/10.3201/eid1302.060781

    Article  Google Scholar 

  20. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  21. Murdoch, T., Detsky, A.: The inevitable application of big data to health care. JAMA 309(13), 1351–1352 (2013)

    Article  Google Scholar 

  22. Neu, H.C.: The crisis in antibiotic resistance. Science 257(5073), 1064–1074 (1992)

    Article  Google Scholar 

  23. Nseir, S., Grailles, G., Soury-Lavergne, A., Minacori, F., Alves, I., Durocher, A.: Accuracy of American Thoracic Society/Infectious Diseases Society of America criteria in predicting infection or colonization with multidrug-resistant bacteria at intensive-care unit admission. Clin. Microbiol. Infect. 16(7), 902–908 (2010)

    Article  Google Scholar 

  24. Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Health Inf. Sci. Syst. 2(1), 3 (2014)

    Article  Google Scholar 

  25. Sen, C., Hartvigsen, T., Rundensteiner, E., Claypool, K.: CREST - risk prediction for clostridium difficile infection using multimodal data mining. In: Altun, Y., et al. (eds.) ECML PKDD 2017. LNCS (LNAI), vol. 10536, pp. 52–63. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-71273-4_5

    Chapter  Google Scholar 

  26. Shang, J.S., Lin, Y.E., Goetz, A.M.: Diagnosis of MRSA with neural networks and logistic regression approach. Health Care Manag. Sci. 3(4), 287 (2000)

    Article  Google Scholar 

  27. Sintchenko, V., Coiera, E., Gilbert, G.L.: Decision support systems for antibiotic prescribing. Curr. Opin. Infect. Dis. 21(6), 573–579 (2008)

    Article  Google Scholar 

  28. Ventola, C.L.: The antibiotic resistance crisis: Part 1: causes and threats. Pharm. Ther. 40(4), 277 (2015)

    Google Scholar 

  29. Visser, H., le Cessie, S., Vos, K., Breedveld, F.C., Hazes, J.M.: How to diagnose rheumatoid arthritis early: a prediction model for persistent (erosive) arthritis. Arthritis Rheumatol. 46(2), 357–365 (2002)

    Article  Google Scholar 

  30. Weiner, L., et al.: Antimicrobial-resistant pathogens associated with healthcare-associated infections: summary of data reported to the National Healthcare Safety Network at the centers for disease control and prevention, 2011–2014. Infect. Control Hosp. Epidemiol. 37(11), 1288–1301 (2016)

    Article  Google Scholar 

  31. Wiens, J., Guttag, J., Horvitz, E.: Learning evolving patient risk processes for c. diff. colonization. In: ICML Workshop on Machine Learning from Clinical Data (2012)

    Google Scholar 

  32. Wu, J., Roy, J., Stewart, W.F.: Prediction modeling using EHR data: challenges, strategies, and a comparison of machine learning approaches. Med. Care 48(6), S106–S113 (2010)

    Article  Google Scholar 

Download references

Acknowledgements

Thomas Hartvigsen thanks the US Department of Education for supporting his PhD studies via the grant P200A150306 on “GAANN Fellowships to Support Data-Driven Computing Research”, while Cansu Sen thanks WPI for granting her the Arvid Anderson Fellowship (2015–2016) to pursue her PhD studies. We also thank the DSRG and Data Science Community at WPI for their continued support and feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thomas Hartvigsen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hartvigsen, T., Sen, C., Rundensteiner, E.A. (2019). Detecting MRSA Infections by Fusing Structured and Unstructured Electronic Health Record Data. In: Cliquet Jr., A., et al. Biomedical Engineering Systems and Technologies. BIOSTEC 2018. Communications in Computer and Information Science, vol 1024. Springer, Cham. https://doi.org/10.1007/978-3-030-29196-9_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29196-9_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29195-2

  • Online ISBN: 978-3-030-29196-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics