Skip to main content

Injecting Domain Knowledge in Electronic Medical Records to Improve Hospitalization Prediction

  • Conference paper
  • First Online:
The Semantic Web (ESWC 2019)

Abstract

Electronic medical records (EMR) contain key information about the different symptomatic episodes that a patient went through. They carry a great potential in order to improve the well-being of patients and therefore represent a very valuable input for artificial intelligence approaches. However, the explicit knowledge directly available through these records remains limited, the extracted features to be used by machine learning algorithms do not contain all the implicit knowledge of medical expert. In order to evaluate the impact of domain knowledge when processing EMRs, we augment the features extracted from EMRs with ontological resources before turning them into vectors used by machine learning algorithms. We evaluate these augmentations with several machine learning algorithms to predict hospitalization. Our approach was experimented on data from the PRIMEGE PACA database that contains more than 350,000 consultations carried out by 16 general practitioners (GPs).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://fr.dbpedia.org/sparql.

  2. 2.

    https://www.wikidata.org.

  3. 3.

    https://query.wikidata.org/sparql.

  4. 4.

    Anatomical Therapeutic Chemical Classification, https://bioportal.bioontology.org/ontologies/ATC.

  5. 5.

    National Drug File - Reference Terminology, https://bioportal.bioontology.org/ontologies/NDF-RT.

  6. 6.

    International Primary Care Classification, http://bioportal.lirmm.fr/ontologies/CISP-2.

  7. 7.

    https://www.w3.org/TR/sparql11-query/.

  8. 8.

    http://corese.inria.fr.

References

  1. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  2. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Cawley, G.C., Talbot, N.L.: On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)

    MathSciNet  MATH  Google Scholar 

  4. Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)

    Google Scholar 

  5. Choi, E., et al.: GRAM: graph-based attention model for healthcare representation learning. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 787–795. ACM (2017)

    Google Scholar 

  6. Corby, O., Zucker, C.F.: The KGRAM abstract machine for knowledge graph querying. In: Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1, pp. 338–341. IEEE (2010)

    Google Scholar 

  7. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics) (2013)

    Google Scholar 

  8. Forman, G., Scholz, M.: Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement. ACM SIGKDD Explor. Newsl. 12(1), 49–57 (2010)

    Article  Google Scholar 

  9. Goldstein, B.A., Navar, A.M., Pencina, M.J., Ioannidis, J.: Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 24(1), 198–208 (2017)

    Article  Google Scholar 

  10. Lacroix-Hugues, V., Darmon, D., Pradier, C., Staccini, P.: Creation of the first french database in primary care using the ICPC2: feasibility study. Stud. Health Technol. Inform. 245, 462–466 (2017)

    Google Scholar 

  11. McCullagh, P., Nelder, J.A.: Generalized Linear Models, vol. 37. CRC Press, Boca Raton (1989)

    Book  MATH  Google Scholar 

  12. Min, H., Mobahi, H., Irvin, K., Avramovic, S., Wojtusiak, J.: Predicting activities of daily living for cancer patients using an ontology-guided machine learning methodology. J. Biomed. Semant. 8(1), 39 (2017)

    Article  Google Scholar 

  13. Ordónez, F.J., de Toledo, P., Sanchis, A.: Activity recognition using hybrid generative/discriminative models on home environments using binary sensors. Sensors 13(5), 5460–5477 (2013)

    Article  Google Scholar 

  14. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  15. Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  16. Salguero, A.G., Espinilla, M., Delatorre, P., Medina, J.: Using ontologies for the online recognition of activities of daily living. Sensors 18(4), 1202 (2018)

    Article  Google Scholar 

Download references

Acknowledgement

This work is partly funded by the French government labelled PIA program under its IDEX UCAJEDI project (ANR-15-IDEX-0001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raphaël Gazzotti .

Editor information

Editors and Affiliations

Appendix

Appendix

Among the three kernels tested for ‘SVC’ (RBF, linear and polynomial), the nested cross-validation selected RBF and a linear kernel equally. The ridge regression (L2 regularization) was overwhelmingly chosen by nested cross-validation for the logistic regression algorithm.

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D. (2019). Injecting Domain Knowledge in Electronic Medical Records to Improve Hospitalization Prediction. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21348-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21347-3

  • Online ISBN: 978-3-030-21348-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics