Abstract
Electronic medical records (EMR) contain key information about the different symptomatic episodes that a patient went through. They carry a great potential in order to improve the well-being of patients and therefore represent a very valuable input for artificial intelligence approaches. However, the explicit knowledge directly available through these records remains limited, the extracted features to be used by machine learning algorithms do not contain all the implicit knowledge of medical expert. In order to evaluate the impact of domain knowledge when processing EMRs, we augment the features extracted from EMRs with ontological resources before turning them into vectors used by machine learning algorithms. We evaluate these augmentations with several machine learning algorithms to predict hospitalization. Our approach was experimented on data from the PRIMEGE PACA database that contains more than 350,000 consultations carried out by 16 general practitioners (GPs).
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
Anatomical Therapeutic Chemical Classification, https://bioportal.bioontology.org/ontologies/ATC.
- 5.
National Drug File - Reference Terminology, https://bioportal.bioontology.org/ontologies/NDF-RT.
- 6.
International Primary Care Classification, http://bioportal.lirmm.fr/ontologies/CISP-2.
- 7.
- 8.
References
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Cawley, G.C., Talbot, N.L.: On over-fitting in model selection and subsequent selection bias in performance evaluation. J. Mach. Learn. Res. 11, 2079–2107 (2010)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 27 (2011)
Choi, E., et al.: GRAM: graph-based attention model for healthcare representation learning. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 787–795. ACM (2017)
Corby, O., Zucker, C.F.: The KGRAM abstract machine for knowledge graph querying. In: Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1, pp. 338–341. IEEE (2010)
Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics) (2013)
Forman, G., Scholz, M.: Apples-to-apples in cross-validation studies: pitfalls in classifier performance measurement. ACM SIGKDD Explor. Newsl. 12(1), 49–57 (2010)
Goldstein, B.A., Navar, A.M., Pencina, M.J., Ioannidis, J.: Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 24(1), 198–208 (2017)
Lacroix-Hugues, V., Darmon, D., Pradier, C., Staccini, P.: Creation of the first french database in primary care using the ICPC2: feasibility study. Stud. Health Technol. Inform. 245, 462–466 (2017)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, vol. 37. CRC Press, Boca Raton (1989)
Min, H., Mobahi, H., Irvin, K., Avramovic, S., Wojtusiak, J.: Predicting activities of daily living for cancer patients using an ontology-guided machine learning methodology. J. Biomed. Semant. 8(1), 39 (2017)
Ordónez, F.J., de Toledo, P., Sanchis, A.: Activity recognition using hybrid generative/discriminative models on home environments using binary sensors. Sensors 13(5), 5460–5477 (2013)
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Salguero, A.G., Espinilla, M., Delatorre, P., Medina, J.: Using ontologies for the online recognition of activities of daily living. Sensors 18(4), 1202 (2018)
Acknowledgement
This work is partly funded by the French government labelled PIA program under its IDEX UCAJEDI project (ANR-15-IDEX-0001).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Among the three kernels tested for ‘SVC’ (RBF, linear and polynomial), the nested cross-validation selected RBF and a linear kernel equally. The ridge regression (L2 regularization) was overwhelmingly chosen by nested cross-validation for the logistic regression algorithm.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Gazzotti, R., Faron-Zucker, C., Gandon, F., Lacroix-Hugues, V., Darmon, D. (2019). Injecting Domain Knowledge in Electronic Medical Records to Improve Hospitalization Prediction. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-21348-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21347-3
Online ISBN: 978-3-030-21348-0
eBook Packages: Computer ScienceComputer Science (R0)