Abstract
Anonymization of personal data is a legal requirement for their use as part of a research project. In the context of developing a tool for detecting hospital-acquired infections, 2000 medical documents were needed for the research project ALADIN. To help annotators to anonymize this corpus of documents, a tool for the anonymization has been developed, relying on Natural Language Processing techniques. The recall, precision and F-score of the automatic phase of the anonymizer were respectively 79.7, 85.2 and 82.4%. The gold- standard used for the evaluation was the manual anonymization of the documents. The performance of the automatic anonymization can still be improved but the tool is already a considerable help in this process in terms of saving time and in terms of quality of anonymization (including the accuracy of labeling anonymized terms and computation of time duration).
Preview
Unable to display preview. Download preview PDF.
Références
Journal officiel de la République française. Loi du 6 janvier 1978 relative á l’informatique aux fichiers et aux libertés modifiée par la loi n°2004-801 du 6 août 2004. JO du 7 août 2004 1978
[2] CNIL. L’état des lieux en matiére de procédés d’anonymisation. Disponible sur: 〈http://www.cnil.fr/en-savoir-plus/fiches-pratiques/fiche/article/letat-des-lieux-en-matiere-de-procedes-danonymisation/〉 (Consulté le 26.12.2010)
Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I, Setzer A. Building a semantically annotated corpus of clinical texts. J Biomed Inform 2009; 42(5): 950–66
Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Schuler K, Cooper J, Guan W, de Groen PC. Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model. J Biomed Inform 2009; 42(5): 937–49
Grouin C, Rosier A, Dameron O, Zweigenbaum P. Une procédure d’anonymisation á deux niveaux pour créer un corpus de comptes rendus hospitaliers. Informatique et Santé 2009; 17: 23–34
Proux D, Marchal P, Segond F, Kergourlay I, Darmoni S, Pereira S, Gicquel Q, Metzger M. Natural Language Processing to detect Risk Patterns related to Hospital Acquired Infections. International Conference on Recent Advances in Natural Language Processing. Borovets, Bulgaria, 2009
Metzger MH, Gicquel Q, Proux D, Pereira S, Kergourlay I, Serrot E, Segond F, Darmoni S. Development of an Automated Detection Tool for Healthcare-associated Infections Based on Screening Natural Language Medical Reports. AMIA Annu Fall Symp 2009
Proux D, Segond F, Gerbier S, Metzger MH. Addressing risk assessment for patient safety in hospitals through information extraction in medical reports. In: Springer B, ed. Intelligent Information Processing IV. IFIP International Federation for Information Processing 2009; 288: 230–9.
Brun C, Ehrmann M. Adaptation of a Named Entity Recognition System for the ESTER2 Evaluation Campaign. IEEE NLP-KE (IEEE International Conference on Natural Language Processing and Knowledge Engineering). Dalian, China, 24–27 September 2009
Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol 2010; 10: 70
Krishna R, Kelleher K, Stahlberg E. Patient confidentiality in the research use of clinical medical databases. Am J Public Health 2007; 97(4): 654–8
Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G. Medical document anonymization with a semantic lexicon. AMIA Fall Symp Proc 2000; 729–33
Sweeney L. Replacing personally-identifying information in medical records, the Scrub system. Annu Fall Symp Proc 1996; 333–7
Taira RK, Bui AA, Kangarloo H. Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp 2002; 757–61
Neamatullah I, Douglass MM, Lehman LW, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak 2008; 8: 32
Friedlin FJ, McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc 2008; 15(5): 601–610
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag France
About this chapter
Cite this chapter
Gicquel, Q. et al. (2011). Évaluation d’un outil d’aide á l’anonymisation des documents médicaux basé sur le traitement automatique du langage naturel. In: Staccini, P.M., Harmel, A., Darmoni, S.J., Gouider, R. (eds) Systèmes d’information pour l’amélioration de la qualité en santé. Informatique et Santé, vol 1. Springer, Paris. https://doi.org/10.1007/978-2-8178-0285-5_15
Download citation
DOI: https://doi.org/10.1007/978-2-8178-0285-5_15
Publisher Name: Springer, Paris
Print ISBN: 978-2-8178-0284-8
Online ISBN: 978-2-8178-0285-5