Évaluation d’un outil d’aide á l’anonymisation des documents médicaux basé sur le traitement automatique du langage naturel

Gicquel, Quentin; Proux, Denys; Marchal, Pierre; Hagége, Caroline; Berrouane, Yasmina; Darmoni, Stéfan J.; Pereira, Suzanne; Segond, Frédérique; Metzger, Marie-Héléne

doi:10.1007/978-2-8178-0285-5_15

Quentin Gicquel⁵,
Denys Proux⁶,
Pierre Marchal⁷,
Caroline Hagége⁶,
Yasmina Berrouane⁸,
Stéfan J. Darmoni⁹,
Suzanne Pereira¹⁰,
Frédérique Segond⁶ &
…
Marie-Héléne Metzger^5,11

Part of the book series: Informatique et Santé ((INFORMATIQUE,volume 1))

579 Accesses

Abstract

Anonymization of personal data is a legal requirement for their use as part of a research project. In the context of developing a tool for detecting hospital-acquired infections, 2000 medical documents were needed for the research project ALADIN. To help annotators to anonymize this corpus of documents, a tool for the anonymization has been developed, relying on Natural Language Processing techniques. The recall, precision and F-score of the automatic phase of the anonymizer were respectively 79.7, 85.2 and 82.4%. The gold- standard used for the evaluation was the manual anonymization of the documents. The performance of the automatic anonymization can still be improved but the tool is already a considerable help in this process in terms of saving time and in terms of quality of anonymization (including the accuracy of labeling anonymized terms and computation of time duration).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Références

Journal officiel de la République française. Loi du 6 janvier 1978 relative á l’informatique aux fichiers et aux libertés modifiée par la loi n°2004-801 du 6 août 2004. JO du 7 août 2004 1978
Google Scholar
[2] CNIL. L’état des lieux en matiére de procédés d’anonymisation. Disponible sur: 〈http://www.cnil.fr/en-savoir-plus/fiches-pratiques/fiche/article/letat-des-lieux-en-matiere-de-procedes-danonymisation/〉 (Consulté le 26.12.2010)
Google Scholar
Roberts A, Gaizauskas R, Hepple M, Demetriou G, Guo Y, Roberts I, Setzer A. Building a semantically annotated corpus of clinical texts. J Biomed Inform 2009; 42(5): 950–66
Article Google Scholar
Coden A, Savova G, Sominsky I, Tanenblatt M, Masanz J, Schuler K, Cooper J, Guan W, de Groen PC. Automatically extracting cancer disease characteristics from pathology reports into a Disease Knowledge Representation Model. J Biomed Inform 2009; 42(5): 937–49
Article Google Scholar
Grouin C, Rosier A, Dameron O, Zweigenbaum P. Une procédure d’anonymisation á deux niveaux pour créer un corpus de comptes rendus hospitaliers. Informatique et Santé 2009; 17: 23–34
Article Google Scholar
Proux D, Marchal P, Segond F, Kergourlay I, Darmoni S, Pereira S, Gicquel Q, Metzger M. Natural Language Processing to detect Risk Patterns related to Hospital Acquired Infections. International Conference on Recent Advances in Natural Language Processing. Borovets, Bulgaria, 2009
Google Scholar
Metzger MH, Gicquel Q, Proux D, Pereira S, Kergourlay I, Serrot E, Segond F, Darmoni S. Development of an Automated Detection Tool for Healthcare-associated Infections Based on Screening Natural Language Medical Reports. AMIA Annu Fall Symp 2009
Google Scholar
Proux D, Segond F, Gerbier S, Metzger MH. Addressing risk assessment for patient safety in hospitals through information extraction in medical reports. In: Springer B, ed. Intelligent Information Processing IV. IFIP International Federation for Information Processing 2009; 288: 230–9.
Google Scholar
Brun C, Ehrmann M. Adaptation of a Named Entity Recognition System for the ESTER2 Evaluation Campaign. IEEE NLP-KE (IEEE International Conference on Natural Language Processing and Knowledge Engineering). Dalian, China, 24–27 September 2009
Google Scholar
Meystre SM, Friedlin FJ, South BR, Shen S, Samore MH. Automatic de-identification of textual documents in the electronic health record: a review of recent research. BMC Med Res Methodol 2010; 10: 70
Article Google Scholar
Krishna R, Kelleher K, Stahlberg E. Patient confidentiality in the research use of clinical medical databases. Am J Public Health 2007; 97(4): 654–8
Article Google Scholar
Ruch P, Baud RH, Rassinoux AM, Bouillon P, Robert G. Medical document anonymization with a semantic lexicon. AMIA Fall Symp Proc 2000; 729–33
Google Scholar
Sweeney L. Replacing personally-identifying information in medical records, the Scrub system. Annu Fall Symp Proc 1996; 333–7
Google Scholar
Taira RK, Bui AA, Kangarloo H. Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp 2002; 757–61
Google Scholar
Neamatullah I, Douglass MM, Lehman LW, Reisner A, Villarroel M, Long WJ, Szolovits P, Moody GB, Mark RG, Clifford GD. Automated de-identification of free-text medical records. BMC Med Inform Decis Mak 2008; 8: 32
Article Google Scholar
Friedlin FJ, McDonald CJ. A software tool for removing patient identifying information from clinical documents. J Am Med Inform Assoc 2008; 15(5): 601–610
Article Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire de biométrie et biologie évolutive, Université Lyon I — CNRS-UMR 5558, 43 boulevard du 11 novembre 1918, 69622, Villeurbanne, France
Quentin Gicquel & Marie-Héléne Metzger
Xerox Research Center Europe, Meylan, France
Denys Proux, Caroline Hagége & Frédérique Segond
ER-TIM, INaLCO, Paris, France
Pierre Marchal
Service d’hygiéne, CHU de Nice, Nice, France
Yasmina Berrouane
CISMeF, Rouen, France
Stéfan J. Darmoni
Vidal, Issy-les-Moulineaux, France
Suzanne Pereira
Hôpital de la Croix-Rousse, UHE, Lyon, France
Marie-Héléne Metzger

Authors

Quentin Gicquel
View author publications
You can also search for this author in PubMed Google Scholar
Denys Proux
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Marchal
View author publications
You can also search for this author in PubMed Google Scholar
Caroline Hagége
View author publications
You can also search for this author in PubMed Google Scholar
Yasmina Berrouane
View author publications
You can also search for this author in PubMed Google Scholar
Stéfan J. Darmoni
View author publications
You can also search for this author in PubMed Google Scholar
Suzanne Pereira
View author publications
You can also search for this author in PubMed Google Scholar
Frédérique Segond
View author publications
You can also search for this author in PubMed Google Scholar
Marie-Héléne Metzger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Quentin Gicquel .

Editor information

Editors and Affiliations

Département d’Information et d’Informatique Médicale, Centre Hospitalier Universitaire de Nice Hôpital de Cimiez, 4 Avenue Reine Victoria, B.P. 1179, 06003, Nice Cedex 1, France
Pascal M. Staccini
Service de Médecine Interne, C.H.U. Mongi Slim - La Marsa, 2046, Sidi-Daoud, Tunis, Tunisie
Ali Harmel
Equipe CISMeF, Cour Leschevin Porte 21, 3ème étage 1 rue de Germont, 76031, Rouen Cedex, France
Stéfan J. Darmoni
Service de Neurologie, C.H.U. Razi, 2010, La Manouba, Tunis, Tunisie
Riadh Gouider

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gicquel, Q. et al. (2011). Évaluation d’un outil d’aide á l’anonymisation des documents médicaux basé sur le traitement automatique du langage naturel. In: Staccini, P.M., Harmel, A., Darmoni, S.J., Gouider, R. (eds) Systèmes d’information pour l’amélioration de la qualité en santé. Informatique et Santé, vol 1. Springer, Paris. https://doi.org/10.1007/978-2-8178-0285-5_15

Download citation

DOI: https://doi.org/10.1007/978-2-8178-0285-5_15
Publisher Name: Springer, Paris
Print ISBN: 978-2-8178-0284-8
Online ISBN: 978-2-8178-0285-5

Publish with us

Policies and ethics