Abstract
De-identification in electronic health records is a prerequisite to distribute medical records for further clinical data processing or mining. In this paper, we introduce a framework based on recurrent neural network to solve the de-identification problem, and compare state-of-the-art methods with our framework. It is integrated, which includes records skeleton generation, chunk representation and protected information labeling. We evaluate our framework on three datasets involving two English datasets from i2b2 de-identification challenge and a Chinese dataset we created. To the best of our knowledge, we are the first to apply RNN model to the Chinese de-identification problem. The experimental results indicate that our framework not only achieves high performance but also has strong generalization ability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sweeney, L.: Replacing personally-identifying information in medical records, the Scrub system. In: Proceedings of AMIA Annual Fall Symposium, p. 333. American Medical Informatics Association (1996)
Ruch, P., Baud, R.H., Rassinoux, A.M., et al.: Medical document anonymization with a semantic lexicon. In: Proceedings of AMIA Symposium, p. 729. American Medical Informatics Association (2000)
Uzuner, Ö., Luo, Y., Szolovits, P.: Evaluating the state-of-the-art in automatic de-identification. J. Am. Med. Inform. Assoc. 14(5), 550–563 (2007)
Guillen, R.: Automated de-identification and categorization of medical records. In: i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data, p. 116 (2006)
Wellner, B., Huyck, M., Mardis, S., et al.: Rapidly retargetable approaches to de-identification in medical records. J. Am. Med. Inform. Assoc. 14(5), 564–573 (2007)
Hara, K.: Applying a SVM based Chunker and a text classifier to the deid challenge. In: i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data, pp. 10–11. (2006)
Szarvas, G., Farkas, R., Busa-Fekete, R.: State-of-the-art anonymization of medical records using an iterative machine learning framework. J. Am. Med. Inform. Assoc. 14(5), 574–580 (2007)
Sweeney, L.: Uniqueness of simple demographics in the US population. Technical report, Carnegie Mellon University (2000)
Stubbs, A., Kotfila, C., Uzuner, Ö.: Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 i2b2/UTHealth shared task Track 1. J. Biomed. Inform. 58, S11–S19 (2015)
Stubbs, A., Uzuner, Ö.: Annotating longitudinal clinical narratives for de-identification: the 2014 i2b2/UTHealth corpus. J. Biomed. Inform. 58, S20–S29 (2015)
Dehghan, A., Kovacevic, A., Karystianis, G., et al.: Combining knowledge-and data-driven methods for de-identification of clinical narratives. J. Biomed. Inform. 58, S53–S59 (2015)
Yang, H., Garibaldi, J.M.: Automatic detection of protected health information from clinic narratives. J. Biomed. Inform. 58, S30–S38 (2015)
Dernoncourt, F., Lee, J.Y., Uzuner, O., et al.: De-identification of patient notes with recurrent neural networks (2016). arXiv preprint arXiv:1606.03475
Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Heidelberg (2012)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078
Mesnil, G., He, X., et al.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: INTERSPEECH, pp. 3771–3775 (2013)
Yao, K., Zweig, G., Hwang, M.Y., et al.: Recurrent neural networks for language understanding. In: INTERSPEECH, pp. 2524–2528 (2013)
Aberdeen, J., Bayer, S., Yeniterzi, R., et al.: The MITRE identification scrubber toolkit: design, training, and assessment. Int. J. Med. Inform. 79(12), 849–859 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Li, K., Chai, Y., Zhao, H., Nan, X., Zhao, Y. (2016). Learning to Recognize Protected Health Information in Electronic Health Records with Recurrent Neural Network. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_51
Download citation
DOI: https://doi.org/10.1007/978-3-319-50496-4_51
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)