Learning to Recognize Protected Health Information in Electronic Health Records with Recurrent Neural Network

Li, Kun; Chai, Yumei; Zhao, Hongling; Nan, Xiaofei; Zhao, Yueshu

doi:10.1007/978-3-319-50496-4_51

Kun Li¹⁸,
Yumei Chai¹⁸,
Hongling Zhao¹⁹,
Xiaofei Nan¹⁸ &
…
Yueshu Zhao^19,20

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10102))

Included in the following conference series:

4719 Accesses
4 Citations

Abstract

De-identification in electronic health records is a prerequisite to distribute medical records for further clinical data processing or mining. In this paper, we introduce a framework based on recurrent neural network to solve the de-identification problem, and compare state-of-the-art methods with our framework. It is integrated, which includes records skeleton generation, chunk representation and protected information labeling. We evaluate our framework on three datasets involving two English datasets from i2b2 de-identification challenge and a Chinese dataset we created. To the best of our knowledge, we are the first to apply RNN model to the Chinese de-identification problem. The experimental results indicate that our framework not only achieves high performance but also has strong generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Leveraging text skeleton for de-identification of electronic medical records

Article Open access 22 March 2018

De-identification of electronic health record using neural network

Article Open access 29 October 2020

De-identifying free text of Japanese electronic health records

Article Open access 21 September 2020

References

Sweeney, L.: Replacing personally-identifying information in medical records, the Scrub system. In: Proceedings of AMIA Annual Fall Symposium, p. 333. American Medical Informatics Association (1996)
Google Scholar
Ruch, P., Baud, R.H., Rassinoux, A.M., et al.: Medical document anonymization with a semantic lexicon. In: Proceedings of AMIA Symposium, p. 729. American Medical Informatics Association (2000)
Google Scholar
Uzuner, Ö., Luo, Y., Szolovits, P.: Evaluating the state-of-the-art in automatic de-identification. J. Am. Med. Inform. Assoc. 14(5), 550–563 (2007)
Article Google Scholar
Guillen, R.: Automated de-identification and categorization of medical records. In: i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data, p. 116 (2006)
Google Scholar
Wellner, B., Huyck, M., Mardis, S., et al.: Rapidly retargetable approaches to de-identification in medical records. J. Am. Med. Inform. Assoc. 14(5), 564–573 (2007)
Article Google Scholar
Hara, K.: Applying a SVM based Chunker and a text classifier to the deid challenge. In: i2b2 Workshop on Challenges in Natural Language Processing for Clinical Data, pp. 10–11. (2006)
Google Scholar
Szarvas, G., Farkas, R., Busa-Fekete, R.: State-of-the-art anonymization of medical records using an iterative machine learning framework. J. Am. Med. Inform. Assoc. 14(5), 574–580 (2007)
Article Google Scholar
Sweeney, L.: Uniqueness of simple demographics in the US population. Technical report, Carnegie Mellon University (2000)
Google Scholar
Stubbs, A., Kotfila, C., Uzuner, Ö.: Automated systems for the de-identification of longitudinal clinical narratives: overview of 2014 i2b2/UTHealth shared task Track 1. J. Biomed. Inform. 58, S11–S19 (2015)
Article Google Scholar
Stubbs, A., Uzuner, Ö.: Annotating longitudinal clinical narratives for de-identification: the 2014 i2b2/UTHealth corpus. J. Biomed. Inform. 58, S20–S29 (2015)
Article Google Scholar
Dehghan, A., Kovacevic, A., Karystianis, G., et al.: Combining knowledge-and data-driven methods for de-identification of clinical narratives. J. Biomed. Inform. 58, S53–S59 (2015)
Article Google Scholar
Yang, H., Garibaldi, J.M.: Automatic detection of protected health information from clinic narratives. J. Biomed. Inform. 58, S30–S38 (2015)
Article Google Scholar
Dernoncourt, F., Lee, J.Y., Uzuner, O., et al.: De-identification of patient notes with recurrent neural networks (2016). arXiv preprint arXiv:1606.03475
Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Heidelberg (2012)
Book MATH Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Cho, K., Van Merriënboer, B., Gulcehre, C., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078
Mesnil, G., He, X., et al.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: INTERSPEECH, pp. 3771–3775 (2013)
Google Scholar
Yao, K., Zweig, G., Hwang, M.Y., et al.: Recurrent neural networks for language understanding. In: INTERSPEECH, pp. 2524–2528 (2013)
Google Scholar
Aberdeen, J., Bayer, S., Yeniterzi, R., et al.: The MITRE identification scrubber toolkit: design, training, and assessment. Int. J. Med. Inform. 79(12), 849–859 (2010)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Information Engineering School, Zhengzhou University, Zhengzhou, China
Kun Li, Yumei Chai & Xiaofei Nan
Collaborative Innovation Center for Internet Healthcare, Zhengzhou University, Zhengzhou, China
Hongling Zhao & Yueshu Zhao
Hospital of Zhengzhou University, Zhengzhou, China
Yueshu Zhao

Authors

Kun Li
View author publications
You can also search for this author in PubMed Google Scholar
Yumei Chai
View author publications
You can also search for this author in PubMed Google Scholar
Hongling Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Nan
View author publications
You can also search for this author in PubMed Google Scholar
Yueshu Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun Li .

Editor information

Editors and Affiliations

Microsoft Research Asia, Beijing, China
Chin-Yew Lin
Brandeis University, Waltham, Massachusetts, USA
Nianwen Xue
Peking University, Beijing, China
Dongyan Zhao
Fudan University, Shanghai, China
Xuanjing Huang
Peking University, Beijing, China
Yansong Feng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, K., Chai, Y., Zhao, H., Nan, X., Zhao, Y. (2016). Learning to Recognize Protected Health Information in Electronic Health Records with Recurrent Neural Network. In: Lin, CY., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds) Natural Language Understanding and Intelligent Applications. ICCPOL NLPCC 2016 2016. Lecture Notes in Computer Science(), vol 10102. Springer, Cham. https://doi.org/10.1007/978-3-319-50496-4_51

Download citation

DOI: https://doi.org/10.1007/978-3-319-50496-4_51
Published: 02 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50495-7
Online ISBN: 978-3-319-50496-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning to Recognize Protected Health Information in Electronic Health Records with Recurrent Neural Network

Abstract

Access this chapter

Similar content being viewed by others

Leveraging text skeleton for de-identification of electronic medical records

De-identification of electronic health record using neural network

De-identifying free text of Japanese electronic health records

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Learning to Recognize Protected Health Information in Electronic Health Records with Recurrent Neural Network

Abstract

Access this chapter

Similar content being viewed by others

Leveraging text skeleton for de-identification of electronic medical records

De-identification of electronic health record using neural network

De-identifying free text of Japanese electronic health records

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation