Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding
- 229 Downloads
Adverse drug event (ADE) detection is a vital step towards effective pharmacovigilance and prevention of future incidents caused by potentially harmful ADEs. The electronic health records (EHRs) of patients in hospitals contain valuable information regarding ADEs and hence are an important source for detecting ADE signals. However, EHR texts tend to be noisy. Yet applying off-the-shelf tools for EHR text preprocessing jeopardizes the subsequent ADE detection performance, which depends on a well tokenized text input.
In this paper, we report our experience with the NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE1.0), which aims to promote deep innovations on this subject. In particular, we have developed rule-based sentence and word tokenization techniques to deal with the noise in the EHR text.
We propose a detection methodology by adapting a three-layered, deep learning architecture of (1) recurrent neural network [bi-directional long short-term memory (Bi-LSTM)] for character-level word representation to encode the morphological features of the medical terminology, (2) Bi-LSTM for capturing the contextual information of each word within a sentence, and (3) conditional random fields for the final label prediction by also considering the surrounding words. We experiment with different word embedding methods commonly used in word-level classification tasks and demonstrate the impact of an integrated usage of both domain-specific and general-purpose pre-trained word embedding for detecting ADEs from EHRs.
Our system was ranked first for the named entity recognition task in the MADE1.0 challenge, with a micro-averaged F1-score of 0.8290 (official score).
Our results indicate that the integration of two widely used sequence labeling techniques that complement each other along with dual-level embedding (character level and word level) to represent words in the input layer results in a deep learning architecture that achieves excellent information extraction accuracy for EHR notes.
We are grateful to Dr. Marni Hall (former Sr. Program Director), Suranjan De (Deputy Director), Sanjay K. Sahoo and Thang La, Regulatory Science, Office of Surveillance and Epidemiology (OSE), US Food and Drug Administration (FDA) for introducing us to pharmacovigilance in general and the ADE detection problem in particular. We also thank Worcester Polytechnic Institute (WPI) Data Science Research Group (DSRG) members for their valuable feedback.
Compliance with Ethical Standards
Conflict of interest
Susmitha Wunnava, Xiao Qin, Tabassum Kakar, Cansu Sen, Elke A. Rundensteiner and Xiangnan Kong have no conflicts of interest that are directly relevant to the content of this study.
Susmitha Wunnava is thankful to the Seeds of STEM and Institute of Education Sciences, US Department of Education for supporting her PhD studies via the Grant R305A150571. Xiao and Tabassum are grateful to Oak Ridge Associated Universities (ORAU) for granting them an ORISE Fellowship to conduct research with the US Food and Drug Administration.
- 1.Donaldson MS, Corrigan JM, Kohn LT, editors. To err is human: building a safer health system. Washington DC: National Academies Press; 2000.Google Scholar
- 2.Wunnava S, Qin X, Kakar T, Kong X, Rundensteiner EA, Sahoo SK, et al. One size does not fit all: an ensemble approach towards information extraction from adverse drug event narratives. In: Proceedings of HEALTHINF; 2018. pp 176–188.Google Scholar
- 6.Ramesh BP, Belknap SM, Li Z, Frid N, West DP, Yu H. Automatically recognizing medication and adverse event information from food and drug administration’s adverse event reporting system narratives. JMIR. 2014;8:2.Google Scholar
- 7.Lipton ZC. A Critical review of recurrent neural networks for sequence learning. CoRR. 2015; abs/1506.00019.Google Scholar
- 9.Lafferty J, McCallum A, Pereira FCN. Conditional random fields: probabilistic models for segmenting and labeling sequence data. 2001.Google Scholar
- 10.Jagannatha AN, Yu H. Structured prediction models for RNN based sequence labeling in clinical text. In: Proceedings of the conference on empirical methods in natural language processing. In: Conference on empirical methods in natural language Processing; 2016.Google Scholar
- 11.Tutubalina E, Nikolenko S. Combination of deep recurrent neural networks and conditional random fields for extracting adverse drug reactions from user reviews. J Healthcare Eng; 2017;2017: Article ID 945134, 9.Google Scholar
- 12.Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991. 2015.
- 13.Dubois S, Romano N. Learning effective embeddings from medical notes.Google Scholar
- 14.Choi Y, Chiu CYI, Sontag D. Learning low-dimensional representations of medical concepts. In: AMIA summits on translational science proceedings. 2016.Google Scholar
- 15.Pennington J, Socher R, Manning C. Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014. pp 1532–1543.Google Scholar
- 16.Wunnava S, Qin X, Kakar T, Rundensteiner EA, Kong X. Bidirectional LSTM-CRF for adverse drug event tagging in electronic health records. In Liu F, Jagannatha A, Yu H, editors. In: Proceedings of the 1st international workshop on medication and adverse drug event detection, volume 90 of Proceedings of machine learning research; 2018 May 4. pp 48–56.Google Scholar
- 17.Comeau DC, Islamaj Dogan R, Ciccarese P, Cohen KB, Krallinger M, Leitner F, et al. BioC: a minimalist approach to interoperability for biomedical text processing. Database. 2013; 2013.Google Scholar
- 18.Bird S, Loper E. NLTK: the natural language toolkit. In: Proceedings of the ACL 2004 on Interactive poster and demonstration sessions; 2004. p 31.Google Scholar
- 19.Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D. The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations; 2014. pp 55–60.Google Scholar
- 21.Jagannatha AN, Yu H. Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting; 2016. p 473.Google Scholar
- 22.Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium; 2001.Google Scholar
- 23.Bird S, Klein E, Loper E. Natural language processing with python: O’Reilly; 2009.Google Scholar
- 24.Ramshaw LA, Marcus MP. Text chunking using transformation-based learning. In Natural language processing using very large corpora. Springer; 1999. Pp 157–176.Google Scholar
- 25.Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011;12:2493–537.Google Scholar
- 26.Santos CD, Zadrozny B. Learning character-level representations for part-of-speech tagging. In: Proceedings of the 31st international conference on machine learning (ICML-14); 2014. pp 1818-1826.Google Scholar
- 28.Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent neural networks. In: International conference on machine learning; 2013. pp 1310–1318.Google Scholar
- 29.Gers FA, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM. IET. 1999.Google Scholar
- 31.Ma X, Hovy EH. End-to-end Sequence Labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers; 2016.Google Scholar
- 32.Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929–58.Google Scholar
- 33.Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.
- 34.Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. TensorFlow: a system for large-scale machine learning. OSDI. 2016;16:265–83.Google Scholar
- 35.Pyysalo S, Ginter F, Moen H, Salakoski T, Ananiadou S. Distributional semantics resources for biomedical text processing. In Proceedings of the 5th international symposium on languages in biology and medicine. Tokyo, Japan; 2013. pp 39–43Google Scholar