Drug Safety

, Volume 42, Issue 1, pp 113–122 | Cite as

Adverse Drug Event Detection from Electronic Health Records Using Hierarchical Recurrent Neural Networks with Dual-Level Embedding

  • Susmitha WunnavaEmail author
  • Xiao Qin
  • Tabassum Kakar
  • Cansu Sen
  • Elke A. Rundensteiner
  • Xiangnan Kong
Original Research Article
Part of the following topical collections:
  1. NLP Challenge for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE 1.0)



Adverse drug event (ADE) detection is a vital step towards effective pharmacovigilance and prevention of future incidents caused by potentially harmful ADEs. The electronic health records (EHRs) of patients in hospitals contain valuable information regarding ADEs and hence are an important source for detecting ADE signals. However, EHR texts tend to be noisy. Yet applying off-the-shelf tools for EHR text preprocessing jeopardizes the subsequent ADE detection performance, which depends on a well tokenized text input.


In this paper, we report our experience with the NLP Challenges for Detecting Medication and Adverse Drug Events from Electronic Health Records (MADE1.0), which aims to promote deep innovations on this subject. In particular, we have developed rule-based sentence and word tokenization techniques to deal with the noise in the EHR text.


We propose a detection methodology by adapting a three-layered, deep learning architecture of (1) recurrent neural network [bi-directional long short-term memory (Bi-LSTM)] for character-level word representation to encode the morphological features of the medical terminology, (2) Bi-LSTM for capturing the contextual information of each word within a sentence, and (3) conditional random fields for the final label prediction by also considering the surrounding words. We experiment with different word embedding methods commonly used in word-level classification tasks and demonstrate the impact of an integrated usage of both domain-specific and general-purpose pre-trained word embedding for detecting ADEs from EHRs.


Our system was ranked first for the named entity recognition task in the MADE1.0 challenge, with a micro-averaged F1-score of 0.8290 (official score).


Our results indicate that the integration of two widely used sequence labeling techniques that complement each other along with dual-level embedding (character level and word level) to represent words in the input layer results in a deep learning architecture that achieves excellent information extraction accuracy for EHR notes.



We are grateful to Dr. Marni Hall (former Sr. Program Director), Suranjan De (Deputy Director), Sanjay K. Sahoo and Thang La, Regulatory Science, Office of Surveillance and Epidemiology (OSE), US Food and Drug Administration (FDA) for introducing us to pharmacovigilance in general and the ADE detection problem in particular. We also thank Worcester Polytechnic Institute (WPI) Data Science Research Group (DSRG) members for their valuable feedback.

Compliance with Ethical Standards

Conflict of interest

Susmitha Wunnava, Xiao Qin, Tabassum Kakar, Cansu Sen, Elke A. Rundensteiner and Xiangnan Kong have no conflicts of interest that are directly relevant to the content of this study.


Susmitha Wunnava is thankful to the Seeds of STEM and Institute of Education Sciences, US Department of Education for supporting her PhD studies via the Grant R305A150571. Xiao and Tabassum are grateful to Oak Ridge Associated Universities (ORAU) for granting them an ORISE Fellowship to conduct research with the US Food and Drug Administration.


  1. 1.
    Donaldson MS, Corrigan JM, Kohn LT, editors. To err is human: building a safer health system. Washington DC: National Academies Press; 2000.Google Scholar
  2. 2.
    Wunnava S, Qin X, Kakar T, Kong X, Rundensteiner EA, Sahoo SK, et al. One size does not fit all: an ensemble approach towards information extraction from adverse drug event narratives. In: Proceedings of HEALTHINF; 2018. pp 176–188.Google Scholar
  3. 3.
    Deleger L, Grouin C, Zweigenbaum P. Extracting medical information from narrative patient records: the case of medication-related information. J Am Med Inform Assoc. 2010;17(5):555–8.CrossRefGoogle Scholar
  4. 4.
    Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc. 2010;17:19–24.CrossRefGoogle Scholar
  5. 5.
    Sampathkumar H, Xw Chen, Luo B. Mining adverse drug reactions from online healthcare forums using hidden Markov model. BMC Med Inf Decis Mak. 2014;14:91.CrossRefGoogle Scholar
  6. 6.
    Ramesh BP, Belknap SM, Li Z, Frid N, West DP, Yu H. Automatically recognizing medication and adverse event information from food and drug administration’s adverse event reporting system narratives. JMIR. 2014;8:2.Google Scholar
  7. 7.
    Lipton ZC. A Critical review of recurrent neural networks for sequence learning. CoRR. 2015; abs/1506.00019.Google Scholar
  8. 8.
    Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–80.CrossRefGoogle Scholar
  9. 9.
    Lafferty J, McCallum A, Pereira FCN. Conditional random fields: probabilistic models for segmenting and labeling sequence data. 2001.Google Scholar
  10. 10.
    Jagannatha AN, Yu H. Structured prediction models for RNN based sequence labeling in clinical text. In: Proceedings of the conference on empirical methods in natural language processing. In: Conference on empirical methods in natural language Processing; 2016.Google Scholar
  11. 11.
    Tutubalina E, Nikolenko S. Combination of deep recurrent neural networks and conditional random fields for extracting adverse drug reactions from user reviews. J Healthcare Eng; 2017;2017: Article ID 945134, 9.Google Scholar
  12. 12.
    Huang Z, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv preprint arXiv:1508.01991. 2015.
  13. 13.
    Dubois S, Romano N. Learning effective embeddings from medical notes.Google Scholar
  14. 14.
    Choi Y, Chiu CYI, Sontag D. Learning low-dimensional representations of medical concepts. In: AMIA summits on translational science proceedings. 2016.Google Scholar
  15. 15.
    Pennington J, Socher R, Manning C. Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP); 2014. pp 1532–1543.Google Scholar
  16. 16.
    Wunnava S, Qin X, Kakar T, Rundensteiner EA, Kong X. Bidirectional LSTM-CRF for adverse drug event tagging in electronic health records. In Liu F, Jagannatha A, Yu H, editors. In: Proceedings of the 1st international workshop on medication and adverse drug event detection, volume 90 of Proceedings of machine learning research; 2018 May 4. pp 48–56.Google Scholar
  17. 17.
    Comeau DC, Islamaj Dogan R, Ciccarese P, Cohen KB, Krallinger M, Leitner F, et al. BioC: a minimalist approach to interoperability for biomedical text processing. Database. 2013; 2013.Google Scholar
  18. 18.
    Bird S, Loper E. NLTK: the natural language toolkit. In: Proceedings of the ACL 2004 on Interactive poster and demonstration sessions; 2004. p 31.Google Scholar
  19. 19.
    Manning C, Surdeanu M, Bauer J, Finkel J, Bethard S, McClosky D. The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations; 2014. pp 55–60.Google Scholar
  20. 20.
    Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical text analysis and knowledge extraction system (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17:507–13.CrossRefGoogle Scholar
  21. 21.
    Jagannatha AN, Yu H. Bidirectional RNN for medical event detection in electronic health records. In: Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting; 2016. p 473.Google Scholar
  22. 22.
    Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium; 2001.Google Scholar
  23. 23.
    Bird S, Klein E, Loper E. Natural language processing with python: O’Reilly; 2009.Google Scholar
  24. 24.
    Ramshaw LA, Marcus MP. Text chunking using transformation-based learning. In Natural language processing using very large corpora. Springer; 1999. Pp 157–176.Google Scholar
  25. 25.
    Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. J Mach Learn Res. 2011;12:2493–537.Google Scholar
  26. 26.
    Santos CD, Zadrozny B. Learning character-level representations for part-of-speech tagging. In: Proceedings of the 31st international conference on machine learning (ICML-14); 2014. pp 1818-1826.Google Scholar
  27. 27.
    Bengio P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw. 1994;5:157–66.CrossRefGoogle Scholar
  28. 28.
    Pascanu R, Mikolov T, Bengio Y. On the difficulty of training recurrent neural networks. In: International conference on machine learning; 2013. pp 1310–1318.Google Scholar
  29. 29.
    Gers FA, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM. IET. 1999.Google Scholar
  30. 30.
    Schuster M, Paliwal KK. Bidirectional recurrent neural networks. IEEE Trans Signal Process. 1997;45:2673–81.CrossRefGoogle Scholar
  31. 31.
    Ma X, Hovy EH. End-to-end Sequence Labeling via bi-directional LSTM-CNNs-CRF. In Proceedings of the 54th annual meeting of the association for computational linguistics, ACL 2016, August 7-12, 2016, Berlin, Germany, Volume 1: Long Papers; 2016.Google Scholar
  32. 32.
    Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929–58.Google Scholar
  33. 33.
    Kingma DP, Ba J. Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. 2014.
  34. 34.
    Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, et al. TensorFlow: a system for large-scale machine learning. OSDI. 2016;16:265–83.Google Scholar
  35. 35.
    Pyysalo S, Ginter F, Moen H, Salakoski T, Ananiadou S. Distributional semantics resources for biomedical text processing. In Proceedings of the 5th international symposium on languages in biology and medicine. Tokyo, Japan; 2013. pp 39–43Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Worcester Polytechnic InstituteWorcesterUSA

Personalised recommendations