Skip to main content

A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 639))

Abstract

Medical named entity recognition is a fundamental and essential research for medical natural language possessing, aiming to identifying medical concepts or terminology such as diseases, drugs, treatments, procedures, etc. from unstructured medical text. A model based on a bidirectional LSTM and conditional random fields (Bi-LSTM-CRF) is proposed for medical named entity recognition. Our model contains three layers and relies on character-based word representations learned from the supervised corpus. BiLSTM-CRF model can learn the information characteristics of a given dataset. Experiments on a publically available NCBI Disease Corpus as an evaluation standard dataset shows our approach achieves a 0.8022 F1 measure, which outperforms a number of widely used baseline methods.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Lei, J., Tang, B., Lu, X.: A comprehensive study of named entity recognition in Chinese clinical text. J. Am. Med. Inform. Assoc. 21(5), 808–814 (2013)

    Article  Google Scholar 

  2. Leaman, R., Islamaj, D.R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)

    Article  Google Scholar 

  3. Islamaj, D.R., Lu, Z.: An improved corpus of disease mentions in PubMed citations. In: Proceedings of the ACL 2012 Workshop on BioNLP, pp. 91–99 (2012)

    Google Scholar 

  4. Jain, D.: Supervised named entity recognition for clinical data. In: CLEF 2015 Online Working Notes, CEUR-WS 2015, vol. 28 (2015)

    Google Scholar 

  5. Alanazi, S., Sharp, B., Stanier, C.: A named entity recognition system applied to Arabic text in the medical domain. Int. J. Comput. Sci. 12(3), 109–117 (2015)

    Google Scholar 

  6. Chen, Y., Lasko, T.A., Mei, Q., Denny, J.C., Xu, H.: A study of active learning methods for named entity recognition in clinical text. J. Biomed. Inform. 58(C), 11–18 (2015)

    Article  Google Scholar 

  7. Leaman, R., Gonzalez, G.: Banner: an executable survey of advances in biomedical named entity recognition. Pac. Symp. Biocomput. 13, 652–663 (2008)

    Google Scholar 

  8. Wang, H., Zhang, W., Zeng, Q., Li, Z., Feng, K., Liu, L.: Extracting important information from Chinese operation notes with natural language processing methods. J. Biomed. Inform. 48(2), 130–136 (2014)

    Article  Google Scholar 

  9. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  10. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)

    Article  Google Scholar 

  11. Unified Medical Language System (UMLS), NIH. https://www.nlm.nih.gov/research/umls/quickstart.html. Accessed 2017

  12. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: The Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 260–270 (2016)

    Google Scholar 

  13. Leaman, R., Lu, Z.: Taggerone: joint named entity recognition and normalization with semi-markov models. Bioinformatics 32(18), 343 (2016)

    Article  Google Scholar 

  14. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)

    Article  Google Scholar 

  15. Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)

    Article  Google Scholar 

  16. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Computer Science (2013)

    Google Scholar 

  17. Islamaj, D.R., Leaman, R., Lu, Z.Y.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47(2), 1–10 (2014)

    Google Scholar 

  18. Ling, W., Tsvetkov, Y., Amir, S., Fermandez, R., Dyer, C., Black, A.W., et al.: Not all contexts are created equal: better word representations with variable attention. In: The Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1367–1372 (2015)

    Google Scholar 

  19. Campos, D., Matos, S., Oliveira, J.L.: A modular framework for biomedical concept recognition. BMC Bioinform. 14, 281 (2013)

    Article  Google Scholar 

  20. Ling, W., Luís, T., Marujo, L., Astudillo, R.F., Amir, S., Dyer, C.: Finding function in form: compositional character models for open vocabulary word representation. Computer Science, 1899–1907 (2015)

    Google Scholar 

  21. Song, M., Kim, W.C., Lee, D., Heo, G.E., Kang, K.Y.: PKDE4J: entity and relation extraction for public knowledge discovery. J. Biomed. Inform. 57, 320–332 (2015)

    Article  Google Scholar 

  22. Good, B.M., Nanis, M., Wu, C., Su, A.I.: Microtask crowdsourcing for disease mention annotation in pubmed abstracts. Pac. Symp. Biocomput. 509–510, 282 (2015)

    Google Scholar 

  23. Comeau, D.C., Liu, H., Doğan, R.I., Wilbur, W.J.: Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus. Databases 23, 3790–3795 (2014)

    Google Scholar 

  24. Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Computer Science, arXiv:1508.01991 (2015)

  25. Kim, S., Yeganova, L., Wilbur, W.J.: Meshable: searching pubmed abstracts by utilizing mesh and mesh-derived topical terms. Bioinformatics 19, 331 (2016)

    Google Scholar 

  26. Wei, Q., Tao, C., Xu, R., He, Y., Lin, G.: Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Databases 2016, 1–8 (2016)

    Google Scholar 

Download references

Acknowledgement

This work was supported by National Natural Science Foundation of China (No. 61403088), the programs of Personalized Health Service Public Platform based on Open and Big Data (No. 2014B010118005), Ancient Literature Knowledge base Platform for the Inheritance and Development of Traditional Chinese Medicine (No. 2014A020221039) and Innovative School Project in Higher Education of Guangdong (No. YQ2015062).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Tianyong Hao or Wenyin Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Xu, K., Zhou, Z., Hao, T., Liu, W. (2018). A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition. In: Hassanien, A., Shaalan, K., Gaber, T., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017. AISI 2017. Advances in Intelligent Systems and Computing, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-64861-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64861-3_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64860-6

  • Online ISBN: 978-3-319-64861-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics