A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition

Xu, Kai; Zhou, Zhanfan; Hao, Tianyong; Liu, Wenyin

doi:10.1007/978-3-319-64861-3_33

A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition

Kai Xu¹⁸,
Zhanfan Zhou¹⁹,
Tianyong Hao¹⁹ &
…
Wenyin Liu¹⁸

Conference paper
First Online: 31 August 2017

3463 Accesses
27 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 639))

Abstract

Medical named entity recognition is a fundamental and essential research for medical natural language possessing, aiming to identifying medical concepts or terminology such as diseases, drugs, treatments, procedures, etc. from unstructured medical text. A model based on a bidirectional LSTM and conditional random fields (Bi-LSTM-CRF) is proposed for medical named entity recognition. Our model contains three layers and relies on character-based word representations learned from the supervised corpus. BiLSTM-CRF model can learn the information characteristics of a given dataset. Experiments on a publically available NCBI Disease Corpus as an evaluation standard dataset shows our approach achieves a 0.8022 F1 measure, which outperforms a number of widely used baseline methods.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Lei, J., Tang, B., Lu, X.: A comprehensive study of named entity recognition in Chinese clinical text. J. Am. Med. Inform. Assoc. 21(5), 808–814 (2013)
Article Google Scholar
Leaman, R., Islamaj, D.R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Article Google Scholar
Islamaj, D.R., Lu, Z.: An improved corpus of disease mentions in PubMed citations. In: Proceedings of the ACL 2012 Workshop on BioNLP, pp. 91–99 (2012)
Google Scholar
Jain, D.: Supervised named entity recognition for clinical data. In: CLEF 2015 Online Working Notes, CEUR-WS 2015, vol. 28 (2015)
Google Scholar
Alanazi, S., Sharp, B., Stanier, C.: A named entity recognition system applied to Arabic text in the medical domain. Int. J. Comput. Sci. 12(3), 109–117 (2015)
Google Scholar
Chen, Y., Lasko, T.A., Mei, Q., Denny, J.C., Xu, H.: A study of active learning methods for named entity recognition in clinical text. J. Biomed. Inform. 58(C), 11–18 (2015)
Article Google Scholar
Leaman, R., Gonzalez, G.: Banner: an executable survey of advances in biomedical named entity recognition. Pac. Symp. Biocomput. 13, 652–663 (2008)
Google Scholar
Wang, H., Zhang, W., Zeng, Q., Li, Z., Feng, K., Liu, L.: Extracting important information from Chinese operation notes with natural language processing methods. J. Biomed. Inform. 48(2), 130–136 (2014)
Article Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Article Google Scholar
Unified Medical Language System (UMLS), NIH. https://www.nlm.nih.gov/research/umls/quickstart.html. Accessed 2017
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: The Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 260–270 (2016)
Google Scholar
Leaman, R., Lu, Z.: Taggerone: joint named entity recognition and normalization with semi-markov models. Bioinformatics 32(18), 343 (2016)
Article Google Scholar
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Article Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Article Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Computer Science (2013)
Google Scholar
Islamaj, D.R., Leaman, R., Lu, Z.Y.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47(2), 1–10 (2014)
Google Scholar
Ling, W., Tsvetkov, Y., Amir, S., Fermandez, R., Dyer, C., Black, A.W., et al.: Not all contexts are created equal: better word representations with variable attention. In: The Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1367–1372 (2015)
Google Scholar
Campos, D., Matos, S., Oliveira, J.L.: A modular framework for biomedical concept recognition. BMC Bioinform. 14, 281 (2013)
Article Google Scholar
Ling, W., Luís, T., Marujo, L., Astudillo, R.F., Amir, S., Dyer, C.: Finding function in form: compositional character models for open vocabulary word representation. Computer Science, 1899–1907 (2015)
Google Scholar
Song, M., Kim, W.C., Lee, D., Heo, G.E., Kang, K.Y.: PKDE4J: entity and relation extraction for public knowledge discovery. J. Biomed. Inform. 57, 320–332 (2015)
Article Google Scholar
Good, B.M., Nanis, M., Wu, C., Su, A.I.: Microtask crowdsourcing for disease mention annotation in pubmed abstracts. Pac. Symp. Biocomput. 509–510, 282 (2015)
Google Scholar
Comeau, D.C., Liu, H., Doğan, R.I., Wilbur, W.J.: Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus. Databases 23, 3790–3795 (2014)
Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Computer Science, arXiv:1508.01991 (2015)
Kim, S., Yeganova, L., Wilbur, W.J.: Meshable: searching pubmed abstracts by utilizing mesh and mesh-derived topical terms. Bioinformatics 19, 331 (2016)
Google Scholar
Wei, Q., Tao, C., Xu, R., He, Y., Lin, G.: Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Databases 2016, 1–8 (2016)
Google Scholar

Download references

Acknowledgement

This work was supported by National Natural Science Foundation of China (No. 61403088), the programs of Personalized Health Service Public Platform based on Open and Big Data (No. 2014B010118005), Ancient Literature Knowledge base Platform for the Inheritance and Development of Traditional Chinese Medicine (No. 2014A020221039) and Innovative School Project in Higher Education of Guangdong (No. YQ2015062).

Author information

Authors and Affiliations

School of Computer Science and Technology, Guangdong University of Technology, Guangzhou, China
Kai Xu & Wenyin Liu
School of Informatics, Guangdong Universities of Foreign Studies, Guangzhou, China
Zhanfan Zhou & Tianyong Hao

Authors

Kai Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zhanfan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Tianyong Hao
View author publications
You can also search for this author in PubMed Google Scholar
Wenyin Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Tianyong Hao or Wenyin Liu .

Editor information

Editors and Affiliations

Faculty of Computers and Information, Information Technology Department, Cairo University, Giza, Egypt
Aboul Ella Hassanien
Dubai International Academic City, The British University in Dubai, Dubai, United Arab Emirates
Khaled Shaalan
Faculty of Computers and Informatics, Suez Canal University, Ismailia, Egypt
Tarek Gaber
Ain Shams University, Cairo, Egypt
Mohamed F. Tolba

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, K., Zhou, Z., Hao, T., Liu, W. (2018). A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition. In: Hassanien, A., Shaalan, K., Gaber, T., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017. AISI 2017. Advances in Intelligent Systems and Computing, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-64861-3_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-64861-3_33
Published: 31 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64860-6
Online ISBN: 978-3-319-64861-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics