Abstract
Medical named entity recognition is a fundamental and essential research for medical natural language possessing, aiming to identifying medical concepts or terminology such as diseases, drugs, treatments, procedures, etc. from unstructured medical text. A model based on a bidirectional LSTM and conditional random fields (Bi-LSTM-CRF) is proposed for medical named entity recognition. Our model contains three layers and relies on character-based word representations learned from the supervised corpus. BiLSTM-CRF model can learn the information characteristics of a given dataset. Experiments on a publically available NCBI Disease Corpus as an evaluation standard dataset shows our approach achieves a 0.8022 F1 measure, which outperforms a number of widely used baseline methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Lei, J., Tang, B., Lu, X.: A comprehensive study of named entity recognition in Chinese clinical text. J. Am. Med. Inform. Assoc. 21(5), 808–814 (2013)
Leaman, R., Islamaj, D.R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Islamaj, D.R., Lu, Z.: An improved corpus of disease mentions in PubMed citations. In: Proceedings of the ACL 2012 Workshop on BioNLP, pp. 91–99 (2012)
Jain, D.: Supervised named entity recognition for clinical data. In: CLEF 2015 Online Working Notes, CEUR-WS 2015, vol. 28 (2015)
Alanazi, S., Sharp, B., Stanier, C.: A named entity recognition system applied to Arabic text in the medical domain. Int. J. Comput. Sci. 12(3), 109–117 (2015)
Chen, Y., Lasko, T.A., Mei, Q., Denny, J.C., Xu, H.: A study of active learning methods for named entity recognition in clinical text. J. Biomed. Inform. 58(C), 11–18 (2015)
Leaman, R., Gonzalez, G.: Banner: an executable survey of advances in biomedical named entity recognition. Pac. Symp. Biocomput. 13, 652–663 (2008)
Wang, H., Zhang, W., Zeng, Q., Li, Z., Feng, K., Liu, L.: Extracting important information from Chinese operation notes with natural language processing methods. J. Biomed. Inform. 48(2), 130–136 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5–6), 602–610 (2005)
Unified Medical Language System (UMLS), NIH. https://www.nlm.nih.gov/research/umls/quickstart.html. Accessed 2017
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: The Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), pp. 260–270 (2016)
Leaman, R., Lu, Z.: Taggerone: joint named entity recognition and normalization with semi-markov models. Bioinformatics 32(18), 343 (2016)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. Computer Science (2013)
Islamaj, D.R., Leaman, R., Lu, Z.Y.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47(2), 1–10 (2014)
Ling, W., Tsvetkov, Y., Amir, S., Fermandez, R., Dyer, C., Black, A.W., et al.: Not all contexts are created equal: better word representations with variable attention. In: The Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1367–1372 (2015)
Campos, D., Matos, S., Oliveira, J.L.: A modular framework for biomedical concept recognition. BMC Bioinform. 14, 281 (2013)
Ling, W., Luís, T., Marujo, L., Astudillo, R.F., Amir, S., Dyer, C.: Finding function in form: compositional character models for open vocabulary word representation. Computer Science, 1899–1907 (2015)
Song, M., Kim, W.C., Lee, D., Heo, G.E., Kang, K.Y.: PKDE4J: entity and relation extraction for public knowledge discovery. J. Biomed. Inform. 57, 320–332 (2015)
Good, B.M., Nanis, M., Wu, C., Su, A.I.: Microtask crowdsourcing for disease mention annotation in pubmed abstracts. Pac. Symp. Biocomput. 509–510, 282 (2015)
Comeau, D.C., Liu, H., Doğan, R.I., Wilbur, W.J.: Natural language processing pipelines to annotate BioC collections with an application to the NCBI disease corpus. Databases 23, 3790–3795 (2014)
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. Computer Science, arXiv:1508.01991 (2015)
Kim, S., Yeganova, L., Wilbur, W.J.: Meshable: searching pubmed abstracts by utilizing mesh and mesh-derived topical terms. Bioinformatics 19, 331 (2016)
Wei, Q., Tao, C., Xu, R., He, Y., Lin, G.: Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Databases 2016, 1–8 (2016)
Acknowledgement
This work was supported by National Natural Science Foundation of China (No. 61403088), the programs of Personalized Health Service Public Platform based on Open and Big Data (No. 2014B010118005), Ancient Literature Knowledge base Platform for the Inheritance and Development of Traditional Chinese Medicine (No. 2014A020221039) and Innovative School Project in Higher Education of Guangdong (No. YQ2015062).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Xu, K., Zhou, Z., Hao, T., Liu, W. (2018). A Bidirectional LSTM and Conditional Random Fields Approach to Medical Named Entity Recognition. In: Hassanien, A., Shaalan, K., Gaber, T., Tolba, M. (eds) Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017. AISI 2017. Advances in Intelligent Systems and Computing, vol 639. Springer, Cham. https://doi.org/10.1007/978-3-319-64861-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-319-64861-3_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64860-6
Online ISBN: 978-3-319-64861-3
eBook Packages: EngineeringEngineering (R0)