Abstract
Discovery of biomedical named entities is one of the preliminary steps for many biomedical texts mining task. In the biomedical domain, typical entities are present, including disease, chemical, gene, and protein. To find these entities, currently, a deep learning-based approach applied into the Biomedical Named Entity Recognition (Bio_NER) which gives prominent results. Although deep learning-based approach gives a satisfactory result, still a tremendous amount of data is required for training because a lack of data can be one of the barriers in the performance of Bio_NER. There is one more obstacle in the path of Bio_NER is polysemy or misclassification of the entity in bio-entity. Which means one biomedical entity might have a different meaning in different places, i.e., a gene named entity may be labeled as disease name. When Conditional Random Field combined with deep learning-based approach i.e. Bidirectional Long Short Term Memory (Bi-LSTM), It mistakenly labeled a gene entity “BRCA1” as a disease entity which is “BRCA1 abnormality” or “Braca1-deficient” present in the training dataset. Similarly, “VHL (Von Hippel-Lindau disease),” which is one of the genes named labeled as a disease by Bi-LSTM CRF Model. One more problem is addressed in this chapter, as bio-med domain, entities are long and complex like cell whose name is “A375M (B-Raf (V600E)) is a human melanoma cell line”, in this biomedical entity, multiple words are present, but still it is difficult to find the context information of this particular bio-entity. For lack of data and entity misclassification problem, this chapter embeds multiple Bio_NER models. In the proposed model, the model trained with different datasets is connected so that the targeted model obtained the information by combining another model, which reduce the false-positives rate. Recurrent Neural Network (RNN) which is dependent upon the Bi-LSTM gates are introduced to handle the long and complex range dependencies in biomedical entities. BioCreative II GM Corpus, Pubmed, Gold-standard dataset, and JNLPBA dataset are used in this research work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- CRF:
-
Conditional random field
- LSTM:
-
Long short term memory
- BILSTM:
-
Bidirectional long short term memory
- BioNER:
-
Biomedical named entity recognition
- NER:
-
Named entity recognition
- MTM:
-
Multi task model
- WE:
-
Word embedding
- CE:
-
Character embedding
References
Zhong, H., Hu, X.: Disease named entity recognition by machine learning using semantic type of metathesaurus. Int. J. Mach. Learn. Comput. 3(6), 494–498 (2014)
Collier, N., Nobata, C., Tsujii, J.: Extracting the names of genes and gene products with a hidden Markov model, vol. 1. In: Proceedings of the 18th Conference on Computational Linguistics, pp. 201–207 (2000)
Zhou, G.D.: Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid. Int. J. Med. Inf. 75(6), 456–467 (2006)
Lafferty, J., Mccallum, A., Pereira, F.C.N., Pereira, F.: Conditional Random Fields, pp. 282–289 (2001)
McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp. 591–598 (2000)
Neves, M.L., Carazo, J.-M., Pascual-Montano, A.: Moara: A Java library for extracting and normalizing gene and protein mentions. BMC Bioinf. 11(1), 157 (2010)
Sahu, S.K., Anand, A.: Recurrent neural network models for disease name recognition using domain invariant features. ArXiv E-Prints. arXiv:1606.09371 (2016)
Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics (Oxford, England) 33(14), i37–i48 (2017)
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. ArXiv E-Prints. arXiv:1603.01360 (2016)
Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th Languages in Biology and Medicine Conference (LBM’13), pp. 39–44 (2013)
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
Goulart, R.R.V., Strube de Lima, V.L., Xavier, C.C.: A systematic review of named entity recognition in biomedical texts. J. Braz. Comput. Soc. 17(2), 103–116 (2011)
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Crichton, G., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinf. 18(1), 368 (2017)
Wang, X., Zhang, Y., Ren, X., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-type biomedical named entity recognition with deep multi-task learning. ArXiv E-Prints. arXiv:1801.09851 (2018)
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference, Department of Computer Science, University of Toronto, no. 3, pp. 6645–6649
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. ArXiv E-Prints. arXiv:1508.06615 (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Song, M., Yu, H., Han, W.-S.: Developing a hybrid dictionary-based bio-entity recognition technique. BMC Med. Inform. Decis. Mak. 15(1), S9 (2015)
Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 70–75 (2004)
Kim, J., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA, 70–75 (n.d.)
Krallinger, M., Rabal, O., Leitner, F., Vazquez, M., Salgado, D., Lu, Z., Leaman, R., Lu, Y., Ji, D., Lowe, D.M., Valencia, A.: The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminf. 7(Suppl 1 Text mining for chemistry and the CHEMDNER track), S2–S2 (2015)
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 147–155 (2009)
Campos, D., Matos, S., Oliveira, J.L.: Biomedical named entity recognition: a survey of machine-learning tools. In: Sakurai, S. (ed.) Theory and Applications for Advanced Text Mining (2012)
Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pacific Symposium on Biocomputing, pp. 707–718 (1998)
Kim, S., Chen, J.Y., Cutello, V., Lee, D.: DTMBIO 2016: The Tenth International Workshop on Data and Text Mining in Biomedical Informatics. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2511–2512 (2016)
Acknowledgements
The authors would like to thank the National Institute of Technology Raipur for providing necessary infrastructure and facility for doing research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kumar, A. (2020). Disambiguation Model for Bio-Medical Named Entity Recognition. In: Dash, S., Acharya, B., Mittal, M., Abraham, A., Kelemen, A. (eds) Deep Learning Techniques for Biomedical and Health Informatics. Studies in Big Data, vol 68. Springer, Cham. https://doi.org/10.1007/978-3-030-33966-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-33966-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33965-4
Online ISBN: 978-3-030-33966-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)