Disambiguation Model for Bio-Medical Named Entity Recognition

Kumar, A.

doi:10.1007/978-3-030-33966-1_3

A. Kumar⁷

Part of the book series: Studies in Big Data ((SBD,volume 68))

2147 Accesses
1 Citations

Abstract

Discovery of biomedical named entities is one of the preliminary steps for many biomedical texts mining task. In the biomedical domain, typical entities are present, including disease, chemical, gene, and protein. To find these entities, currently, a deep learning-based approach applied into the Biomedical Named Entity Recognition (Bio_NER) which gives prominent results. Although deep learning-based approach gives a satisfactory result, still a tremendous amount of data is required for training because a lack of data can be one of the barriers in the performance of Bio_NER. There is one more obstacle in the path of Bio_NER is polysemy or misclassification of the entity in bio-entity. Which means one biomedical entity might have a different meaning in different places, i.e., a gene named entity may be labeled as disease name. When Conditional Random Field combined with deep learning-based approach i.e. Bidirectional Long Short Term Memory (Bi-LSTM), It mistakenly labeled a gene entity “BRCA1” as a disease entity which is “BRCA1 abnormality” or “Braca1-deficient” present in the training dataset. Similarly, “VHL (Von Hippel-Lindau disease),” which is one of the genes named labeled as a disease by Bi-LSTM CRF Model. One more problem is addressed in this chapter, as bio-med domain, entities are long and complex like cell whose name is “A375M (B-Raf (V600E)) is a human melanoma cell line”, in this biomedical entity, multiple words are present, but still it is difficult to find the context information of this particular bio-entity. For lack of data and entity misclassification problem, this chapter embeds multiple Bio_NER models. In the proposed model, the model trained with different datasets is connected so that the targeted model obtained the information by combining another model, which reduce the false-positives rate. Recurrent Neural Network (RNN) which is dependent upon the Bi-LSTM gates are introduced to handle the long and complex range dependencies in biomedical entities. BioCreative II GM Corpus, Pubmed, Gold-standard dataset, and JNLPBA dataset are used in this research work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

CRF:: Conditional random field
LSTM:: Long short term memory
BILSTM:: Bidirectional long short term memory
BioNER:: Biomedical named entity recognition
NER:: Named entity recognition
MTM:: Multi task model
WE:: Word embedding
CE:: Character embedding

References

Zhong, H., Hu, X.: Disease named entity recognition by machine learning using semantic type of metathesaurus. Int. J. Mach. Learn. Comput. 3(6), 494–498 (2014)
Article Google Scholar
Collier, N., Nobata, C., Tsujii, J.: Extracting the names of genes and gene products with a hidden Markov model, vol. 1. In: Proceedings of the 18th Conference on Computational Linguistics, pp. 201–207 (2000)
Google Scholar
Zhou, G.D.: Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid. Int. J. Med. Inf. 75(6), 456–467 (2006)
Article Google Scholar
Lafferty, J., Mccallum, A., Pereira, F.C.N., Pereira, F.: Conditional Random Fields, pp. 282–289 (2001)
Google Scholar
McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp. 591–598 (2000)
Google Scholar
Neves, M.L., Carazo, J.-M., Pascual-Montano, A.: Moara: A Java library for extracting and normalizing gene and protein mentions. BMC Bioinf. 11(1), 157 (2010)
Article Google Scholar
Sahu, S.K., Anand, A.: Recurrent neural network models for disease name recognition using domain invariant features. ArXiv E-Prints. arXiv:1606.09371 (2016)
Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics (Oxford, England) 33(14), i37–i48 (2017)
Article Google Scholar
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. ArXiv E-Prints. arXiv:1603.01360 (2016)
Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th Languages in Biology and Medicine Conference (LBM’13), pp. 39–44 (2013)
Google Scholar
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
Article Google Scholar
Goulart, R.R.V., Strube de Lima, V.L., Xavier, C.C.: A systematic review of named entity recognition in biomedical texts. J. Braz. Comput. Soc. 17(2), 103–116 (2011)
Article Google Scholar
Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)
Article MathSciNet Google Scholar
Crichton, G., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinf. 18(1), 368 (2017)
Article Google Scholar
Wang, X., Zhang, Y., Ren, X., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-type biomedical named entity recognition with deep multi-task learning. ArXiv E-Prints. arXiv:1801.09851 (2018)
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference, Department of Computer Science, University of Toronto, no. 3, pp. 6645–6649
Google Scholar
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. ArXiv E-Prints. arXiv:1508.06615 (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Song, M., Yu, H., Han, W.-S.: Developing a hybrid dictionary-based bio-entity recognition technique. BMC Med. Inform. Decis. Mak. 15(1), S9 (2015)
Article Google Scholar
Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 70–75 (2004)
Google Scholar
Kim, J., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA, 70–75 (n.d.)
Google Scholar
Krallinger, M., Rabal, O., Leitner, F., Vazquez, M., Salgado, D., Lu, Z., Leaman, R., Lu, Y., Ji, D., Lowe, D.M., Valencia, A.: The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminf. 7(Suppl 1 Text mining for chemistry and the CHEMDNER track), S2–S2 (2015)
Google Scholar
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 147–155 (2009)
Google Scholar
Campos, D., Matos, S., Oliveira, J.L.: Biomedical named entity recognition: a survey of machine-learning tools. In: Sakurai, S. (ed.) Theory and Applications for Advanced Text Mining (2012)
Google Scholar
Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pacific Symposium on Biocomputing, pp. 707–718 (1998)
Google Scholar
Kim, S., Chen, J.Y., Cutello, V., Lee, D.: DTMBIO 2016: The Tenth International Workshop on Data and Text Mining in Biomedical Informatics. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2511–2512 (2016)
Google Scholar

Download references

Acknowledgements

The authors would like to thank the National Institute of Technology Raipur for providing necessary infrastructure and facility for doing research.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Raipur, Raipur, Chhattisgarh, 492010, India
A. Kumar

Authors

A. Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Kumar .

Editor information

Editors and Affiliations

Department of Computer Science, North Orissa University, Takatpur, Odisha, India
Sujata Dash
School of Computer Science and Engineering, KIIT Deemed to University, Bhubaneswar, Odisha, India
Biswa Ranjan Acharya
Computer Science and Engineering Department, G. B. Pant Government Engineering College, New Delhi, Delhi, India
Mamta Mittal
Scientific Network for Innovation and Research Excellence, Machine Intelligence Research Labs, Auburn, AL, USA
Ajith Abraham
Department of Organizational Systems and Adult Health, University of Maryland, Baltimore, MD, USA
Arpad Kelemen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kumar, A. (2020). Disambiguation Model for Bio-Medical Named Entity Recognition. In: Dash, S., Acharya, B., Mittal, M., Abraham, A., Kelemen, A. (eds) Deep Learning Techniques for Biomedical and Health Informatics. Studies in Big Data, vol 68. Springer, Cham. https://doi.org/10.1007/978-3-030-33966-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-33966-1_3
Published: 15 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33965-4
Online ISBN: 978-3-030-33966-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics