Skip to main content

Disambiguation Model for Bio-Medical Named Entity Recognition

  • Chapter
  • First Online:
Deep Learning Techniques for Biomedical and Health Informatics

Part of the book series: Studies in Big Data ((SBD,volume 68))

Abstract

Discovery of biomedical named entities is one of the preliminary steps for many biomedical texts mining task. In the biomedical domain, typical entities are present, including disease, chemical, gene, and protein. To find these entities, currently, a deep learning-based approach applied into the Biomedical Named Entity Recognition (Bio_NER) which gives prominent results. Although deep learning-based approach gives a satisfactory result, still a tremendous amount of data is required for training because a lack of data can be one of the barriers in the performance of Bio_NER. There is one more obstacle in the path of Bio_NER is polysemy or misclassification of the entity in bio-entity. Which means one biomedical entity might have a different meaning in different places, i.e., a gene named entity may be labeled as disease name. When Conditional Random Field combined with deep learning-based approach i.e. Bidirectional Long Short Term Memory (Bi-LSTM), It mistakenly labeled a gene entity “BRCA1” as a disease entity which is “BRCA1 abnormality” or “Braca1-deficient” present in the training dataset. Similarly, “VHL (Von Hippel-Lindau disease),” which is one of the genes named labeled as a disease by Bi-LSTM CRF Model. One more problem is addressed in this chapter, as bio-med domain, entities are long and complex like cell whose name is “A375M (B-Raf (V600E)) is a human melanoma cell line”, in this biomedical entity, multiple words are present, but still it is difficult to find the context information of this particular bio-entity. For lack of data and entity misclassification problem, this chapter embeds multiple Bio_NER models. In the proposed model, the model trained with different datasets is connected so that the targeted model obtained the information by combining another model, which reduce the false-positives rate. Recurrent Neural Network (RNN) which is dependent upon the Bi-LSTM gates are introduced to handle the long and complex range dependencies in biomedical entities. BioCreative II GM Corpus, Pubmed, Gold-standard dataset, and JNLPBA dataset are used in this research work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

CRF:

Conditional random field

LSTM:

Long short term memory

BILSTM:

Bidirectional long short term memory

BioNER:

Biomedical named entity recognition

NER:

Named entity recognition

MTM:

Multi task model

WE:

Word embedding

CE:

Character embedding

References

  1. Zhong, H., Hu, X.: Disease named entity recognition by machine learning using semantic type of metathesaurus. Int. J. Mach. Learn. Comput. 3(6), 494–498 (2014)

    Article  Google Scholar 

  2. Collier, N., Nobata, C., Tsujii, J.: Extracting the names of genes and gene products with a hidden Markov model, vol. 1. In: Proceedings of the 18th Conference on Computational Linguistics, pp. 201–207 (2000)

    Google Scholar 

  3. Zhou, G.D.: Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid. Int. J. Med. Inf. 75(6), 456–467 (2006)

    Article  Google Scholar 

  4. Lafferty, J., Mccallum, A., Pereira, F.C.N., Pereira, F.: Conditional Random Fields, pp. 282–289 (2001)

    Google Scholar 

  5. McCallum, A., Freitag, D., Pereira, F.C.N.: Maximum entropy markov models for information extraction and segmentation. In: Proceedings of the Seventeenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, pp. 591–598 (2000)

    Google Scholar 

  6. Neves, M.L., Carazo, J.-M., Pascual-Montano, A.: Moara: A Java library for extracting and normalizing gene and protein mentions. BMC Bioinf. 11(1), 157 (2010)

    Article  Google Scholar 

  7. Sahu, S.K., Anand, A.: Recurrent neural network models for disease name recognition using domain invariant features. ArXiv E-Prints. arXiv:1606.09371 (2016)

  8. Habibi, M., Weber, L., Neves, M., Wiegandt, D.L., Leser, U.: Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics (Oxford, England) 33(14), i37–i48 (2017)

    Article  Google Scholar 

  9. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. ArXiv E-Prints. arXiv:1603.01360 (2016)

  10. Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th Languages in Biology and Medicine Conference (LBM’13), pp. 39–44 (2013)

    Google Scholar 

  11. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)

    Article  Google Scholar 

  12. Goulart, R.R.V., Strube de Lima, V.L., Xavier, C.C.: A systematic review of named entity recognition in biomedical texts. J. Braz. Comput. Soc. 17(2), 103–116 (2011)

    Article  Google Scholar 

  13. Caruana, R.: Multitask learning. Mach. Learn. 28(1), 41–75 (1997)

    Article  MathSciNet  Google Scholar 

  14. Crichton, G., Pyysalo, S., Chiu, B., Korhonen, A.: A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinf. 18(1), 368 (2017)

    Article  Google Scholar 

  15. Wang, X., Zhang, Y., Ren, X., Zhang, Y., Zitnik, M., Shang, J., Langlotz, C., Han, J.: Cross-type biomedical named entity recognition with deep multi-task learning. ArXiv E-Prints. arXiv:1801.09851 (2018)

  16. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference, Department of Computer Science, University of Toronto, no. 3, pp. 6645–6649

    Google Scholar 

  17. Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. ArXiv E-Prints. arXiv:1508.06615 (2015)

  18. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  19. Song, M., Yu, H., Han, W.-S.: Developing a hybrid dictionary-based bio-entity recognition technique. BMC Med. Inform. Decis. Mak. 15(1), S9 (2015)

    Article  Google Scholar 

  20. Kim, J.-D., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 70–75 (2004)

    Google Scholar 

  21. Kim, J., Ohta, T., Tsuruoka, Y., Tateisi, Y., Collier, N.: Introduction to the bio-entity recognition task at JNLPBA, 70–75 (n.d.)

    Google Scholar 

  22. Krallinger, M., Rabal, O., Leitner, F., Vazquez, M., Salgado, D., Lu, Z., Leaman, R., Lu, Y., Ji, D., Lowe, D.M., Valencia, A.: The CHEMDNER corpus of chemicals and drugs and its annotation principles. J. Cheminf. 7(Suppl 1 Text mining for chemistry and the CHEMDNER track), S2–S2 (2015)

    Google Scholar 

  23. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 147–155 (2009)

    Google Scholar 

  24. Campos, D., Matos, S., Oliveira, J.L.: Biomedical named entity recognition: a survey of machine-learning tools. In: Sakurai, S. (ed.) Theory and Applications for Advanced Text Mining (2012)

    Google Scholar 

  25. Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pacific Symposium on Biocomputing, pp. 707–718 (1998)

    Google Scholar 

  26. Kim, S., Chen, J.Y., Cutello, V., Lee, D.: DTMBIO 2016: The Tenth International Workshop on Data and Text Mining in Biomedical Informatics. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 2511–2512 (2016)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the National Institute of Technology Raipur for providing necessary infrastructure and facility for doing research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Kumar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kumar, A. (2020). Disambiguation Model for Bio-Medical Named Entity Recognition. In: Dash, S., Acharya, B., Mittal, M., Abraham, A., Kelemen, A. (eds) Deep Learning Techniques for Biomedical and Health Informatics. Studies in Big Data, vol 68. Springer, Cham. https://doi.org/10.1007/978-3-030-33966-1_3

Download citation

Publish with us

Policies and ethics