Recognizing Biomedical Named Entities Based on the Sentence Vector/Twin Word Embeddings Conditioned Bidirectional LSTM

Li, Lishuang; Jin, Liuke; Jiang, Yuxin; Huang, Degen

doi:10.1007/978-3-319-47674-2_15

Lishuang Li¹⁸,
Liuke Jin¹⁸,
Yuxin Jiang¹⁸ &
…
Degen Huang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10035))

Included in the following conference series:

1990 Accesses
13 Citations

Abstract

As a fundamental step in biomedical information extraction tasks, biomedical named entity recognition remains challenging. In recent years, the neural network has been applied on the entity recognition to avoid the complex hand-designed features, which are derived from various linguistic analyses. However, performance of the conventional neural network systems is always limited to exploiting long range dependencies in sentences. In this paper, we mainly adopt the bidirectional recurrent neural network with LSTM unit to identify biomedical entities, in which the twin word embeddings and sentence vector are added to rich input information. Therefore, the complex feature extraction can be skipped. In the testing phase, Viterbi algorithm is also used to filter the illogical label sequences. The experimental results conducted on the BioCreative II GM corpus show that our system can achieve an F-score of 88.61 %, which outperforms CRF models using the complex hand-designed features and is 6.74 % higher than RNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://deeplearning.net/tutorial/rnnslu.html.

References

Li, L., Fan, W., Huang, D., Dang, Y., Sun, J.: Boosting performance of gene mention tagging system by hybrid methods. J. Biomed. Inform. 45(1), 156–164 (2012)
Article Google Scholar
Shen, D., Zhang, J., Zhou, G., Su, J., Tan, C.: Effective adaptation of a hidden Markov model-based named entity recognizer for biomedical domain. In: Proceedings of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, vol. 13, pp. 49–56 (2003)
Google Scholar
Saha, S., Sarkar, S., Mitra, P.: Feature selection techniques for maximum entropy based biomedical named entity recognition. J. Biomed. Inform. 42(5), 905–911 (2009)
Article Google Scholar
Sun, C., Guan, Y., Wang, X., Lin, L.: Rich features based conditional random fields for biological named entities recognition. Comput. Biol. Med. 37(9), 1327–1333 (2007)
Article Google Scholar
Lee, K., Hwang, Y., Kim, S., Rim, H.: Biomedical named entity recognition using two-phase model based on SVMs. J. Biomed. Inform. 37(6), 436–447 (2004)
Article Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12(8), 2493–2537 (2011)
MATH Google Scholar
Chen, Y., Zheng, D., Zhao, T.: Exploring deep belief nets to detect and categorize Chinese entities. In: International Conference on Advanced Data Mining and Applications, pp. 468–480 (2013)
Google Scholar
Li, L., Jin, L., Huang, D.: Exploring recurrent neural networks to detect named entities from biomedical text. In: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pp. 279–290 (2015)
Google Scholar
Li, L., Jin, L., Jiang, Z., Song D., Huang, D.: Biomedical named entity recognition based on extended recurrent neural networks. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 649–652 (2015)
Google Scholar
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv Preprint arXiv:1212.5701 (2012)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Chen, Y., Zheng, D., Zhao, T.: Exploring deep belief nets to detect and categorize Chinese entities. In: International Conference on Advanced Data Mining and Applications, pp. 468–480 (2013)
Google Scholar
Ando, R.K.: BioCreative II gene mention tagging system at IBM watson. In: Proceedings of the Second BioCreative Challenge Evaluation Workshop, vol. 23, pp. 101–103 (2007)
Google Scholar
Li, L., Zhou, R., Huang D., Liao, W.: Integrating divergent models for gene mention tagging. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 1–7 (2009)
Google Scholar
Li, L., He, H., Liu, S., Huang, D.: Research of word representations on biomedical named entity recognition. J. Chin. Comput. Syst. 2, 302–307 (2016). (in Chinese)
Google Scholar
Li, Y., Lin, H., Yang, Z.: Incorporating rich background knowledge for gene named entity classification and recognition. BMC Bioinform. 10(1), 1–15 (2009)
Article Google Scholar
Yao, L., Liu, H., Liu, Y., Li, X., Anwar, M.W.: Biomedical named entity recognition based on deep neutral network. Corpus 8(8), 279–288 (2015)
Google Scholar
Chang, F., Guo, J., Xu, W., Chung, S.: Application of word embeddings in biomedical named entity recognition tasks. J. Digital Inf. Manage. 13(5), 321–327 (2015)
Google Scholar
Wang, X., Yang, C., Guan, R.: A comparative study for biomedical named entity recognition. Int. J. Mach. Learn. Cybern. 1–10 (2015). doi:10.1007/s13042-015-0426-6
Google Scholar
Zhou, G. Su, J.: Exploring deep knowledge resources in biomedical name recognition. In: International Joint Workshop on Natural Language Processing in Biomedicine and ITS Applications, pp. 96–99 (2004)
Google Scholar

Download references

Acknowledgment

The authors gratefully acknowledge the financial support provided by the National Natural Science Foundation of China under Nos. 61173101, 61672126. The Tesla K40 used for this research was donated by the NVIDIA Corporation.

Author information

Authors and Affiliations

School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
Lishuang Li, Liuke Jin, Yuxin Jiang & Degen Huang

Authors

Lishuang Li
View author publications
You can also search for this author in PubMed Google Scholar
Liuke Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yuxin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Degen Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lishuang Li .

Editor information

Editors and Affiliations

Tsinghua University , Beijing, China
Maosong Sun
Fudan University , Shanghai, China
Xuanjing Huang
Dalian University of Technology , Dalian, China
Hongfei Lin
Tsinghua University , Beijing, China
Zhiyuan Liu
Tsinghua University , Beijing, China
Yang Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Jin, L., Jiang, Y., Huang, D. (2016). Recognizing Biomedical Named Entities Based on the Sentence Vector/Twin Word Embeddings Conditioned Bidirectional LSTM. In: Sun, M., Huang, X., Lin, H., Liu, Z., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2016 2016. Lecture Notes in Computer Science(), vol 10035. Springer, Cham. https://doi.org/10.1007/978-3-319-47674-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-47674-2_15
Published: 10 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47673-5
Online ISBN: 978-3-319-47674-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics