A Deep Learning Way for Disease Name Representation and Normalization

Liu, Hongwei; Xu, Yun

doi:10.1007/978-3-319-73618-1_13

Hongwei Liu¹⁸ &
Yun Xu¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Included in the following conference series:

National CCF Conference on Natural Language Processing and Chinese Computing

3372 Accesses
5 Citations

Abstract

Disease name normalization aims at mapping various disease names to standardized disease vocabulary entries. Disease names have such a wide variation that dictionary lookup method couldn’t get a high accuracy on this task. Dnorm is the first machine learning approach for this task. It is not robust enough due to strong dependence on training dataset. In this article, we propose a deep learning way for disease name representation and normalization. Representations of composing words can be learned from large unlabelled literature corpus. Rich semantic and syntactic properties of disease names are encoded in the representations during the process. With the new way of representations for disease names, a higher accuracy is achieved in the normalization task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lu, Z.: PubMed and beyond: a survey of web tools for searching biomedical literature. Database 2011, baq036 (2011)
Article Google Scholar
Garcia-Albornoz, M., Nielsen, J.: Finding directionality and gene-disease predictions in disease associations. BMC Syst. Biol. 9(1), 35 (2015)
Article Google Scholar
Yu, L., Huang, J., Ma, Z., et al.: Inferring drug-disease associations based on known protein complexes. BMC Med. Genomics 8(2), S2 (2015)
Article Google Scholar
Leaman, R., Gonzalez, G.: BANNER: an executable survey of advances in biomedical named entity recognition. In: Pacific Symposium on Biocomputing, vol. 13, pp. 652–663 (2008)
Google Scholar
Doğan, R.I., Lu, Z.: An inference method for disease name normalization. In: AAAI Fall Symposium Series (2012)
Google Scholar
Kang, N., Singh, B., Afzal, Z., et al.: Using rule-based natural language processing to improve disease normalization in biomedical text. J. Am. Med. Inform. Assoc. 20(5), 876–881 (2013)
Article Google Scholar
Leaman, R., Doğan, R.I., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)
Article Google Scholar
Cao, Z., Qin, T., Liu, T.Y., et al.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine learning, pp. 129–136. ACM (2007)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1556–1566 (2015)
Google Scholar
Al-Rfou, R., et al.: Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint (2016)
Google Scholar
Medical Subject Headings. https://www.nlm.nih.gov/mesh
An Online Catalog of Human Genes and Genetic Disorders. https://www.omim.org
Davis, A.P., Wiegers, T.C., Rosenstein, M.C., et al.: MEDIC: a practical disease vocabulary used at the comparative toxicogenomics database. Database 2012, bar065 (2012)
Google Scholar
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)
Article Google Scholar
US National Labrary of Medicine. https://www.ncbi.nlm.nih.gov/pubmed
Li, J., Sun, Y., Johnson, R.J., et al.: BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016, baw068 (2016)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, China
Hongwei Liu & Yun Xu

Authors

Hongwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yun Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun Xu .

Editor information

Editors and Affiliations

Fudan University, Shanghai, China
Xuanjing Huang
Singapore Management University, Singapore, Singapore
Jing Jiang
Peking University, Beijing, China
Dongyan Zhao
Peking University, Beijing, China
Yansong Feng
Soochow University, Suzhou, China
Yu Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, H., Xu, Y. (2018). A Deep Learning Way for Disease Name Representation and Normalization. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-73618-1_13
Published: 05 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73617-4
Online ISBN: 978-3-319-73618-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics