Skip to main content

A Deep Learning Way for Disease Name Representation and Normalization

  • Conference paper
  • First Online:
Natural Language Processing and Chinese Computing (NLPCC 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10619))

Abstract

Disease name normalization aims at mapping various disease names to standardized disease vocabulary entries. Disease names have such a wide variation that dictionary lookup method couldn’t get a high accuracy on this task. Dnorm is the first machine learning approach for this task. It is not robust enough due to strong dependence on training dataset. In this article, we propose a deep learning way for disease name representation and normalization. Representations of composing words can be learned from large unlabelled literature corpus. Rich semantic and syntactic properties of disease names are encoded in the representations during the process. With the new way of representations for disease names, a higher accuracy is achieved in the normalization task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lu, Z.: PubMed and beyond: a survey of web tools for searching biomedical literature. Database 2011, baq036 (2011)

    Article  Google Scholar 

  2. Garcia-Albornoz, M., Nielsen, J.: Finding directionality and gene-disease predictions in disease associations. BMC Syst. Biol. 9(1), 35 (2015)

    Article  Google Scholar 

  3. Yu, L., Huang, J., Ma, Z., et al.: Inferring drug-disease associations based on known protein complexes. BMC Med. Genomics 8(2), S2 (2015)

    Article  Google Scholar 

  4. Leaman, R., Gonzalez, G.: BANNER: an executable survey of advances in biomedical named entity recognition. In: Pacific Symposium on Biocomputing, vol. 13, pp. 652–663 (2008)

    Google Scholar 

  5. DoÄźan, R.I., Lu, Z.: An inference method for disease name normalization. In: AAAI Fall Symposium Series (2012)

    Google Scholar 

  6. Kang, N., Singh, B., Afzal, Z., et al.: Using rule-based natural language processing to improve disease normalization in biomedical text. J. Am. Med. Inform. Assoc. 20(5), 876–881 (2013)

    Article  Google Scholar 

  7. Leaman, R., Doğan, R.I., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)

    Article  Google Scholar 

  8. Cao, Z., Qin, T., Liu, T.Y., et al.: Learning to rank: from pairwise approach to listwise approach. In: Proceedings of the 24th International Conference on Machine learning, pp. 129–136. ACM (2007)

    Google Scholar 

  9. Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. In: ICLR Workshop (2013)

    Google Scholar 

  10. Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  11. Tai, K.S., Socher, R., Manning, C.D.: Improved semantic representations from tree-structured long short-term memory networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 1556–1566 (2015)

    Google Scholar 

  12. Al-Rfou, R., et al.: Theano: A Python framework for fast computation of mathematical expressions. arXiv preprint (2016)

    Google Scholar 

  13. Medical Subject Headings. https://www.nlm.nih.gov/mesh

  14. An Online Catalog of Human Genes and Genetic Disorders. https://www.omim.org

  15. Davis, A.P., Wiegers, T.C., Rosenstein, M.C., et al.: MEDIC: a practical disease vocabulary used at the comparative toxicogenomics database. Database 2012, bar065 (2012)

    Google Scholar 

  16. Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform. 47, 1–10 (2014)

    Article  Google Scholar 

  17. US National Labrary of Medicine. https://www.ncbi.nlm.nih.gov/pubmed

  18. Li, J., Sun, Y., Johnson, R.J., et al.: BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016, baw068 (2016)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Xu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, H., Xu, Y. (2018). A Deep Learning Way for Disease Name Representation and Normalization. In: Huang, X., Jiang, J., Zhao, D., Feng, Y., Hong, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2017. Lecture Notes in Computer Science(), vol 10619. Springer, Cham. https://doi.org/10.1007/978-3-319-73618-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73618-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73617-4

  • Online ISBN: 978-3-319-73618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics