Advertisement

RENET: A Deep Learning Approach for Extracting Gene-Disease Associations from Literature

  • Ye Wu
  • Ruibang Luo
  • Henry C. M. Leung
  • Hing-Fung Ting
  • Tak-Wah LamEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11467)

Abstract

Over one million new biomedical articles are published every year. Efficient and accurate text-mining tools are urgently needed to automatically extract knowledge from these articles to support research and genetic testing. In particular, the extraction of gene-disease associations is mostly studied. However, existing text-mining tools for extracting gene-disease associations have limited capacity, as each sentence is considered separately. Our experiments show that the best existing tools, such as BeFree and DTMiner, achieve a precision of 48% and recall rate of 78% at most. In this study, we designed and implemented a deep learning approach, named RENET, which considers the correlation between the sentences in an article to extract gene-disease associations. Our method has significantly improved the precision and recall rate to 85.2% and 81.8%, respectively. The source code of RENET is available at https://bitbucket.org/alexwuhkucs/gda-extraction/src/master/.

Keywords

Literature mining Relation Extraction Gene-disease association Deep learning 

Notes

Acknowledgments

This work was supported by Hong Kong ITF Grant ITS/331/17FP and General Research Fund No. 27204518.

References

  1. 1.
    Lu, Y.-F., Goldstein, D.B., Angrist, M., Cavalleri, G.: Personalized medicine and human genetic diversity. Cold Spring Harbor Perspect. Med. 4, a008581 (2014)CrossRefGoogle Scholar
  2. 2.
    Garraway, L.A., Verweij, J., Ballman, K.V.: Precision oncology: an overview. J. Clin. Oncol. 31(15), 1803–1805 (2013)CrossRefGoogle Scholar
  3. 3.
    Westergaard, D., Stærfeldt, H.-H., Tønsberg, C., Jensen, L.J., Brunak, S.: A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts. PLoS Comput. Biol. 14(2), e1005962 (2018)CrossRefGoogle Scholar
  4. 4.
    Wei, C.-H., Kao, H.-Y., Lu, Z.: PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res. 41(W1), W518–W522 (2013)CrossRefGoogle Scholar
  5. 5.
    Wang, Y., et al.: No association between bipolar disorder and syngr1 or synapsin II polymorphisms in the Han Chinese population. Psychiatry Res. 169(2), 167–168 (2009)CrossRefGoogle Scholar
  6. 6.
    Hakenberg, J., et al.: A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions. J. Biomed. Inf. 45(5), 842–850 (2012)CrossRefGoogle Scholar
  7. 7.
    Song, M., Kim, W.C., Lee, D., Heo, G.E., Kang, K.Y.: PKDE4 J: entity and relation extraction for public knowledge discovery. J. Biomed. Inf. 57, 320–332 (2015)CrossRefGoogle Scholar
  8. 8.
    Thompson, P., Ananiadou, S.: Extracting gene-disease relations from text to support biomarker discovery. In: Proceedings of the 2017 International Conference on Digital Health, pp. 180–189. ACM (2017)Google Scholar
  9. 9.
    Bundschus, M., Dejori, M., Stetter, M., Tresp, V., Kriegel, H.-P.: Extraction of semantic biomedical relations from text using conditional random fields. BMC Bioinf. 9(1), 207 (2008)CrossRefGoogle Scholar
  10. 10.
    Chun, H.-W., et al.: Extraction of gene-disease relations from Medline using domain dictionaries and machine learning. In: Biocomputing, pp. 4–15. World Scientific (2006)Google Scholar
  11. 11.
    Peng, Y., Lu, Z.: Deep learning for extracting protein-protein interactions from biomedical literature. arXiv preprint arXiv:1706.01556 (2017)
  12. 12.
    Bravo, À., Piñero, J., Queralt-Rosinach, N., Rautschka, M., Furlong, L.I.: Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinf. 16(1), 55 (2015)CrossRefGoogle Scholar
  13. 13.
    Miwa, M., Bansal, M.: End-to-end relation extraction using LSTMS on sequences and tree structures. arXiv preprint arXiv:1601.00770 (2016)
  14. 14.
    Nguyen, T.H., Grishman, R.: Relation extraction: perspective from convolutional neural networks. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 39–48 (2015)Google Scholar
  15. 15.
    Tang, D., Qin, B., Liu, T.: Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1422–1432 (2015)Google Scholar
  16. 16.
    Xu, D., et al.: DTMiner: identification of potential disease targets through biomedical literature mining. Bioinformatics 32(23), 3619–3626 (2016)Google Scholar
  17. 17.
    Roberts, R.J.: PubMed central: the GenBank of the published literature. Proc. Natl. Acad. Sci. U. S. A. 98(2), 381–382 (2001)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)zbMATHGoogle Scholar
  19. 19.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  20. 20.
    Tang, D., Qin, B., Liu, T.: Learning semantic representations of users and products for document level sentiment classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 1014–1023 (2015)Google Scholar
  21. 21.
    Denil, M., Demiraj, A., Kalchbrenner, N., Blunsom, P., de Freitas, N.: Modelling, visualising and summarising documents with a single convolutional neural network. arXiv preprint arXiv:1408.5882 (2014)
  22. 22.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
  23. 23.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994)CrossRefGoogle Scholar
  24. 24.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  25. 25.
    Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceeding of the Conference on Empirical Methods in Natural Language Processing, pp. 1724–1734 (2014)Google Scholar
  26. 26.
    Graves, A., Jaitly, N., Mohamed, A.-R.: Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE (2013)Google Scholar
  27. 27.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  28. 28.
    Piñero, J., et al.: DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Res. 45(D1), D833–D839 (2016)CrossRefGoogle Scholar
  29. 29.
    Moen, S., Ananiadou, T.S.S.: Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th International Symposium on Languages in Biology and Medicine, Tokyo, Japan, pp. 39–43 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Ye Wu
    • 1
  • Ruibang Luo
    • 1
  • Henry C. M. Leung
    • 1
  • Hing-Fung Ting
    • 1
  • Tak-Wah Lam
    • 1
    Email author
  1. 1.Department of Computer ScienceThe University of Hong KongPokfulamHong Kong

Personalised recommendations