Skip to main content

Measuring the Semantic World – How to Map Meaning to High-Dimensional Entity Clusters in PubMed?

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11279))

Abstract

The exponential increase of scientific publications in the medical field urgently calls for innovative access paths beyond the limits of a term-based search. As an example, the search term “diabetes” leads to a result of over 600,000 publications in the medical digital library PubMed. In such cases, the automatic extraction of semantic relations between important entities like active substances, diseases, and genes can help to reveal entity-relationships and thus allow simplified access to the knowledge embedded in digital libraries. On the other hand, for semantic-relation tasks distributional embedding models based on neural networks promise considerable progress in terms of accuracy, performance and scalability. Yet, despite the recent successes of neural networks in this field, questions arise related to their non-deterministic nature: Are the semantic relations meaningful, and perhaps even new and unknown entity-relationships? In this paper, we address this question by measuring the associations between important pharmaceutical entities such as active substances (drugs) and diseases in high-dimensional embedded space. In our investigation, we show that while on one hand only few of the contextualized associations directly correlate with spatial distance, on the other hand we have discovered their potential for predicting new associations, which makes the method suitable as a new, literature-based technique for important practical tasks like e.g., drug repurposing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://ctdbase.org/.

  2. 2.

    https://deeplearning4j.org/.

  3. 3.

    https://www.ncbi.nlm.nih.gov/pubmed/.

  4. 4.

    https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/.

  5. 5.

    https://www.drugbank.ca/.

  6. 6.

    https://lucene.apache.org/.

  7. 7.

    https://deeplearning4j.org/word2vec.

References

  1. Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 238–247 (2014)

    Google Scholar 

  2. Leaman, R., Islamaj Doğan, R., Lu, Z.: DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22), 2909–2917 (2013)

    Article  Google Scholar 

  3. Zhang, W., et al.: Predicting drug-disease associations based on the known association bipartite network. In: 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 503–509. IEEE, November 2017

    Google Scholar 

  4. Lotfi Shahreza, M., Ghadiri, N., Mousavi, S.R., Varshosaz, J., Green, J.R.: A review of network-based approaches to drug repositioning. Brief. Bioinform. (2017). https://doi.org/10.1093/bib/bbx017

    Article  Google Scholar 

  5. Jensen, L.J., Saric, J., Bork, P.: Literature mining for the biologist: from information retrieval to biological discovery. Nat. Rev. Genet. 7(2), 119 (2006)

    Article  Google Scholar 

  6. Dudley, J.T., Deshpande, T., Butte, A.J.: Exploiting drug–disease relationships for computational drug repositioning. Brief. Bioinform. 12(4), 303–311 (2011)

    Article  Google Scholar 

  7. Wawrzinek, J., Balke, W.-T.: Semantic facettation in pharmaceutical collections using deep learning for active substance contextualization. In: Choemprayong, S., Crestani, F., Cunningham, S.J. (eds.) ICADL 2017. LNCS, vol. 10647, pp. 41–53. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70232-2_4

    Chapter  Google Scholar 

  8. Keiser, M.J., et al.: Predicting new molecular targets for known drugs. Nature 462(7270), 175 (2009)

    Article  Google Scholar 

  9. Agarwal, P., Searls, D.B.: Can literature analysis identify innovation drivers in drug discovery? Nat. Rev. Drug Discov. 8(11), 865 (2009)

    Article  Google Scholar 

  10. Ngo, D.L., et al.: Application of word embedding to drug repositioning. J. Biomed. Sci. Eng. 9(01), 7 (2016)

    Article  Google Scholar 

  11. Lengerich, B.J., Maas, A.L., Potts, C.: Retrofitting distributional embeddings to knowledge graphs with functional relations. arXiv preprint arXiv:1708.00112 (2017)

  12. Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751 (2013)

    Google Scholar 

  13. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  14. Chiang, A.P., Butte, A.J.: Systematic evaluation of drug–disease relationships to identify leads for novel drug uses. Clin. Pharmacol. Ther. 86(5), 507–510 (2009)

    Article  Google Scholar 

  15. Elekes, Á., Schäler, M., Böhm, K.: On the various semantics of similarity in word embedding models. In: 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 1–10. IEEE, June 2017

    Google Scholar 

  16. Dumais, S.T.: Latent semantic analysis. Annu. Rev. Inf. Sci. Technol. (ARIST) 38(1), 188–230 (2004). Association for Information Science & Technology

    Article  Google Scholar 

  17. Larsen, P.O., Von Ins, M.: The rate of growth in scientific publication and the decline in coverage provided by Science Citation Index. Scientometrics 84(3), 575–603 (2010)

    Article  Google Scholar 

  18. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Lang. Cogn. Process. 6(1), 1–28 (1991)

    Article  MathSciNet  Google Scholar 

  19. Miller, G.A.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  20. Rinaldi, F., Clematide, S., Hafner, S.: Ranking of CTD articles and interactions using the OntoGene pipeline. In: Proceedings of the 2012 BioCreative Workshop, April 2012

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Janus Wawrzinek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wawrzinek, J., Balke, WT. (2018). Measuring the Semantic World – How to Map Meaning to High-Dimensional Entity Clusters in PubMed?. In: Dobreva, M., Hinze, A., Žumer, M. (eds) Maturity and Innovation in Digital Libraries. ICADL 2018. Lecture Notes in Computer Science(), vol 11279. Springer, Cham. https://doi.org/10.1007/978-3-030-04257-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04257-8_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04256-1

  • Online ISBN: 978-3-030-04257-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics