Encyclopedia of Big Data Technologies

Living Edition
| Editors: Sherif Sakr, Albert Zomaya

Big Semantic Data Processing in the Life Sciences Domain

  • Helena F. DeusEmail author
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-63962-8_315-1



Big semantic data processing in the life sciences deals with a set of graph-based techniques and methods used to integrate or analyze empirical evidence obtained in the course of life sciences or biomedical research.


The twofold ambition behind biomedical research and development is to either create new knowledge or apply it for treatment and prevention of disease. In life sciences, much of the research is dedicated toward understanding living systems – not just for the sake of knowledge but also to harness and control them. The promise hidden in big biomedical data processing is its potential to accelerate those efforts with predictive analytics informed by empirical results.

The mutability and adaptability inherent in all living things means that medical success requires a deep understanding of genomics: knowing how the flu virus evolves helps devise vaccines to...


Life Science Domain The Cancer Genome Atlas (TCGA) Biomedical Experts Open PHACTS Knowledge Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in to check access.


  1. Almeida JS, Dress A, Kühne T, Parida L (2014) ICT for Bridging Biology and Medicine (Dagstuhl Perspectives Workshop 13342). Dagstuhl Manifestos 3:31–50.  https://doi.org/10.4230/DagMan.3.1.31CrossRefGoogle Scholar
  2. Ashburner M, Ball CA, Blake JA et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25:25–29.  https://doi.org/10.1038/75556CrossRefGoogle Scholar
  3. Belleau F, Nolin M-A, Tourigny N et al (2008) Bio2RDF: towards a mashup to build bioinformatics knowledge systems. J Biomed Inform 41:706–716.  https://doi.org/10.1016/J.JBI.2008.03.004CrossRefGoogle Scholar
  4. Black DL (2003) Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem 72:291–336.  https://doi.org/10.1146/annurev.biochem.72.121801.161720CrossRefGoogle Scholar
  5. Chen B, Ding Y, Wang H, et al (2010) Chem2Bio2RDF: A Linked Open Data Portal for Systems Chemical Biology. In: Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01. pp 232–239Google Scholar
  6. Church K (2017) Word2Vec. Nat Lang Eng 23:155–162.  https://doi.org/10.1017/S1351324916000334CrossRefGoogle Scholar
  7. Deus HF, Veiga DF, Freire PR et al (2010) Exposing The Cancer Genome Atlas as a SPARQL endpoint. J Biomed Inform 43:998–1008.  https://doi.org/10.1016/j.jbi.2010.09.004CrossRefGoogle Scholar
  8. Drolet BC, Lorenzi NM (2011) Translational research: understanding the continuum from bench to bedside. Transl Res 157:1–5.  https://doi.org/10.1016/j.trsl.2010.10.002CrossRefGoogle Scholar
  9. Fujita PA, Rhead B, Zweig AS et al (2010) The UCSC Genome Browser database: update 2011. Nucleic Acids Res 39:1–7.  https://doi.org/10.1093/nar/gkq963CrossRefGoogle Scholar
  10. Garcia A, Lopez F, Garcia L, et al (2017) Biotea, semantics for PubMed Central.  https://doi.org/10.7287/peerj.preprints.3469v1
  11. Haas LM, Schwarz PM, Kodali P et al (2001) DiscoveryLink: a system for integrated access to life sciences data sources. IBM Syst J 40:489–511.  https://doi.org/10.1147/sj.402.0489CrossRefGoogle Scholar
  12. Harrow I, Jiménez-Ruiz E, Splendiani A et al (2017) Matching disease and phenotype ontologies in the ontology alignment evaluation initiative. J Biomed Semant 8:55.  https://doi.org/10.1186/s13326-017-0162-9CrossRefGoogle Scholar
  13. Hasnain A, Fox R, Decker S, Deus H (2012) Cataloguing and Linking Life Sciences LOD Cloud. In: 1st International Workshop on Ontology Engineering in a Data-driven World OEDW 2012. pp 1–11Google Scholar
  14. Hasnain A, Kamdar MR, Hasapis P, et al (2014) Linked Biomedical Dataspace: Lessons Learned Integrating Data for Drug Discovery BT. In: Mika P, Tudorache T, Bernstein A, et al. (eds) The Semantic Web – ISWC 2014. Springer International Publishing, Cham, pp 114–130Google Scholar
  15. Hu J-B, Dong M-J, Zhang J (2016) A holistic in silico approach to develop novel inhibitors targeting ErbB1 and ErbB2 kinases. Trop J Pharm Res 15:231.  https://doi.org/10.4314/tjpr.v15i2.3CrossRefGoogle Scholar
  16. Jentzsch A, Zhao J, Hassanzadeh O, et al (2009) Linking open drug data. In: Proc I-SEMANTICS 2009, GrazGoogle Scholar
  17. Jiang J, Li X, Zhao C et al (2017) Learning and inference in knowledge-based probabilistic model for medical diagnosis. Knowl-Based Syst 138:58–68.  https://doi.org/10.1016/J.KNOSYS.2017.09.030CrossRefGoogle Scholar
  18. Jupp S, Malone J, Bolleman J et al (2014) The EBI RDF platform: linked open data for the life sciences. Bioinformatics 30:1338–1339.  https://doi.org/10.1093/bioinformatics/btt765CrossRefGoogle Scholar
  19. Kotsampasakou E, Montanari F, Ecker GF (2017) Predicting drug-induced liver injury: the importance of data curation. Toxicology 389:139–145.  https://doi.org/10.1016/J.TOX.2017.06.003CrossRefGoogle Scholar
  20. Koutkias VG, Lillo-Le Louët A, Jaulent M-C (2017) Exploiting heterogeneous publicly available data sources for drug safety surveillance: computational framework and case studies. Expert Opin Drug Saf 16:113–124.  https://doi.org/10.1080/14740338.2017.1257604CrossRefGoogle Scholar
  21. Lamurias A, Ferreira JD, Clarke LA, Couto FM (2017) Generating a tolerogenic cell therapy knowledge graph from literature. Front Immunol 8:1–23.  https://doi.org/10.3389/fimmu.2017.01656CrossRefGoogle Scholar
  22. Noy NF, Shah NH, Whetzel PL et al (2009) BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res 37:W170–W173.  https://doi.org/10.1093/nar/gkp440CrossRefGoogle Scholar
  23. Radich JP, Dai H, Mao M et al (2006) Gene expression changes associated with progression and response in chronic myeloid leukemia. Proc Natl Acad Sci U S A 103:2794–2799.  https://doi.org/10.1073/pnas.0510423103CrossRefGoogle Scholar
  24. Robbins DE, Gruneberg A, Deus HF et al (2013) A self-updating road map of The Cancer Genome Atlas. Bioinformatics 29:1333–1340.  https://doi.org/10.1093/bioinformatics/btt141CrossRefGoogle Scholar
  25. Ruttenberg A, Clark T, Bug W et al (2007) Advancing translational research with the Semantic Web. BMC Bioinf 8(Suppl 3):S2.  https://doi.org/10.1186/1471-2105-8-S3-S2CrossRefGoogle Scholar
  26. Saleem M, Padmanabhuni SS, A-CN N et al (2014) TopFed: TCGA tailored federated query processing and linking to LOD. J Biomed Semant 5:47.  https://doi.org/10.1186/2041-1480-5-47CrossRefGoogle Scholar
  27. Schriml LM, Arze C, Nadendla S et al (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res 40:D940–D946.  https://doi.org/10.1093/nar/gkr972CrossRefGoogle Scholar
  28. Sioutos N, de Coronado S, Haber MW et al (2007) NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40:30–43.  https://doi.org/10.1016/j.jbi.2006.02.013CrossRefGoogle Scholar
  29. Stark C, Breitkreutz B-J, Reguly T et al (2006) BioGRID: a general repository for interaction datasets. Nucleic Acids Res 34:D535–D539.  https://doi.org/10.1093/nar/gkj109CrossRefGoogle Scholar
  30. Vieira A (2016) Knowledge Representation in Graphs using Convolutional Neural Networks. Comput Res Repos abs/1612.02255Google Scholar
  31. Wang M (2017) Predicting Rich Drug-Drug Interactions via Biomedical Knowledge Graphs and Text Jointly Embedding. Compuring Resour Repos abs/1712.08875Google Scholar
  32. Washington NL, Haendel MA, Mungall CJ et al (2009) Linking human diseases to animal models using ontology-based phenotype annotation. PLoS Biol 7:e1000247.  https://doi.org/10.1371/journal.pbio.1000247CrossRefGoogle Scholar
  33. Weber GM, Mandl KD, Kohane IS (2014) Finding the missing link for big biomedical data. JAMA 311:2479–2480.  https://doi.org/10.1001/jama.2014.4228CrossRefGoogle Scholar
  34. Wild DJ, Ding Y, Sheth AP, et al (2011) Systems chemical biology and the Semantic Web: what they mean for the future of drug discovery research. Drug Discov Today.  https://doi.org/10.1016/j.drudis.2011.12.019
  35. Xu N, Li Y, Zhou X et al (2015) CDKN2 gene deletion as poor prognosis predictor involved in the progression of adult B-lineage acute lymphoblastic leukemia patients. J Cancer 6:1114–1120.  https://doi.org/10.7150/jca.11959CrossRefGoogle Scholar
  36. Yoshida Y, Makita Y, Heida N et al (2009) PosMed (Positional Medline): prioritizing genes with an artificial neural network comprising medical documents to accelerate positional cloning. Nucleic Acids Res 37:W147–W152.  https://doi.org/10.1093/nar/gkp384CrossRefGoogle Scholar
  37. Zeginis D, Hasnain A, Loutas N et al (2014) A collaborative methodology for developing a semantic model for interlinking cancer chemoprevention linked-data sources. Semant Web 5(2):127–142.  https://doi.org/10.3233/SW-130112CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Elsevier LabsCambridgeUSA

Section editors and affiliations

  • Philippe Cudré-Mauroux
    • 1
  • Olaf Hartig
    • 2
  1. 1.eXascale InfolabUniversity of FribourgFribourgSwitzerland
  2. 2.Linköping UniversityLinköpingSweden