Skip to main content

Semantic Data Integration of Big Biomedical Data for Supporting Personalised Medicine

  • Chapter
  • First Online:
Current Trends in Semantic Web Technologies: Theory and Practice

Part of the book series: Studies in Computational Intelligence ((SCI,volume 815))

Abstract

Big biomedical data has grown exponentially during the last decades and a similar growth rate is expected in the next years. Likewise, semantic web technologies have also advanced during the last years, and a great variety of tools, e.g., ontologies and query languages, have been developed by different scientific communities and practitioners. Although a rich variety of tools and big data collections are available, many challenges need to be addressed in order to discover insights from which decisions can be taken. For instance, different interoperability conflicts can exist among data collections, data may be incomplete, and entities may be dispersed across different datasets. These issues hinder knowledge exploration and discovery, being thus required data integration in order to unveil meaningful outcomes. In this chapter, we address these challenges and devise a knowledge-driven framework that relies on semantic web technologies to enable knowledge exploration and discovery. The framework receives big data sources and integrates them into a knowledge graph. Semantic data integration methods are utilized for identifying equivalent entities, i.e., entities that correspond to the same real-world elements. Fusion policies enable the merging of equivalent entities inside the knowledge graph, as well as with entities in other knowledge graphs, e.g., DBpedia and Bio2RFD. Knowledge discovery allows for the exploration of knowledge graphs in order to uncover novel patterns and relations. As proof of concept, we report on the results of applying the knowledge-driven framework in the EU funded project iASiS (http://project-iasis.eu/) in order to transform big data into actionable knowledge, paving thus the way for personalised medicine.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://trends.google.com/trends/?geo=US.

  2. 2.

    Web services that enable the execution of SPARQL queries following the SPARQL protocol.

  3. 3.

    https://bioportal.bioontology.org/.

  4. 4.

    https://www.snomed.org/snomed-ct.

  5. 5.

    https://www.nlm.nih.gov/research/umls/.

  6. 6.

    https://hpo.jax.org/app/.

  7. 7.

    http://si.washington.edu/projects/fma.

  8. 8.

    https://meshb.nlm.nih.gov/search.

  9. 9.

    https://www.nlm.nih.gov/research/umls/rxnorm/.

  10. 10.

    https://ncit.nci.nih.gov/ncitbrowser/.

  11. 11.

    https://code.google.com/archive/p/semanticscience/wikis/SIO.wiki.

  12. 12.

    https://www.ncbi.nlm.nih.gov/pubmed/.

  13. 13.

    https://cancer.sanger.ac.uk/cosmic.

  14. 14.

    https://www.drugbank.ca/.

  15. 15.

    http://stitch.embl.de/.

  16. 16.

    https://vocol.iais.fraunhofer.de/iasis/.

  17. 17.

    https://www.kegg.jp/kegg/rest/keggapi.html.

  18. 18.

    http://www.stitch1.embl.de/.

  19. 19.

    http://www.sideeffects.embl.de/.

  20. 20.

    The ten knowledge graphs have 133,873,127 RDF triples.

References

  1. Schmidlen, T.J., Wawak, L., Kasper, R., García-España, J.F., Christman, M.F., Gordon, E.S.: Personalized genomic results: analysis of informational needs. J. Genetic Counseling 578–587 (2014)

    Article  Google Scholar 

  2. Shah, N.H., LePendu, P., Bauer-Mehren, A., Ghebremariam, Y.T., Iyer, S.V., Marcus, J., Nead, K.T., Cooke, J.P., Leeper, N.J.: Proton pump inhibitor usage and the risk of myocardial infarction in the general population. PLoS One (2015)

    Google Scholar 

  3. Iturria-Medina, Y., Sotero, R., Toussaint, P.: Early role of vascular dysregulation on late-onset Alzheimer’s disease based on multifactorial data-driven analysis. Nature Commun. (2016)

    Google Scholar 

  4. Acosta, M., Vidal, M.E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: Proceedings of the 10th International Conference on the Semantic Web ISWC (2011)

    Chapter  Google Scholar 

  5. Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: optimization techniques for federated query processing on linked data. In: Proceedings of the 10th International Conference on the Semantic Web ISWC (2011)

    Chapter  Google Scholar 

  6. Collarana, D., Galkin, M., Traverso-Ribón, I., Vidal, M.E., Lange, C., Auer, S.: MINTE: semantically integrating RDF graphs. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics WIMS (2017)

    Google Scholar 

  7. Collarana, D., Lange, C., Auer, S.: FuhSen: a platform for federated, RDF-based hybrid search. In: Proceedings of the 25th International Conference on World Wide Web (2016)

    Google Scholar 

  8. Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: Proceedings of the 9th Extended Semantic Web Conference ESWC (2012)

    Google Scholar 

  9. Collarana, D., Galkin, M., Lange, C., Scerri, S., Auer, S., Vidal, M.E.: Synthesizing Knowledge Graphs from Web Sources with the MINTE + Framework (2018)

    Google Scholar 

  10. Gawriljuk, G., Harth, A., Knoblock, C.A., Szekely, P.: A scalable approach to incrementally building knowledge graphs. In International Conference on Theory and Practice of Digital Libraries TPDL, pp. 188–199 (2016)

    Chapter  Google Scholar 

  11. Kejriwal, M., Szekely, P. and Knoblock, C.: Investigative knowledge discovery for combating illicit activities. IEEE Intell. Syst. 53–63 (2018)

    Article  Google Scholar 

  12. Fundulaki, I., Auer, S.: Linked Open Data—Introduction to the Special Theme. ERCIM News (2014)

    Google Scholar 

  13. Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., Iyer, R., Schatz, M.C., Sinha, S., Robinson, G.E.: Big Data: astronomical or genomical. PLoS One (2015)

    Google Scholar 

  14. Chen, M., Mao, S., Liu, Y.: Big Data: a survey. MONET 171–209 (2014)

    Article  Google Scholar 

  15. Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E., Van de Walle, R.: RML: a generic language for integrated rdf mappings of heterogeneous data. In: Proceedings of the Workshop on Linked Data on the Web co-located with the 23rd International World Wide Web Conference (WWW) (2014)

    Google Scholar 

  16. Wiederhold, G.: Mediators in the architecture of future information systems. IEEE Comput. 38–49 (1992)

    Article  Google Scholar 

  17. Zadorozhny, V., Raschid, L., Vidal, M.E., Urhan, T., Bright, L.: Efficient evaluation of queries in a mediator for WebSources. In: Proceedings of the 2002 {ACM} {SIGMOD} International Conference on Management of Data (2002)

    Google Scholar 

  18. Cao, L.: Data science: challenges and directions. Commun. ACM, 59–68 (2017)

    Article  MathSciNet  Google Scholar 

  19. Sivarajah, U., Kamal, M.M., Irani, Z., Weerakkody, V.: Critical analysis of Big Data challenges and analytical methods. J. Business Res. 263–286 (2017)

    Article  Google Scholar 

  20. Jagadish, H.V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Shahabi, C.: Big data and its technical challenges. Commun. ACM 86–94 (2014)

    Article  Google Scholar 

  21. Knoblock, C.A., Szekely, P., Ambite, J.L., Goel, A., Gupta, S., Lerman, K., Muslea, M., Taheriyan, M., Mallick, P.: Semi-automatically mapping structured sources into the semantic web. In: de Extended Semantic Web Conference (2012)

    Google Scholar 

  22. Collarana, D., Galkin, M., Traverso-Ribón, I., Vidal, M.E., Lange, C., Auer, S.: MINTE: semantically integrating RDF graphs. In: Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics (2017)

    Google Scholar 

  23. Isele, R., Bizer, C.: Active learning of expressive linkage rules using genetic programming. Web Semantics: Science, Services and Agents on the World Wide Web, pp. 2–15 (2013)

    Article  Google Scholar 

  24. Galkin, M., Collarana, D., Traverso-Ribón, I., Vidal, M.E., Auer, S.: SJoin: a semantic join operator to integrate heterogeneous RDF graphs. In: de International Conference on Database and Expert Systems Applications (2017)

    Google Scholar 

  25. Schultz, A., Matteini, A., Isele, R., Mendes, P.N., Bizer, C., Becker, C.: LDIF-a framework for large-scale linked data integration. In: 21st International World Wide Web Conference (WWW 2012), Developers Track, Lyon, France (2012)

    Google Scholar 

  26. Mendes, P.N., Mühleisen, H., Bizer, C.: Sieve: linked data quality assessment and fusion. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops (2012)

    Google Scholar 

  27. Ngomo, A.C.N., Auer, S.: Limes-a time-efficient approach for large-scale link discovery on the web of data. de IJCAI (2011)

    Google Scholar 

  28. Ristoski, P., Bizer, C., Paulheim, H.: Mining the web of linked data with rapidminer. Web Semantics: Science, Services and Agents on the World Wide Web, pp. 142–151 (2015)

    Article  Google Scholar 

  29. Hu, W., Qiu, H., Huang, J., Dumontier, M.: BioSearch: a semantic search engine for Bio2RDF. Database (2017)

    Google Scholar 

  30. Hu, W., Qiu, H., Dumontier, M.: Link analysis of life science linked data. In: de International Semantic Web Conference (2015)

    Google Scholar 

  31. Callahan, A., Cruz-Toledo, J., Ansell, P., Dumontier, M.: Bio2RDF release 2: improved coverage, interoperability and provenance of life science linked data. In; de Extended Semantic Web Conference (2013)

    Google Scholar 

  32. Sahu, S., Mhedhbi, A., Salihoglu, S., Lin, J., Özsu, M.T.: The ubiquity of large graphs and surprising challenges of graph processing. In: Proceedings of the VLDB Endowment, pp. 420–431 (2017)

    Google Scholar 

  33. Hartig, O., Vidal, M.E., Freytag, J.C.: Federated Semantic Data Management (Dagstuhl Seminar 17262), Dagstuhl Reports, pp. 135–167 (2017)

    Google Scholar 

  34. Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: Fedx: optimization techniques for federated query processing on linked data. de International Semantic Web Conference (2011)

    Google Scholar 

  35. Acosta, M., Vidal, M.E., Lampo, T., Castillo, J., Ruckhaus, E.: ANAPSID: an adaptive query processing engine for SPARQL endpoints. In: International Semantic Web Conference (2011)

    Google Scholar 

  36. Endris, K.M., Galkin, M., Lytra, I., Mami, M.N., Vidal, M.E., Auer, S.: MULDER: querying the linked data web by bridging RDF molecule templates. In: International Conference on Database and Expert Systems Applications (2017)

    Google Scholar 

  37. Colombo, P., Ferrari, E.: Privacy aware access control for Big Data: a research roadmap. Big Data Res. 145–154 (2015)

    Article  Google Scholar 

  38. Kirrane, S., Villata, S., d’Aquin, M.: Privacy, security and policies: a review of problems and solutions with semantic web technologies. Semantic Web 1–10 (2018)

    Google Scholar 

  39. Kamateri, E., Kalampokis, E., Tambouris, E., Tarabanis, K.: The linked medical data access control framework. J. Biomed. Informat. 213–225 (2014)

    Article  Google Scholar 

  40. Grando, A., Schwab, R.: Building and evaluating an ontology-based tool for reasoning about consent permission. In: de AMIA Annual Symposium Proceedings (2013)

    Google Scholar 

  41. Zeng, Q., Zhao, M., Liu, P., Yadav, P., Calo, S., Lobo, J.: Enforcement of autonomous authorizations in collaborative distributed query evaluation. IEEE Trans. Knowl. Data Eng. (2015)

    Google Scholar 

  42. Endris, K.M., Almhithawi, Z., Lytra, I., Vidal, M.E., Auer, S.: BOUNCER: privacy-aware query processing over federations of RDF datasets. In: 29th International Conference on Database and Expert Systems Applications (2018)

    Google Scholar 

  43. Ribón, I.T., Vidal, M.-E., Kämpgen, B., Sure-Vetter, Y.: GADES: a graph-based semantic similarity measure. In: Proceedings of the 12th International Conference on Semantic Systems, Leipzig, Germany (2016)

    Google Scholar 

  44. Menasalvas, E., Rodríguez, A., Costumero, R., Ambit, H., Gonzalo, C.: “Clinical Narrative Analytics Challenges”, in Rough Sets—International Joint Conference. IJCRS, Santiago de Chile (2016)

    Google Scholar 

  45. Toro, C., Gonzalo-Martín, C., García-Pedrero, A., Menasalvas Ruiz, E.: Supervoxels-based histon as a new Alzheimer’s disease imaging biomarker. Sensors 1752 (2018)

    Google Scholar 

  46. Livi, C.M., Klus, P., Delli Ponti, R., Tartaglia, G.G.: catRAPID signature: identification of ribonucleoproteins and RNA-binding regions. Bioinformatics 773–775 (2016)

    Article  Google Scholar 

  47. La Cruz, A., Baranya, A., Vidal, M.-E.: Medical image rendering and description driven by semantic annotations. In: Resource Discovery—5th International Workshop, {RED} 2012, Co-located with the 9th Extended Semantic Web Conference, {ESWC} 2012, Heraklion, Greece, May 27, 2012, Heraklion (2012)

    Google Scholar 

  48. Pérez, W., Tello, A., Saquicela, V., Vidal, M.E., La Cruz, A.: An automatic method for the enrichment of {DICOM} metadata using biomedical. In: Proceedings of the 37th Annual International Conference of the {IEEE} Engineering in Medicine and Biology Society, {EMBC} 2015, Milan, Italy, August 25–29, 2015, Milan (2015)

    Google Scholar 

  49. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: 9th International Conference on Semantic Systems of I-SEMANTICS 2013, ISEM ‘13, Graz, Austria, September 4–6, 2013, Graz (2013)

    Google Scholar 

  50. Ferragina, P., Scaiella, U.: TAGME: on-the-fly annotation of short text fragments (by wikipedia). In: Proceedings of the 19th {ACM} Conference on Information and Knowledge Management, {CIKM} 2010, Toronto, Ontario, Canada, October 26–30, 2010, Toronto (2010)

    Google Scholar 

  51. Hasnain, A., Mehmood, Q., Sana e Zainab, S., Saleem, M., Warren, C., Zehra, D., Decker, S., Rebholz-Schuhmann, D.: BioFed: federated query processing over life sciences linked open data. J. Biomed. Semant. 13 (2017)

    Google Scholar 

  52. Palma, G., Vidal, M.-E., Raschid, L.: Drug-target interaction prediction using semantic similarity and edge partitioning. In: 13th International Semantic Web Conference on the Semantic Web–{ISWC} 2014, Riva del Garda, Italy, October 19-23, 2014. Proceedings, Part I, Riva del Garda (2014)

    Google Scholar 

  53. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Scientif. Comput. (1998)

    Google Scholar 

  54. Ribon, I.T., Vidal, M.E.: GARUM: a semantic similarity measure based on machine learning and entity characteristics. In: 29th International Conference on Database and Expert Systems Applications, DEXA (2018)

    Google Scholar 

  55. Morales, C., Collarana, D., Vidal, M.E., Auer, S.: MateTee: A semantic similarity metric based on translation embeddings for knowledge graphs. In: 17th International Conference on Web Engineering, ICWE (2017)

    Google Scholar 

  56. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: 27th Annual Conference on Neural Information Processing Systems on Advances in Neural Information Processing Systems 26 (2013)

    Google Scholar 

  57. Nickel, M., Rosasco, L., Poggio, T.A.: Holographic embeddings of knowledge graphs. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  58. Nickel, M., Tresp, V.: Tensor factorization for multi-relational learning. In: European Conference of Machine Learning and Knowledge Discovery in Databases, ECML PKDD (2013)

    Google Scholar 

Download references

Acknowledgements

This work has been partially funded in by the European Union’s Horizon 2020 research and innovation programme project iASiS under grant agreement No. 727658. Kemele Endris has been sponsored by the EU Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement No. 642795 (WDAqua). Farah Karin has been supported by a scholarship of German Academic Exchange Service (DAAD).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maria-Esther Vidal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Vidal, ME., Endris, K.M., Jozashoori, S., Karim, F., Palma, G. (2019). Semantic Data Integration of Big Biomedical Data for Supporting Personalised Medicine. In: Alor-Hernández, G., Sánchez-Cervantes, J., Rodríguez-González, A., Valencia-García, R. (eds) Current Trends in Semantic Web Technologies: Theory and Practice. Studies in Computational Intelligence, vol 815. Springer, Cham. https://doi.org/10.1007/978-3-030-06149-4_2

Download citation

Publish with us

Policies and ethics