Skip to main content

Computational Infrastructures for Data and Knowledge Management in Systems Biology

  • Chapter
Systems Biology

Abstract

The volume, complexity and heterogeneity of data originating from high throughput functional genomics technologies have created challenges and opportunities for Information technology (IT) departments. These increased demands have also led to increasing costs for IT infrastructure such as necessary computing power and storage devices, as well as further costs for manpower effort, required for maintenance. This chapter describes some of the challenges for computational analysis infrastructure, including bottlenecks and most pressing needs that have to be addressed to effectively support the development of systems biology and its application in medicine.

The authors Fotis Georgatos and Stéphane Ballereau contributed equally to this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

3D:

Three dimensional

4D:

Four dimensional typically 3D plus time dimension

BOINC:

Berkeley Open Infrastructure for Network Computing

CERN:

European Council for Nuclear Research

CPU:

Central Processing Unit

EBI:

European Bioinformatics Institute

EGEE:

Enabling Grids for E-sciencE

EGI:

European Grid infrastructure

ELIXIR:

European Life Sciences Infrastructure for Biological Information

EMBL:

European Molecular Biology Laboratory

Flops:

FLoating-point Operations Per Second

GPU:

Graphical Processing Unit

HPC:

High Performance Computing

HTC:

High Throughput Computing

IaaS:

Infrastructure as a Service

IT:

Information Technology

I/O:

Input/Output—typically used in the context of software processing of data

LHC:

Large Hadron Collider

MPI:

Message Passing Interface

Omics:

A collective term to refer to -omics keywords like metabolomics, genomics, proteomics etc.

PaaS:

Platform as a Service

PRACE:

Partnership for Advanced Computing in Europe

ROI:

Return On Investment

SaaS:

Software as a Service

SBML:

Systems Biology Markup Language

WLCG:

Worldwide LHC Computing Grid

References

  1. Chen C, McGarvey PB, Huang H, Wu CH (2010) Protein bioinformatics infrastructure for the integration and analysis of multiple high-throughput ‘omics’ data. Adv Bioinform, 19p

    Google Scholar 

  2. Bousquet J et al (2011) MeDALL (mechanisms of the development of ALLergy): an integrated approach from phenotypes to systems medicine. Allergy 66:596–604

    Article  CAS  PubMed  Google Scholar 

  3. Bel EH et al (2011) Diagnosis and definition of severe refractory asthma: an international consensus statement from the innovative medicine initiative (IMI). Thorax 66:910–917

    Article  PubMed  Google Scholar 

  4. Rosenthal A et al (2010) Cloud computing: a new business paradigm for biomedical information sharing. J Biomed Inform 43:342–353

    Article  PubMed  Google Scholar 

  5. Ruusalepp R (2008) Infrastructure planning and data curation: acomparative study of international approaches to enabling the sharing of research data. At http://www.jisc.ac.uk/media/documents/programmes/preservation/national_data_sharing_report_final.pdf

  6. Twiki—a web-based collaboration for EGEE project. At https://twiki.cern.ch/twiki/bin/view/EGEE/LifeSciences

  7. Biomedical applications description. At http://proton.polytech.unice.fr/biomed/egee2-applications.html#medimg

  8. HealthGrid Portal—A Human Grid Initiative. At http://healthgrid.org/

  9. The BioinfoGRID Project. At http://www.bioinfogrid.eu/

  10. IGI—Italian Grid Infrastructure. List of scientific application for VO biomed at http://www.italiangrid.it/appdb/listbyvo/6

  11. Crosswell LC, Thornton JM (2012) ELIXIR: a distributed infrastructure for European biological data. Trends Biotechnol 30:241–242

    Article  CAS  PubMed  Google Scholar 

  12. eTRIKS European Transnational Information and Knowledge Management Services. At http://www.etriks.org/

  13. Wu Y, Kumar S, Park S-J (2010) Measurement and performance issues of transport protocols over 10 Gbps high-speed optical networks. Comput Netw 54:475–488

    Article  Google Scholar 

  14. Saltzer JH, Reed DP, Clark DD (1984) End-to-end arguments in system design. ACM Trans Comput Syst 2:277–288

    Article  Google Scholar 

  15. Welcome to the Worldwide LHC Computing Grid. At http://wlcg.web.cern.ch/

  16. Newhouse S. D2.3 EGI-InSPIRE Paper, European Grid Infrastructure. At http://go.egi.eu/pdnon

  17. Sujansky W (2001) Heterogeneous database integration in biomedicine. J Biomed Inform 34:285–298

    Article  CAS  PubMed  Google Scholar 

  18. Alonso-Calvo R et al (2007) An agent- and ontology-based system for integrating public gene, protein, and disease databases. J Biomed Inform 40:17–29

    Article  CAS  PubMed  Google Scholar 

  19. Brazma A, Krestyaninova M, Sarkans U (2006) Standards for systems biology. Nat Rev Genet 7:593–605

    Article  CAS  PubMed  Google Scholar 

  20. Courtot M et al (2011) Controlled vocabularies and semantics in systems biology. Mol Syst Biol 7:543

    Article  PubMed Central  PubMed  Google Scholar 

  21. Szalma S, Koka V, Khasanova T, Perakslis ED (2010) Effective knowledge management in translational medicine. J Transl Med 8:68

    Article  PubMed Central  PubMed  Google Scholar 

  22. Stein LD (2008) Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet 9:678–688

    Article  CAS  PubMed  Google Scholar 

  23. Ghosh S, Matsuoka Y, Asai Y, Hsin K-Y, Kitano H (2011) Software for systems biology: from tools to integrated platforms. Nat Rev Genet 12:821–832

    CAS  PubMed  Google Scholar 

  24. Wruck W, Peuker M, Regenbrecht CRA (2012) Data management strategies for multinational large-scale systems biology projects. Brief Bioinform. doi:10.1093/bib/bbs064

  25. Blankenberg D et al (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapter 19, Unit 19.10.1–21

    Google Scholar 

  26. Chervitz SA et al (2011) Data standards for omics data: the basis of data sharing and reuse. Methods Mol Biol 719:31–69

    Article  CAS  PubMed  Google Scholar 

  27. Hucka M et al (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19:524–531

    Article  CAS  PubMed  Google Scholar 

  28. Foster I, Kohr DR, Krishnaiyer R, Choudhary A (1997) A library-based approach to task parallelism in a data-parallel language. J Parallel Distrib Comput 45:148–158

    Article  Google Scholar 

  29. VitalIT tools—High Performance Computing Center. At http://www.vital-it.ch/software/tools.php

  30. Hull D et al (2006) Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34:729–732

    Article  Google Scholar 

  31. Hillman-Jackson J et al (2012) Using galaxy to perform large-scale interactive data analyses. Curr Protoc Bioinform Chapter 10, Unit10.5

    Google Scholar 

  32. Abouelhoda M, Issa SA, Ghanem M (2012) Tavaxy: integrating taverna and galaxy workflows with cloud computing support. BMC Bioinform 13:77

    Article  Google Scholar 

  33. Reich M et al (2006) GenePattern 2.0. Nat Genet 38:500–501

    Article  CAS  PubMed  Google Scholar 

  34. Sage Synapse: Contribute to the Cure. At https://synapse.sagebase.org

  35. Kushida CA et al (2012) Strategies for De-identification and anonymization of electronic health record data for use in multicenter research studies. Med Care 50:S82–S101

    Article  PubMed  Google Scholar 

  36. Lyon L (2007) Dealing with data: roles, rights, responsibilities and relationships. Consultancy report, UKOLN, University of Bath, UK

    Google Scholar 

  37. Biosapiens network—A European Virtual Institute for Genome Annotation. At http://www.biosapiens.info

  38. Training at EMBL-EBI. At http://www.ebi.ac.uk/training/

  39. Laxminarayan S, Michelson L (1988) Perspectives in biomedical supercomputing. IEEE Eng Med Biol Mag 7:12–15

    Article  CAS  PubMed  Google Scholar 

  40. Böhm K (1997) Supercomputing in cancer research. Stud Health Technol Inform 43 Pt A:104–108

    Google Scholar 

  41. Maizel JR (1988) Supercomputing in molecular biology: applications to sequence analysis. IEEE Eng Med Biol Mag 7:27–30

    Article  CAS  PubMed  Google Scholar 

  42. Orphanoudakis SC (1988) Supercomputing in medical imaging. IEEE Eng Med Biol Mag 7:16–20

    Article  CAS  PubMed  Google Scholar 

  43. Kesselman C, Foster I (1998) The grid: blueprint for a new computing infrastructure. Morgan Kaufmann Publishers, Burlington. At http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/1558604758

  44. Szolovits P (2007) What is a grid? J Am Med Inform Assoc 14:386

    Article  PubMed Central  PubMed  Google Scholar 

  45. Breton V, Medina R, Montagnat J (2003) DataGrid, prototype of a biomedical grid. Methods Inf Med 42:143–147

    CAS  PubMed  Google Scholar 

  46. European Grid Infrastructure. For further information, kindly refer to the EGI-InSPIRE paper. EGI at http://go.egi.eu/pdnon

  47. The Open Science Grid Homepage. At http://www.opensciencegrid.org

  48. The NorduGrid Collaboration, Web site. http://www.nordugrid.org

  49. Armbrust M et al (2009) Above the clouds: a berkeley view of cloud computing. EECS Department, University of California, Berkeley. At http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html

  50. Anderson DP (2004) Boinc: a system for public-resource computing and storage. In: 5th IEEE/ACM international workshop on grid computing 4–10

    Google Scholar 

  51. Mesirov J (2010) Computer science: accessible reproducible research. Science 327(5964):415–416. doi:10.1126/science.1179653. 22 Jan 2010

    Google Scholar 

  52. Tan TW et al (2010) Advancing standards for bioinformatics activities: persistence, reproducibility, disambiguation and minimum information about a bioinformatics investigation (MIABi). BMC Genomics 11(4):S27. doi:10.1186/1471-2164-11-S4-S27. http://www.ncbi.nlm.nih.gov/pubmed/21143811. 2 Dec 2010

  53. Kenneth H et al (2012) EasyBuild: building software with ease, PyHPC 2012, Supercomputing 2012, Salt Lake City

    Google Scholar 

Download references

Acknowledgements

This work was supported by the CNRS, and in part by the EU grants to CA in the context of the MeDALL consortium (Mechanisms of the Development of Allergy, Grant Agreement FP7 N°264357), the U-BIOPRED consortium (Unbiased Biomarkers for the PREDiction of respiratory disease outcomes, Grant Agreement IMI 115010), and the eTRIKS consortium to CA, DB, MG, YG, RS, RB (European Translational research Information & Knowledge management Services, Grant Agreement n°115446). The formation of the European Institute for Systems biology & Medicine hosted at Claude Bernard University is supported by the Lyonbiopole competitive cluster and its academic, industrial and local authority partners, including Grand Lyon, Région Rhône-Alpes, Direction de la Recherche et de la Technologie, and the Finovi Foundation (CA). We would like to acknowledge the support of the Luxembourg Centre for Systems Biomedicine and the University of Luxembourg (LH and RB), the NIH General Medical Sciences Center for Systems Biology GM076547 (LH) and a Department of Defense contract on Liver Toxicity W911SR-09-C-0062 (LH).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reinhard Schneider .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Georgatos, F. et al. (2013). Computational Infrastructures for Data and Knowledge Management in Systems Biology. In: Prokop, A., Csukás, B. (eds) Systems Biology. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6803-1_13

Download citation

Publish with us

Policies and ethics