Abstract
The volume, complexity and heterogeneity of data originating from high throughput functional genomics technologies have created challenges and opportunities for Information technology (IT) departments. These increased demands have also led to increasing costs for IT infrastructure such as necessary computing power and storage devices, as well as further costs for manpower effort, required for maintenance. This chapter describes some of the challenges for computational analysis infrastructure, including bottlenecks and most pressing needs that have to be addressed to effectively support the development of systems biology and its application in medicine.
The authors Fotis Georgatos and Stéphane Ballereau contributed equally to this chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- 3D:
-
Three dimensional
- 4D:
-
Four dimensional typically 3D plus time dimension
- BOINC:
-
Berkeley Open Infrastructure for Network Computing
- CERN:
-
European Council for Nuclear Research
- CPU:
-
Central Processing Unit
- EBI:
-
European Bioinformatics Institute
- EGEE:
-
Enabling Grids for E-sciencE
- EGI:
-
European Grid infrastructure
- ELIXIR:
-
European Life Sciences Infrastructure for Biological Information
- EMBL:
-
European Molecular Biology Laboratory
- Flops:
-
FLoating-point Operations Per Second
- GPU:
-
Graphical Processing Unit
- HPC:
-
High Performance Computing
- HTC:
-
High Throughput Computing
- IaaS:
-
Infrastructure as a Service
- IT:
-
Information Technology
- I/O:
-
Input/Output—typically used in the context of software processing of data
- LHC:
-
Large Hadron Collider
- MPI:
-
Message Passing Interface
- Omics:
-
A collective term to refer to -omics keywords like metabolomics, genomics, proteomics etc.
- PaaS:
-
Platform as a Service
- PRACE:
-
Partnership for Advanced Computing in Europe
- ROI:
-
Return On Investment
- SaaS:
-
Software as a Service
- SBML:
-
Systems Biology Markup Language
- WLCG:
-
Worldwide LHC Computing Grid
References
Chen C, McGarvey PB, Huang H, Wu CH (2010) Protein bioinformatics infrastructure for the integration and analysis of multiple high-throughput ‘omics’ data. Adv Bioinform, 19p
Bousquet J et al (2011) MeDALL (mechanisms of the development of ALLergy): an integrated approach from phenotypes to systems medicine. Allergy 66:596–604
Bel EH et al (2011) Diagnosis and definition of severe refractory asthma: an international consensus statement from the innovative medicine initiative (IMI). Thorax 66:910–917
Rosenthal A et al (2010) Cloud computing: a new business paradigm for biomedical information sharing. J Biomed Inform 43:342–353
Ruusalepp R (2008) Infrastructure planning and data curation: acomparative study of international approaches to enabling the sharing of research data. At http://www.jisc.ac.uk/media/documents/programmes/preservation/national_data_sharing_report_final.pdf
Twiki—a web-based collaboration for EGEE project. At https://twiki.cern.ch/twiki/bin/view/EGEE/LifeSciences
Biomedical applications description. At http://proton.polytech.unice.fr/biomed/egee2-applications.html#medimg
HealthGrid Portal—A Human Grid Initiative. At http://healthgrid.org/
The BioinfoGRID Project. At http://www.bioinfogrid.eu/
IGI—Italian Grid Infrastructure. List of scientific application for VO biomed at http://www.italiangrid.it/appdb/listbyvo/6
Crosswell LC, Thornton JM (2012) ELIXIR: a distributed infrastructure for European biological data. Trends Biotechnol 30:241–242
eTRIKS European Transnational Information and Knowledge Management Services. At http://www.etriks.org/
Wu Y, Kumar S, Park S-J (2010) Measurement and performance issues of transport protocols over 10 Gbps high-speed optical networks. Comput Netw 54:475–488
Saltzer JH, Reed DP, Clark DD (1984) End-to-end arguments in system design. ACM Trans Comput Syst 2:277–288
Welcome to the Worldwide LHC Computing Grid. At http://wlcg.web.cern.ch/
Newhouse S. D2.3 EGI-InSPIRE Paper, European Grid Infrastructure. At http://go.egi.eu/pdnon
Sujansky W (2001) Heterogeneous database integration in biomedicine. J Biomed Inform 34:285–298
Alonso-Calvo R et al (2007) An agent- and ontology-based system for integrating public gene, protein, and disease databases. J Biomed Inform 40:17–29
Brazma A, Krestyaninova M, Sarkans U (2006) Standards for systems biology. Nat Rev Genet 7:593–605
Courtot M et al (2011) Controlled vocabularies and semantics in systems biology. Mol Syst Biol 7:543
Szalma S, Koka V, Khasanova T, Perakslis ED (2010) Effective knowledge management in translational medicine. J Transl Med 8:68
Stein LD (2008) Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet 9:678–688
Ghosh S, Matsuoka Y, Asai Y, Hsin K-Y, Kitano H (2011) Software for systems biology: from tools to integrated platforms. Nat Rev Genet 12:821–832
Wruck W, Peuker M, Regenbrecht CRA (2012) Data management strategies for multinational large-scale systems biology projects. Brief Bioinform. doi:10.1093/bib/bbs064
Blankenberg D et al (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapter 19, Unit 19.10.1–21
Chervitz SA et al (2011) Data standards for omics data: the basis of data sharing and reuse. Methods Mol Biol 719:31–69
Hucka M et al (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19:524–531
Foster I, Kohr DR, Krishnaiyer R, Choudhary A (1997) A library-based approach to task parallelism in a data-parallel language. J Parallel Distrib Comput 45:148–158
VitalIT tools—High Performance Computing Center. At http://www.vital-it.ch/software/tools.php
Hull D et al (2006) Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34:729–732
Hillman-Jackson J et al (2012) Using galaxy to perform large-scale interactive data analyses. Curr Protoc Bioinform Chapter 10, Unit10.5
Abouelhoda M, Issa SA, Ghanem M (2012) Tavaxy: integrating taverna and galaxy workflows with cloud computing support. BMC Bioinform 13:77
Reich M et al (2006) GenePattern 2.0. Nat Genet 38:500–501
Sage Synapse: Contribute to the Cure. At https://synapse.sagebase.org
Kushida CA et al (2012) Strategies for De-identification and anonymization of electronic health record data for use in multicenter research studies. Med Care 50:S82–S101
Lyon L (2007) Dealing with data: roles, rights, responsibilities and relationships. Consultancy report, UKOLN, University of Bath, UK
Biosapiens network—A European Virtual Institute for Genome Annotation. At http://www.biosapiens.info
Training at EMBL-EBI. At http://www.ebi.ac.uk/training/
Laxminarayan S, Michelson L (1988) Perspectives in biomedical supercomputing. IEEE Eng Med Biol Mag 7:12–15
Böhm K (1997) Supercomputing in cancer research. Stud Health Technol Inform 43 Pt A:104–108
Maizel JR (1988) Supercomputing in molecular biology: applications to sequence analysis. IEEE Eng Med Biol Mag 7:27–30
Orphanoudakis SC (1988) Supercomputing in medical imaging. IEEE Eng Med Biol Mag 7:16–20
Kesselman C, Foster I (1998) The grid: blueprint for a new computing infrastructure. Morgan Kaufmann Publishers, Burlington. At http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/1558604758
Szolovits P (2007) What is a grid? J Am Med Inform Assoc 14:386
Breton V, Medina R, Montagnat J (2003) DataGrid, prototype of a biomedical grid. Methods Inf Med 42:143–147
European Grid Infrastructure. For further information, kindly refer to the EGI-InSPIRE paper. EGI at http://go.egi.eu/pdnon
The Open Science Grid Homepage. At http://www.opensciencegrid.org
The NorduGrid Collaboration, Web site. http://www.nordugrid.org
Armbrust M et al (2009) Above the clouds: a berkeley view of cloud computing. EECS Department, University of California, Berkeley. At http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html
Anderson DP (2004) Boinc: a system for public-resource computing and storage. In: 5th IEEE/ACM international workshop on grid computing 4–10
Mesirov J (2010) Computer science: accessible reproducible research. Science 327(5964):415–416. doi:10.1126/science.1179653. 22 Jan 2010
Tan TW et al (2010) Advancing standards for bioinformatics activities: persistence, reproducibility, disambiguation and minimum information about a bioinformatics investigation (MIABi). BMC Genomics 11(4):S27. doi:10.1186/1471-2164-11-S4-S27. http://www.ncbi.nlm.nih.gov/pubmed/21143811. 2 Dec 2010
Kenneth H et al (2012) EasyBuild: building software with ease, PyHPC 2012, Supercomputing 2012, Salt Lake City
Acknowledgements
This work was supported by the CNRS, and in part by the EU grants to CA in the context of the MeDALL consortium (Mechanisms of the Development of Allergy, Grant Agreement FP7 N°264357), the U-BIOPRED consortium (Unbiased Biomarkers for the PREDiction of respiratory disease outcomes, Grant Agreement IMI 115010), and the eTRIKS consortium to CA, DB, MG, YG, RS, RB (European Translational research Information & Knowledge management Services, Grant Agreement n°115446). The formation of the European Institute for Systems biology & Medicine hosted at Claude Bernard University is supported by the Lyonbiopole competitive cluster and its academic, industrial and local authority partners, including Grand Lyon, Région Rhône-Alpes, Direction de la Recherche et de la Technologie, and the Finovi Foundation (CA). We would like to acknowledge the support of the Luxembourg Centre for Systems Biomedicine and the University of Luxembourg (LH and RB), the NIH General Medical Sciences Center for Systems Biology GM076547 (LH) and a Department of Defense contract on Liver Toxicity W911SR-09-C-0062 (LH).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Georgatos, F. et al. (2013). Computational Infrastructures for Data and Knowledge Management in Systems Biology. In: Prokop, A., Csukás, B. (eds) Systems Biology. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6803-1_13
Download citation
DOI: https://doi.org/10.1007/978-94-007-6803-1_13
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6802-4
Online ISBN: 978-94-007-6803-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)