Computational Infrastructures for Data and Knowledge Management in Systems Biology

Georgatos, Fotis; Ballereau, Stéphane; Pellet, Johann; Ghanem, Moustafa; Price, Nathan; Hood, Leroy; Guo, Yi-Ke; Boutigny, Dominique; Auffray, Charles; Balling, Rudi; Schneider, Reinhard

doi:10.1007/978-94-007-6803-1_13

Fotis Georgatos³,
Stéphane Ballereau⁴,
Johann Pellet⁴,
Moustafa Ghanem⁵,
Nathan Price⁶,
Leroy Hood⁶,
Yi-Ke Guo⁵,
Dominique Boutigny⁷,
Charles Auffray⁴,
Rudi Balling³ &
…
Reinhard Schneider³

3143 Accesses
2 Citations

Abstract

The volume, complexity and heterogeneity of data originating from high throughput functional genomics technologies have created challenges and opportunities for Information technology (IT) departments. These increased demands have also led to increasing costs for IT infrastructure such as necessary computing power and storage devices, as well as further costs for manpower effort, required for maintenance. This chapter describes some of the challenges for computational analysis infrastructure, including bottlenecks and most pressing needs that have to be addressed to effectively support the development of systems biology and its application in medicine.

The authors Fotis Georgatos and Stéphane Ballereau contributed equally to this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

3D:: Three dimensional
4D:: Four dimensional typically 3D plus time dimension
BOINC:: Berkeley Open Infrastructure for Network Computing
CERN:: European Council for Nuclear Research
CPU:: Central Processing Unit
EBI:: European Bioinformatics Institute
EGEE:: Enabling Grids for E-sciencE
EGI:: European Grid infrastructure
ELIXIR:: European Life Sciences Infrastructure for Biological Information
EMBL:: European Molecular Biology Laboratory
Flops:: FLoating-point Operations Per Second
GPU:: Graphical Processing Unit
HPC:: High Performance Computing
HTC:: High Throughput Computing
IaaS:: Infrastructure as a Service
IT:: Information Technology
I/O:: Input/Output—typically used in the context of software processing of data
LHC:: Large Hadron Collider
MPI:: Message Passing Interface
Omics:: A collective term to refer to -omics keywords like metabolomics, genomics, proteomics etc.
PaaS:: Platform as a Service
PRACE:: Partnership for Advanced Computing in Europe
ROI:: Return On Investment
SaaS:: Software as a Service
SBML:: Systems Biology Markup Language
WLCG:: Worldwide LHC Computing Grid

References

Chen C, McGarvey PB, Huang H, Wu CH (2010) Protein bioinformatics infrastructure for the integration and analysis of multiple high-throughput ‘omics’ data. Adv Bioinform, 19p
Google Scholar
Bousquet J et al (2011) MeDALL (mechanisms of the development of ALLergy): an integrated approach from phenotypes to systems medicine. Allergy 66:596–604
Article CAS PubMed Google Scholar
Bel EH et al (2011) Diagnosis and definition of severe refractory asthma: an international consensus statement from the innovative medicine initiative (IMI). Thorax 66:910–917
Article PubMed Google Scholar
Rosenthal A et al (2010) Cloud computing: a new business paradigm for biomedical information sharing. J Biomed Inform 43:342–353
Article PubMed Google Scholar
Ruusalepp R (2008) Infrastructure planning and data curation: acomparative study of international approaches to enabling the sharing of research data. At http://www.jisc.ac.uk/media/documents/programmes/preservation/national_data_sharing_report_final.pdf
Twiki—a web-based collaboration for EGEE project. At https://twiki.cern.ch/twiki/bin/view/EGEE/LifeSciences
Biomedical applications description. At http://proton.polytech.unice.fr/biomed/egee2-applications.html#medimg
HealthGrid Portal—A Human Grid Initiative. At http://healthgrid.org/
The BioinfoGRID Project. At http://www.bioinfogrid.eu/
IGI—Italian Grid Infrastructure. List of scientific application for VO biomed at http://www.italiangrid.it/appdb/listbyvo/6
Crosswell LC, Thornton JM (2012) ELIXIR: a distributed infrastructure for European biological data. Trends Biotechnol 30:241–242
Article CAS PubMed Google Scholar
eTRIKS European Transnational Information and Knowledge Management Services. At http://www.etriks.org/
Wu Y, Kumar S, Park S-J (2010) Measurement and performance issues of transport protocols over 10 Gbps high-speed optical networks. Comput Netw 54:475–488
Article Google Scholar
Saltzer JH, Reed DP, Clark DD (1984) End-to-end arguments in system design. ACM Trans Comput Syst 2:277–288
Article Google Scholar
Welcome to the Worldwide LHC Computing Grid. At http://wlcg.web.cern.ch/
Newhouse S. D2.3 EGI-InSPIRE Paper, European Grid Infrastructure. At http://go.egi.eu/pdnon
Sujansky W (2001) Heterogeneous database integration in biomedicine. J Biomed Inform 34:285–298
Article CAS PubMed Google Scholar
Alonso-Calvo R et al (2007) An agent- and ontology-based system for integrating public gene, protein, and disease databases. J Biomed Inform 40:17–29
Article CAS PubMed Google Scholar
Brazma A, Krestyaninova M, Sarkans U (2006) Standards for systems biology. Nat Rev Genet 7:593–605
Article CAS PubMed Google Scholar
Courtot M et al (2011) Controlled vocabularies and semantics in systems biology. Mol Syst Biol 7:543
Article PubMed Central PubMed Google Scholar
Szalma S, Koka V, Khasanova T, Perakslis ED (2010) Effective knowledge management in translational medicine. J Transl Med 8:68
Article PubMed Central PubMed Google Scholar
Stein LD (2008) Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet 9:678–688
Article CAS PubMed Google Scholar
Ghosh S, Matsuoka Y, Asai Y, Hsin K-Y, Kitano H (2011) Software for systems biology: from tools to integrated platforms. Nat Rev Genet 12:821–832
CAS PubMed Google Scholar
Wruck W, Peuker M, Regenbrecht CRA (2012) Data management strategies for multinational large-scale systems biology projects. Brief Bioinform. doi:10.1093/bib/bbs064
Blankenberg D et al (2010) Galaxy: a web-based genome analysis tool for experimentalists. Curr Protoc Mol Biol Chapter 19, Unit 19.10.1–21
Google Scholar
Chervitz SA et al (2011) Data standards for omics data: the basis of data sharing and reuse. Methods Mol Biol 719:31–69
Article CAS PubMed Google Scholar
Hucka M et al (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19:524–531
Article CAS PubMed Google Scholar
Foster I, Kohr DR, Krishnaiyer R, Choudhary A (1997) A library-based approach to task parallelism in a data-parallel language. J Parallel Distrib Comput 45:148–158
Article Google Scholar
VitalIT tools—High Performance Computing Center. At http://www.vital-it.ch/software/tools.php
Hull D et al (2006) Taverna: a tool for building and running workflows of services. Nucleic Acids Res 34:729–732
Article Google Scholar
Hillman-Jackson J et al (2012) Using galaxy to perform large-scale interactive data analyses. Curr Protoc Bioinform Chapter 10, Unit10.5
Google Scholar
Abouelhoda M, Issa SA, Ghanem M (2012) Tavaxy: integrating taverna and galaxy workflows with cloud computing support. BMC Bioinform 13:77
Article Google Scholar
Reich M et al (2006) GenePattern 2.0. Nat Genet 38:500–501
Article CAS PubMed Google Scholar
Sage Synapse: Contribute to the Cure. At https://synapse.sagebase.org
Kushida CA et al (2012) Strategies for De-identification and anonymization of electronic health record data for use in multicenter research studies. Med Care 50:S82–S101
Article PubMed Google Scholar
Lyon L (2007) Dealing with data: roles, rights, responsibilities and relationships. Consultancy report, UKOLN, University of Bath, UK
Google Scholar
Biosapiens network—A European Virtual Institute for Genome Annotation. At http://www.biosapiens.info
Training at EMBL-EBI. At http://www.ebi.ac.uk/training/
Laxminarayan S, Michelson L (1988) Perspectives in biomedical supercomputing. IEEE Eng Med Biol Mag 7:12–15
Article CAS PubMed Google Scholar
Böhm K (1997) Supercomputing in cancer research. Stud Health Technol Inform 43 Pt A:104–108
Google Scholar
Maizel JR (1988) Supercomputing in molecular biology: applications to sequence analysis. IEEE Eng Med Biol Mag 7:27–30
Article CAS PubMed Google Scholar
Orphanoudakis SC (1988) Supercomputing in medical imaging. IEEE Eng Med Biol Mag 7:16–20
Article CAS PubMed Google Scholar
Kesselman C, Foster I (1998) The grid: blueprint for a new computing infrastructure. Morgan Kaufmann Publishers, Burlington. At http://www.amazon.ca/exec/obidos/redirect?tag=citeulike09-20&path=ASIN/1558604758
Szolovits P (2007) What is a grid? J Am Med Inform Assoc 14:386
Article PubMed Central PubMed Google Scholar
Breton V, Medina R, Montagnat J (2003) DataGrid, prototype of a biomedical grid. Methods Inf Med 42:143–147
CAS PubMed Google Scholar
European Grid Infrastructure. For further information, kindly refer to the EGI-InSPIRE paper. EGI at http://go.egi.eu/pdnon
The Open Science Grid Homepage. At http://www.opensciencegrid.org
The NorduGrid Collaboration, Web site. http://www.nordugrid.org
Armbrust M et al (2009) Above the clouds: a berkeley view of cloud computing. EECS Department, University of California, Berkeley. At http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.html
Anderson DP (2004) Boinc: a system for public-resource computing and storage. In: 5th IEEE/ACM international workshop on grid computing 4–10
Google Scholar
Mesirov J (2010) Computer science: accessible reproducible research. Science 327(5964):415–416. doi:10.1126/science.1179653. 22 Jan 2010
Google Scholar
Tan TW et al (2010) Advancing standards for bioinformatics activities: persistence, reproducibility, disambiguation and minimum information about a bioinformatics investigation (MIABi). BMC Genomics 11(4):S27. doi:10.1186/1471-2164-11-S4-S27. http://www.ncbi.nlm.nih.gov/pubmed/21143811. 2 Dec 2010
Kenneth H et al (2012) EasyBuild: building software with ease, PyHPC 2012, Supercomputing 2012, Salt Lake City
Google Scholar

Download references

Acknowledgements

This work was supported by the CNRS, and in part by the EU grants to CA in the context of the MeDALL consortium (Mechanisms of the Development of Allergy, Grant Agreement FP7 N°264357), the U-BIOPRED consortium (Unbiased Biomarkers for the PREDiction of respiratory disease outcomes, Grant Agreement IMI 115010), and the eTRIKS consortium to CA, DB, MG, YG, RS, RB (European Translational research Information & Knowledge management Services, Grant Agreement n°115446). The formation of the European Institute for Systems biology & Medicine hosted at Claude Bernard University is supported by the Lyonbiopole competitive cluster and its academic, industrial and local authority partners, including Grand Lyon, Région Rhône-Alpes, Direction de la Recherche et de la Technologie, and the Finovi Foundation (CA). We would like to acknowledge the support of the Luxembourg Centre for Systems Biomedicine and the University of Luxembourg (LH and RB), the NIH General Medical Sciences Center for Systems Biology GM076547 (LH) and a Department of Defense contract on Liver Toxicity W911SR-09-C-0062 (LH).

Author information

Authors and Affiliations

Luxembourg Centre for Systems Biomedicine, University of Luxembourg, 7 Avenue des Hauts-Fourneaux, 4362, Esch-sur-Alzette, Luxembourg
Fotis Georgatos, Rudi Balling & Reinhard Schneider
European Institute for Systems Biology and Medicine—CNRS-UCBL-ENS, Université de Lyon, 50 avenue Tony Garnier, 69007, Lyon, France
Stéphane Ballereau, Johann Pellet & Charles Auffray
Department of Computing, Imperial College London, London, SW7 2AZ, UK
Moustafa Ghanem & Yi-Ke Guo
Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA, 98109-5234, USA
Nathan Price & Leroy Hood
Centre de Calcul de l’IN2P3, USR6402 CNRS/IN2P3, 43 Bd du 11 Novembre 1918, 69622, Villeurbanne Cedex, France
Dominique Boutigny

Authors

Fotis Georgatos
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Ballereau
View author publications
You can also search for this author in PubMed Google Scholar
Johann Pellet
View author publications
You can also search for this author in PubMed Google Scholar
Moustafa Ghanem
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Price
View author publications
You can also search for this author in PubMed Google Scholar
Leroy Hood
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Ke Guo
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Boutigny
View author publications
You can also search for this author in PubMed Google Scholar
Charles Auffray
View author publications
You can also search for this author in PubMed Google Scholar
Rudi Balling
View author publications
You can also search for this author in PubMed Google Scholar
Reinhard Schneider
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Reinhard Schneider .

Editor information

Editors and Affiliations

Chemical and Biological Engineering, Vanderbilt University, Nashville, TN, USA
Aleš Prokop
Research Group on Process Network Engineering, Kaposvár University, Kaposvár, Hungary
Béla Csukás

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Georgatos, F. et al. (2013). Computational Infrastructures for Data and Knowledge Management in Systems Biology. In: Prokop, A., Csukás, B. (eds) Systems Biology. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6803-1_13

Download citation

DOI: https://doi.org/10.1007/978-94-007-6803-1_13
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6802-4
Online ISBN: 978-94-007-6803-1
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics