Advertisement

Current challenges and approaches for the synergistic use of systems biology data in the scientific community

  • Christian H. Ahrens
  • Ulrich Wagner
  • Hubert K. Rehrauer
  • Can Türker
  • Ralph Schlapbach
Part of the Experientia Supplementum book series (EXS, volume 97)

Abstract

Today’s rapid development and broad application of high-throughput analytical technologies are transforming biological research and provide an amount of data and analytical opportunities to understand the fundamentals of biological processes undreamt of in past years. To fully exploit the potential of the large amount of data, scientists must be able to understand and interpret the information in an integrative manner. While the sheer data volume and heterogeneity of technical platforms within each discipline already poses a significant challenge, the heterogeneity of platforms and data formats across disciplines makes the integrative management, analysis, and interpretation of data a significantly more difficult task. This challenge thus lies at the heart of systems biology, which aims at a quantitative understanding of biological systems to the extent that systemic features can be predicted. In this chapter, we discuss several key issues that need to be addressed in order to put an integrated systems biology data analysis and mining within reach.

Keywords

Gene Expression Omnibus System Biology Markup Language Protein Interaction Data Open Biomedical Ontology Gene Expression Database 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bork P, Serrano L (2005) Towards cellular systems in 4D. Cell 121:507–509PubMedCrossRefGoogle Scholar
  2. 2.
    Lauffenburger D (2003) Systems biology. Chem Eng News 81: 45–55Google Scholar
  3. 3.
    Maglott DR, Katz KS, Sicotte H, Pruitt KD (2000) NCBI’s LocusLink and RefSeq. Nucleic Acids Res 28: 126–128PubMedCrossRefGoogle Scholar
  4. 4.
    Pruitt KD, Katz KS, Sicotte H, Maglott DR (2000) Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet 16: 44–47PubMedCrossRefGoogle Scholar
  5. 5.
    Bairoch A, Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28: 45–48PubMedCrossRefGoogle Scholar
  6. 6.
    Jarvinen AK, Hautaniemi S, Edgren H, Auvinen P, Saarela J, Kallioniemi OP, Monni O (2004) Are data from different gene expression microarray platforms comparable? Genomics 83: 1164–1168PubMedCrossRefGoogle Scholar
  7. 7.
    Hack CJ (2004) Integrated transcriptome and proteome data: the challenges ahead. Brief Funct Genomic Proteomic 3: 212–219PubMedCrossRefGoogle Scholar
  8. 8.
    Schulze-Kremer S (2002) Ontologies for molecular biology and bioinformatics. In Silico Biol 2: 179–193PubMedGoogle Scholar
  9. 9.
    Rojas I, Ratsch E, Saric J, Wittig U (2004) Notes on the use of ontologies in the biochemical domain. In Silico Biol 4: 89–96PubMedGoogle Scholar
  10. 10.
    Blake J (2004) Bio-ontologies-fast and furious. Nat Biotechnol 22: 773–774PubMedCrossRefGoogle Scholar
  11. 11.
    Bard JB, Rhee SY (2004) Ontologies in biology: design, applications and future challenges. Nat Rev Genet 5: 213–222PubMedCrossRefGoogle Scholar
  12. 12.
    Gruber TR (1993) Toward principles for the design of ontologies used for knowledge sharing. http://ksl-web.stanford.edu/KSL_Abstracts/KSL-93-04.htmlGoogle Scholar
  13. 13.
    OBO. Open Biomedical Ontologies. http://obo.sourceforge.net.Google Scholar
  14. 14.
    Mungall C (2004) OBOL: Integrating language and meaning in bio-ontologies. Comp Funct Genomics 6–7: 509–520CrossRefGoogle Scholar
  15. 15.
    The Plant Ontology Consortium (2002) The Plant Ontology Consortium and Plant Ontologies. Comp Funct Genomics 3: 137–142CrossRefGoogle Scholar
  16. 16.
    Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25: 25–29PubMedCrossRefGoogle Scholar
  17. 17.
    The Gene Ontology Consortium (2001) Creating the gene ontology resource: design and implementation. Genome Res 11: 1425–1433CrossRefGoogle Scholar
  18. 18.
    Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W (2004) GENEVESTIGATOR. Arabidopsis microarray database and analysis toolbox. Plant Physiol 136: 2621–2632PubMedCrossRefGoogle Scholar
  19. 19.
    Khatri P, Draghici S (2005) Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21: 3587–3595PubMedCrossRefGoogle Scholar
  20. 20.
    Khatri P, Draghici S, Ostermeier GC, Krawetz SA (2002) Profiling gene expression using onto-express. Genomics 79: 266–270PubMedCrossRefGoogle Scholar
  21. 21.
    Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA (2003) Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res 31: 3775–3781PubMedCrossRefGoogle Scholar
  22. 22.
    Zhang B, Schmoyer D, Kirov S, Snoddy J (2004) GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics 5: 16PubMedCrossRefGoogle Scholar
  23. 23.
    Lee HK, Braynen W, Keshav K, Pavlidis P. Ermine J (2005) Tool for functional analysis of gene expression data sets. BMC Bioinformatics 6: 269PubMedCrossRefGoogle Scholar
  24. 24.
    Maere S, Heymans K, Kuiper M (2005) BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21: 3448–3449PubMedCrossRefGoogle Scholar
  25. 25.
    Wrobel G, Chalmel F, Primig M (2005) goCluster integrates statistical analysis and functional interpretation of microarray expression data. Bioinformatics 21: 3575–3577PubMedCrossRefGoogle Scholar
  26. 26.
    Lottaz C, Spang R (2005) Molecular decomposition of complex clinical phenotypes using biologically structured analysis of microarray data. Bioinformatics 21: 1971–1978PubMedCrossRefGoogle Scholar
  27. 27.
    Berardini TZ, Mundodi S, Reiser L, Huala E, Garcia-Hernandez M, Zhang P, Mueller LA, Yoon J, Doyle A, Lander G et al. (2004) Functional annotation of the Arabidopsis genome using controlled vocabularies. Plant Physiol 135: 745–755PubMedCrossRefGoogle Scholar
  28. 28.
    Beckett P, Bancroft I (2005) M.T. Computational tools for Brassica-Arabidopsis comparative genomics. Comp Funct Genomics 6: 147–152CrossRefGoogle Scholar
  29. 29.
    Gramene. www.gramene.orgGoogle Scholar
  30. 30.
    Ware D, Jaiswal P, Ni J, Pan X, Chang K, Clark K, Teytelman L, Schmidt S, Zhao W, Cartinhour S et al. (2002) Gramene: a resource for comparative grass genomics. Nucleic Acids Res 30: 103–105PubMedCrossRefGoogle Scholar
  31. 31.
    Ware DH, Jaiswal P, Ni J, Yap IV, Pan X, Clark KY, Teytelman L, Schmidt SC, Zhao W, Chang K et al. (2002) Gramene, a tool for grass genomics. Plant Physiol 130: 1606–1613PubMedCrossRefGoogle Scholar
  32. 32.
    Soldatova LN, King RD (2005) Are the current ontologies in biology good ontologies? Nat Biotechnol 23: 1095–1098PubMedCrossRefGoogle Scholar
  33. 33.
    Brazma A, Robinson A, Cameron G, Ashburner M (2000) One-stop shop for microarray data. Nature 403: 699–700PubMedCrossRefGoogle Scholar
  34. 34.
    MIAME. www.mged.org/Workgroups/MIAME/miame_checklist.htmlGoogle Scholar
  35. 35.
    Zimmermann P, Schildknecht B, Craigon D, Garcia-Hernandez M, Gruissem W, May S, Mukherjee G, Parkinson H, Rhee S, Wagner U et al. (2006) MIAME/Plant — adding value to plant microarray experiments. Plant Methods 2: 1PubMedCrossRefGoogle Scholar
  36. 36.
    MIAME-Tox. http://www.mged.org/MIAME1.1-DenverDraft.DOC)Google Scholar
  37. 37.
    Spellman PT, Miller M, Stewart J, Troup C, Sarkans U, Chervitz S, Bernhart D, Sherlock G, Ball C, Lepage M et al. (2002) Design and implementation of microarray gene expression markup language (MAGE-ML). Genome Biol 3:RESEARCH0046 Epub 2002 Aug 23Google Scholar
  38. 38.
    MAGE-ML.DTD. http://schema.omg.org/lsr/gene_expression/1.1/MAGE-ML.dtdGoogle Scholar
  39. 39.
    MGED Ontology draft. www.mged.org/Workgroups/MIAME/MIAMEv1.1-MAGEOntologyDraft2v1.0.htmGoogle Scholar
  40. 40.
    Jenkins H, Hardy N, Beckmann M, Draper J, Smith AR, Taylor J, Fiehn O, Goodacre R, Bino RJ, Hall R et al. (2004) A proposed framework for the description of plant metabolomics experiments and their results. Nat Biotechnol 22: 1601–1606PubMedCrossRefGoogle Scholar
  41. 41.
    Kaiser J (2002) Proteomics. Public-private group maps out initiatives. Science 296: 827PubMedCrossRefGoogle Scholar
  42. 42.
    Orchard S, Hermjakob H, Apweiler R (2003) The proteomics standards initiative. Proteomics 3: 1374–1376PubMedCrossRefGoogle Scholar
  43. 43.
    Orchard S, Taylor C, Hermjakob H, Zhu W, Julian R, Apweiler R (2004) Current status of proteomic standards development. Exp Rev Proteomics 1: 179–183CrossRefGoogle Scholar
  44. 44.
    Jensen ON (2004) Modification-specific proteomics: characterization of post-translational modifications by mass spectrometry. Curr Opin Chem Biol 8: 33–41PubMedCrossRefGoogle Scholar
  45. 45.
    Tyers M, Mann M (2003) From genomics to proteomics. Nature 422: 193–197PubMedCrossRefGoogle Scholar
  46. 46.
    Anderson NL, Anderson NG (2002) The human plasma proteome: history, character, and diagnostic prospects. Mol Cell Proteomics 1: 845–867PubMedCrossRefGoogle Scholar
  47. 47.
    de Lichtenberg U, Jensen LJ, Brunak S, Bork P (2005) Dynamic complex formation during the yeast cell cycle. Science 307: 724–727PubMedCrossRefGoogle Scholar
  48. 48.
    Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C et al. (2004) The HUPO PSI’s molecular interaction format — a community standard for the representation of protein interaction data. Nat Biotechnol 22: 177–183PubMedCrossRefGoogle Scholar
  49. 49.
    DIP. http://dip.doe-mbi.ucla.eduGoogle Scholar
  50. 50.
    MINT. http://mint.bio.uniroma2.it/mintGoogle Scholar
  51. 51.
    MPact. http://mips.gsf.de/genre/proj/mpactGoogle Scholar
  52. 52.
    IntAct. www.ebi.ac.uk/intactGoogle Scholar
  53. 53.
    http://imex.sf.netGoogle Scholar
  54. 54.
    Pedrioli PG, Eng JK, Hubley R, Vogelzang M, Deutsch EW, Raught B, Pratt B, Nilsson E, Angeletti RH, Apweiler R et al. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat Biotechnol 22: 1459–1466PubMedCrossRefGoogle Scholar
  55. 55.
    Orchard S, Hermjakob H, Taylor C, Aebersold R, Apweiler R (2005) Human proteome organisation proteomics standards initiative pre-congress initiative. Proteomics 5: 4651–4652PubMedCrossRefGoogle Scholar
  56. 56.
    Nesvizhskii AI, Aebersold R (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics 4: 1419–1440PubMedCrossRefGoogle Scholar
  57. 57.
    Carr S, Aebersold R, Baldwin M, Burlingame A, Clauser K, Nesvizhskii A (2004) The need for guidelines in publication of peptide and protein identification data: Working Group on Publication Guidelines for Peptide and Protein Identification Data. Mol Cell Proteomics 3: 531–533PubMedCrossRefGoogle Scholar
  58. 58.
    Keller A, Nesvizhskii AI, Kolker E, Aebersold R (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74: 5383–5392PubMedCrossRefGoogle Scholar
  59. 59.
    Nesvizhskii AI, Keller A, Kolker E, Aebersold R (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75: 4646–4658PubMedCrossRefGoogle Scholar
  60. 60.
    Ahrens C, Jespersen H, Schandorff S (2005) Bioinformatics for Proteomics: Wiley, 249–272Google Scholar
  61. 61.
    Schwarz K, Schmitt I, Türker C, Höding M, Hildebrandt E, Balko S, Conrad S, Saake G (1999) Design Support for Database Federations. Springer-Verlag, 445–459Google Scholar
  62. 62.
    Hucka M, Finney A, Sauro HM, Bolouri H, Doyle JC, Kitano H, Arkin AP, Bornstein BJ, Bray D, Cornish-Bowden A et al. (2003) The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models. Bioinformatics 19: 524–531PubMedCrossRefGoogle Scholar
  63. 63.
    Adelberg A (1998) NoDoSE — A tool for semi-automatically extracting structured and semistructured data from text documents. In: Proceedings of the International Conference on Data Management, SIGMOD’98, ACM SIGMOD Record, 25Google Scholar
  64. 64.
    Sheth AP, Larson JA (1990) Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Computing Surveys 22: 183–236CrossRefGoogle Scholar
  65. 65.
    Batini C, Lenzerini M, Navathe SB (1986) A comparative analysis of methodologies for database schema integration. ACM Computing Surveys 18: 323–364CrossRefGoogle Scholar
  66. 66.
    Sarkans U, Parkinson H, Lara GG, Oezcimen A, Sharma A, Abeygunawardena N, Contrino S, Holloway E, Rocca-Serra P, Mukherjee G et al. (2005) The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics 21: 1495–1501PubMedCrossRefGoogle Scholar
  67. 67.
    Parkinson H, Sarkans U, Shojatalab M, Abeygunawardena N, Contrino S, Coulson R, Farne A, Lara GG, Holloway E, Kapushesky M et al. (2005) ArrayExpress — a public repository for microarray gene expression data at the EBI. Nucleic Acids Res 33: D553–555PubMedCrossRefGoogle Scholar
  68. 68.
    Dai M, Wang P, Boyd AD, Kostov G, Athey B, Jones EG, Bunney WE, Myers RM, Speed TP, Akil H et al. (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic Acids Res 33: e175PubMedCrossRefGoogle Scholar
  69. 69.
    Barrett T, Suzek TO, Troup DB, Wilhite SE, Ngau WC, Ledoux P, Rudnev D, Lash AE, Fujibuchi W, Edgar R (2005) NCBI GEO: mining millions of expression profiles — database and tools. Nucleic Acids Res 33: D562–566PubMedCrossRefGoogle Scholar
  70. 70.
    Mukherjee G, Abeygunawardena N, Parkinson H, Contrino S, Durinck S, Farne A, Holloway E, Lilja P, Moreau Y, Oezcimen A et al. (2005) Plant-based microarray data at the European Bioinformatics Institute. Introducing AtMIAMExpress, a submission tool for Arabidopsis gene expression data to ArrayExpress. Plant Physiol 139: 632–636PubMedCrossRefGoogle Scholar
  71. 71.
    Boyes DC, Zayed AM, Ascenzi R, McCaskill AJ, Hoffman NE, Davis KR, Gorlach J (2001) Growth stage-based phenotypic analysis of Arabidopsis: a model for high throughput functional genomics in plants. Plant Cell 13: 1499–1510PubMedCrossRefGoogle Scholar
  72. 72.
    Craigon DJ, James N, Okyere J, Higgins J, Jotham J, May S (2004) NASCArrays: a repository for microarray data generated by NASC’s transcriptomics service. Nucleic Acids Res 32: D575–577PubMedCrossRefGoogle Scholar
  73. 73.
    Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-Hernandez M, Huala E, Lander G, Montoya M et al. (2003) The Arabidopsis Information Resource (TAIR): a model organism database providing a centralized, curated gateway to Arabidopsis biology, research materials and community. Nucleic Acids Res 31: 224–228PubMedCrossRefGoogle Scholar
  74. 74.
    Yazaki J, Kishimoto N, Ishikawa M, Endo D, Kojima K (2002) The Rice Expression Database (RED): gateway to rice functional genomics. Trends in Plant Sci 7: 563–564CrossRefGoogle Scholar
  75. 75.
    SGMD. http://psi081.ba.ars.usda.gov/SGMD/default.htmGoogle Scholar
  76. 76.
    Maizearray. www.maizearray.orgGoogle Scholar
  77. 77.
    Shen L, Gong J, Caldo RA, Nettleton D, Cook D, Wise RP, Dickerson JA (2005) BarleyBase — an expression profiling database for plant genomics. Nucleic Acids Res 33: D614–618PubMedCrossRefGoogle Scholar
  78. 78.
    Button DK, Gartland KM, Ball LD, Natanson L, Gartland JS, Lyon GD (2006) DRASTIC — INSIGHTS: querying information in a plant gene expression database. Nucleic Acids Res 34: D712–716PubMedCrossRefGoogle Scholar
  79. 79.
    www.expasy.org/ch2d/2d-index.htmlGoogle Scholar
  80. 80.
    Desiere F, Deutsch EW, Nesvizhskii AI, Mallick P, King NL, Eng JK, Aderem A, Boyle R, Brunner E, Donohoe S et al. (2005) Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry. Genome Biol 6: R9PubMedCrossRefGoogle Scholar
  81. 81.
    SBEAMS. www.sbeams.org/Google Scholar
  82. 82.
    Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 13: 2498–2504PubMedCrossRefGoogle Scholar
  83. 83.
    Goffeau A, Barrell BG, Bussey H, Davis RW, Dujon B, Feldmann H, Galibert F, Hoheisel JD, Jacq C, Johnston M et al. (1996) Life with 6000 genes. Science 274: 546, 563–567PubMedCrossRefGoogle Scholar
  84. 84.
    Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF et al. (2000) The genome sequence of Drosophila melanogaster. Science 287: 2185–2195PubMedCrossRefGoogle Scholar
  85. 85.
    Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408: 796–815CrossRefGoogle Scholar
  86. 86.
    Bevan M, Walsh S (2005) The Arabidopsis genome: a foundation for plant research. Genome Res 15: 1632–1642PubMedCrossRefGoogle Scholar
  87. 87.
    Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick AM, Wu HC, Kim C, Nguyen M et al. (2003) Empirical analysis of transcriptional activity in the Arabidopsis genome. Science 302: 842–846PubMedCrossRefGoogle Scholar
  88. 88.
    DasGupta R, Kaykas A, Moon RT, Perrimon N (2005) Functional genomic analysis of the Wnt-wingless signaling pathway. Science 308: 826–833PubMedCrossRefGoogle Scholar
  89. 89.
    Aebersold R, Mann M (2003) Mass spectrometry-based proteomics. Nature 422: 198–207PubMedCrossRefGoogle Scholar
  90. 90.
    Kuster B, Schirle M, Mallick P, Aebersold R (2005) Scoring proteomes with proteotypic peptide probes. Nat Rev Mol Cell Biol 6: 577–583PubMedCrossRefGoogle Scholar
  91. 91.
    Ideker T, Thorsson V, Ranish JA, Christmas R, Buhler J, Eng JK, Bumgarner R, Goodlett DR, Aebersold R, Hood L (2001) Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 292: 929–934PubMedCrossRefGoogle Scholar
  92. 92.
    Chory J, Ecker JR, Briggs S, Caboche M, Coruzzi GM, Cook D, Dangl J, Grant S, Guerinot ML, Henikoff S et al. (2000) National Science Foundation-Sponsored Workshop Report: “The 2010 Project” functional genomics and the virtual plant. A blueprint for understanding how plants are built and how to improve them. Plant Physiol 123: 423–426PubMedCrossRefGoogle Scholar
  93. 93.
    Cheng J, Sun S, Tracy A, Hubbell E, Morris J, Valmeekam V, Kimbrough A, Cline MS, Liu G, Shigeta R et al. (2004) NetAffx Gene Ontology Mining Tool: a visual approach for microarray data analysis. Bioinformatics 20: 1462–1463PubMedCrossRefGoogle Scholar

Copyright information

© Birkhäuser Verlag/Switzerland 2007

Authors and Affiliations

  • Christian H. Ahrens
    • 1
  • Ulrich Wagner
    • 1
  • Hubert K. Rehrauer
    • 1
  • Can Türker
    • 1
  • Ralph Schlapbach
    • 1
  1. 1.Functional Genomics Center ZurichZurichSwitzerland

Personalised recommendations