Computational Representation of Biological Systems

  • Zach Frazier
  • Jason McDermott
  • Michal Guerquin
  • Ram Samudrala
Part of the Methods in Molecular Biology book series (MIMB, volume 541)


Integration of large and diverse biological data sets is a daunting problem facing systems biology researchers. Exploring the complex issues of data validation, integration, and representation, we present a systematic approach for the management and analysis of large biological data sets based on data warehouses. Our system has been implemented in the Bioverse, a framework combining diverse protein information from a variety of knowledge areas such as molecular interactions, pathway localization, protein structure, and protein function.

Key words

Bioverse data integration molecular interactions protein structure protein function data warehouse database bioinformatics 


  1. 1.
    Hwang, D., Rust, A. G. G., Ramsey, S., Smith, J. J. J., Leslie, D. M. M., Weston, A. D. D., et al. A data integration methodology for systems biology. Proc Natl Acad Sci U S A, 2005, 102(48):17296–17301.Google Scholar
  2. 2.
    Hwang, D., Smith, J. J., Leslie, D. M., Weston, A. D., Rust, A. G., Ramsey, S., et al. A data integration methodology for systems biology: Experimental verification. Proc Natl Acad Sci U S A 2005,102(48);17302–17307.Google Scholar
  3. 3.
    McDermott, J., Bumgarner, R., & Samudrala, R. Functional annotation from predicted protein interaction networks. Bioinformatics 2005,21(15):3217–3226.PubMedCrossRefGoogle Scholar
  4. 4.
    Chen, N., Harris, T. W., Antoshechkin, I., Bastiani, C., Bieri, T., Blasiar, D., et al. (2005). WormBase: A comprehensive data resource for caenorhabditis biology and genomics. Nucleic Acids Res 2005,33(Supplement 1):D383.PubMedGoogle Scholar
  5. 5.
    Haft, D. H., Selengut, J. D., & White, O. The TIGRFAMs database of protein families. Nucleic Acids Res 2003,31(1):371–373.PubMedCrossRefGoogle Scholar
  6. 6.
    Madera, M., Vogel, C., Kummerfeld, S. K., Chothia, C., & Gough, J. The SUPERFAMILY database in 2004: Additions and improvements. Nucleic Acids Res 2004,32:D235–D239.PubMedCrossRefGoogle Scholar
  7. 7.
    Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., & Hattori, M.. The KEGG resource for deciphering the genome. Nucleic Acids Res 2004, 32(Database issue)Google Scholar
  8. 8.
    Wilkinson, M. D., & Links, M. BioMOBY: An open source biological web services proposal. Brief Bioinform 2002,3(4):331–341.PubMedCrossRefGoogle Scholar
  9. 9.
    Carrere, S., & Gouzy, J. REMORA: A pilot in the ocean of BioMoby web-services. Bioinformatics 2006,22(7): 900–901.PubMedCrossRefGoogle Scholar
  10. 10.
    Shah A..R, Singhal M., Klicker K. R., Stephan E. G., Wiley H. S., & Waters K. M. Enabling high-throughput data management for systems biology: The Bioinformatics Resource Manager. Bioinformatics 2007,23(7):906–909.PubMedCrossRefGoogle Scholar
  11. 11.
    McDermott, J., Guerquin, M., Frazier, Z., Chang, A., & Samudrala, R. BIOVERSE: Enhancements to the framework for structural, functional, and contextual annotations of proteins and proteome. Nucleic Acids Res 2005,33:W324–W325.PubMedCrossRefGoogle Scholar
  12. 12.
    Kimball, R., Ross, M. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling 2002, Wiley, New York, NY.Google Scholar
  13. 13.
    Codd, E. F. A relational model of data for large shared data banks. Communications of the ACM 1970,13(6):377–387.CrossRefGoogle Scholar
  14. 14.
    Hucka, M., Finney, A., Sauro, H. M., Bolouri, H., Doyle, J. C., Kitano, H., et al. The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics 2003,19(4): 524–531.PubMedCrossRefGoogle Scholar
  15. 15.
    Hermjakob, H., Montecchi-Palazzi, L., Bader, G., Wojcik, J., Salwinski, L., Ceol, A., et al. The HUPO PSI's molecular interaction format – a community standard for the representation of protein interaction data. Nat Biotechnol 2004,22(2):177–183.PubMedCrossRefGoogle Scholar
  16. 16.
    Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. Gene ontology: Tool for the unification of biology. The gene ontology consortium. Nat Genet 2000,25(1):25–29.PubMedCrossRefGoogle Scholar
  17. 17.
    Lo Conte, L., Ailey, B., Hubbard, T. J., Brenner, S. E., Murzin, A. G., & Chothia, C. SCOP: A structural classification of proteins database. Nucleic Acids Res 2000, 28(1);257–259.PubMedCrossRefGoogle Scholar
  18. 18.
    Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J., Wheeler D.L. GenBank: Update. Nucleic Acids Res 2004,32: D23–D26.PubMedCrossRefGoogle Scholar
  19. 19.
    Bader, G. D., Betel, D., & Hogue, C. W. BIND: The biomolecular interaction network database. Nucleic Acids Res 2003,31 (1):248–250.PubMedCrossRefGoogle Scholar
  20. 20.
    Breitkreutz, B. J., Stark, C., & Tyers, M. The GRID: The general repository for interaction datasets. Genome Biol 2003 4(3):R23.PubMedCrossRefGoogle Scholar
  21. 21.
    Chatr-aryamontri, A., Ceol, A., Palazzi, L. M., Nardelli, G., Schneider, M. V., Castagnoli, L., et al. MINT: The molecular INTeraction database. Nucleic Acids Res 2007,35:D572–D574.PubMedCrossRefGoogle Scholar
  22. 22.
    Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., & Eisenberg, D. DIP: The database of interacting proteins. Nucleic Acids Res 2000,28(1):289–291.PubMedCrossRefGoogle Scholar
  23. 23.
    Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., & Hattori, M. (2004). The KEGG resource for deciphering the genome. Nucleic Acids Res 32(Database issue).Google Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Zach Frazier
    • 1
  • Jason McDermott
    • 2
  • Michal Guerquin
    • 1
  • Ram Samudrala
    • 1
  1. 1.Department of MicrobiologyUniversity of WashingtonSeattleUSA
  2. 2.Computational Biology and BioinformaticsPacific Northwest National LaboratoryRichlandUSA

Personalised recommendations