Abstract
Integration of large and diverse biological data sets is a daunting problem facing systems biology researchers. Exploring the complex issues of data validation, integration, and representation, we present a systematic approach for the management and analysis of large biological data sets based on data warehouses. Our system has been implemented in the Bioverse, a framework combining diverse protein information from a variety of knowledge areas such as molecular interactions, pathway localization, protein structure, and protein function.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Hwang, D., Rust, A. G. G., Ramsey, S., Smith, J. J. J., Leslie, D. M. M., Weston, A. D. D., et al. A data integration methodology for systems biology. Proc Natl Acad Sci U S A, 2005, 102(48):17296–17301.
Hwang, D., Smith, J. J., Leslie, D. M., Weston, A. D., Rust, A. G., Ramsey, S., et al. A data integration methodology for systems biology: Experimental verification. Proc Natl Acad Sci U S A 2005,102(48);17302–17307.
McDermott, J., Bumgarner, R., & Samudrala, R. Functional annotation from predicted protein interaction networks. Bioinformatics 2005,21(15):3217–3226.
Chen, N., Harris, T. W., Antoshechkin, I., Bastiani, C., Bieri, T., Blasiar, D., et al. (2005). WormBase: A comprehensive data resource for caenorhabditis biology and genomics. Nucleic Acids Res 2005,33(Supplement 1):D383.
Haft, D. H., Selengut, J. D., & White, O. The TIGRFAMs database of protein families. Nucleic Acids Res 2003,31(1):371–373.
Madera, M., Vogel, C., Kummerfeld, S. K., Chothia, C., & Gough, J. The SUPERFAMILY database in 2004: Additions and improvements. Nucleic Acids Res 2004,32:D235–D239.
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., & Hattori, M.. The KEGG resource for deciphering the genome. Nucleic Acids Res 2004, 32(Database issue)
Wilkinson, M. D., & Links, M. BioMOBY: An open source biological web services proposal. Brief Bioinform 2002,3(4):331–341.
Carrere, S., & Gouzy, J. REMORA: A pilot in the ocean of BioMoby web-services. Bioinformatics 2006,22(7): 900–901.
Shah A..R, Singhal M., Klicker K. R., Stephan E. G., Wiley H. S., & Waters K. M. Enabling high-throughput data management for systems biology: The Bioinformatics Resource Manager. Bioinformatics 2007,23(7):906–909.
McDermott, J., Guerquin, M., Frazier, Z., Chang, A., & Samudrala, R. BIOVERSE: Enhancements to the framework for structural, functional, and contextual annotations of proteins and proteome. Nucleic Acids Res 2005,33:W324–W325.
Kimball, R., Ross, M. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling 2002, Wiley, New York, NY.
Codd, E. F. A relational model of data for large shared data banks. Communications of the ACM 1970,13(6):377–387.
Hucka, M., Finney, A., Sauro, H. M., Bolouri, H., Doyle, J. C., Kitano, H., et al. The systems biology markup language (SBML): A medium for representation and exchange of biochemical network models. Bioinformatics 2003,19(4): 524–531.
Hermjakob, H., Montecchi-Palazzi, L., Bader, G., Wojcik, J., Salwinski, L., Ceol, A., et al. The HUPO PSI's molecular interaction format – a community standard for the representation of protein interaction data. Nat Biotechnol 2004,22(2):177–183.
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. Gene ontology: Tool for the unification of biology. The gene ontology consortium. Nat Genet 2000,25(1):25–29.
Lo Conte, L., Ailey, B., Hubbard, T. J., Brenner, S. E., Murzin, A. G., & Chothia, C. SCOP: A structural classification of proteins database. Nucleic Acids Res 2000, 28(1);257–259.
Benson D.A., Karsch-Mizrachi I., Lipman D.J., Ostell J., Wheeler D.L. GenBank: Update. Nucleic Acids Res 2004,32: D23–D26.
Bader, G. D., Betel, D., & Hogue, C. W. BIND: The biomolecular interaction network database. Nucleic Acids Res 2003,31 (1):248–250.
Breitkreutz, B. J., Stark, C., & Tyers, M. The GRID: The general repository for interaction datasets. Genome Biol 2003 4(3):R23.
Chatr-aryamontri, A., Ceol, A., Palazzi, L. M., Nardelli, G., Schneider, M. V., Castagnoli, L., et al. MINT: The molecular INTeraction database. Nucleic Acids Res 2007,35:D572–D574.
Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., & Eisenberg, D. DIP: The database of interacting proteins. Nucleic Acids Res 2000,28(1):289–291.
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., & Hattori, M. (2004). The KEGG resource for deciphering the genome. Nucleic Acids Res 32(Database issue).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Frazier, Z., McDermott, J., Guerquin, M., Samudrala, R. (2009). Computational Representation of Biological Systems. In: Ireton, R., Montgomery, K., Bumgarner, R., Samudrala, R., McDermott, J. (eds) Computational Systems Biology. Methods in Molecular Biology, vol 541. Humana Press. https://doi.org/10.1007/978-1-59745-243-4_23
Download citation
DOI: https://doi.org/10.1007/978-1-59745-243-4_23
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-58829-905-5
Online ISBN: 978-1-59745-243-4
eBook Packages: Springer Protocols