Advertisement

The Bioverse API and Web Application

  • Michal Guerquin
  • Jason McDermott
  • Zach Frazier
  • Ram Samudrala
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 541)

Abstract

The Bioverse is a framework for creating, warehousing and presenting biological information based on hierarchical levels of organisation. The framework is guided by a deeper philosophy of desiring to represent all relationships between all components of biological systems towards the goal of a wholistic picture of organismal biology. Data from various sources are combined into a single repository and a uniform interface is exposed to access it. The power of the approach of the Bioverse is that, due to its inclusive nature, patterns emerge from the acquired data and new predictions are made. The implementation of this repository (beginning with acquisition of source data, processing in a pipeline, and concluding with storage in a relational database) and interfaces to the data contained in it, from a programmatic application interface to a user friendly web application, are discussed.

Key words

Bioverse framework systems biology proteomics interaction protein structure functional annotation prediction visualization server programming interface data warehouse 

Notes

Acknowledgements

We acknowledge the invaluable help in the form of comments, contributions, and critiques of the Bioverse from all members of the Samudrala group and the Department of Microbiology at the University of Washington.

Many researchers have helped in the creation of the Bioverse and Protinfo web servers. We thank the scientific community (more properly attributed in Section 3.2) for making available data and techniques we have used and relied on.

This work was and is currently supported in part by the University of Washington’s Advanced Technology Initiative in Infectious Diseases, Puget Sound Partners in Global Health, NSF CAREER Grant, NSF Grant DBI-0217241, NIH Grant GM068152 and a Searle Scholar Award to Ram Samudrala.

References

  1. 1.
    J. Yu, J. Wang, W. Lin, et al. The genomes of Oryza sativa: a history of duplications. Public Libr. Sci. Biol. 3: e38 (2005).Google Scholar
  2. 2.
    S. Kikuchi, K. Satoh, T. Nagata, et al. Collection, mapping, and annotation of over 28,000 cDNA clones from japonica rice. Science. 301: 376–379 (2003).PubMedCrossRefGoogle Scholar
  3. 3.
    The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nat. Genet. 25: 25–29 (2000).CrossRefGoogle Scholar
  4. 4.
    J. Cherry, C. Adler, C. Ball, et al. SGD: Saccharomyces genome database. Nucl. Acids Res. 261: 73–79 (1998).PubMedCrossRefGoogle Scholar
  5. 5.
    T. Harris, N. Chen, F. Cunningham, et al. WormBase: a multi-species resource for nematode biology and genomics. Nucleic Acids Res. 32: D411–D417 (2004).PubMedCrossRefGoogle Scholar
  6. 6.
    F. Consortium. The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res. 31: 172–175 (2003).CrossRefGoogle Scholar
  7. 7.
    S. Peri, J. D. Navarro, R. Amanchy, et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. 13(10): 2363–2371 (2003).PubMedCrossRefGoogle Scholar
  8. 8.
    R. Apweiler, T. Attwood, A. Bairoch, et al. InterPro-an integrated documentation resource for protein families, domains and functional sites. Bioinformatics. 16: 1145–1150 (2000).PubMedCrossRefGoogle Scholar
  9. 9.
    H. M. Berman, J. Westbrook, Z. Feng, et al. The protein data bank. Nucl. Acids Res. 281: 235–242 (2000).PubMedCrossRefGoogle Scholar
  10. 10.
    A. G. Murzin, S. E. Brenner, T. Hubbard, C. Chothia. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247: 536–540 (1995).PubMedGoogle Scholar
  11. 11.
    T. Hubbard, A. Murzin, S. Brenner, C. Chothia. SCOP: a structural classification of proteins database. Nucleic Acids Res. 25: 236–239 (1997).PubMedCrossRefGoogle Scholar
  12. 12.
    L. Lo Conte, S. E. Brenner, T. J. P. Hubbard, C. Chothia, A. G. Murzin. SCOP database in 2002: refinements accommodate structural genomics. Nucl. Acids Res. 30(1): 264–267 (2002).Google Scholar
  13. 13.
    A. Andreeva, D. Howorth, S. E. Brenner, et al. SCOP database in 2004: refinements integrate structure and sequence family data. Nucl. Acids Res. 32 (2004).Google Scholar
  14. 14.
    J. Gough, K. Karplus, R. Hughey, C. Chothia. Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J. Mol. Biol. 313: 903–919 (2001).PubMedCrossRefGoogle Scholar
  15. 15.
    J. Gough, C. Chothia. SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments. Nucleic Acids Res. 30: 268–272 (2002).PubMedCrossRefGoogle Scholar
  16. 16.
    L. McGuffin, K. Bryson, D. Jones. The PSIPRED protein structure prediction server. Bioinformatics. 16: 404–405 (2000).PubMedCrossRefGoogle Scholar
  17. 17.
    R. Samudrala, J. Moult. A graph-theoretic algorithm for comparative modelling of protein structure. J. Mol. Biol. 279: 287–302 (1998).PubMedCrossRefGoogle Scholar
  18. 18.
    R. Samudrala, Y. Xia, E. Huang, M. Levitt. Ab initio protein structure prediction using a combined hierarchical approach. Prot.: Struct. Funct. Genet. S3: 194–198 (1999).CrossRefGoogle Scholar
  19. 19.
    E. Huang, R. Samudrala, J. Ponder. Ab initio fold prediction of small helical proteins using distance geometry and knowledge-based scoring functions. J. Mol. Biol. 290: 267–281 (1999).PubMedCrossRefGoogle Scholar
  20. 20.
    Y. Xia, E. Huang, M. Levitt, R. Samudrala. Ab initio construction of protein tertiary structures using a hierarchical approach. J. Mol. Biol. 300: 171–185 (2000).PubMedCrossRefGoogle Scholar
  21. 21.
    G. Bader, D. Betel, C. Hogue. BIND: the biomolecular interaction network database. Nucleic Acids Res. 31: 248–250 (2003).PubMedCrossRefGoogle Scholar
  22. 22.
    H. Mewes, D. Frishman, U. Guldener, et al. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30: 31–34 (2002).PubMedCrossRefGoogle Scholar
  23. 23.
    I. Xenarios, L. Salwinski, X. Duan, et al. DIP, the database of interacting proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res. 30: 303–305 (2002).PubMedCrossRefGoogle Scholar
  24. 24.
    L. Matthews, P. Vaglio, J. Reboul, et al. Identification of potential interaction networks using sequence-based searches for conserved protein-protein intera ctions or “interologs”. Genome Res. 11: 2120–2126 (2001).PubMedCrossRefGoogle Scholar
  25. 25.
    J. McDermott, R. Bumgarner, R. Samudrala. Functional annotation from predicted protein interaction networks. Bioinformatics. 21: 3217–3226 (2005).PubMedCrossRefGoogle Scholar
  26. 26.
  27. 27.
    S. Altschul, T. Madden, A. Schaffer, et al. Gapped BLAST and PSI-BLAST: a new generation of database programs. Nucleic Acids Res. 25: 3389–3402 (1997).PubMedCrossRefGoogle Scholar
  28. 28.
    HMMER: biosequence analysis using profile hidden Markov models. http://hmmer.janelia.org.
  29. 29.
    L.-H. Hung, R. Samudrala. PROTINFO: secondary and tertiary protein structure prediction. Nucleic Acids Res. 31: 3736–3737 (2003).CrossRefGoogle Scholar
  30. 30.
    L. Hung, S. Ngan, T. Liu, R. Samudrala. PROTINFO: new algorithms for enhanced protein structure predictions. Nucleic Acids Res. 33: W77–W80 (2005).PubMedCrossRefGoogle Scholar
  31. 31.
    L.-H. Hung, R. Samudrala. An automated assignment-free Bayesian approach for accurately identifying proton contacts from NOESY data. J. Biomol. NMR. 36: 189–198 (2006).Google Scholar
  32. 32.
    L.-H. Hung, R. Samudrala. Accurate and automated assignment of secondary structure with PsiCSI. Protein Sci. 12: 288–295 (2003).PubMedCrossRefGoogle Scholar
  33. 33.
    K. Wang, J. A. Horst, G. Cheng, D. C. Nickle, R. Samudrala. Protein Meta-Functional Signatures from Combining Sequence, Structure, Evolution, and Amino Acid Property Information. PLoS Computational Biology 4(9): e1000181 (2008).Google Scholar
  34. 34.
    G. Cheng, B. Qian, R. Samudrala, D. Baker. Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Res. 33: 5861–5867 (2005).PubMedCrossRefGoogle Scholar
  35. 35.
    K. Wang, R. Samudrala. FSSA: a novel method for identifying functional signatures from structural alignments. Bioinformatics. 21: 2969–2977 (2005).PubMedCrossRefGoogle Scholar
  36. 36.
    G. Cheng, R. Samudrala. An all-atom geometrical knowledge-based scoring function to predict protein metal ion binding sites, affinities and specificities. manuscript in preparation (2007).Google Scholar
  37. 37.
    E. Jenwitheesuk, K. Wang, J. Mittler, R. Samudrala. PIRSpred: a web server for reliable HIV-1 protein-inhibitor resistance/susceptibility prediction. Trends Microbiol. 13: 150–151 (2005).PubMedCrossRefGoogle Scholar
  38. 38.
    E. Jenwitheesuk, R. Samudrala. Prediction of HIV-1 protease inhibitor resistance using a protein-inhibitor flexible docking approach. Antiv. Ther. 10: 157–166 (2005).Google Scholar
  39. 39.
    R. Jenwitheesuk, K. Wang, J. Mittler, R. Samudrala. Improved accuracy of HIV-1 genotypic susceptibility interpretation using a consensus approach. AIDS. 18: 1858–1859 (2004).PubMedCrossRefGoogle Scholar
  40. 40.
    K. Wang, E. Jenwitheesuk, R. Samudrala, J. Mittler. Simple linear model provides highly accurate genotypic predictions of HIV-1 drug resistance. Antiv. Ther. 9: 343–352 (2004).Google Scholar
  41. 41.
    K. Wang, R. Samudrala. Automated functional classification of experimental and predicted protein structures. Bioinformatics. 7: 278–277 (2006).PubMedGoogle Scholar
  42. 42.
    A. Chang, J. McDermott, Z. Frazier, M. Guerquin, R. Samudrala. INTEGRATOR: interactive graphical search of large protein interactomes over the web. Bioinformatics. 7: 146–110 (2006).PubMedGoogle Scholar
  43. 43.
    XML-RPC Home Page. http://www.xmlrpc.com.
  44. 44.
    J. McDermott, M. Guerquin, Z. Frazier, R. Samudrala. BellaVista: a flexible visualization environment for complex biological information. manuscript in preparation (2007).Google Scholar
  45. 45.
  46. 46.
    E. Birney, D. Andrews, P. Bevan, et al. Ensembl 2004. Nucleic Acids Res. 32: D468–D470 (2004).PubMedCrossRefGoogle Scholar
  47. 47.
    A. Birkland, G. Yona. BIOZON: a hub of heterogeneous biological data. Nucl. Acids Res. 34: D235–D242 (2006).PubMedCrossRefGoogle Scholar
  48. 48.
    B. Breitkreutz, C. Stark, M. Tyers. The GRID: the general repository for interaction datasets. Genome Biol. 4: 744120 (2003).Google Scholar
  49. 49.
    M. Kanehisa, S. Goto, S. Kawashima, A. Nakaya. The KEGG databases at GenomeNet. Nucleic Acids Res. 30: 42–46 (2002).PubMedCrossRefGoogle Scholar
  50. 50.
    K. Fleming, A. Muller, R. MacCallum, M. Sternberg. 3D-GENOMICS: a database to compare structural and functional annotations of proteins between sequenced genomes. Nucleic Acids Res. 32: D245–D250 (2004).PubMedCrossRefGoogle Scholar
  51. 51.
    D. Frishman, M. Mokrejs, D. Kosykh, et al. The PEDANT genome database. Nucleic Acids Res. 31: 207–211 (2003).PubMedCrossRefGoogle Scholar
  52. 52.
    M. L. Riley, T. Schmidt, C. Wagner, H.-W. Mewes, D. Frishman. The PEDANT genome database in 2005. Nucl. Acids Res. 33: D308–D310 (2005).PubMedCrossRefGoogle Scholar
  53. 53.
    C. von Mering, M. Huynen, D. Jaeggi, et al. STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31: 258–261 (2003).CrossRefGoogle Scholar
  54. 54.
    J. Mellor, I. Yanai, K. Clodfelter, J. Mintseris, C. DeLisi. Predictome: a database of putative functional links between proteins. Nucleic Acids Res. 30: 306–309 (2002).PubMedCrossRefGoogle Scholar
  55. 55.
    P. Shannon, A. Markiel, O. Ozier, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13: 2498–2504 (2003).PubMedCrossRefGoogle Scholar
  56. 56.
    H. Yu, N. Luscombe, H. Lu, et al. Annotation transfer between genomes: protein-protein interologs and protein-DNA regulogs. Genome Res. 14: 1107–1118 (2004).PubMedCrossRefGoogle Scholar
  57. 57.
    Python Programming Language – Official Website. http://www.python.org.
  58. 58.
    PostgreSQL: The world’s most advanced open source database. http://www.postgresql.org.
  59. 59.
  60. 60.
    htmltmpl templating engine. http://htmltmpl.sourceforge.net.
  61. 61.
    trimpath – Google Code. http://code.google.com/p/trimpath.

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Michal Guerquin
    • 1
  • Jason McDermott
    • 2
  • Zach Frazier
    • 1
  • Ram Samudrala
    • 1
  1. 1.Department of MicrobiologyUniversity of WashingtonSeattleUSA
  2. 2.Computational Biology and BioinformaticsPacific Northwest National LaboratoryRichlandUSA

Personalised recommendations