Skip to main content

Handling Diverse Protein Interaction Data: Integration, Storage and Retrieval

  • Chapter
  • 1318 Accesses

Part of the book series: Computational Biology ((COBO,volume 9))

Abstract

In this chapter we review current approaches to store, retrieve and integrate diverse protein interaction data. To incorporate the heterogeneous results of computational predictions and protein interaction experiments, methods of data integration have been widely used which provide efficient presentation, and analysis of interaction data. Among them statistical meta-analysis and supervised machine learning methods are becoming very popular in this respect. While integration methods reduce complexity of system representation, the databases provide efficient storage and retrieval of data. A large variety of interaction databases exist which differ in scope, type and coverage of data as well as query search capabilities. We categorize the databases of protein interactions into comprehensive, specialized, structural and databases developed for network analysis. This gives a rough grouping of resources based on how they might be used. In particular, one might often start with a comprehensive database search and afterwards perform a refined search of the obtained results using a database with a more specific focus.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Birkland A, Yona G: BIOZON: a system for unification, management and analysis of heterogeneous biological data. BMC Bioinformatics 2006, 7:70.

    Article  Google Scholar 

  2. Joyce AR, Palsson BO: The model organism as a system: integrating ‘omics’ data sets. Nat Rev Mol Cell Biol 2006, 7(3):198–210.

    Article  Google Scholar 

  3. Lacroix Z, Raschid L, Eckman BA: Techniques for optimization of queries on integrated biological resources. J Bioinform Comput Biol 2004, 2(2):375–411.

    Article  Google Scholar 

  4. Hedges LV, Olkin I: Statistical methods for meta-analysis: Academic Press; 1985.

    Google Scholar 

  5. Hunter JE, Schmidt FL: ‘Methods of Meta-Analysis : Correcting Error and Bias in Research’ Sage Publications; 1990.

    Google Scholar 

  6. Deane CM, Salwinski L, Xenarios I, Eisenberg D: Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol Cell Proteomics 2002, 1(5):349–356.

    Article  Google Scholar 

  7. Hwang D, Rust AG, Ramsey S, Smith JJ, Leslie DM, Weston AD, de Atauri P, Aitchison JD, Hood L, Siegel AF et al.: A data integration methodology for systems biology. Proc Natl Acad Sci U S A 2005, 102(48):17296–17301.

    Article  Google Scholar 

  8. Hwang D, Smith JJ, Leslie DM, Weston AD, Rust AG, Ramsey S, de Atauri P, Siegel AF, Bolouri H, Aitchison JD et al.: A data integration methodology for systems biology: experimental verification. Proc Natl Acad Sci U S A 2005, 102(48):17302–17307.

    Article  Google Scholar 

  9. Gilchrist MA, Salter LA, Wagner A: A statistical framework for combining and interpreting proteomic datasets. Bioinformatics 2004, 20(5):689–700.

    Article  Google Scholar 

  10. Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M: A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302(5644):449–453.

    Article  Google Scholar 

  11. Qi Y, Klein-Seetharaman J, Bar-Joseph Z: Random forest similarity for protein-protein interaction prediction from multiple sources. Pac Symp Biocomput 2005:531–542.

    Google Scholar 

  12. Chen XW, Liu M: Prediction of protein-protein interactions using random decision forest framework. Bioinformatics 2005, 21(24):4394–4400.

    Article  Google Scholar 

  13. Qi Y, Bar-Joseph Z, Klein-Seetharaman J: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins 2006, 63(3):490–500.

    Article  Google Scholar 

  14. Bader JS, Chaudhuri A, Rothberg JM, Chant J: Gaining confidence in high-throughput protein interaction networks. Nat Biotechnol 2004, 22(1):78–85.

    Article  Google Scholar 

  15. Lee I, Date SV, Adai AT, Marcotte EM: A probabilistic functional network of yeast genes. Science 2004, 306(5701):1555–1558.

    Article  Google Scholar 

  16. Yamanishi Y, Vert JP, Kanehisa M: Protein network inference from multiple genomic data: a supervised approach. Bioinformatics 2004, 20 Suppl 1:I363–I370.

    Article  Google Scholar 

  17. Bradford JR, Westhead DR: Improved prediction of protein-protein binding sites using a support vector machines approach. Bioinformatics 2005, 21(8):1487–1494.

    Article  Google Scholar 

  18. Huttenhower C, Troyanskaya OG: Bayesian data integration: a functional perspective. Comput Syst Bioinformatics Conf 2006:341–351.

    Google Scholar 

  19. Zhang LV, Wong SL, King OD, Roth FP: Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 2004, 5:38.

    Article  Google Scholar 

  20. Lin N, Wu B, Jansen R, Gerstein M, Zhao H: Information assessment on predicting protein-protein interactions. BMC Bioinformatics 2004, 5:154.

    Article  Google Scholar 

  21. Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM: Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 2005, 23(8):951–959.

    Article  Google Scholar 

  22. Lu LJ, Xia Y, Paccanaro A, Yu H, Gerstein M: Assessing the limits of genomic data integration for predicting protein networks. Genome Res 2005, 15(7):945–953.

    Article  Google Scholar 

  23. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D: The database of interacting proteins: 2004 update. Nucleic Acids Res 2004, 32(Database issue):D449–451.

    Article  Google Scholar 

  24. Duan XJ, Xenarios I, Eisenberg D: Describing biological protein interactions in terms of protein states and state transitions: the LiveDIP database. Mol Cell Proteomics 2002, 1(2): 104–116.

    Article  Google Scholar 

  25. Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D: Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol 2004, 5(5):R35.

    Article  Google Scholar 

  26. Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K, Betel D, Bobechko B, Boutilier K, Burgess E et al.: The Biomolecular Interaction Network Database and related tools 2005 update. Nucleic Acids Res 2005, 33(Database issue):D418–424.

    Article  Google Scholar 

  27. von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, Jouffre N, Huynen MA, Bork P: STRING: known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res 2005, 33(Database issue):D433–437.

    Article  Google Scholar 

  28. Zanzoni A, Montecchi-Palazzi L, Quondam M, Ausiello G, Helmer-Citterich M, Cesareni G: MINT: a Molecular INTeraction database. FEBS Lett 2002, 513(1):135–140.

    Article  Google Scholar 

  29. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S, Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A et al.: IntAct: an open source molecular interaction database. Nucleic Acids Res 2004, 32(Database issue):D452–455.

    Article  Google Scholar 

  30. Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res 2006, 34(Database issue): D535–539.

    Article  Google Scholar 

  31. Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006, 34(Database issue):D354–357.

    Article  Google Scholar 

  32. Hermjakob H, Montecchi-Palazzi L, Bader G, Wojcik J, Salwinski L, Ceol A, Moore S, Orchard S, Sarkans U, von Mering C et al.: The HUPO PSI’s molecular interaction format—a community standard for the representation of protein interaction data. Nat Biotechnol 2004, 22(2):177–183.

    Article  Google Scholar 

  33. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13(11):2498–2504.

    Article  Google Scholar 

  34. Bader GD, Hogue CW: BIND–a data specification for storing and describing biomolecular interactions, molecular complexes and pathways. Bioinformatics 2000, 16(5):465–477.

    Article  Google Scholar 

  35. Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM et al.: Human protein reference database–2006 update. Nucleic Acids Res 2006, 34(Database issue):D411–414.

    Article  Google Scholar 

  36. Li J, Ning Y, Hedley W, Saunders B, Chen Y, Tindill N, Hannay T, Subramaniam S: The Molecule Pages database. Nature 2002, 420(6916):716–717.

    Article  Google Scholar 

  37. Guldener U, Munsterkotter M, Oesterheld M, Pagel P, Ruepp A, Mewes HW, Stumpflen V: MPact: the MIPS protein interaction resource on yeast. Nucleic Acids Res 2006, 34(Database issue):D436–441.

    Article  Google Scholar 

  38. Pacifico S, Liu G, Guest S, Parrish JR, Fotouhi F, Finley RL, Jr.: A database and tool, IM Browser, for exploring and integrating emerging gene and protein interaction data for Drosophila. BMC Bioinformatics 2006, 7:195.

    Article  Google Scholar 

  39. Hoebeke M, Chiapello H, Noirot P, Bessieres P: SPiD: a subtilis protein interaction database. Bioinformatics 2001, 17(12):1209–1212.

    Article  Google Scholar 

  40. Koike A, Kobayashi Y, Takagi T: Kinase pathway database: an integrated protein-kinase and NLP-based protein-interaction resource. Genome Res 2003, 13(6A):1231–1243.

    Article  Google Scholar 

  41. Marino-Ramirez L, Minor JL, Reading N, Hu JC: Identification and mapping of self-assembling protein domains encoded by the Escherichia coli K-12 genome by use of lambda repressor fusions. J Bacteriol 2004, 186(5):1311–1319.

    Article  Google Scholar 

  42. Ng A, Bursteinas B, Gao Q, Mollison E, Zvelebil M: pSTIING: a ‘systems’ approach towards integrating signalling pathways, interaction and transcriptional regulatory networks in inflammation and cancer. Nucleic Acids Res 2006, 34(Database issue):D527–534.

    Article  Google Scholar 

  43. Mathew JP, Taylor BS, Bader GD, Pyarajan S, Antoniotti M, Chinnaiyan AM, Sander C, Burakoff SJ, Mishra B: From bytes to bedside: data integration and computational biology for translational cancer research. PLoS Comput Biol 2007, 3(2):e12.

    Article  Google Scholar 

  44. Obenauer JC, Cantley LC, Yaffe MB: Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 2003, 31(13):3635–3641.

    Article  Google Scholar 

  45. Kikuno R, Nagase T, Nakayama M, Koga H, Okazaki N, Nakajima D, Ohara O: HUGE: a database for human KIAA proteins, a 2004 update integrating HUGEppi and ROUGE. Nucleic Acids Res 2004, 32(Database issue):D502–504.

    Article  Google Scholar 

  46. Thorn KS, Bogan AA: ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 2001, 17(3):284–285.

    Article  Google Scholar 

  47. Fischer TB, Arunachalam KV, Bailey D, Mangual V, Bakhru S, Russo R, Huang D, Paczkowski M, Lalchandani V, Ramachandra C et al.: The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 2003, 19(11): 1453–1454.

    Article  Google Scholar 

  48. Keskin O, Ma B, Nussinov R: Hot regions in protein–protein interactions: the organization and contribution of structurally conserved hot spot residues. J Mol Biol 2005, 345(5): 1281-1294.

    Article  Google Scholar 

  49. Teyra J, Doms A, Schroeder M, Pisabarro MT: SCOWLP: a web-based database for detailed characterization and visualization of protein interfaces. BMC Bioinformatics 2006, 7:104.

    Article  Google Scholar 

  50. Kumar MD, Gromiha MM: PINT: Protein-protein Interactions Thermodynamic Database. Nucleic Acids Res 2006, 34(Database issue):D195–198.

    Article  Google Scholar 

  51. Ng SK, Zhang Z, Tan SH, Lin K: InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes. Nucleic Acids Res 2003, 31(1):251–254.

    Article  Google Scholar 

  52. Pagel P, Oesterheld M, Stumpflen V, Frishman D: The DIMA web resource–exploring the protein domain network. Bioinformatics 2006, 22(8):997–998.

    Article  Google Scholar 

  53. Raghavachari B, Tasneem A, Przytycka TM, Jothi R: DOMINE: a database of protein domain interactions. Nucleic Acids Res 2007, 36(Database issue):D656–.

    Article  Google Scholar 

  54. Kundrotas PJ, Alexov E: PROTCOM: searchable database of protein complexes enhanced with domain-domain structures. Nucleic Acids Res 2007, 35(Database issue):D575–579.

    Article  Google Scholar 

  55. Stein A, Russell RB, Aloy P: 3did: interacting protein domains of known three-dimensional structure. Nucleic Acids Res 2005, 33(Database issue):D413–417.

    Article  Google Scholar 

  56. Aloy P, Russell RB: InterPreTS: protein interaction prediction through tertiary structure. Bioinformatics 2003, 19(1):161–162.

    Article  Google Scholar 

  57. Henrick K, Thornton JM: PQS: a protein quaternary structure file server. Trends Biochem Sci 1998, 23(9):358–361.

    Article  Google Scholar 

  58. Davis FP, Sali A: PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics 2005, 21(9):1901–1907.

    Article  Google Scholar 

  59. Pieper U, Eswar N, Braberg H, Madhusudhan MS, Davis FP, Stuart AC, Mirkovic N, Rossi A, Marti-Renom MA, Fiser A et al.: MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 2004, 32(Database issue):D217–222.

    Article  Google Scholar 

  60. Shoemaker BA, Panchenko AR, Bryant SH: Finding biologically relevant protein domain interactions: conserved binding mode analysis. Protein Sci 2006, 15(2):352–361.

    Article  Google Scholar 

  61. Winter C, Henschel A, Kim WK, Schroeder M: SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res 2006, 34(Database issue):D310–314.

    Article  Google Scholar 

  62. Finn RD, Marshall M, Bateman A: iPfam: visualization of protein-protein interactions in PDB at domain and amino acid resolutions. Bioinformatics 2005, 21(3):410–412.

    Article  Google Scholar 

  63. Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, Rossi A, Marti-Renom M, Karchin R, Webb BM, Eramian D et al.: MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 2006, 34(Database issue):D291–295.

    Article  Google Scholar 

  64. Formstecher E, Aresta S, Collura V, Hamburger A, Meil A, Trehin A, Reverdy C, Betin V, Maire S, Brun C et al.: Protein interaction mapping: a Drosophila case study. Genome Res 2005, 15(3):376–384.

    Article  Google Scholar 

  65. Kemmer D, Huang Y, Shah SP, Lim J, Brumm J, Yuen MM, Ling J, Xu T, Wasserman WW, Ouellette BF: Ulysses - an application for the projection of molecular interactions across species. Genome Biol 2005, 6(12):R106.

    Article  Google Scholar 

  66. Kelley BP, Yuan B, Lewitter F, Sharan R, Stockwell BR, Ideker T: PathBLAST: a tool for alignment of protein interaction networks. Nucleic Acids Res 2004, 32(Web Server issue):W83–88.

    Article  Google Scholar 

  67. Prieto C, De Las Rivas J: APID: Agile Protein Interaction DataAnalyzer. Nucleic Acids Res 2006, 34(Web Server issue):W298–302.

    Article  Google Scholar 

  68. Yip KY, Yu H, Kim PM, Schultz M, Gerstein M: The tYNA platform for comparative interactomics: a web tool for managing, comparing and mining multiple networks. Bioinformatics 2006, 22(23):2968–2970.

    Article  Google Scholar 

  69. Aragues R, Jaeggi D, Oliva B: PIANA: protein interactions and network analysis. Bioinformatics 2006, 22(8):1015–1017.

    Article  Google Scholar 

  70. Hu Z, Ng DM, Yamada T, Chen C, Kawashima S, Mellor J, Linghu B, Kanehisa M, Stuart JM, DeLisi C: VisANT 3.0: new modules for pathway visualization, editing, prediction and construction. Nucleic Acids Res 2007, 35(Web Server issue):W625–632.

    Article  Google Scholar 

  71. Kelley BP, Sharan R, Karp RM, Sittler T, Root DE, Stockwell BR, Ideker T: Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc Natl Acad Sci U S A 2003, 100(20):11394–11399.

    Article  Google Scholar 

  72. Sharan R, Suthram S, Kelley RM, Kuhn T, McCuine S, Uetz P, Sittler T, Karp RM, Ideker T: Conserved patterns of protein interaction in multiple species. Proc Natl Acad Sci U S A 2005, 102(6):1974–1979.

    Article  Google Scholar 

  73. Yin Y, Tainsky MA, Bischoff FZ, Strong LC, Wahl GM: Wild-type p53 restores cell cycle control and inhibits gene amplification in cells with mutant p53 alleles. Cell 1992, 70(6): 937–948.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benjamin Shoemaker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag London Limited

About this chapter

Cite this chapter

Shoemaker, B., Panchenko, A. (2008). Handling Diverse Protein Interaction Data: Integration, Storage and Retrieval. In: Panchenko, A., Przytycka, T. (eds) Protein-protein Interactions and Networks. Computational Biology, vol 9. Springer, London. https://doi.org/10.1007/978-1-84800-125-1_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-84800-125-1_2

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-84800-124-4

  • Online ISBN: 978-1-84800-125-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics