Computational Prediction of Protein-Protein Interactions

  • Anton J. Enright
  • Lucy Skrabanek
  • Gary D. Bader
Part of the Springer Protocols Handbooks book series (SPH)


One of the current goals of proteomics is to map the protein interaction networks of a large number of model organisms (1). Protein-protein interaction information allows the function of a protein to be defined by its position in a complex web of interacting proteins. Access to such information will greatly aid biological research and poten- tially make the discovery of novel drug targets much easier. Previously, the detection of protein-protein interactions was limited to labor-intensive experimental techniques such as co-immunoprecipitation or affinity chromatography. High-throughput experi- mental techniques such as yeast two-hybrid and mass spectrometry have now also become available for large-scale detection of protein interactions. These methods, how- ever, may not be generally applicable to all proteins in all organisms, and may also be prone to systematic error. Recently, a number of complementary computational approaches have been developed for the large-scale prediction of protein-protein inter- actions based on protein sequence, structure, and evolutionary relationships in com- plete genomes.


Phylogenetic Profile Multiple Genome Layout Algorithm Orthology Assignment Protein Interaction Site 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Mendelsohn, A. R. and Brent, R. (1999) Protein interaction methods-toward an endgame. Science 284, 1948–1950.PubMedCrossRefGoogle Scholar
  2. 2.
    Eisenberg, D., Marcotte, E. M., Xenarios, I., and Yeates, T. O. (2000) Protein function in the post-genomic era. Nature 405, 823–826.PubMedCrossRefGoogle Scholar
  3. 3.
    Huynen, M., Snel, B., Lathe, W., and Bork, P. (2000) Exploitation of gene context. Curr. Opin. Struct. Biol. 10, 366–370.PubMedCrossRefGoogle Scholar
  4. 4.
    Grigoriev, A. (2001) A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae. Nucleic Acids Res. 29, 3513–3519.CrossRefGoogle Scholar
  5. 5.
    Ge, H., Liu, Z., Church, G. M., and Vidal, M. (2001) Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae. Nat. Genet. 29, 482–486.CrossRefGoogle Scholar
  6. 6.
    Jansen, R., Greenbaum, D., and Gerstein, M. (2002) Relating whole-genome expression data with protein-protein interactions. Genome Res. 12, 37–46.PubMedCrossRefGoogle Scholar
  7. 7.
    Marcotte, E. M., Pellegrini, M., Thompson, M. J., Yeates, T. O., and Eisenberg, D. (1999) A combined algorithm for genome-wide prediction of protein function. Nature 402, 83–86.PubMedCrossRefGoogle Scholar
  8. 8.
    Jansen, R., Yu, H., Greenbaum, D., et al. (2003) A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453.PubMedCrossRefGoogle Scholar
  9. 9.
    Sussman, J. L., Lin, D., Jiang, J., et al. (1998) Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules. Acta Crystallogr. D. iol. Crystallogr. 54, 1078–1084.CrossRefGoogle Scholar
  10. 10.
    Chothia, C., and Janin, J. (1975) Principles of protein-protein recognition. Nature 256, 705–708.PubMedCrossRefGoogle Scholar
  11. 11.
    Gallet, X., Charloteaux, B., Thomas, A., and Brasseur, R. (2000) A fast method to predict protein interaction sites from sequences. J. Mol. Biol. 302, 917–926.PubMedCrossRefGoogle Scholar
  12. 12.
    Korn, A. P. and Burnett, R. M. (1991) Distribution and complementarity of hydropathy in multisubunit proteins. Proteins: Struct. Funct. Genet. 9, 37–55.CrossRefGoogle Scholar
  13. 13.
    Young, L., Jernigan, R. L., and Covell, D. G. (1994) A role for surface hydrophobicity in protein-protein recognition. Protein Sci. 3, 717–729.PubMedCrossRefGoogle Scholar
  14. 14.
    Mueller, T. D. and Feigon, J. (2002) Solution structures of UBA domains reveal a con-served hydrophobic surface for protein-protein interactions. J. Mol. Biol. 319,1243–1255.PubMedCrossRefGoogle Scholar
  15. 15.
    Lijnzaad, P. and Argos, P. (1997) Hydrophobic patches on protein subunit interfaces: Characteristics and prediction. Proteins: Struct. Funct. Genet. 28, 333–343.CrossRefGoogle Scholar
  16. 16.
    Janin, J., Miller, S., and Chothia, C., (1988) Surface, subunit interfaces and interior of oligomeric proteins. J. Mol. Biol. 204, 155–164.PubMedCrossRefGoogle Scholar
  17. 17.
    Argos, P. (1988) An investigation of protein subunit and domain interfaces. Prot. Eng. 2, 101–113.CrossRefGoogle Scholar
  18. 18.
    Jones, S. and Thornton, J. M. (1996) Principles of protein-protein interactions. Proc. Natl. Acad. Sci. USA 93, 13–20.PubMedCrossRefGoogle Scholar
  19. 19.
    Ofran, Y. and Rost, B., (2003) Analysing six types of protein-protein interfaces. J. Mol. 325, 377–387.CrossRefGoogle Scholar
  20. 20.
    Jones, S. and Thornton, J. M. (1997) Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol. 272, 121–132.PubMedCrossRefGoogle Scholar
  21. 21.
    Jones, S. and Thornton, J. M. (1997) Prediction of protein-protein interaction sites using patch analysis. J. Mol. Biol. 272, 133–143.PubMedCrossRefGoogle Scholar
  22. 22.
    Lawrence, M. and Colman, P. M. (1993) Shape complementarity at protein-protein interfaces. J. Mol. Biol. 234, 946–950.PubMedCrossRefGoogle Scholar
  23. 23.
    Gabb, H. A., Jackson, R. M., and Sternberg, M. J. E. (1997) Modelling protein docking using shape complementarity, electrostatics and biochemical information. J. Mol. Biol. 272, 106–120.PubMedCrossRefGoogle Scholar
  24. 24.
    Shoichet, B. K. and Kuntz, I. D. (1991) Protein docking and complementarity. J. Mol. Biol. 221, 327–346.PubMedCrossRefGoogle Scholar
  25. 25.
    Aloy, P. and Russell, R. B. (2002) Potential artefacts in protein-interaction networks. FEBS Lett. 530, 253–254.PubMedCrossRefGoogle Scholar
  26. 26.
    Casari, G., Sander, C., and Valencia, A. (1995) A method to predict functional residues in proteins. Nat. Struct. Biol. 2, 171–178.PubMedCrossRefGoogle Scholar
  27. 27.
    Lichtarge, O., Bourne, H. R., and Cohen, F. E. (1996) An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358.PubMedCrossRefGoogle Scholar
  28. 28.
    Pazos, F., Helmer-Citterich, M., Ausiello, G., and Valencia, A. (1997) Correlated mutations contain information about protein-protein interaction. J. Mol. Biol. 271, 511–523.PubMedCrossRefGoogle Scholar
  29. 29.
    Zhou, H. X. and Shan, Y. B. (2001) Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins: Struct. Funct. Genet. 44, 336–343.CrossRefGoogle Scholar
  30. 30.
    Fariselli, P., Pazos, F., Valencia, A., and Casadio, R. (2002) Prediction of protein-protein interaction sites in heterocomplexes with neural networks. Eur. J. Biochem. 269, 1356–1361.PubMedCrossRefGoogle Scholar
  31. 31.
    Ofran, Y. and Rost, B. (2003) Predicted protein-protein interaction sites from local sequence information. FEBS Lett. 544, 236–239.PubMedCrossRefGoogle Scholar
  32. 32.
    Aloy, P. and Russell, R. B. (2002) Interrogating protein interaction networks through structural biology. Proc. Natl. Acad. Sci. USA 99, 5896–5901.PubMedCrossRefGoogle Scholar
  33. 33.
    Aloy, P. and Russell, R. B. (2003) InterPreTS: protein Interaction Prediction through Tertiary Structure. Bioinformatics 19, 161–162.PubMedCrossRefGoogle Scholar
  34. 34.
    Karp, P. D., Riley, M., Saier, M., Paulsen, I. T., Paley, S. M., and Pellegrini-Toole, A. (2000) The EcoCyc and MetaCyc databases. Nucleic Acids Res. 28, 56–59.PubMedCrossRefGoogle Scholar
  35. 35.
    Tamames, J., Casari, G., Ouzounis, C., and Valencia, A. (1997) Conserved clusters of functionally related genes in two bacterial genomes. J. Mol. Evol. 44, 66–73.PubMedCrossRefGoogle Scholar
  36. 36.
    Dandekar, T., Snel, B., Huynen, M., and Bork, P. (1998) Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem. Sci. 23, 324–328.PubMedCrossRefGoogle Scholar
  37. 37.
    Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G. D., and Maltsev, N. (1999) The use of gene clusters to infer functional coupling. Proc. Natl. Acad. Sci. USA 96, 2896–2901.PubMedCrossRefGoogle Scholar
  38. 38.
    Zorio, D. A., Cheng, N. N., Blumenthal, T., and Spieth, J. (1994) Operons as a common form of chromosomal organization in elegans. Nature 372, 270–272.CrossRefGoogle Scholar
  39. 39.
    Blumenthal, T. (1998) Gene clusters and polycistronic transcription in eukaryotes. Bioessays 20, 480–487.PubMedCrossRefGoogle Scholar
  40. 40.
    Snel, B., Bork, P., and Huynen, M. A. (2002) Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Res. 12, 17–25.PubMedCrossRefGoogle Scholar
  41. 41.
    Kunin, V., Cases, I., Enright, A. J., de Lorenzo, V., and Ouzounis, C.A. (2003) Myriads of protein families, and still counting. Genome Biol. 4,401.PubMedCrossRefGoogle Scholar
  42. 42.
    Ouzounis, C., (1999) Orthology: another terminology muddle. Trends Genet. 15, 445.PubMedCrossRefGoogle Scholar
  43. 43.
    Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D., and Yeates, T.O. (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl. Acad. Sci. USA 96, 4285–4288.PubMedCrossRefGoogle Scholar
  44. 44.
    Ouzounis, C. and Kyrpides, N. (1996) The emergence of major cellular processes in evolution. FEBS Lett. 390, 119–123.PubMedCrossRefGoogle Scholar
  45. 45.
    Rivera, M. C., Jain, R., Moore, J. E., and Lake, J. A. (1998) Genomic evidence for two functionally distinct gene classes. Proc. Natl. Acad. Sci. USA 95, 6239–6244.PubMedCrossRefGoogle Scholar
  46. 46.
    Marcotte, E. M., Xenarios, I., van Der Bliek, A. M., and Eisenberg, D. (2000) Localizing proteins in the cell from their phylogenetic profiles. Proc. Natl. Acad. Sci. USA 97, 12,115–12,120.PubMedCrossRefGoogle Scholar
  47. 47.
    Galperin, M. Y. and Koonin, E. V. (2000) Who’s your neighbor? New computational approaches for functional genomics. Nat. Biotechnol. 18, 609–613.PubMedCrossRefGoogle Scholar
  48. 48.
    Enright, A. J., Iliopoulos, I., Kyrpides, N. C., and Ouzounis, C.A. (1999) Protein interaction maps for complete genomes based on gene fusion events. Nature 402, 86–90.PubMedCrossRefGoogle Scholar
  49. 49.
    Marcotte, E. M., Pellegrini, M., Ng, H. L., Rice, D. W., Yeates, T. O., and Eisenberg, D. (1999) Detecting protein function and protein-protein interactions from genome sequences. Science 285, 751–753.PubMedCrossRefGoogle Scholar
  50. 50.
    Enright, A. J., Van Dongen, S., and Ouzounis, C. A. (2002) An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30, 1575–1584.PubMedCrossRefGoogle Scholar
  51. 51.
    Enright, A. J. and Ouzounis, C. A. (2001) Functional associations of proteins in entire genomes by means of exhaustive detection of gene fusions. Genome Biol. 2, RESEARCH0034.Google Scholar
  52. 52.
    Pazos, F. and Valencia, A. (2002) In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 47, 219–227.PubMedCrossRefGoogle Scholar
  53. 53.
    Gobel, U., Sander, C., Schneider, R., and Valencia, A. (1994) Correlated mutations and residue contacts in proteins. Proteins 18, 309–317.PubMedCrossRefGoogle Scholar
  54. 54.
    Snel, B., Bork, P., and Huynen, M. A. (2002) The identification of functional modules from the genomic association of genes. Proc. Natl. Acad. Sci. USA 99, 5890–5895.PubMedCrossRefGoogle Scholar
  55. 55.
    Snel, B., Lehmann, G., Bork, P., and Huynen, M. A. (2000) STRING: a web-server to retrieve and display the repeatedly occurring neighbourhood of a gene. Nucleic Acids Res. 28, 3442–3444.PubMedCrossRefGoogle Scholar
  56. 56.
    Overbeek, R., Larsen, N., Pusch, G. D., et al. (2000) WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res. 28, 123–125.PubMedCrossRefGoogle Scholar
  57. 57.
    von Mering, C., Huynen, M., Jaeggi, D., Schmidt, S., Bork, P., and Snel, B. (2003) STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 31, 258–261.CrossRefGoogle Scholar
  58. 58.
    Mellor, J. C., Yanai, I., Clodfelter, K. H., Mintseris, J., and DeLisi, C. (2002) Predictome: a database of putative functional links between proteins. Nucleic Acids Res. 30, 306–309.CrossRefGoogle Scholar
  59. 59.
    Tatusov, R. L., Koonin, E. V., and Lipman, D. J. (1997) A genomic perspective on protein families. Science 278, 631–637.PubMedCrossRefGoogle Scholar
  60. 60.
    Tatusov, R. L., Fedorova, N. D., Jackson, J. D., et al. (2003) The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41.PubMedCrossRefGoogle Scholar
  61. 61.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.PubMedCrossRefGoogle Scholar
  62. 62.
    Iliopoulos, I., Enright, A. J., Poullet, P., and Ouzounis, C., (2003) Mapping functional associa-tions in the entire genome of Drosophila melanogaster. Funct. Genomics 4, 337–341.Google Scholar
  63. 63.
    Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subse-quences. J. Mol. Biol. 147, 195–197.PubMedCrossRefGoogle Scholar
  64. 64.
    Enright, A. J. (2002) Computational Analysis of Protein Function in Complete Genomes. Ph.D. Thesis, University of Cambridge, p. 241.Google Scholar
  65. 65.
    Enright, A. J., Kunin, V., and Ouzounis, C. A. (2003) Protein families and TRIBES in genome sequence space. Nucleic Acids Res. 31, 4632–4638.PubMedCrossRefGoogle Scholar
  66. 66.
    Brazma, A., Hingamp, P., Quackenbush, J., et al. (2001) Minimum information about a microarray experiment (MIAME)-toward standards for microarray data. Nat. Genet. 29, 365–371.PubMedCrossRefGoogle Scholar
  67. 67.
    Gollub, J., Ball, C. A., Binkley, G., et al. (2003) The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res. 31, 94–96.PubMedCrossRefGoogle Scholar
  68. 68.
    Brazma, A., Parkinson, H., Sarkans, U., et al. (2003) ArrayExpress-a public repository for microarray gene expression data at the EBI. Nucleic Acids Res. 31, 68–71.PubMedCrossRefGoogle Scholar
  69. 69.
    Edgar, R., Domrachev, M., and Lash, A. E. (2002) Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210.PubMedCrossRefGoogle Scholar
  70. 70.
    Eisen, M., Spellman, P. T., Brown, P. O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95,14,863–14,868.PubMedCrossRefGoogle Scholar
  71. 71.
    Walsh, S., Anderson, M., and Cartinhour, S. W. (1998) ACEDB: a database for genome information. Methods Biochem. Anal. 39, 299–318.PubMedCrossRefGoogle Scholar
  72. 72.
    Kanehisa, M. and Goto, S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30.PubMedCrossRefGoogle Scholar
  73. 74.
    Enright, A. J. and Ouzounis, C. A. (2001) BioLayout-an automatic graph layout algorithm for similarity visualization. Bioinformatics 17, 853–854.PubMedCrossRefGoogle Scholar
  74. 75.
    Shannon, P., Markiel, A., Ozier, O., et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13(11), 2498–2504.PubMedCrossRefGoogle Scholar
  75. 76.
    Bader, G. D., Betel, D., and Hogue, W. (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 31, 248–250.PubMedCrossRefGoogle Scholar
  76. 77.
    Xenarios, I., Rice, D. W., Salwinski, L., Baron, M. K., Marcotte, E. M., and Eisenberg, D. (2000) DIP: the database of interacting proteins. Nucleic Acids Res. 28, 289–291.PubMedCrossRefGoogle Scholar
  77. 78.
    Orchard, S., Hermjakob, H., and Apweiler, R. (2003) The proteomics standards initiative. Proteomics 3, 1374–1376.PubMedCrossRefGoogle Scholar
  78. 79.
    Mewes, H. W., Frishman, D., Guldener, U., et al. (2002) MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 30, 31–34.PubMedCrossRefGoogle Scholar
  79. 80.
    Hermjakob, H., Montecchi-Palazzi, L., Lewington, C., et al. (2004) IntAct: an open source molecular interaction database. Nucleic Acids Res. 32, Database issue:D452-455.Google Scholar
  80. 81.
    Zanzoni, A., Montecchi-Palazzi, L., Quondam, M., Ausiello, G., Helmer-Citterich, M., and Cesareni, G. (2002) MINT: a Molecular INTeraction database. FEBS Lett. 513, 135–140.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press Inc., Totowa, NJ 2005

Authors and Affiliations

  • Anton J. Enright
    • 1
  • Lucy Skrabanek
    • 2
  • Gary D. Bader
    • 1
  1. 1.Computational Biology CenterMemorial Sloan-Kettering Cancer CenterNew York
  2. 2.Department of Physiology and Biophysics and Institute for Computational BiomedicineWeill Medical College of Cornell UniversityNew York

Personalised recommendations