Prediction of Protein–Protein Interactions: A Study of the Co-evolution Model

  • Itai Sharon
  • Jason V. Davis
  • Golan Yona
Part of the Methods in Molecular Biology book series (MIMB, volume 541)


The concept of molecular co-evolution drew attention in recent years as the basis for several algorithms for the prediction of protein–protein interactions. While being successful on specific data, the concept has never been tested on a large set of proteins. In this chapter we analyze the feasibility of the co-evolution principle for protein–protein interaction prediction through one of its derivatives, the correlated divergence model. Given two proteins, the model compares the patterns of divergence of their families and assigns a score based on the correlation between the two. The working hypothesis of the model postulates that the stronger the correlation the more likely is that the two proteins interact. Several novel variants of this model are considered, including algorithms that attempt to identify the subset of the database proteins (the homologs of the query proteins) that are more likely to interact. We test the models over a large set of protein interactions extracted from several sources, including BIND, DIP, and HPRD.

Key words

Protein–protein interactions co-evolution mirror-tree 



This work is supported by the National Science Foundation under Grant No. 0133311 to Golan Yona, and by the National Science Foundation under Grant No. 0218521, as part of the NSF/NIH Collaborative Research in Computational Neuroscience Program.


  1. 1.
    Schwikowski, B., Uetz, P. and Fields, S. A network of protein-protein interactions in yeast. Nat Biotechnol. 2000, 18:1257–61.PubMedCrossRefGoogle Scholar
  2. 2.
    Ho, Y., Gruhler, A., Heilbut, A., et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2003, 415:180–83.CrossRefGoogle Scholar
  3. 3.
    Ihmels, J., Levy, R. and Barkai, N. Principles of ranscriptional control in the metabolic network of Saccharomyces cerevisiae. Nat Biotechnol. 2004, 22:86–92.PubMedCrossRefGoogle Scholar
  4. 4.
    Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D. and Alon, U. Network motifs: simple building blocks of complex networks. Science 2002, 298:824–27.PubMedCrossRefGoogle Scholar
  5. 5.
    Fields, S. and Song, O. A novel genetic system to detect protein-protein interactions. Nature 1989, 340:245–46.PubMedCrossRefGoogle Scholar
  6. 6.
    Uetz, P., Giot, L., Cagney, G., et al. A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature 2000, 403:623–27.PubMedCrossRefGoogle Scholar
  7. 7.
    Sobolev, V., Sorokine, A., Prilusky, J., Abola, E. E. and Edelman, M. Automated analysis of interatomic contacts in proteins. Bioinformatics 1999, 4:327–32.CrossRefGoogle Scholar
  8. 8.
    Gallet, X., Charloteaux, B., Thomas, A. and Brasseur, R. A fast method to predict protein interaction sites from sequences. J. Mol. Biol. 2000, 302:917–26.PubMedCrossRefGoogle Scholar
  9. 9.
    Espadaler, J., Romero-Isart, O., Jackson, R. M. and Oliva, B. Prediction of protein–protein interactions using distant conservation of sequence patterns and structure relationships. Bioinformatics 2005, 21(16):3360–68.PubMedCrossRefGoogle Scholar
  10. 10.
    Teodoro, M., Phillips, G. and Kavraki, L. Molecular docking: A problem with thousands of degrees of freedom. IEEE International Conference on Robotics and Automation (ICRA 2001), 2001 May, Seoul, Korea, pp. 960–966.Google Scholar
  11. 11.
    Lu, L., Lu, H. and Skolnick, J. MULTIPROSPECTOR: An Algorithm for the prediction of protein-protein interactions by multimeric threading. Proteins 2002, 49:350–64.PubMedCrossRefGoogle Scholar
  12. 12.
    Kini, R. M. and Evans, H. J. A hypothetical structural role for proline residues in the flanking segments of protein-protein interaction sites. Biochem. Biophys. Res. Commun. 1995, 212:1115–24.Google Scholar
  13. 13.
    Aytuna, A. S., Gursoy, A. and Keskin, O. Prediction of protein–protein interactions by combining structure and sequence conservation in protein interfaces. Bioinformatics 2005, 21(12):2850–55.PubMedCrossRefGoogle Scholar
  14. 14.
    Clackson, T. and Wells, J. A. A hot spot of binding energy in a hormonereceptor interface. Science 1995, 267:383–86.PubMedCrossRefGoogle Scholar
  15. 15.
    Thorn, K. S. and Bogan, A. A. ASEdb: A database of Alanine mutations and their effect on the free energy of binding in protein interactions. Bioinformatics 2001, 1:284–85.CrossRefGoogle Scholar
  16. 16.
    Sprinzak, E. and Margalit, H. Correlated sequence-signatures as markers of protein-protein interaction. J. Mol. Biol. 2001, 311:681–92.PubMedCrossRefGoogle Scholar
  17. 17.
    Aloy, P. and Russell, R. InterPreTS: Protein interaction prediction through tertiary structure. Bioinformatics 2003, 19:161–62.PubMedCrossRefGoogle Scholar
  18. 18.
    Deng, M., Mehta, S., Sun, F. and Chen, T. Inferring domain-domain interactions from protein-protein interactions. Genome Res. 2002, 12:1540–48.PubMedCrossRefGoogle Scholar
  19. 19.
    Liu, Y., Liu, N. and Zhao, H. Inferring protein–protein interactions through high-throughput interaction data from diverse organisms. Bioinformatics 2005, 21(15): 3279–85.PubMedCrossRefGoogle Scholar
  20. 20.
    Chen, X. W. and Liu, M. Prediction of protein–protein interactions using random decision forest framework. Bioinformatics 2005, 21(24):4394–400.PubMedCrossRefGoogle Scholar
  21. 21.
    Breiman, L. Random forests. Mach. Learn. 2001, 45:5–32.CrossRefGoogle Scholar
  22. 22.
    Han, D., Kim, H., Jang, W., Lee, S. and Suh, J. PreSPI: A domain combination based prediction system for protein–protein interaction. Nucl. Acids Res. 2004, 32(21):6312–20.PubMedCrossRefGoogle Scholar
  23. 23.
    Marcotte, E. M., Pellegrini, M., Ng, H. L., Rice, D. W., Yeates, T. O. and Eisenberg, D. Detecting protein function and protein-protein interactions from genome sequences. Science 1999, 285:751–53.PubMedCrossRefGoogle Scholar
  24. 24.
    Enright, A. J., Iliopoulos, I., Kyrpides, N. C. and Ouzounis, C. A. Protein interaction maps for complete genomes based on gene fusion events. Nature 1999, 402:86–90.PubMedCrossRefGoogle Scholar
  25. 25.
    Park, D., Lee, S., Bolser, D., Schroeder, M., Lappe, M., Oh, D. and Bhak, J. Comparative interactomics analysis of protein family interaction networks using PSIMAP (protein structural interactome map). Bioinformatics 2005, 21(15):3234–40.PubMedCrossRefGoogle Scholar
  26. 26.
    Huang, T., Tien, A., Huang, W., Lee, Y. G., Peng, C., Tseng, H., Kao, C. and Huang, C. F. POINT: A database for the prediction of protein–protein interactions based on the orthologous interactome. Bioinformatics 2004, 20(17):3273–76.PubMedCrossRefGoogle Scholar
  27. 27.
    Sun, J., Xu, J., Liu, Z., Liu, Q., Zhao, A., Shi, T. and Li, Y. Refined phylogenetic profiles method for predicting protein–protein interactions. Bioinformatics 2005, 21(16):3409–15.PubMedCrossRefGoogle Scholar
  28. 28.
    Dandekar, T., Snel, B., Huynen, M. and Bork, P. Conservation of gene order: A fingerprint of proteins that physically interact. Trends Biochem. Sci. 1998, 23:324–28.PubMedCrossRefGoogle Scholar
  29. 29.
    Goh, C., Bogan, A., Joachimiak, M., Walther, D. and Cohen, F. Co-evolution of proteins with their interaction partners. J. Mol. Biol. 2000, 299:283–93.PubMedCrossRefGoogle Scholar
  30. 30.
    Pazos, F. and Valencia, A. Similarity of phylogenetic trees as indicator of protein-protein interaction. Protein Eng. 2001, 14:609–14.PubMedCrossRefGoogle Scholar
  31. 31.
    Tan, S., Zhang, Z. and Ng, S. ADVICE: Automated detection and validation of interaction by co-evolution. Nucl. Acids Res. 2004, 32:W69–W72.PubMedCrossRefGoogle Scholar
  32. 32.
    Izarzugaza, J. M. G., Juan, D., Pons, C., Ranea, J. A. G., Valencia, A. and Pazos, F. TSEMA: Interactive prediction of protein pairings between interacting families. Nucl. Acids Res. 2006, 34:W315–W319.PubMedCrossRefGoogle Scholar
  33. 33.
    Pazos, F., Helmer-Citterich, M., Ausiello, G. and Valencia, A. Correlated mutations contain information about protein-protein interaction. J. Mol. Biol. 1997, 271:511–23.PubMedCrossRefGoogle Scholar
  34. 34.
    Valencia, A. and Pazos, F. Computational methods for the prediction of protein interactions. Curr. Opin. Struct. Biol. 2002, 12:368–73.PubMedCrossRefGoogle Scholar
  35. 35.
    Pazos, F. and Valencia, A. In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 2002, 47:219–27.PubMedCrossRefGoogle Scholar
  36. 36.
    Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N. J., Chung, S., Emili, A., Snyder, M., Greenblatt, J. F. and Gerstein, M. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 2003, 302(17):449–53.PubMedCrossRefGoogle Scholar
  37. 37.
    Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., Davis, A. P., Dolinski, K., Dwight, S. S., Eppig, J. T., Harris, M. A., Hill, D. P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J. C., Richardson, J. E., Ringwald, M., Rubin, G. M. and Sherlock G. Gene ontology: Tool for the unification of biology. The gene ontology consortium. Nat. Genet. 2000, 25(1):25–29.PubMedCrossRefGoogle Scholar
  38. 38.
    Mewes, H. W., Frishman, D., Guldener, U., Mannhaupt, G., Mayer, K., Mokrejs, M., Morgenstern, B., Munsterkotter, M., Rudd, S. and Weil B. MIPS: A database for genomes and protein sequences. Nucl. Acids Res. 2002, 30(1):31–34.PubMedCrossRefGoogle Scholar
  39. 39.
    Ben-Hur, A. and Noble, W. S. Kernel methods for predicting protein–protein interactions. Bioinformatics 2005, 21(Suppl. 1):i38–i46.PubMedCrossRefGoogle Scholar
  40. 40.
    Gobel, U., Sander, C., Schneider, R. and Valencia, A. Correlated mutations and residue contacts in proteins. Proteins 1994, 18:309–17.PubMedCrossRefGoogle Scholar
  41. 41.
    Birkland, A. and Yona, G. The BIOZON database: A hub of heterogeneous biological data. Nucl. Acids Res. 2006, 34:D235–D242.PubMedCrossRefGoogle Scholar
  42. 42.
    Bader, G. D., Donaldson, I., Wolting, C., Ouellette, B. F., Pawson, T. and Hogue, C. W. BIND – The biomolecular interaction network database. Nucl. Acids Res. 2001, 29:242–45.PubMedCrossRefGoogle Scholar
  43. 43.
    Xenarios, I., Fernandez, E., Salwinski, L., Duan, X. J., Thompson, M. J., Marcotte, E. M. and Eisenberg, D. DIP: The database of interacting proteins: 2001 update. Nucl. Acids Res. 2001, 29:239–241.PubMedCrossRefGoogle Scholar
  44. 44.
    Katoh, K., Misawa, K., Kuma, K. and Miyata, T. MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucl. Acids Res. 2002, 30(14):3059–66.PubMedCrossRefGoogle Scholar
  45. 45.
    Katoh, K., Kuma, K., Toh, H. and Miyata, T. MAFFT version 5: Improvement in accuracy of multiple sequence alignment. Nucl. Acids Res. 2005, 33(2):511–18.PubMedCrossRefGoogle Scholar
  46. 46.
    Higgins, D. G., Thompson, J. D. and Gibson, T. J. Using CLUSTAL for multiple sequence alignments. Methods Enzymol. 1996, 266:383–402.PubMedCrossRefGoogle Scholar
  47. 47.
    Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D. J. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucl. Acids Res. 1997, 25:3389–402.PubMedCrossRefGoogle Scholar
  48. 48.
    Do, C. B., Mahabhashyam, M. S. P., Brudno, M., and Batzoglou, S. PROBCONS: Probabilistic consistency-based multiple sequence alignment. Genome Res. 2005, 15:330–40.PubMedCrossRefGoogle Scholar
  49. 49.
    Ramani, A. K. and Marcotte, E. M. Exploiting the co-evolution of interacting proteins to discover interaction specificity. J. Mol. Biol. 2003, 327:273–84.PubMedCrossRefGoogle Scholar
  50. 50.
    Gertz, J., Elfond, G., Shustrova, A., Weisinger, M., Pellegrini, M., Cokus, S. and Rothschild, B. Inferring protein interactions from phylogenetic distance matrices. Bioinformatics 2003, 19(16):2039–45.PubMedCrossRefGoogle Scholar
  51. 51.
    Henikoff, S. and Henikoff, J. G. Position-based sequence weights. J. Mol. Biol. 1994, 243:574–78.PubMedCrossRefGoogle Scholar
  52. 52.
    Popescu, L. and Yona, G. Automation of gene assignments to metabolic pathways using high-throughput expression data. BMC Bioinformatics 2005, 6:217.PubMedCrossRefGoogle Scholar
  53. 53.
    Miklos, G. and Rubin, G. The role of the genome project in determining gene function: Insights from model organisms. Cell 1996, 86:521–29.PubMedCrossRefGoogle Scholar
  54. 54.
    Yona, G., Dirks, W., Rahman, R. and Lin, M. Effective similarity measures for expression profiles. Bioinformatics 2006, 22:1616–22.PubMedCrossRefGoogle Scholar
  55. 55.
    Jothi, R., Kann, M. G. and Przytycka, T. M. Predicting protein–protein interaction by searching evolutionary tree automorphism space. Bioinformatics 2005, 21(Suppl. 1):i241–i250.PubMedCrossRefGoogle Scholar
  56. 56.
    Carillo, H. and Lipman, D. The multiple sequence alignment problem in biology. SIAM J. Appl. Math. 1988, 48(5):1073–82.CrossRefGoogle Scholar
  57. 57.
    Lin, J. Divergence measures based on the Shannon entropy. IEEE Trans. Info. Theory 1991, 37(1):145–51.CrossRefGoogle Scholar
  58. 58.
    Hirsh, A. E. and Fraser, H. B. Protein dispensability and rate of evolution. Nature 2001, 411(6841):1046–49.PubMedCrossRefGoogle Scholar
  59. 59.
    Jordan, I. K., Rogozin, I. B., Wolf, Y. I. and Koonin, E. V. Essential genes are more evolutionarily conserved than are nonessential genes in bacteria. Genome Res. 2002, 12(6):962–68.PubMedGoogle Scholar
  60. 60.
    Remm, M., Storm, C. E. V. and Sonnhammer, E. L. L. Automatic clustering of orthologs and in-paralogs from pairwise species. J. Mol. Biol. 2001, 314:1041–52.PubMedCrossRefGoogle Scholar
  61. 61.
    O'Brien, K. P., Remm, M. and Sonnhammer, E. L. L. Inparanoid: A comprehensive database of eukaryotic orthologs. Nucl. Acids Res. 2005, 33:D476–D480.Google Scholar
  62. 62.
    Sato, T., Yamanishi, Y., Kanehisa, M. and Toh, H. The inference of protein– protein interactions by co-evolutionary analysis is improved by excluding the information about the phylogenetic relationships. Bioinformatics 2005, 21(17):3482–89.PubMedCrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Itai Sharon
    • 1
  • Jason V. Davis
    • 2
  • Golan Yona
    • 1
    • 3
  1. 1.Department of Computer ScienceTechnion - Israel Institute of TechnologyHaifaIsrael
  2. 2.Department of Computer ScienceUniversity of Texas at AustinAustinUSA
  3. 3.Department of Biological Statistics and Computational BiologyCornell UniversityIthacaUSA

Personalised recommendations