Co-evolution and Information Signals in Biological Sequences

  • Alessandra Carbone
  • Linda Dib
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5532)


Information content of a pool of sequences has been defined in information theory through enthropic measures aimed to capture the amount of variability within sequences. When dealing with biological sequences coding for proteins, a first approach is to align these sequences to estimate the probability of each amino-acid to occur within alignment positions and to combine these values through an “entropy” function whose minimum corresponds to the case where for each position, each amino-acid has the same probability to occur. This model is too restrictive when the purpose is to evaluate sequence constraints that have to be conserved to maintain the function of the proteins under random mutations. In fact, co-evolution of amino-acids appearing in pairs or tuplets of positions in sequences constitutes a fine signal of important structural, functional and mechanical information for protein families. It is clear that classical information theory should be revisited when applied to biological data. A large number of approaches to co-evolution of biological sequences have been developed in the last seven years. We present a few of them, discuss their limitations and some related questions, like the generation of random structures to validate predictions based on co-evolution, which appear crucial for new advances in structural bioinformatics.


Multiple Sequence Alignment Biological Sequence Compensatory Mutation Distance Tree Alignment Position 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adami, C., Cerf, N.J.: Physical complexity of symbolic sequences. Physica D 137, 62–69 (2000)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)CrossRefGoogle Scholar
  3. 3.
    Armon, A., Graur, D., Ben-Tal, N.: ConSurf: An Algorithmic Tool for the Identification of Functional Regions in Proteins by Surface Mapping of Phylogenetic Information. J. Mol. Biol. 307, 447–463 (2001)CrossRefGoogle Scholar
  4. 4.
    Cheng, G., Qian, B., Samudrala, R., Baker, D.: Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Res. 33, 5861–5867 (2005)CrossRefGoogle Scholar
  5. 5.
    Baussand, J., Carbone, A.: A combinatorial approach to detect co-evolved amino-acid networks in protein families with variable divergence (submitted manuscript) (2009)Google Scholar
  6. 6.
    Bickel, P.J., Kechris, K.J., Spector, P.C., Wedemayer, G.J., Glazer, A.N.: Finding important sites in protein sequences. Proceedings of the National Academy of Sciences USA 99, 14764–14771 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Capra, J.A., Singh, M.: Predicting functionnally important residues from sequences conservation. Bioinformatics 23, 1875–1882 (2007)CrossRefGoogle Scholar
  8. 8.
    Carbone, A., Engelen, S.: Information content of sets of biological sequences revisited. In: Condon, A., Harel, D., Kok, J.N., Salomaa, A., Winfree, E. (eds.) Algorithmic Bioprocesses. Natural Computing Series. Springer, Heidelberg (2008)Google Scholar
  9. 9.
    Carothers, J.M., Oestreich, S.C., Davis, J.H., Szostak, J.W.: Informational complexity and functional activity of RNA structures. J. Am. Chem. Soc. 126, 5130–5137 (2004)CrossRefGoogle Scholar
  10. 10.
    Chang, M.S.S., Benner, S.A.: Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. J. Mol. Biol. 341, 617–631 (2004)CrossRefGoogle Scholar
  11. 11.
    Cheng, G., Qian, B., Samudrala, R., Baker, D.: Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design. Nucleic Acids Research 33, 5861–5867 (2005)CrossRefGoogle Scholar
  12. 12.
    del Alamo, M., Mateu, M.G.: Electrostatic repulsion, compensatory mutations, and long-range non-additive effects at the dimerization interface of the HIV capsid protein. J. Mol. Biol. 345, 893–906 (2005)CrossRefGoogle Scholar
  13. 13.
    Dunn, S.D., Wahl, L.M., Gloor, G.B.: Mutual Information Without the Influence of Phylogeny or Entropy Dramatically Improves Residue Contact Prediction. Bioinformatics 24, 333–340 (2008)CrossRefGoogle Scholar
  14. 14.
    Duret, L., Abdeddaim, S.: Multiple alignment for structural functional or phylogenetic analyses of homologous sequences. In: Higgins, D., Taylor, W. (eds.) Bioinformatics sequence structure and databanks. Oxford University Press, Oxford (2000)Google Scholar
  15. 15.
    Engelen, S., Trojan, L.A., Sacquin-Mora, S., Lavery, R., Carbone, A.: Joint Evolutionary Trees: detection and analysis of protein interfaces. PLoS Computational Biology 5(1), e1000267 (2009)CrossRefGoogle Scholar
  16. 16.
    Fares, M.A., Travers, S.A.A.: A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses. Genetics 173, 9–23 (2006)CrossRefGoogle Scholar
  17. 17.
    Fares, M.A., McNally, D.: CAPS: coevolution analysis using protein sequences. Bioinformatics 22, 2821–2822 (2006)CrossRefGoogle Scholar
  18. 18.
    Felsenstein, J.: Inferring Phylogenies. Sinauer Associates, Sunderland (2004)Google Scholar
  19. 19.
    Fitch, W.M., Markowitz, E.: An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem Genet. 4, 579–593 (1970)CrossRefGoogle Scholar
  20. 20.
    Fodor, A.A., Aldrich, R.W.: Influence of conservation on calculations of amino acid covariance in multiple sequence alignments. Proteins 56, 211–221 (2004a)CrossRefGoogle Scholar
  21. 21.
    Gloor, G.B., Martin, L.C., Wahl, L.N., Dunn, S.D.: Mutual information in protein multiple sequence alignments reveals two two classes of coevolving positions. Biochemistry 44, 7156–7165 (2005)CrossRefGoogle Scholar
  22. 22.
    Halperin, I., Wolfson, H., Nussinov, R.: Correlated mutations: advances and limitations. A study on fusion proteins and on the CohesinDockerin families. Proteins 63, 832–845 (2006)CrossRefGoogle Scholar
  23. 23.
    Innis, C.A.: siteFiNDER–3D: a web-based tool for predicting the location of functional sites in proteins. Nucleic Acids Res. 35(Web-Server-Issue), 489–494 (2007)CrossRefGoogle Scholar
  24. 24.
    Kass, I., Horovitz, A.: Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins: Structure, Function, and Bioinformatics 48, 611–617 (2002)CrossRefGoogle Scholar
  25. 25.
    Lecompte, O., Thompson, J.D., Plewniak, F., Thierry, J., Poch, O.: Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene. 270, 17–30 (2001)CrossRefGoogle Scholar
  26. 26.
    Lichtarge, O., Bourne, H.R., Cohen, F.E.: An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257, 342–358 (1996)CrossRefGoogle Scholar
  27. 27.
    Lichtarge, O., Sowa, M.E.: Evolutionary predictions of binding surfaces and interactions. Current Opinions in Structural Biology 12, 21–27 (2002)CrossRefGoogle Scholar
  28. 28.
    Lockless, S.W., Ranganathan, R.: Evolutionary conserved pathways of energetic connectivity in protein families. Science 286, 295–299 (1999)CrossRefGoogle Scholar
  29. 29.
    Martin, L.C., Gloor, G.B., Dunn, S.D., Wahl, L.M.: Using information theory to search for co-evolving residues in proteins. Bioinformatics 21, 4116–4124 (2005)CrossRefGoogle Scholar
  30. 30.
    Mateu, M.G., Fersht, A.R.: Mutually compensatory mutations during evolution of the tetramerization domain of tumor suppressor p53 lead to impaired hetero-oligomerization. Proc. Natl. Acad Sci. USA 96, 3595–3599 (1999)CrossRefGoogle Scholar
  31. 31.
    Mintseris, J., Weng, Z.: Structure, function, and evolution of transient and obligate proteinprotein interactions. Proc. Natl. Acad. Sci. USA 102, 10930–10935 (2005)CrossRefGoogle Scholar
  32. 32.
    Notredame, C.: Recent progresses in multiple sequence alignment: a survey. Pharmacogenomics 31, 131–144 (2002)CrossRefGoogle Scholar
  33. 33.
    Notredame, C.: Recent evolutions of multiple sequence alignment algorithms. PLOS Computational Biology 8, e123 (2007)CrossRefGoogle Scholar
  34. 34.
    Pazos, F., Helmer-Citterich, M., Ausiello, G., Valencia, A.: Correlated mutations contain information about proteinprotein interaction. J. Mol. Biol. 271, 511–523 (1997)CrossRefGoogle Scholar
  35. 35.
    Pazos, F., Valencia, A.: In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins 47, 219–227 (2002)CrossRefGoogle Scholar
  36. 36.
    Poon, A., Chao, L.: The rate of compensatory mutation in the DNA bacteriophage X174. Genetics 170, 989–999 (2005)CrossRefGoogle Scholar
  37. 37.
    Pupko, T., Bell, R.E., Mayrose, I., Glaser, F., Ben-Tal, N.: Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18, S71–S77 (2002)CrossRefGoogle Scholar
  38. 38.
    Rambaut, A., Grassly, N.C.: Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235–238 (1997)Google Scholar
  39. 39.
    Strope, C.L., Scott, S.D., Moriyama, E.N.: indel-Seq-Gen: A new protein family simulator incorporating domains, motifs, and indels. Mol. Biol. Evol. 24, 640–649 (2007)CrossRefGoogle Scholar
  40. 40.
    Suel, G.M., Lockless, S.W., Wall, M.A., Ranganathan, R.: Evolutionary conserved networks of residues mediate allosteric communication in proteins. Nature Struct. Biol. 23, 59–69 (2003)CrossRefGoogle Scholar
  41. 41.
    Thompson, J.D., Plewniak, F., Poch, O.: A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Research 27, 12682–12690 (1999)CrossRefGoogle Scholar
  42. 42.
    Tillier, E.R., Lui, T.W.: Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 19, 750–755 (2003)CrossRefGoogle Scholar
  43. 43.
    Tress, M., de Juan, D., Grana, O., Gomez, M.J., Gomez-Puertas, P., Gonzalez, J.M., Lopez, G., Valencia, A.: Scoring docking models with evolutionary information. Proteins 60, 275–280 (2005)CrossRefGoogle Scholar
  44. 44.
    Yang, Z.: Adaptive molecular evolution. In: Balding, D., Bishop, M., Cannings, C. (eds.) Handbook of statistical genetics, pp. 327–350. Wiley, New York (2001)Google Scholar
  45. 45.
    Yang, Z., Swanson, W.J., Vacquier, V.D.: Maximum likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Mol. Biol. Evol. 17, 1446–1455 (2000)Google Scholar
  46. 46.
    Yanofsky, C., Horn, V., Thorpe, D.: Protein Structure Relationships Revealed by Mutational Analysis. Science 146, 1593–1594 (1964)CrossRefGoogle Scholar
  47. 47.
    Wallace, I.M., Blackshields, G., Higgins, D.G.: Multiple sequence alignments. Curr. Opin. Struct. Biol. 15, 261–266 (2005)CrossRefGoogle Scholar
  48. 48.
    Watson, J.D., Laskowski, R.A., Thornton, J.M.: Predicting protein function from sequence and structural data. Curr. Opin. Struct. Biol. 15, 275–284 (2005)CrossRefGoogle Scholar
  49. 49.
    Wollenberg, K.R., Atchley, W.R.: Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc. Natl. Acad. Sci. U S A 97, 3288–3291 (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Alessandra Carbone
    • 1
  • Linda Dib
    • 2
  1. 1.Département d’InformatiqueUniversité Pierre et Marie Curie-Paris 6 
  2. 2.Génomique Analytique, FRE3214 CNRS-UPMCParis 

Personalised recommendations