Prediction of Functional Sites in Proteins by Evolutionary Methods

  • Pedro López-Romero
  • Manuel J. Gómez
  • Paulino Gómez-Puertas
  • Alfonso Valencia
Part of the Principles and Practice book series (PRINCIPLES)


Functional sites are well-defined regions that are relevant for protein function, and that include characteristic groups of amino acids. These regions may be involved in the interaction between proteins and other molecules, such as other proteins, nucleic acids, small ligands and substrates. Interaction sites have been studied in great detail in representative protein families, and their relationship with natural substrates and drugs has been characterized, as well as their mediation in protein complex formation. In many cases they have been studied in relation to their potential for engineering protein activity. Protein binding sites have also been studied at a more general level by characterizing the typical structure of binding sites, and their general residue preferences. However, it is the relationship between the conservation of sequence features and protein active sites and binding sites that constitutes the basis of the development of prediction methods. The conservation of the chemical characteristics of the amino acids in specific groups of sequences, in the context of large protein families, is a particular method used in a growing collection of methods aimed at predicting protein binding sites at a genomic scale. In this review we analyze these methods, discuss their similarities, and describe a number of key unsolved problems.


Entropy Serine Alanine Methionine NADH 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Aloy P, Querol E, Aviles FX, Sternberg MJ. Automated structure-based prediction of functional sites in proteins: applications to assessing the validity of inheriting protein function from homology in genome annotation and to protein docking. J Mol Biol. 2001, 311 (2): 395–408PubMedCrossRefGoogle Scholar
  2. Altschuh D, Lesk AM, Bloomer AC, Klug A. Correlation of coordinated amino acid substitutions with function in virus related to tobacco mosaic virus. J. Mol. Biol. 1987, 193: 693–707PubMedCrossRefGoogle Scholar
  3. Andrade MA, Casari G, Sander C, Valencia A. Classification of protein families and detection of the determinant residues with an improved self-organizing map. Biol Cybern. 1997, 76: 441–450PubMedCrossRefGoogle Scholar
  4. Armon A, Graur D, Ben-Tal N. Con Surf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. J Mol Biol. 2001, 307: 447–463PubMedCrossRefGoogle Scholar
  5. Atchley, W. R., Terhalle, W., Dress, A. Positional dependence, cliques and predictive motifs in the bHLH protein domain. J. Mol. Evol. 1999, 48: 501–516PubMedCrossRefGoogle Scholar
  6. Atchley, W. R., Wollenberg, K. R., Fitch, W. M., Terhalle, W. Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol. Biol. Evol. 2000, 17: 164–178PubMedCrossRefGoogle Scholar
  7. Azuma Y, Renault L, Garcia-Ranea JA, Valencia A, Nishimoto T, Wittinghofer A. Model of the Ran-RCC1 interaction using biochemical and docking experiments. Journal of Molecular Biology. 1999, 289: 1119–1130PubMedCrossRefGoogle Scholar
  8. Bauer B, Mirey G, Vetter IR, Garcia-Ranea JA, Valencia A, Wittinghofer A, Camonis JH, Cool RH. Effector recognition by the small GTP-binding proteins Ras and Ral. Journal of Biological Chemistry. 1999, 274: 17763–17770PubMedCrossRefGoogle Scholar
  9. Bazan JF, KochNolte F. Sequence and structural links between distant ADP- ribosyltransferase families. In Adp-Ribosylation in Animal Tissues. Edited by; 1997: 99–107.Google Scholar
  10. Bazan JF. Helical fold prediction for the cyclin box. Proteins-Structure Function and Genetics. 1996, 24: 1–17Google Scholar
  11. Blomberg N, Nilges M. Functional diversity of PH domains: an exhaustive modelling study. Folding and Design. 1997, 2: 343–355PubMedCrossRefGoogle Scholar
  12. Chap. 22 Prediction and Functional Sites in Proteins by Evolutionary Methods 337Google Scholar
  13. Bogan AA, Thorn KS. Anatomy of hot spots in protein interfaces. J Mol Biol. 1998, 280 (1): 1–9PubMedCrossRefGoogle Scholar
  14. Casari G, Sander, C., Valencia, A. A method to predict functional residues in proteins. Nature Struct Biol. 1995, 2: 171–178PubMedCrossRefGoogle Scholar
  15. Clackson T, Wells JA. A hot spot of binding energy in a hormone-receptor interface. Science. 1995, 267 (5196): 383–6PubMedCrossRefGoogle Scholar
  16. Clarke, N. D. Covariation of residues in the homeodomain sequence family. Protein Sci. 1995, 4: 2269–2278PubMedCrossRefGoogle Scholar
  17. del Porto P, Puntoriero G, Scotta C, Nicosia A, Piccolella E. High prevalence of hypervariable region 1-specific and cross-reactive CD4(+) T cells in HCV-infected individuals responsive to IFN-alpha treatment. Virology. 2000, 269: 313–324PubMedCrossRefGoogle Scholar
  18. del Sol, A., Pazos, F., Valencia, A. Automatic methods for predicting functionally important residues. J. Mol. Biol. 2003, 326: 1289–1302PubMedCrossRefGoogle Scholar
  19. de Rinaldis M, Ausiello G, Cesareni G, Helmer-Citterich M. Three-dimensional profiles: a new tool to identify protein surface similarities. J Mol Biol. 1998, 284: 1211–1221PubMedCrossRefGoogle Scholar
  20. Devos D, Valencia A. Practical limits of function prediction. Proteins. 2000, 41: 98–107PubMedCrossRefGoogle Scholar
  21. Dokholyan NV, Li L, Ding F, Shakhnovich EI.. Topological determinants of protein folding. Proc Natl Acad Sci USA. 2002, 99 (13): 8637–41PubMedCrossRefGoogle Scholar
  22. Dopazo J. A new index to find regions showing an unexpected variability or conservation in sequence alignments. Comput Appl Biosci. 1997, 13 (3): 313–7PubMedGoogle Scholar
  23. Dorit RL, Ayala FJ. ADH evolution and the phylogenetic footprint. J Mol Evol. 1995, 40 (6): 658–62PubMedCrossRefGoogle Scholar
  24. Ferreira F, Ebner C, Kramer B, Casari G, Briza P, Kungl AJ, Grimm R, Jahn-Schmid B, Breiteneder H, Kraft D, et al. Modulation of IgE reactivity of allergens by site-directed mutagenesis: potential use of hypoallergenic variants for immunotherapy. Faseb Journal. 1998, 12: 231–242PubMedGoogle Scholar
  25. Ferreira F, Wallner M, Breiteneder H, Hartl A, Thalhamer J, Ebner C. Genetic engineering of allergens: Future therapeutic products. International Archives of Allergy and Immunology. 2002, 128: 171–178PubMedCrossRefGoogle Scholar
  26. Fetrow JS, Skolnick J. Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxins/thioredoxins and T1 ribonucleases. J Mol Biol. 1998, 281 (5): 949–68PubMedCrossRefGoogle Scholar
  27. Gaboriaud C, Rossi V, Fontecilla-Camps JC, Arland GJ. Evolutionary conserved rigid module-domain interactions can be detected at the sequence level: The examples of complement and blood coagulation proteases. Journal of Molecular Biology. 1998, 282: 459–470PubMedCrossRefGoogle Scholar
  28. Garcia B, Castellanos A, Menendez J, Pons T. Molecular cloning of an alpha-glucosidaselike gene from Penicillium minioluteum and structure prediction of its gene product. Biochemical and Biophysical Research Communications. 2001, 281: 151–158PubMedCrossRefGoogle Scholar
  29. Giraud, BG, Lapedes A, Liu LC. Analysis of correlation between sites in models of protein sequences. Physical Rev E. 1998, 58 (5): 6312–6322CrossRefGoogle Scholar
  30. Gribskov M, Homyak M, Edenfield J, Eisenberg D. Profile scanning for three-dimensional structural patterns in protein sequences. Comput Appl Biosci. 1988, 4 (1): 61–6PubMedGoogle Scholar
  31. Grishin NV, Phillips MA. The subunit interfaces of oligomeric enzymes are conserved to a similar extent to the overall protein sequences. Protein Sci. 1994, 3 (12): 2455–8PubMedCrossRefGoogle Scholar
  32. Gu JY, Wang YF, Gu X. Evolutionary analysis for functional divergence of Jak protein kinase domains and tissue-specific genes. Journal of Molecular Evolution. 2002, 54: 725–733PubMedCrossRefGoogle Scholar
  33. Hannenhalli SS, Russell RB. Analysis and Prediction of Functional Sub-types from Protein Sequence Alignments. J Mol Biol. 2000, 303: 61–76PubMedCrossRefGoogle Scholar
  34. Iliopoulos I, Tsoka S, Andrade MA, Janssen P, Audit B, Tramontano A, Valencia A, Leroy C, Sander C, Ouzounis C. A. Genome sequences and great expectations. Genome Biol. 2000, 2(1):INTERACTIONS0001Google Scholar
  35. Johnson JM, Church GM. Predicting ligand-binding function in families of bacterial receptors. Proceedings of the National Academy of Sciences of the United States of America. 2000, 97: 3965–3970PubMedCrossRefGoogle Scholar
  36. Kohonen, T. Self-organized formation of topologically correct feature maps. Biol. Cybern. 1982, 43: 59–69CrossRefGoogle Scholar
  37. Kraft D, Ferreira F, Vrtala S, Breiteneder H, Ebner C, Valenta R, Susani M, Breitenbach M, Scheiner O. The importance of recombinant allergens for diagnosis and therapy of IgE-mediated allergies. International Archives of Allergy and Immunology 1999, 118: 171–176PubMedCrossRefGoogle Scholar
  38. Kuipers W, Oliveira L, Vriend G, Ijzerman AP. Identification of class-determining residues in G protein-coupled receptors by sequence analysis. Receptors Channels. 1997, 5 (34): 159–74PubMedGoogle Scholar
  39. Kuntz ID, Blaney JM, Oatley SJ, Langridge R, Ferrin TE. A geometric approach to macromolecule-ligand interactions. J Mol Biol. 1982, 161 (2): 269–88PubMedCrossRefGoogle Scholar
  40. Landgraf R, Fischer D, Eisenberg D. Analysis of heregulin symmetry by weighted evolutionary tracing. Protein Engineering. 1999, 12: 943–951PubMedCrossRefGoogle Scholar
  41. Landgraf R, Xenarios I, Eisenberg D. Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. J Mol Biol. 2001, 307: 1487–1502PubMedCrossRefGoogle Scholar
  42. Lichtarge O, Boume HR, Cohen FE. An Evolutionary Trace Method Defines Binding Surfaces Common to Protein Families. J Mol Biol. 1996, 257: 342–358PubMedCrossRefGoogle Scholar
  43. Livingstone CD, Barton GJ. Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. Comput Appl Biosci. 1993, 6: 645–756Google Scholar
  44. Lizano S, Lambeau G, Lazdunski M. Cloning and cDNA sequence analysis of Lys(49) and Asp(49) basic phospholipase A(2) myotoxin isoforms from Bothrops asper. International Journal of Biochemistry and Cell Biology. 2001, 33: 127–132PubMedCrossRefGoogle Scholar
  45. Lockless, S. W., Ranganathan, R. Evolutionary conserved pathways of energetic connectivity in protein families. Science. 1999, 286: 295–299PubMedCrossRefGoogle Scholar
  46. Luscombe NM, Thornton JM. Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. J Mol Biol. 2002, 320 (5): 991–1009PubMedCrossRefGoogle Scholar
  47. Madabushi S, Yao H, Marsh M, Kristensen DM, Philippi A, Sowa ME, Lichtarge O. Structural clusters of evolutionary trace residues are statistically significant and common in proteins. J Mol Biol. 2002, 316: 139–154PubMedCrossRefGoogle Scholar
  48. Mirny LA, Gelfand MS. Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors. Journal of Molecular Biology. 2002, 321: 7–20PubMedCrossRefGoogle Scholar
  49. Miyata, T., Miyazawa, S., Yashunaga, T. Two types of amino acid substitutions in protein evolution. J. Mol. Evol. 1979, 12: 219–236PubMedCrossRefGoogle Scholar
  50. Morillas M, Gomez-Puertas P, Bentebibel A, Selles E, Casals N, Valencia A, Hegardt FG, Serra D Identification of conserved amino acid residues in rat liver Carnitine palmitoyltransferase I critical for malonyl-CoA inhibition. Journal of Biological Chemistry. 2003, 278: 9058–9063PubMedCrossRefGoogle Scholar
  51. Morillas M, Gomez-Puertas P, Roca R, Serra D, Asins G, Valencia A, Hegardt FG. Structural model of the catalytic core of carnitine palmitoyltransferase I and carnitine octanoyltransferase (COT)–Mutation of CPT I histidine 473 and alanine 381 and COT alanine 238 impairs the catalytic activity. Journal of Biological Chemistry. 2001, 276: 45001–45008PubMedCrossRefGoogle Scholar
  52. Morillas M, Gomez-Puertas P, Rubi B, Clotet J, Arino J, Valencia A, Hegardt FG, Serra D, Asins G. Structural model of a malonyl-CoA-binding site of carnitine octanoyltransferase and carnitine palmitoyltransferase I- Mutational analysis of a malonyl-CoA affinity domain. Journal of Biological Chemistry. 2002, 277: 11473–11480PubMedCrossRefGoogle Scholar
  53. Osuna J, Soberon X, Morett E. A proposed architecture for the Central domain of the bacterial enhancer-binding proteins based on secondary structure prediction and fold recognition. Protein Science. 1997, 6: 543–555PubMedCrossRefGoogle Scholar
  54. Ouzounis C, Perez-Irratxeta C, Sander C, Valencia A. Are binding residues conserved? Pacific Symposium on Biocomputing. 1998, 3: 399–410Google Scholar
  55. Padilla-Zuniga AJ, Rojo-Dominguez A. Non-homology knowledge-based prediction of the papain prosegment folding pattern: a description of plausible folding and activation mechanisms. Folding and Design. 1998, 3: 271–284PubMedCrossRefGoogle Scholar
  56. Pazos F, Sanchez-Pulido L, Garcia-Ranea JA, Andrade MA, Atrian S, Valencia A. Comparative analysis of different methods for the detection of specificity regions in protein families. In: Lundh D, Olsson, B., Narayanan A. (ed) Biocomputing and Emergent Computation. 1997, World Scientific, Singapore, New Jersey, London, Hong Kong, p 132145Google Scholar
  57. Pettit FK, Bowie JU. Protein surface roughness and small molecular binding sites. J Mol Biol. 1999, 285 (4): 1377–82PubMedCrossRefGoogle Scholar
  58. Pons T, Olmea O, Chinea G, Beldarrain A, Marquez G, Acosta N, Rodriguez L, Valencia A. Structural model for family 32 of glycosyl-hydrolase enzymes. Proteins-Structure Function and Genetics. 1998, 33: 383–395CrossRefGoogle Scholar
  59. Puntoriero G, Meola A, Lahm A, Zucchelli S, Ercole BB, Tafi R, Pezzanera M, Mondelli MU, Cortese R, Tramontano A, et al. Towards a solution for hepatitis C virus hyper-variability: mimotopes of the hypervariable region 1 can induce antibodies cross-reacting with a large number of viral variants. Embo Journal. 1998, 17: 3521–3533PubMedCrossRefGoogle Scholar
  60. Pupko T, Bell RE, Mayrose I, Glaser F, Ben-Tal N. Rate Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics. 2002, 18: S71 - S77PubMedCrossRefGoogle Scholar
  61. Reva BA, Finkelstein AV, Skolnick J. Derivation and testing residue-residue mean-force potentials for use in protein structure recognition. Methods Mol Biol. 2000, 143: 155–74PubMedGoogle Scholar
  62. Roccasecca R, Folgori A, Ercole BB, Puntoriero G, Lahra A, Zucchelli S, Tafi R, Pezzanera M, Galfre G, Tramontano A, et al. Mimotopes of the hyper variable region I of the hepatitis C virus induce cross-reactive antibodies directed against discontinuous epitopes. Molecular Immunology. 2001, 38: 485–492PubMedCrossRefGoogle Scholar
  63. Rost B. Enzyme function less conserved than anticipated. J Mol Biol. 2002, 318: 595–608PubMedCrossRefGoogle Scholar
  64. Rost B, Honig B, Valencia A. Bioinformatics in structural genomics. Bioinformatics. 2002, 18 (7): 897–8PubMedCrossRefGoogle Scholar
  65. Sagara JI, Shimizu S, Kawabata T, Nakamura S, Ikeguchi M, Shimizu K. The use of sequence comparison to detect `identities’ in tRNA genes. Nucleic Acids Research. 1998, 26: 1974–1979PubMedCrossRefGoogle Scholar
  66. Shannon CE, and Weaver W. The Mathematical Theory of Communication. The University of Illinois Press, Urbana, 1949Google Scholar
  67. Sibbald PR, Argos P. Weighting aligned protein or nucleic acid sequences to correct for unequal representation. J Mol Biol. 1990, 216 (4): 813–8PubMedCrossRefGoogle Scholar
  68. Singer, M. S., Oliveira, L. Vriend, G., Shepherd, G. M. Potential ligand-binding residues in rat olfactory receptors identified by correlated mutation analysis. Receptor and Channels. 1995, 3: 89–95Google Scholar
  69. Süel, G.M., Lockless, S. W., Ranganathan, R. Evolutionary conserved networks of residues mediate allosteric communication in proteins. Nat. Struct. Biology. 2003, 10 (1): 59–68CrossRefGoogle Scholar
  70. Taylor, W. R., Harricks, K. Compensating changes in protein multiple sequence alignments. Prot. Eng. 1994, 7: 342–348Google Scholar
  71. Taylor, W. R. Classification of amino acid conservation. J Theor. Biol. 1986, 119: 205–218PubMedCrossRefGoogle Scholar
  72. Todd AE, Orengo CA, Thornton JM. Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol. 2001, 307: 1113–1143PubMedCrossRefGoogle Scholar
  73. Valdar WS, Thornton JM. Protein-protein interfaces: analysis of amino acid conservation in homodimers. Proteins. 2001, 42: 108–124PubMedCrossRefGoogle Scholar
  74. Valencia A, Hubbard TJ, Muga A, Banuelos S, Llorca O, Carrascosa JL, Valpuesta JM. Prediction of the Structure of Groes and Its Interaction with Groel. Proteins-Structure Function and Genetics. 1995, 22: 199–209CrossRefGoogle Scholar
  75. Villar HO, Kauvar LM. Amino-acid preferences at protein binding sites. FEBS Lett. 1994, 349: 125–130PubMedCrossRefGoogle Scholar
  76. Wang YF, Gu X. Functional divergence in the caspase gene family and altered functional constraints: Statistical analysis and prediction. Genetics. 2001, 158: 1311–1320PubMedGoogle Scholar
  77. Ward RJ, Alves AR, Neto JR, Arni RK, Casari G. A SequenceSpace analysis of Lys49 phospholipases A(2): clues towards identification of residues involved in a novel mechanism of membrane damage and in myotoxicity. Protein Engineering. 1998, 11: 285–294PubMedCrossRefGoogle Scholar
  78. Wodak SJ, Janin J. Structural basis of macromolecular recognition. Advances in Protein Chemistry. 2003, 61: 9CrossRefGoogle Scholar
  79. Yao, H., Kristensen, D. M., Mihalek, I., Sowa, M. E., Shaw, C., Kimmer, M., Kavraki, L., Lichtarge, O. An accurate, sensitive, and scalable method to identify functional sites in protein structures. J. Mol. Biol. 2003, 326: 255–261PubMedCrossRefGoogle Scholar
  80. Zucchelli S, Roccasecca R, Meola A, Ercole BB, Tafi R, Dubuisson J, Galfre G, Cortese R, Nicosia A. Mimotopes of the hepatitis C virus hypervariable region 1, but not the natural sequences, induce cross-reactive antibody response by genetic immunization. Hepatology. 2001, 33: 692–703PubMedCrossRefGoogle Scholar
  81. Zuckerkandl E, Pauling L. Evolutionary Divergence and Convergence in Proteins. In: Bryson V, Vogel HJ (eds) Evolving Genes And Proteins. Academic Press, 1965, New York, p 97–166Google Scholar
  82. Zvelebil, M. J. J. M., Barton, G. J., Taylor, W. R., Stenberg, M. J. E. Prediction of protein secondary structure and active sites using the alignment of homologous sequences. J. Mol. Biol. 1987, 195: 957–961PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Pedro López-Romero
  • Manuel J. Gómez
  • Paulino Gómez-Puertas
  • Alfonso Valencia

There are no affiliations available

Personalised recommendations