Physicochemical Correlation between Amino Acid Sites in Short Sequences under Selective Pressure

  • David Campo
  • Zoya Dimitrova
  • Yuri Khudyakov
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4983)


The activities and properties of proteins are the result of interactions among their constitutive amino acids. In the course of natural selection, substitutions which tend to destabilize a particular structure may be compensated by other substitutions which confer stability to that structure. Patterns of coordinated substitutions were studied in two sets of selected peptides. The first is a set of 181 amino acid sequences that were selected in vitro to bind a MHC class I molecule (Kb). The second is a set of 114 sequences of the Hypervariable Region 1 of Hepatitis C virus, which, originating from infected patients, result from natural selection in vivo. The patterns of coordinated substitutions in both datasets showed many significant structural and functional links between pairs of positions and conservation of specific selected physicochemical properties.


physicochemical properties amino acid covariation selection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pollock, D., Taylor, W.: Effectiveness of correlation analysis in identifying protein residues. Protein Eng. 10(6), 647–657 (1997)CrossRefGoogle Scholar
  2. 2.
    Chothia, C., Lesk, A.: Evolution of proteins formed by beta-sheets. I. Plastocyanin and azurin. J. Mol. Biol. 160(2), 309–323 (1982)CrossRefGoogle Scholar
  3. 3.
    Lesk, A., C., C.: Evolution of proteins formed by beta-sheets. II. The core of the immunoglobulin domains. J. Mol. Biol. 160(2), 325–342 (1982)CrossRefGoogle Scholar
  4. 4.
    Oosawa, K., Simon, M.: Analysis of mutations in the transmembrane region of the aspartate chemoreceptor in Escherichia coli. Proc. Natl. Acad. Sci. USA 83(18), 6930–6934 (1986)CrossRefGoogle Scholar
  5. 5.
    Altschuh, D., et al.: Coordinated amino acid changes in homologous protein families. Protein Eng. 2(3), 193–199 (1988)CrossRefGoogle Scholar
  6. 6.
    Bordo, D., Argos, P.: Evolution of protein cores. Constraints in point mutations as observed in globin tertiary structure. J. Mol. Biol. 211(4), 975–988 (1990)CrossRefGoogle Scholar
  7. 7.
    Mateu, M., Fersht, A.: Mutually compensatory mutations during evolution of the tetramerization domain of tumor supressor p53 lead to impaired hetero-oligomerization. Proc. Natl. Acad. Sci. USA 96, 3595–3599 (1999)CrossRefGoogle Scholar
  8. 8.
    Lim, W., Sauer, R.: Alternative packing arrangements in the hydrophobic core of lambda repressor. Nature 339(6219), 31–36 (1989)CrossRefGoogle Scholar
  9. 9.
    Lim, W., Farruggio, D., Sauer, R.: Structural and energetic consequences of disruptive mutations in a protein core. Biochemistry 31(17), 4324–4333 (1992)CrossRefGoogle Scholar
  10. 10.
    Baldwin, E., et al.: The role of backbone flexibility in the accommodation of variants that repack the core of T4 lysozyme. Science 262(5140), 1715–1718 (1993)CrossRefGoogle Scholar
  11. 11.
    Govindarajan, S., et al.: Systematic variation of Amino acid substitutions for stringent assesment of pairwise covariation. J. Mol. Biol. 328, 1061–1069 (2003)CrossRefGoogle Scholar
  12. 12.
    Clarke, N.: Covariation of residues in the homeodomain sequence family. Protein Sci. 4(11), 2269–2278 (1995)CrossRefGoogle Scholar
  13. 13.
    Voigt, C., et al.: Computational method to reduce the search space for directed protein evolution. In: Proc. Natl. Acad. Sci. USA, vol. 98, pp. 3778–3783 (2001)Google Scholar
  14. 14.
    Atchley, W., et al.: Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol. Biol. Evol. 17(1), 164–178 (2000)Google Scholar
  15. 15.
    Fukami-Kobayashi, K., Schreiber, D., Benner, S.: Detecting compensatory covariation signals in protein evolution using reconstructed ancestral sequences. J. Mol. Biol. 319, 729–743 (2002)CrossRefGoogle Scholar
  16. 16.
    Göbel, U., et al.: Correlated mutations and residue contacts in proteins. Proteins 18(4), 309–317 (1994)CrossRefGoogle Scholar
  17. 17.
    Neher, E.: How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci USA 91(1), 98–102 (1994)CrossRefMathSciNetGoogle Scholar
  18. 18.
    Shindyalov, I., Kolchanov, N., Sander, C.: Can three dimensional contacts in protein structures be predicted by analysis of correlated mutations? Protein Eng. 7, 349–358 (1994)CrossRefGoogle Scholar
  19. 19.
    Taylor, W., Hatrick, K.: Compensating changes in protein multiple sequence alignments. Protein Eng. 7(3), 341–348 (1994)CrossRefGoogle Scholar
  20. 20.
    Benner, S., et al.: Bona fide predictions of protein secondary structure using transparent analyses of multiple sequence alignments. Chem. Rev. 97, 2725–2844 (1997)CrossRefGoogle Scholar
  21. 21.
    Nagl, S., Freeman, J., Smith, T.: Evolutionary constraint networks in ligand-binding domains: an information-theoretic approach. Pac. Symp. Biocomput, 90–101 (1999)Google Scholar
  22. 22.
    Larson, S., Di Nardo, A., Davidson, A.: Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. J. Mol. Biol. 303(3), 433–446 (2000)CrossRefGoogle Scholar
  23. 23.
    Afonnikov, D., Oshchepkov, D., Kolchanov, N.: Detection of conserved physico-chemical characteristics of proteins by analyzing clusters of positions with co-ordinated substitutions. Bioinformatics 17(11), 1035–1046 (2001)CrossRefGoogle Scholar
  24. 24.
    Nemoto, W., et al.: Detection of pairwise residue proximity by covariation analysis for 3D-structure prediction of G-protein-coupled receptors. Protein. J. 23(6), 427–435 (2004)CrossRefGoogle Scholar
  25. 25.
    Wang, L.: Covariation analysis of local amino acid sequences in recurrent protein local structures. J. Bioinform. Comput. Biol. 3(6), 1391–1409 (2005)CrossRefGoogle Scholar
  26. 26.
    Shackelford, G., Karplus, K.: Contact prediction using mutual information and neural nets. Proteins 69(suppl. 8), 159–164 (2007)CrossRefGoogle Scholar
  27. 27.
    Altschuh, D., et al.: Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. J. Mol. Biol. 193(4), 693–707 (1987)CrossRefGoogle Scholar
  28. 28.
    Korber, B., et al.: Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc. Natl. Acad. Sci. USA 90(15), 7176–7180 (1993)CrossRefGoogle Scholar
  29. 29.
    Gilbert, P., Novitsky, V., Essex, M.: Covariability of selected amino acid positions for HIV type 1 subtypes C and B. AIDS Res. Hum. Retroviruses 21(12), 1016–1030 (2005)CrossRefGoogle Scholar
  30. 30.
    Kolli, M., Lastere, S., Schiffer, C.: Co-evolution of nelfinavir-resistant HIV-1 protease and the p1-p6 substrate. Virology 347(2), 405–409 (2006)CrossRefGoogle Scholar
  31. 31.
    Chelvanayagam, G., et al.: An analysis of simultaneous variation in protein structures. Protein Eng. 10(4), 307–316 (1997)CrossRefGoogle Scholar
  32. 32.
    Martin, L., et al.: Using information theory to search for co-evolving residues in proteins. Bioinformatics 21(22), 4116–4124 (2005)CrossRefGoogle Scholar
  33. 33.
    Gloor, G., et al.: Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. Biochemistry 44(19), 156–165 (2005)CrossRefGoogle Scholar
  34. 34.
    Poon, A., Chao, L.: The rate of compensatory mutation in the DNA bacteriophage phiX174. Genetics 170(3), 989–999 (2005)CrossRefGoogle Scholar
  35. 35.
    Yeang, C., Haussler, D.: Detecting coevolution in and among protein domains. PLoS Comput Biol. 3(11), e211 (2007)CrossRefMathSciNetGoogle Scholar
  36. 36.
    Milik, M.S., Brunmark, D., Yuan, A., Vitiello, L., Jackson, A., Peterson, M., Skolnick, P., Glass, J.: Application of an artificial neural network to predict specific class I MHC binding peptide sequences. Nat. Biotechnol. 16(8), 753–756 (1998)CrossRefGoogle Scholar
  37. 37.
    Segal, M., Cummings, M., Hubbard, A.: Relating amino acid sequence to phenotype: analysis of peptide-binding data. Biometrics 57(2), 632–642 (2001)CrossRefMathSciNetGoogle Scholar
  38. 38.
    Alter, M.: Epidemiology of hepatitis C virus infection. World J. Gastroenterol. 13(17), 2436–2441 (2007)Google Scholar
  39. 39.
    Alberti, A., Chemello, L., Benvegnu, L.: Natural History Of Hepatitis C. J. Hepatol. 31(supp. 1), 17–24 (1999)CrossRefGoogle Scholar
  40. 40.
    Bowen, D., Walker, C.: Adaptive immune responses in acute and chronic hepatitis C virus infection. Nature 436, 946–952 (2005)CrossRefGoogle Scholar
  41. 41.
    Choo, Q., et al.: Isolation Of A Cdna Clone Derived From A Bloodborne Non-A, Non-B Viral Hepatitis Genome. Science 244, 359–362 (1989)CrossRefGoogle Scholar
  42. 42.
    Smith, D.: Evolution of the hypervariable region of hepatitis C virus. J. Viral Hepat 6(suppl. 1), 41–46 (1999)CrossRefGoogle Scholar
  43. 43.
    Mondelli, M., et al.: Hypervariable region 1 of hepatitis C virus: immunological decoy or biologically relevant domain? Antiviral Res. 52(2), 153–159 (2001)CrossRefGoogle Scholar
  44. 44.
    Kuiken, C., et al.: The Los Alamos hepatitis C sequence database. Bioinformatics 21(3), 379–384 (2005)CrossRefGoogle Scholar
  45. 45.
    Thompson, J., Higgins, D., Gibson, T.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic. Acids. Res. 22(22), 4673–4680 (1994)CrossRefGoogle Scholar
  46. 46.
    Atchley, W., et al.: Solving the protein sequence metric problem. Proc. Natl. Acad. Sci. USA 102(18), 6395–6400 (2005)CrossRefGoogle Scholar
  47. 47.
    Kawashima, S., Kanehisa, M.: AAindex: amino acid index database. Nucleic. Acids. Res. 28, 374 (2000)CrossRefGoogle Scholar
  48. 48.
    Atchley, W., Zhao, J.: Molecular architecture of the DNA-binding region and its relationship to classification of basic helix-loop-helix proteins. Mol. Biol. Evol. 24(1), 192–202 (2007)CrossRefGoogle Scholar
  49. 49.
    McCune, B., Grace, J.: Analysis of ecological communities, MjM Software Design, Gleneden Beach (2002)Google Scholar
  50. 50.
    Cai, L.: Multi-response Permutation Procedure as An Alternative to the Analysis of Variance: An SPSS Implementation. Department of Psychology, University of North Carolina (2004)Google Scholar
  51. 51.
    Cade, B., Richards, J.: User Manual For BLOSSOM Statistical Software. Midcontinent Ecological Science Center US Geological Survey Fort Collins, Colorado (2001)Google Scholar
  52. 52.
    Johnson, R., Wichern, D.: Applied multivariate statistical analysis. Prentice Hall, Upper Saddle River, NJ (2002)Google Scholar
  53. 53.
    SPSS 15.0 for windows, SPSS Inc, Chicago IL (2006)Google Scholar
  54. 54.
    Noivirt, O., Eisenstein, M., Horovitz, A.: Detection and reduction of evolutionary noise in correlated mutation analysis. Protein Eng. 18(5), 247–253 (2005)CrossRefGoogle Scholar
  55. 55.
    Afonnikov, D., Kolchanov, N.: CRASP: a program for analysis of coordinated substitutions in multiple alignments of protein sequences. Nucleic. Acids. Res. 32, W64–W68 (2004)CrossRefGoogle Scholar
  56. 56.
    MathWorks, T.: MATLAB, Natick, MA (2007)Google Scholar
  57. 57.
    Wollenberg, K., Atchley, W.: Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc. Natl. Acad. Sci. USA 97(7), 3288–3291 (2000)CrossRefGoogle Scholar
  58. 58.
    Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical Society, Series B 57(1), 289–300 (1995)MathSciNetzbMATHGoogle Scholar
  59. 59.
    Felsenstein, J.: Phylogenies and the comparative method. Am. Nat. 125, 1–15 (1985)CrossRefGoogle Scholar
  60. 60.
    McAllister, J., et al.: Long-term evolution of the hypervariable region of hepatitis C virus in a common-source-infected cohort. J. Virol. 72(6), 4893–4905 (1998)Google Scholar
  61. 61.
    Sheridan, I., et al.: High-resolution phylogenetic analysis of hepatitis C virus adaptation and its relationship to disease progression. J. Virol 78(7), 3447–3454 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • David Campo
    • 1
  • Zoya Dimitrova
    • 1
  • Yuri Khudyakov
    • 1
  1. 1.Molecular Epidemiology & Bioinformatics Laboratory, Division of Viral HepatitisCenters for Disease Control and PreventionAtlanta

Personalised recommendations