New Measurement for Correlation of Co-evolution Relationship of Subsequences in Protein

  • Hongyun Gao
  • Xiaoqing Yu
  • Yongchao Dou
  • Jun WangEmail author
Original Research Article


Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson’s correlation coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) are used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.


Mutual information Adjacent Correlation Subsequence Pearson’s correlation coefficient 



This paper was supported by National Natural Science Foundation of China (Nos. 11171224 and 11231004). The Subsidy Scheme of Young Teachers in Universities of Shanghai (No. ZZyyy13017) is also supported the work. Hongyun Gao is also sponsored by the China Scholarship Council (CSC) for 17-month study at the Purdue University and supervised by Professor Daisuke Kihara.

Compliance with Ethical Standards

Conflict of interest

The authors declare that they have no competing interests.


  1. 1.
    Ambrogelly A, Palioura S, Söll D (2007) Natural expansion of the genetic code. Nat Chem Biol 3(1):29–35CrossRefPubMedGoogle Scholar
  2. 2.
    Bystroff C, Shao Y (2002) Fully automated ab initio protein structure prediction using i-sites, hmmstr and rosetta. Bioinformatics 18(suppl 1):S54–S61CrossRefPubMedGoogle Scholar
  3. 3.
    Eddy SR (1998) Profile hidden markov models. Bioinformatics 14(9):755–763CrossRefPubMedGoogle Scholar
  4. 4.
    Nimrod G, Glaser F, Steinberg D, Ben-Tal N, Pupko T (2005) In silico identification of functional regions in proteins. Bioinformatics 21:i328–27CrossRefPubMedGoogle Scholar
  5. 5.
    Jukes T, Cantor C (1969) Evolution of protein molecules. In: Munro H (ed) Mammalian protein metabolism. Academic Press, New York, USA, pp 21–132CrossRefGoogle Scholar
  6. 6.
    Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW (2000) Correlations among amino acid sites in bhlh protein domains: an information theoretic analysis. Mol Biol Evol 17(1):164–178CrossRefPubMedGoogle Scholar
  7. 7.
    Barabási AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5:101–113CrossRefPubMedGoogle Scholar
  8. 8.
    Goh CS, Bogan AA, Joachimiak M, Walther D, Cohen FE (2000) Co-evolution of proteins with their interaction partners. J Mol Biol 299(2):283–293CrossRefPubMedGoogle Scholar
  9. 9.
    Pazos F, Helmer-Citterich M, Ausiello G, Valencia A (1997) Correlated mutations contain information about protein–protein interaction. J Mol Biol 271(4):511–523CrossRefPubMedGoogle Scholar
  10. 10.
    Fraser HB, Hirsh AE, Wall DP, Eisen MB (2004) Coevolution of gene expression among interacting proteins. PNAS 101(24):9033–9038CrossRefPubMedGoogle Scholar
  11. 11.
    Pazos F, Valencia A (2001) Similarity of phylogenetic tree as indicator of protein–protein interaction. Protein Eng 14(9):609–614CrossRefPubMedGoogle Scholar
  12. 12.
    Atwell S, Ultsch M, Vos AMD, Wells JA (1997) Structural plasticity in a remodeled protein-protein interface. Science 278(5340):1125–1128CrossRefPubMedGoogle Scholar
  13. 13.
    Chelvanayagam G, Eggenschwiler A, Knecht L, Gonnet G, Benner S (1997) An analysis of simultaneous variation in protein structures. Protein Eng 10:307–316CrossRefPubMedGoogle Scholar
  14. 14.
    Goh CS, Cohen FE (2002) Coevolutionary analysis reveals insights into protein–protein interactions. J Mol Biol 324(1):177–192CrossRefPubMedGoogle Scholar
  15. 15.
    Martin LC, Gloor GB, Dunn SD, Wahl LM (2005) Using information theory to search for co-evolving residues in proteins. Bioinformatics 21(22):4116–4124CrossRefPubMedGoogle Scholar
  16. 16.
    Olivera L, Paiva ACM, Vriend G (2002) Correlated mutation analyses on very large sequence families. Chem Bio Chem 3(10):1010–1017CrossRefGoogle Scholar
  17. 17.
    Taylor WR, Hatrick K (1994) Compensating changes in protein multiple sequence alignments. Protein Eng 7(3):341–348CrossRefPubMedGoogle Scholar
  18. 18.
    Chakrabarti S, Panchenko A (2009) Coevolution in defining the functional specificity. Proteins 75(1):231–240CrossRefPubMedGoogle Scholar
  19. 19.
    Dimmic MW, Hubisz MJ, Bustamante CD, Nielsen R (2005) Detecting coevolving amino acid sites using Bayesian mutational mapping. Bioinformatics 21(suppl 1):126–135CrossRefGoogle Scholar
  20. 20.
    Fares MA (2006) Computational and statistical methods to explore the various dimensions of protein evolution. Curr Bioinform 1:207–217CrossRefGoogle Scholar
  21. 21.
    Fares MA, McNally D (2006) Caps: coevolution analysis using protein sequences. Bioinformatics 22(22):2821–2822CrossRefGoogle Scholar
  22. 22.
    Fasold M, Stadler PF, Binder H (2010) G-stack modulated probe intensities on expression arrays-sequence corrections and signal calibration. BMC Bioinform 11:207CrossRefGoogle Scholar
  23. 23.
    Gao H, Dou Y, Yang J, Wang J (2011) New methods to measure residues coevolution in proteins. BMC Bioinform 12:206CrossRefGoogle Scholar
  24. 24.
    Gloor GB, Martin LC, Wahl LM, Dunn SD (2005) Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. Biochemistry 44(19):7156–7165CrossRefPubMedGoogle Scholar
  25. 25.
    Pollock DD, Taylor WR, Goldman N (1999) Coevolving protein residues: maximum likelihood identification and relationship to structure. J Mol Biol 287(1):187–198CrossRefPubMedGoogle Scholar
  26. 26.
    Weckwerth W, Selbig J (2003) Scoring and identifying organism-specific functional patterns and putative phosphorylation sites in protein sequences using mutual information. Biochem Biophys Res Commun 307:516–521CrossRefPubMedGoogle Scholar
  27. 27.
    Hassan SS, Choudhury PP, Guha R, Chakraborty S, Goswami A (2012) Dna sequence evolution through integral value transformations. Interdiscip Sci 4:128–132CrossRefPubMedGoogle Scholar
  28. 28.
    Silviu G (1977) Information theory with applications. McGraw-Hill, New YorkGoogle Scholar
  29. 29.
    Halabi N, Rivoire O, Leibler S, Ranganathan R (2009) Protein sectors: evolutionary units of three-dimensional structure. Cell 138:774–786CrossRefPubMedGoogle Scholar
  30. 30.
    McLaughlin RN Jr, Poelwijk FJ, Raman A, Gosal WS, Ranganathan R (2012) The spatial architecture of protein function and adaptation. Nature 491:138–142CrossRefPubMedGoogle Scholar
  31. 31.
    Lockless SW, Ranganathan R (1999) Evolutionarily conserved pathways of energetic connectivity in protein families. Science 286(5438):295–299CrossRefPubMedGoogle Scholar
  32. 32.
    Russ WP, Lowery DM, Mishra P, Yaffe MB, Ranganathan R (2005) Natural-like function in artificial WW domains. Nature 437:579–583CrossRefPubMedGoogle Scholar
  33. 33.
    Socolich M, Lockless SW, Russ WP, Lee H, Gardner KH, Ranganathan R (2005) Evolutionary information for specifying a protein fold. Nature 437:512–518CrossRefPubMedGoogle Scholar
  34. 34.
    Fersht AR (1995) Optimization of rates of protein folding: the nucleation-condensation mechanism and its implications. Proc Natl Acad Sci 92:10869–10873CrossRefPubMedGoogle Scholar
  35. 35.
    Hudson KR, Robinson H, Fraser JD (1993) Two adjacent residues in staphylococcal enteromxlns A and E determine t cell v\(\beta\) receptor specificity. J Exp Med 177:175–184CrossRefGoogle Scholar
  36. 36.
    Thygesen HH, Zwinderman AH (2005) Modelling the correlation between the activities of adjacent genes in Drosophila. BMC Bioinformatics 6(10)CrossRefPubMedGoogle Scholar
  37. 37.
    Dickson RJ, Gloor GB (2012) Protein sequence alignment analysis by local covariation: coevolution statistics detect benchmark alignment errors. PLoS One 7(6):e37645CrossRefPubMedGoogle Scholar
  38. 38.
    Dickson RJ, Wahl LM, Fernandes AD, Gloor GB (2010) Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. PLoS ONE 5(6):e11082CrossRefPubMedGoogle Scholar
  39. 39.
    Merkl R, Zwick M (2008) H2r: Identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments. BMC Bioinform 9(1):151CrossRefGoogle Scholar
  40. 40.
    Gltas M, Haubrock M, Tysz N, Waack S (2012) Coupled mutation finder: A new entropy-based method quantifying phylogenetic noise for the detection of compensatory mutations. BMC Bioinformatics 13(225) Google Scholar
  41. 41.
    Cornilescu G, Delaglio F, Bax A (1999) Protein backbone angle restraints from searching a database for chemical shift and sequence homology. J Biomol NMR 13:289–302CrossRefGoogle Scholar
  42. 42.
    Press WH, Teukolsky SA, Vetterling WT, Flannery BP (1992) Numerical recipes in C, the art of scientific computing. Cambridge University Press, CambridgeGoogle Scholar
  43. 43.
    Tukey JW (1977) Exploratory data analysis. Addison-Wesley, ReadingGoogle Scholar
  44. 44.
    Capra JA, Singh M (2007) Predicting functionally important residues from sequence conservation. Bioinformatics 23(15):1875–1882CrossRefGoogle Scholar
  45. 45.
    Abe I, Rohmer M, Prestwich GD (1993) Enzymatic cyclization of squalene and oxidosqualene to sterols and triterpenes. Chem Rev 93:2189–2206CrossRefGoogle Scholar
  46. 46.
    Abe I, Zheng YF, Prestwich GD (1998) Photo affinity labeling of oxidosqualene cyclase and squalene cyclase by a benzophenone containing inhibitor. Biochemistry 37:5779–5784CrossRefPubMedGoogle Scholar
  47. 47.
    Ourisson G, Rohmer M (1992) Hopanoids. 2. biohopanoids: a novel class of bacterial lipids. Acc Chem Res 25:403–408CrossRefGoogle Scholar
  48. 48.
    Poralla K, Hewelt A, Glenn D, Prestwich IA, Ina-Reipen GS (1994) A specific amino acid repeat in squalene and oxidosqualene cyclases. Trends Biochem Sci 19:157–158CrossRefPubMedGoogle Scholar

Copyright information

© International Association of Scientists in the Interdisciplinary Areas and Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  • Hongyun Gao
    • 1
    • 2
  • Xiaoqing Yu
    • 3
  • Yongchao Dou
    • 4
  • Jun Wang
    • 5
    Email author
  1. 1.School of Mathematical SciencesDalian University of TechnologyDalianChina
  2. 2.Information and Engineering CollegeDalian UniversityDalianChina
  3. 3.College of SciencesShanghai Institute of TechnologyShanghaiChina
  4. 4.Center for Plant Science and Innovation, School of Biological SciencesUniversity of NebraskaLincolnUSA
  5. 5.Department of MathematicsShanghai Normal UniversityShanghaiChina

Personalised recommendations