Journal of Molecular Evolution

, Volume 77, Issue 4, pp 159–169 | Cite as

Unearthing the Root of Amino Acid Similarity

  • James D. StephensonEmail author
  • Stephen J. Freeland
Original Article


Similarities and differences between amino acids define the rates at which they substitute for one another within protein sequences and the patterns by which these sequences form protein structures. However, there exist many ways to measure similarity, whether one considers the molecular attributes of individual amino acids, the roles that they play within proteins, or some nuanced contribution of each. One popular approach to representing these relationships is to divide the 20 amino acids of the standard genetic code into groups, thereby forming a simplified amino acid alphabet. Here, we develop a method to compare or combine different simplified alphabets, and apply it to 34 simplified alphabets from the scientific literature. We use this method to show that while different suggestions vary and agree in non-intuitive ways, they combine to reveal a consensus view of amino acid similarity that is clearly rooted in physico-chemistry.


Amino acids Simplified alphabets Similarity measures Chemical properties Protein structure 



This material is based upon work supported by the National Aeronautics and Space Administration through the NASA Astrobiology Institute under Cooperative Agreement No. NNA09DA77A issued through the Office of Space Science.

Supplementary material

239_2013_9565_MOESM1_ESM.pdf (235 kb)
Supplementary material 1 (PDF 236 kb)


  1. Albayrak A, Out HH, Sezerman UO (2010) Clustering of protein families into functional subtypes using relative complexity measure with reduced amino acid alphabets. BMC Bioinformatics 11:428CrossRefGoogle Scholar
  2. Andersen CAF, Brunak S (2004) Representation of protein-sequence information by amino acid subalphabets. AI Magazine 25:97–104Google Scholar
  3. Benner SA, Cohen MA, Gonnet GH (1994) Amino acid substitution during functionally divergent evolution of protein sequences. Protein Eng 7:1323–1332CrossRefGoogle Scholar
  4. Betts MJ, Russell RB (2003) Amino acid properties and consequences of substitutions. Bioinformatics for geneticists. Wiley, New YorkGoogle Scholar
  5. Cannata N, Toppo S, Romualdi C, Valle G (2002) Simplifying amino acid alphabets by means of a branch and bound algorithm and substitution matrices. Bioinformatics 18:1102–1108CrossRefGoogle Scholar
  6. Chen W, Feng P, Lin H (2012) Prediction of ketoacyl synthase family using reduced amino acid alphabets. J Ind Microbiol Biotechnol 39(4):579–584CrossRefGoogle Scholar
  7. Cieplak M, Holter NS, Maritan A, Banavar JR (2001) Amino acid classes and the protein folding problem. J Chem Phys 114:1420–1423CrossRefGoogle Scholar
  8. Crippen GM (1991) Prediction of protein folding from amino acid sequence over discrete conformation spaces. Biochemistry 30:4232–4237CrossRefGoogle Scholar
  9. Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. Atlas of protein sequence and structure, National Biomedical Research Foundation, p 345–351Google Scholar
  10. Di Giulio M (2008) The origin of the genetic code cannot be studied using measurements based on the PAM matrix because this matrix reflects the code itself, making any such analyses tautologous. J Theor Biol 208(2):141–144CrossRefGoogle Scholar
  11. Dickerson RE, Geis I (1983) Hemoglobin: structure, function, evolution, and pathology. Benjamin/Cummings, Menlo ParkGoogle Scholar
  12. Dosztanyi Z, Torda AE (2001) Amino acid similarity matrices based on force fields. Bioinformatics 17:686–699CrossRefGoogle Scholar
  13. Edgar RC (2004) Local homology recognition and distance measures in linear time using compressed amino acid alphabets. Nucleic Acids Res 32:380–385CrossRefGoogle Scholar
  14. Etchebest C, Benros C, Bornot A, Camproux AC, de Brevern AG (2007) A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. Eur Biophys J 36:1059–1069CrossRefGoogle Scholar
  15. Fan K, Wang W (2003) What is the minimum number of letters required to fold a protein? J Mol Biol 328:921–926CrossRefGoogle Scholar
  16. Fitch WM (1966) An improved method for testing for evolutionary homology. J Mol Biol 16:9–16CrossRefGoogle Scholar
  17. Galton F (1907) Vox populi. Nature 75:450–451CrossRefGoogle Scholar
  18. Gu J, Bourne PE (2009) Structural bioinformatics. Wiley, Hoboken, p 681Google Scholar
  19. Haber E, Anfinsen CB (1962) Side-chain interactions governing the pairing of half-cystine residues in ribonuclease. J Biol Chem 237:1839–18441PubMedGoogle Scholar
  20. Kosiol C, Goldman N, Buttimore NH (2004) A new criterion and method for amino acid classification. J Theor Biol 228:97–106CrossRefGoogle Scholar
  21. Kuhner MK, Felsenstein J (1994) A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol Biol Evol 11:459–468PubMedGoogle Scholar
  22. Lehninger AL (1970) Biochemistry. Worth and Co, New YorkGoogle Scholar
  23. Lenckowski J, Walczak K (2007) Simplifying amino acid alphabets using a genetic algorithm and sequence alignment. Evolute Biol 4447:122–131Google Scholar
  24. Li T, Fan K, Wang J, Wang W (2003) Reduction of protein sequence complexity by residue grouping. Protein Eng 16:323–330CrossRefGoogle Scholar
  25. Liao S-M, Du Q-S, Meng J-Z, Pang Z-W, Huang R-B (2013) The multiple roles of histidine in protein interactions. Chem Cent J 7:44CrossRefGoogle Scholar
  26. Liu X, Liu D, Qi J, Zheng WM (2002) Simplified amino acid alphabets based on deviation of conditional probability from random background. Phys Rev E 66:021906CrossRefGoogle Scholar
  27. Livingstone CD, Barton GJ (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. CABIOS 9:745–756PubMedGoogle Scholar
  28. Lucchese G, Sinha AA, Kanduc D (2012) How a single amino acid change may alter the immunological information of a peptide. Front Biosci 4:1843–1852CrossRefGoogle Scholar
  29. Mahler HR, Cordes EH (1966) Biological chemistry. Harper and Row, New YorkGoogle Scholar
  30. Maiorov VN, Crippen GM (1992) Contact potential that recognizes the correct folding of globular proteins. J Mol Biol 227:876–888CrossRefGoogle Scholar
  31. Mantel N (1967) The detection of disease clustering and a generalized regression approach. Cancer Res 27(2):209–220PubMedGoogle Scholar
  32. Melo F, Marti-Renom MA (2006) Accuracy of sequence alignment and fold assessment using reduced amino acid alphabets. Proteins 63:986–995CrossRefGoogle Scholar
  33. Mirny LA, Shakhnovich EI (1999) Universally conserved positions in protein folds: reading evolutionary signals about stability, folding kinetics and function. J Mol Biol 291:177–196CrossRefGoogle Scholar
  34. Morlini I, Zani S (2012) Dissimilarity and similarity measures for comparing dendrograms and their applications. Adv Data Anal Classif 6(2):85–105CrossRefGoogle Scholar
  35. Murphy LR, Wallqvist A, Levy RM (2000) Simplified amino acid alphabets for protein fold recognition and implications for folding. Protein Eng 13:149–152CrossRefGoogle Scholar
  36. Muskal SM, Holbrook SR, Kim S-H (1990) Prediction of the disulfide-bonding state of cysteine in proteins. Protein Eng 3(8):667–672CrossRefGoogle Scholar
  37. Noivirt-Brik O, Hazan G, Unger R, Ofran Y (2013) Non local residue–residue contacts in proteins are more conserved than local ones. Bioinformatics 29(3):331–337CrossRefGoogle Scholar
  38. Peterson EL, Kondev J, Theriot JA, Phillips R (2009) Reduced amino acid alphabets exhibit an improved sensitivity and selectivity in fold assignment. Bioinformatics 25:1356–1362CrossRefGoogle Scholar
  39. Prlic A, Domingues FS, Sippl MJ (2000) Structure-derived substitution matrices for alignment of distantly related sequences. Protein Eng 13:545–550CrossRefGoogle Scholar
  40. Riddle DS et al (1997) Functional rapidly folding proteins from simplified amino acid sequences. Nat Struct Biol 4:805–809CrossRefGoogle Scholar
  41. Risler JL, Delorme MO, Delacroix H, Henaut A (1988) Amino acid substitutions in structurally related proteins. A pattern recognition approach. Determination of a new and efficient scoring matrix. J Mol Biol 204:1019–1029CrossRefGoogle Scholar
  42. Robson B, Suzuki E (1976) Conformational properties of amino acid residues in globular proteins. J Mol Biol 107:327–356CrossRefGoogle Scholar
  43. Rogov SI, Nekrasov AN (2001) A numerical measure of amino acid residues similarity based on the analysis of their surroundings in natural protein sequences. Protein Eng 14:459–463CrossRefGoogle Scholar
  44. Solis AD, Rackovsky S (2000) Optimized representations and maximal information in proteins. Proteins 38:149–164CrossRefGoogle Scholar
  45. Susko E, Roger AJ (2007) On reduced amino acid alphabets for phylogenetic inference. Mol Biol Evol 24(9):2139–2150CrossRefGoogle Scholar
  46. Tamura K et al (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28:2731–2739CrossRefGoogle Scholar
  47. Taylor WR (1986) The classification of amino acid conservation. J Theor Biol 119:205–218CrossRefGoogle Scholar
  48. Thomas PD, Dill KA (1996) An iterative method for extracting energy-like quantities from protein structures. Proc Natl Acad Sci USA 93:11628–11633CrossRefGoogle Scholar
  49. Wang J, Wang W (1999) A computational approach to simplifying the protein folding alphabet. Nat Struct Biol 6:1033–1038CrossRefGoogle Scholar
  50. Weathers EA, Paulaitis ME, Woolf TB, Hoh JH (2004) Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein. FEBS Lett 576:348–352CrossRefGoogle Scholar
  51. Yampolsky LY, Stolzfus A (2005) The exchangeability of amino acids in proteins. Genetics 170(4):1459–1472CrossRefGoogle Scholar
  52. Zuo YC, Li QZ (2009) Using reduced amino acid composition to predict defense in family and subfamily: integrating similarity measure and structural alphabet. Peptides 30:1788–1793CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.NASA Astrobiology InstituteUniversity of HawaiiHonoluluUSA

Personalised recommendations