Protein Substitution Model and Evolutionary Distance
In addition to nucleotide-based substitution models, there are also models based on amino acid and codon sequences. Observed substitutions between two sense codons depend on codon frequencies, the difference between the two encoded amino acids, and the number of nucleotide site differences between the two codons (which could differ at 1, 2, or all 3 sites). Similarly, observed substitutions between two amino acids depend on amino acid frequencies and amino acid dissimilarities. This chapter focuses on amino acid substitution models with their parameters derived from empirical substitution matrices. How is an empirical substitution matrix compiled? How to derive transition probability and rate matrices from an empirical matrix? How to derive evolutionary distances from these matrices? Under what circumstances one may fail to obtain an evolutionary distance? These questions are addressed in detail with numerical illustrations.