Estimating Phylogenies with Invariant Functions of Data
Estimating phylogenies, or evolutionary trees, is a complex task even under the best of circumstances, and it encounters particular difficulties when using molecular data to investigate distantly related species. In recent years researchers have studied how methods to infer phylogenetic relations, such as those based on parsimony, behave for simple models of nucleic acid evolution. The results are not entirely encouraging: HENDY AND PENNY (1989), for example, illustrated simple cases under which parsimony will converge to an incorrect phylogenetic tree, even for equal rates of evolution. What is encouraging, however, is that researchers are beginning to develop methods of estimating phylogenies which may be robust under conditions where parsimony is not. A strategy shared by some of these methods (CAVENDER AND FELSENSTEIN (1987), LAKE (1987a)) is to use invariant functions of the data to identify the correct topology of the corresponding phylogeny. But which invariants, and how? What assumptions underlie these approaches? I discuss these issues and indicate the direction this research seems to be taking.
KeywordsMacromolecule Stein Pyrimidine Purine
Unable to display preview. Download preview PDF.
- Cavender, J. A. (1989), “Mechanized Derivation of Linear Invariants,” Molecular Biology and Evolution, 6, 301–316. [Using assumptions no stronger than those of LAKE (1987a), the author calculates all linear invariants for rooted phylogenies with four species.]Google Scholar
- Cavender, J. A. (1990), “Necessary Conditions for the Method of Inferring Phylogeny by Linear Invariants,” Mathematical Biosciences, submitted. [The sufficient conditions of CAVENDER (1989) for deriving linear invariants are also necessary.]Google Scholar
- Drolet, S., AND Sankofp, D. (1990), “Quadratic Tree Invariants for Multivalued Characters,” Journal of Theoretical Biology, 144, 117–129. [The authors generalize the work of CAVENDER AND FELSENSTEIN (1987) to obtain quadratic invariants for character data involving four species and having more than two states.]MathSciNetCrossRefGoogle Scholar
- Felsenstein, J. (1978), “Cases in which Parsimony or Compatibility Methods will be Positively Misleading,” Systematic Zoology, 27, 401–410. [The author examines conditions under which methods of phylogenetic inference will fail to converge to a correct phylogeny as more and more data are accumulated.]CrossRefGoogle Scholar
- Felsenstein, J. (1990), “Counting Phylogenetic Invariants,” manuscript. [The author counts the invariants that exist in cases involving four-state characters, four species, and different models of nucleotide substitution.]Google Scholar
- Hendy, M. D., AND Penny, D. (1989), “A Framework for the Quantitative Study of Evolutionary Trees,” Systematic Zoology, 38, 297–309. [The authors extend the work of FELSENSTEIN (1978) by finding new conditions under which parsimony methods will fail to converge to a correct phylogeny as more and more data are accumulated.]CrossRefGoogle Scholar
- Lake, J. A. (1987a), “A Rate-independent Technique for Analysis of Nucleic Acid Sequences: Evolutionary Parsimony,” Molecular Biology and Evolution, 4, 167–191. [The author develops linear invariants for four-state character data involving four species.]Google Scholar
- Lake, James A. (1990), “Comparative Simulations of Evolutionary Parsimony and Augmented Distance Matrix Phylogenetic Reconstruction Algorithms,” manuscript. [The author concludes that, in general, evolutionary parsimony (LAKE 1987a) is a more robust algorithm than those for maximum parsimony or the augmented distance method of Kimura.]Google Scholar
- Pearl, J., AND Tarsi, M. (1986), “Structuring Causal Trees,” Journal of Complexity, 2, 60–77. [The problem is to infer treelike models of complex phenomena where the leaves represent observable random binary variables, and the interior vertices represent hidden causes which explain interleaf dependencies. The authors derive a relationship on which the invariants of CAVENDER AND FELSENSTEIN (1987) are based.]MATHMathSciNetCrossRefGoogle Scholar
- Sankopp, D. (1990), “Designer Invariants for Large Phylogenies,” Molecular Biology and Evolution, to appear. [For two-state character data, the author develops quadratic invariants for phylogenies of five species, or for individual edges in phylogenies of any larger size.]Google Scholar
- Sidow, A., AND Wilson, A. C. (1989), “Compositional Parsimony in the Statistical Testing of DNA Trees,” Second International Symposium on Macromolecules, Genes, and Computers, Waterville Valley, NH, USA, August 1989. [The authors extend the method of evolutionary parsimony (LAKE 1987a) to account for heterogeneity in the compositions of bases in DNA sequences.]Google Scholar