Estimating Phylogenies with Invariant Functions of Data

  • William H. E. Day
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Estimating phylogenies, or evolutionary trees, is a complex task even under the best of circumstances, and it encounters particular difficulties when using molecular data to investigate distantly related species. In recent years researchers have studied how methods to infer phylogenetic relations, such as those based on parsimony, behave for simple models of nucleic acid evolution. The results are not entirely encouraging: HENDY AND PENNY (1989), for example, illustrated simple cases under which parsimony will converge to an incorrect phylogenetic tree, even for equal rates of evolution. What is encouraging, however, is that researchers are beginning to develop methods of estimating phylogenies which may be robust under conditions where parsimony is not. A strategy shared by some of these methods (CAVENDER AND FELSENSTEIN (1987), LAKE (1987a)) is to use invariant functions of the data to identify the correct topology of the corresponding phylogeny. But which invariants, and how? What assumptions underlie these approaches? I discuss these issues and indicate the direction this research seems to be taking.


Edge Weight Phylogenetic Inference Interior Vertex Correct Topology Linear Invariant 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Cavender, J. A. (1989), “Mechanized Derivation of Linear Invariants,” Molecular Biology and Evolution, 6, 301–316. [Using assumptions no stronger than those of LAKE (1987a), the author calculates all linear invariants for rooted phylogenies with four species.]Google Scholar
  2. Cavender, J. A. (1990), “Necessary Conditions for the Method of Inferring Phylogeny by Linear Invariants,” Mathematical Biosciences, submitted. [The sufficient conditions of CAVENDER (1989) for deriving linear invariants are also necessary.]Google Scholar
  3. Cavender, J. A., AND Felsenstein, J. (1987), “Invariants of Phylogenies in a Simple Case with Discrete States,” Journal of Classification, 4, 57–71. [The authors develop quadratic invariants (K-and L-invariants) for two-state character data involving four species.]zbMATHCrossRefGoogle Scholar
  4. Drolet, S., AND Sankofp, D. (1990), “Quadratic Tree Invariants for Multivalued Characters,” Journal of Theoretical Biology, 144, 117–129. [The authors generalize the work of CAVENDER AND FELSENSTEIN (1987) to obtain quadratic invariants for character data involving four species and having more than two states.]MathSciNetCrossRefGoogle Scholar
  5. Felsenstein, J. (1978), “Cases in which Parsimony or Compatibility Methods will be Positively Misleading,” Systematic Zoology, 27, 401–410. [The author examines conditions under which methods of phylogenetic inference will fail to converge to a correct phylogeny as more and more data are accumulated.]CrossRefGoogle Scholar
  6. Felsenstein, J. (1982), “Numerical Methods for Inferring Evolutionary Trees,” Quarterly Review of Biology, 57, 379–404. [The author surveys methods of inferring phylogenies from character or distance data.]CrossRefGoogle Scholar
  7. Felsenstein, J. (1988), “Phylogenies from Molecular Sequences: Inference and Reliability,” Annual Review of Genetics, 22, 521–565. [The author surveys methods of inferring and evaluating phylogenies from sequence data.]CrossRefGoogle Scholar
  8. Felsenstein, J. (1990), “Counting Phylogenetic Invariants,” manuscript. [The author counts the invariants that exist in cases involving four-state characters, four species, and different models of nucleotide substitution.]Google Scholar
  9. Hendy, M. D., AND Penny, D. (1989), “A Framework for the Quantitative Study of Evolutionary Trees,” Systematic Zoology, 38, 297–309. [The authors extend the work of FELSENSTEIN (1978) by finding new conditions under which parsimony methods will fail to converge to a correct phylogeny as more and more data are accumulated.]CrossRefGoogle Scholar
  10. Lake, J. A. (1987a), “A Rate-independent Technique for Analysis of Nucleic Acid Sequences: Evolutionary Parsimony,” Molecular Biology and Evolution, 4, 167–191. [The author develops linear invariants for four-state character data involving four species.]Google Scholar
  11. Lake, J. A. (1987b), “Origin of the Eukaryotic Nucleus Determined by Rate-invariant Analysis of rRNA Sequences,” Nature, 331, 184–186. [The author applies the method of evolutionary parsimony (LAKE 1987a) to propose a new parkaryotic-karyotic classification.]CrossRefGoogle Scholar
  12. Lake, James A. (1990), “Comparative Simulations of Evolutionary Parsimony and Augmented Distance Matrix Phylogenetic Reconstruction Algorithms,” manuscript. [The author concludes that, in general, evolutionary parsimony (LAKE 1987a) is a more robust algorithm than those for maximum parsimony or the augmented distance method of Kimura.]Google Scholar
  13. Pearl, J., AND Tarsi, M. (1986), “Structuring Causal Trees,” Journal of Complexity, 2, 60–77. [The problem is to infer treelike models of complex phenomena where the leaves represent observable random binary variables, and the interior vertices represent hidden causes which explain interleaf dependencies. The authors derive a relationship on which the invariants of CAVENDER AND FELSENSTEIN (1987) are based.]zbMATHMathSciNetCrossRefGoogle Scholar
  14. Pearl, J. (1986), “Fusion, Propagation, and Structuring in Belief Networks,” Artificial Intelligence, 29, 241–288. [Section 3 of this paper, entitled “Structuring Causal Trees,” includes most of the material found in PEARL AND TARSI (1986).]zbMATHMathSciNetCrossRefGoogle Scholar
  15. Sankopp, D. (1990), “Designer Invariants for Large Phylogenies,” Molecular Biology and Evolution, to appear. [For two-state character data, the author develops quadratic invariants for phylogenies of five species, or for individual edges in phylogenies of any larger size.]Google Scholar
  16. Sidow, A., AND Wilson, A. C. (1989), “Compositional Parsimony in the Statistical Testing of DNA Trees,” Second International Symposium on Macromolecules, Genes, and Computers, Waterville Valley, NH, USA, August 1989. [The authors extend the method of evolutionary parsimony (LAKE 1987a) to account for heterogeneity in the compositions of bases in DNA sequences.]Google Scholar

Copyright information

© Springer-Verlag Berlin · Heidelberg 1991

Authors and Affiliations

  • William H. E. Day
    • 1
  1. 1.Computer Science, Memorial Univ. NewfoundlandSt. John’sCanada

Personalised recommendations