Using Invariants for Phylogenetic Tree Construction

  • Nicholas ErikssonEmail author
Part of the The IMA Volumes in Mathematics and its Applications book series (IMA, volume 149)


Phylogenetic invariants are certain polynomials in the joint probability distribution of a Markov model on a phylogenetic tree. Such polynomials are of theoretical interest in the field of algebraic statistics and they are also of practical interest-they can be used to construct phylogenetic trees. This paper is a self-contained introduction to the algebraic, statistical, and computational challenges involved in the practical use of phylogenetic invariants. We survey the relevant literature and provide some partial answers and many open problems.

Key words

Algebraic statistics phylogenetics semidefinite programming Mahalonobis norm 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Y.M.M. BISHOP, S.E. FIENBERG, AND P.W. HOLLAND (1975). Discrete Multivariate Analysis: Theory and Practice, MIT Press, Cambridge, MA. Reprinted (2007), Springer-Verlag, New York.Google Scholar
  2. 2.
    L. BUZZIGOLI AND A. GIUSTI (1999). An algorithm to calculate the lower and upper bounds of the elements of an array given its marginals, in Proceedings of the Conference on Statistical Data Protection. Luxemburg: Eurostat, pp. 131-147.Google Scholar
  3. 3.
    E. CARLINI AND F. RAPALLO (2005). ThegeometryofstaUstical models for two-way contingency tables with fixed odds ratios, Rendiconti dell'Istituto di Matematica dell'Universita di Trieste, 37:71~84.Google Scholar
  4. 4.
    S.K. CHRISTIANSEN AND H. GIESE (19991). Genetic analysis of obligate barley powdery mildew fungus based on RFPL and virulence loci, Theoretical and Applied Genetics, 79:705-712.Google Scholar
  5. 5.
    Y. CHEN, I.R. DINWOODIE, AND S. SULLIVANT (2006). Sequential importance sampling for multiway tables, Annals of Statistics, 34:523-545.Google Scholar
  6. 6.
    L.R. Cox (2002). Bounds on entries in 3-dimensional contingency tables subject to given marginal totals, In J. Domingo-Ferrer (Ed.), Inference Oontrol in Statistical Databases, Springer-Verlag LNCS 2316, pp. 21-33.Google Scholar
  7. 7.
    L.R. Cox (2003). On properties of multi-dimensional statistical tables, Journal of Statistical Planning and Inference, 117:251-273.MathSciNetzbMATHCrossRefGoogle Scholar
  8. 8.
    J.A. DE LOERA, R. HEMMECKE, J. TAUZER, AND R. YOSHIDA (2004). Effective lattice point counting in rational convex polytopes, Journal of Symbolic Computation, 38:1273-1302.Google Scholar
  9. 9.
    J.A. DE LOERA AND S. ONN (2006). Markov bases of 3-way tables are erbitrerily' complicated, Journal of Symbolic Computation, 41:173-181.Google Scholar
  10. 10.
    [101 P. DIACONIS AND B. STURMFELS (1998). Algebraic algorUbms for sampling from conditional distribution, Annals of Statistics, 26:363-397.Google Scholar
  11. 11.
    [11J A. DOBRA (2002). Statistical Tools for Disclosure Limitation in Multi~way Contingency Tables. Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University.Google Scholar
  12. 12.
    [12J A. DOBRA (2003). Markov bases for decomposable graphical models, Bernoulli, 9(6):1-16.Google Scholar
  13. 13.
    A. DOBRA AND S.E. FIENBERG (2000). Bounds for cell entries iti contingency tables given marginal totals and decomposable graphs, Proceedings of the National Academy of Sciences, 97:11885-11892.Google Scholar
  14. 14.
    A. DOBRAAND S.E. FIENBERG (2001). "Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation," Statistical Journal of the United Nations ECE, 18:363-37l.Google Scholar
  15. 15.
    A. DOBRAAND S.E. FIENBERG (2003). Bounding entries in multi-way contingency tables given a set of marginal totals, in Y. Haitovsky, H.R. Lerche, and Y. Ritov, eds., Foundations of Statistical Inference: Proceedings of the Shoresh Conference 2000, Physica-Verlag, pp. 3-16.Google Scholar
  16. 16.
    A. DOBRA AND S.E. FIENBERG (2008). The generalized shuttle algorithm, in P. Gibilisco, Eva Riccomagno, Maria-Piera Rogantin (eds.) Algebraic and Geometric Methods in Probability and Statistics, Cambridge University Press, to appear.Google Scholar
  17. 17.
    [17) A. DOBRA, S.E. FIENBERG, AND M. TROTTINI (2003). Assessing the risk of disclosure of confidential categorical data, in J. Bernardo et al., eds., Bayesian StatisUcs 7, Oxford University Press, pp. 125-144.Google Scholar
  18. 18.
    [18J P. DOYLE, J. LANE, J. THEEUWES, AND L. ZAYATZ (eds.) (2001). Confidentiality, Disclosure and Data Access: Theory and Practical Applications for StatistJ'cal Agencies. Elsevier.Google Scholar
  19. 19.
    [191 D. EDWARDS (1992). Linkage analysis using log-lineet models, Computational Statistics and Data Analysis, 10:281-290.Google Scholar
  20. 20.
    N. ERIKSSON, S.E. FIENBERG, A. RJNALDO, AND S. SULLIVANT (2006). Polyhedral conditions for the non-existence of the MLE for hierarchical log-linear models, Journal of Symbolic Computation, 41:222-233.Google Scholar
  21. 21.
    S.E. FIENBERG (1999). Frechet and Bonferroni bounds for multi-way tables of counts With applications to disclosure limitation, In Statistical Data Protection, Proceedings of the Conference, Lisbon, Eurostat, pp. 115-13l.Google Scholar
  22. 22.
    S.E. FIENBERG, U.E. MAKOV, M.M. MEYER, AND R.J. STEELE (2001). "Computing the exact distribution for a multi-way contingency table conditional on its marginal totals," in A.K.M.E. Saleh, ed., Data Analysis from Statistical Foundations: A Festschrift in Honor of the 75th Birthday of D. A. S. Fraser, Nova Science Publishers, Huntington, NY, pp. 145-165.Google Scholar
  23. 23.
    S.E. FIENBERG AND A. RINALDO (2006). Computing maximum likelihood estimates in log-linear models, Technical Report 835, Department of Statistics, Carnegie Mellon University.Google Scholar
  24. 24.
    S.E. FIENBERG AND A. RINALDO (2007). Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation, Journal of Statistical Planning and Inference, 137:3430-3445.Google Scholar
  25. 25.
    S.E. FIENBERG AND A.B. SLAVKOVIC (2004a). Making the release of confidential data from multi-way tables count, Chance, 17(3):5-10.Google Scholar
  26. 26.
    S.E FIENBERG AND A.S. SLAVKOVIC (2005). Preserving the confidentiality of categorical databases when releasing information for association rules, Data Mining and Knowledge Discovery, 11:155-180.Google Scholar
  27. 27.
    L. GARCIA, M. STILLMAN, AND B. STURMFELS (2005). Algebraic geometry for Bayesian networks, Journal of Symbolic Computation, 39:331-355. [28] E. GAWRILOW AND M. JOSWIG (2005). Geometric reasoning with polymake, Manuscript available at arXiv:math. CO/0507273.Google Scholar
  28. 28.
    D. GEIGER, C. MEEK, AND B. STURMFELS (2006). On the toric algebra ofgraphical models, Annals of Statistics, 34:1463-1492.Google Scholar
  29. 29.
    S.J. HABERMAN (1974). The Analysis of Frequency Data, University of Chicago Press, Chicago, Illinois.Google Scholar
  30. 30.
    S. HO~TEN AND B. STURMFELS (2006). Computing the integer programming gap, Combinatorica, 27:367-382. Google Scholar
  31. 31.
    S.L. LAURJTZEN (1996). Graphical Models, Oxford University Press, New York.Google Scholar
  32. 32.
    R.B. NELSEN (2006). An Introduction to Copulas. Springer-Verlag, New York.Google Scholar
  33. 33.
    A. RINALDO (2005). Maximum Likelihood Estimation for Log-linear Models. Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University.Google Scholar
  34. 34.
    A. RINALDO (2006). On maximum likelihood estimation for log-linear models, submitted for publication.Google Scholar
  35. 35.
    F. SANTOS AND B. STURMFELS (2003). Higher Lawrence configurations, J. Cornbin. Theory Ser. A, 103:151-164.Google Scholar
  36. 36.
    A.B. SLAVKOVIC (2004). Statistical Disclosure Limitation Beyond the Margins: Characterization of Joint Distributions for Contingency Tables. Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University.Google Scholar
  37. 37.
    A.B. SLAVKOVIC AND B. SMUCKER (2007). Calculating Cell Bounds in Contingency Tables Based on Conditional Frequencies, Technical Report, Department of Statistics, Penn State University.Google Scholar
  38. 38.
    A.B. SLAVKOVIC AND FIENBERG, S. E. (2004). Bounds for Cell Entries in Two-way Tables Given Conditional Relative Frequencies, In Domingo-Ferrer, J. and Terra, V. (eds.), Privacy in Statistical Databases, Lecture Notes in Computer Science No. 3050, pp. 30-43. New York: Springer-Verlag.Google Scholar
  39. 39.
    A.B. SLAVKOVIC AND S.E. FIENBERG (2008). The algebraic geometry of 2 x 2 contingency tables, forthcoming.Google Scholar
  40. 40.
    B. STURMFELS (1995). Grebner Bases and Convex Polytope, American Mathematical Society, University Lecture Series, 8.Google Scholar
  41. 41.
    S. SULL1VANT (2006). Compressed polytopes and statistical disclosure limitation, Tohoku Mathematical Journal, 58(3):433-445.Google Scholar
  42. 42.
    S. SULLIVANT (2005). Small contingency tables with large gaps, SIAM Journal of Discrete Mathematics, 18(4):787-793.Google Scholar
  43. 43.
    G.M. ZIEGLER (1998). Lectures on Polytopes, Springer-Verlag, New York.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of StatisticsUniversity of ChicagoChicagoUSA

Personalised recommendations