Phylogenetic Regression for Binary Dependent Variables

  • Anthony R. IvesEmail author
  • Theodore GarlandJr.


We compare three methods for phylogenetic regression analyses designed for binary dependent variables (traits with two discrete states) both with each other and with “standard” methods that either ignore phylogenetic relationships or ignore the binary character of the dependent variable. In simulations designed to reveal statistical problems arising in different methods, PLogReg (Ives and Garland 2010) performed better than PGLMM (Ives and Helmus 2011) and MCMCglmm (Hadfield 2010) to identify phylogenetic signal in the absence of independent variables; PLogReg also outperformed a standard method for detecting phylogenetic signal in binary data, ancestral character estimation (Schluter et al. 1997; Pagel 1994). All three phylogenetic methods performed similarly for identifying relationships with a continuously valued independent variable x, with all methods having at most moderately inflated Type I error rates, and MCMCglmm having slightly greater power. In contrast, standard logistic regression that ignores phylogeny had seriously inflated Type I errors when x had phylogenetic signal. Perhaps surprisingly, phylogenetic regression that ignored the binary nature of the dependent variable, RegOU (Lavin et al. 2008), performed as well or better than the other methods, at least for larger sample sizes (≥64 species), although this approach does not result in a model that can be used to simulate data (e.g., for bootstrapping). We also apply the methods to a data set describing whether antelopes fight or flee versus hide from predators as a function of their group size (Brashares et al. 2000). We end with rough guidelines for analyzing binary dependent variables, with the main recommendation being that multiple methods and simulations should be used to give confidence in the statistical results.


  1. Blomberg SP, Garland T Jr, Ives AR (2003) Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 57:717–745CrossRefGoogle Scholar
  2. Bolker BM, Brooks ME, Clark CJ, Geange SW, Poulsen JR, Stevens MHH, White JSS (2009) Generalized linear mixed models: a practical guide for ecology and evolution. Trends Ecol Evol 24(3):127–135. doi: 10.1016/j.tree.2008.10.008CrossRefGoogle Scholar
  3. Bonine KE, Gleeson TT, Garland T Jr (2005) Muscle fibre-type variation in lizards (Squamata) and phylogenetic reconstruction of hypothesized ancestral states. J Exp Biol 208:4529–4547CrossRefGoogle Scholar
  4. Brashares JS, Garland T Jr, Arcese P (2000) Phylogenetic analysis of coadaptation in behavior, diet, and body size in the African antelope. Behav Ecol 11(4):452–463CrossRefGoogle Scholar
  5. Diaz-Uriarte R, Garland T Jr (1996) Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Syst Biol 45(1):27–47CrossRefGoogle Scholar
  6. Diggle P, Heagerty P, Liang K, Zeger S (2004) Analysis of longitudinal data, 2nd edn. Oxford University Press, OxfordGoogle Scholar
  7. Dlugosz EM, Chappell MA, Meek TH, Szafrañska P, Zub K, Konarzewski M, Jones JH, Bicudo JEPW, Careau V, Garland T Jr (2013) Phylogenetic analysis of mammalian maximal oxygen consumption during exercise. J Exp Biol 216:4712–4721CrossRefGoogle Scholar
  8. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman and Hall, New YorkCrossRefGoogle Scholar
  9. Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125:1–15CrossRefGoogle Scholar
  10. Felsenstein J (1988) Phylogenies and quantitative characters. Annu Rev Ecol Syst 19:445–471CrossRefGoogle Scholar
  11. Felsenstein J (2012) A comparative method for both discrete and continuous characters using the threshold model. Am Nat 179(2):145–156. doi: 10.1086/663681CrossRefGoogle Scholar
  12. Firth D (1993) Bias reduction of maximum likelihood estimates. Biometrika 80(1):27–38CrossRefGoogle Scholar
  13. Freckleton RP, Harvey PH, Pagel M (2002) Phylogenetic analysis and comparative data: a test and review of evidence. Am Nat 160:712–726CrossRefGoogle Scholar
  14. Garland T Jr, Dickerman AW, Janis CM, Jones JA (1993) Phylogenetic analysis of covariance by computer-simulation. Syst Biol 42(3):265–292CrossRefGoogle Scholar
  15. Garland T Jr, Harvey PH, Ives AR (1992) Procedures for the analysis of comparative data using phylogenetically independent contrasts. Syst Biol 41:18–32CrossRefGoogle Scholar
  16. Garland T Jr, Midford PE, Ives AR (1999) An introduction to phylogenetically based statistical methods, with a new method for confidence intervals on ancestral values. Am Zool 39:374–388CrossRefGoogle Scholar
  17. Gelman A, Carlin JB, Stern HS, Rubin DB (1995) Bayesian data analysis, 1st edn. Chapman and Hall, LondonGoogle Scholar
  18. Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, New YorkGoogle Scholar
  19. Grafen A (1989) The phylogenetic regression. Trans R Soc Lond B, Biol Sci 326:119–157CrossRefGoogle Scholar
  20. Hadfield JD (2010) MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. J Stat Softw 33:1–22CrossRefGoogle Scholar
  21. Hadfield JD, Nakagawa S (2010) General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters. J Evol Biol 23(3):494–508. doi: 10.1111/j.1420-9101.2009.01915.xCrossRefGoogle Scholar
  22. Hansen TF (1997) Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351CrossRefGoogle Scholar
  23. Hansen TF, Orzack SH (2005) Assessing current adaptive and phylogenetic inertia explanations of trait evolution: the need for controlled comparisons. Evolution 59:2063–2072PubMedPubMedCentralGoogle Scholar
  24. Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, OxfordGoogle Scholar
  25. Heinze G, Ploner M, Dunkler D, Southworth H (2013) logistf: Firth’s bias reduced logistic regression. Vol R package version 1.21Google Scholar
  26. Heinze G, Schemper M (2002) A solution to the problem of separation in logistic regression. Stat Med 21:2409–2419CrossRefGoogle Scholar
  27. Ho LST, Ane C (2014) A linear-time algorithm for gaussian and non-gaussian trait evolution models. Systematic Biology in pressGoogle Scholar
  28. Ives AR, Garland T (2010) Phylogenetic logistic regression for binary dependent variables. Syst Biol 59(1):9–26. doi: 10.1093/sysbio/syp074CrossRefPubMedPubMedCentralGoogle Scholar
  29. Ives AR, Helmus MR (2011) Generalized linear mixed models for phylogenetic analyses of community structure. Ecol Monogr 81:511–525CrossRefGoogle Scholar
  30. Jarman PJ (1974) The social organisation of antelope in relation to their ecology. Behaviour 48:215–267CrossRefGoogle Scholar
  31. Judge GG, Griffiths WE, Hill RC, Lutkepohl H, Lee T-C (1985) The theory and practice of econometrics, 2nd edn. Wiley, New YorkGoogle Scholar
  32. Kembel SW, Cowan PD, Helmus MR, Cornwell WK, Morlon H, Ackerly DD, Blomberg SP, Webb CO (2010) Picante: R tools for integrating phylogenies and ecology. Bioinformatics 26:1463–1464CrossRefGoogle Scholar
  33. Lavin SR, Karasov WH, Ives AR, Middleton KM, Garland T Jr (2008) Morphometrics of the avian small intestine, compared with non-flying mammals: a phylogenetic approach. Physiol Biochem Zool 81:526–550CrossRefGoogle Scholar
  34. Martins EP, Diniz JAF, Housworth EA (2002) Adaptive constraints and the phylogenetic comparative method: a computer simulation test. Evolution 56(1):1–13CrossRefGoogle Scholar
  35. Martins EP, Garland T Jr (1991) Phylogenetic analyses of the correlated evolution of continuous characters: a simulation study. Evolution 45:534–557CrossRefGoogle Scholar
  36. Martins EP, Hansen TF (1997) Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am Nat 149:646–667. Erratum 153:448Google Scholar
  37. McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, LondonCrossRefGoogle Scholar
  38. McCulloch CE, Searle SR, Neuhaus JM (2008) Generalized, linear, and mixed models. Wiley, Hoboken, NJGoogle Scholar
  39. Pagel M (1994) Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc R Soc Lond Ser B Biol Sci 255(1342):37–45CrossRefGoogle Scholar
  40. Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20:289–290CrossRefGoogle Scholar
  41. Revell LJ (2010) Phylogenetic signal and linear regression on species data. Methods Ecol Evol 1:319–329CrossRefGoogle Scholar
  42. Revell LJ (2012) Analyzing continuous character evolution on a phylogeny. Integr Comp Biol 52:E145–E145Google Scholar
  43. Revell LJ, Harmon LJ, Collar DC (2008) Phylogenetic signal, evolutionary process, and rate. Syst Biol 57(4):591–601. doi: 10.1080/10635150802302427CrossRefPubMedGoogle Scholar
  44. Rezende EL, Diniz JAF (2012) Phylogenetic analyses: comparing species to infer adaptations and physiological mechanisms. Compr Physiol 2(1):639–674. doi: 10.1002/cphy.c100079CrossRefPubMedGoogle Scholar
  45. Schluter D, Price T, Mooers AO, Ludwig D (1997) Likelihood of ancestor states in adaptive radiation. Evolution 51:1699–1711CrossRefGoogle Scholar
  46. Villemereuil P, Gimenez O, Doligez B (2013) Comparing parent-offspring regression with frequentist and Bayesian animal models to estimate heritability in wild populations: a simulation study for Gaussian and binary traits. Methods Ecol Evol 4(3):260–275. doi: 10.1111/2041-210x.12011CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.UW-MadisonMadisonUSA
  2. 2.UC-RiversideRiversideUSA

Personalised recommendations