Abstract
Phylogenetic generalised least squares (PGLS) is one of the most commonly employed phylogenetic comparative methods. The technique, a modification of generalised least squares, uses knowledge of phylogenetic relationships to produce an estimate of expected covariance in cross-species data. Closely related species are assumed to have more similar traits because of their shared ancestry and hence produce more similar residuals from the least squares regression line. By taking into account the expected covariance structure of these residuals, modified slope and intercept estimates are generated that can account for interspecific autocorrelation due to phylogeny. Here, we provide a basic conceptual background to PGLS, for those unfamiliar with the approach. We describe the requirements for a PGLS analysis and highlight the packages that can be used to implement the method. We show how phylogeny is used to calculate the expected covariance structure in the data and how this is applied to the generalised least squares regression equation. We demonstrate how PGLS can incorporate information about phylogenetic signal, the extent to which closely related species truly are similar, and how it controls for this signal appropriately, thereby negating concerns about unnecessarily ‘correcting’ for phylogeny. In addition to discussing the appropriate way to present the results of PGLS analyses, we highlight some common misconceptions about the approach and commonly encountered problems with the method. These include misunderstandings about what phylogenetic signal refers to in the context of PGLS (residuals errors, not the traits themselves), and issues associated with unknown or uncertain phylogeny.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Abouheif E (1998) Random trees and the comparative method: a cautionary tale. Evolution 52:1197–1204
Abouheif E (1999) A method for testing the assumption of phylogenetic independence in comparative data. Evol Ecol Res 1:895–909
Bates D (2000) fortunes: R fortunes. R package version 1.5-0, http://CRAN.R-project.org/package=fortunes
Bininda-Emonds ORP (ed) (2004) Phylogenetic supertrees: combining information to reveal the tree of life. Kluwer Academic Publishers, Dordrecht
Björklund M (1997) Are ‘comparative methods’ always necessary? Oikos 80:607–612
Blomberg SP, Garland T Jr (2002) Tempo and mode in evolution: phylogenetic inertia, adaptation and comparative methods. J Evol Biol 15:899–910
Blomberg SP, Garland T Jr, Ives AR (2003) Testing for phylogenetic signal in comparative data: behavioral traits are more labile. Evolution 57:717–745
Blomberg SP, Lefevre JG, Wells JA, Waterhouse M (2012) Independent contrasts and PGLS regression estimators are equivalent. Syst Biol 61:382–391
Bonduriansky R (2007) Sexual selection and allometry: a critical reappraisal of the evidence and ideas. Evolution 61:838–849
Butler MA, King AA (2004) Phylogenetic comparative analysis: a modelling approach for adaptive evolution. Am Nat 164:683–695
Crane J (1975) Fiddler crabs of the Wworld: ocypodidae: genus Uca. Princeton University Press, Princeton
De Villemereuil P, Wells JA, Edwards RD, Blomberg SP (2012) Bayesian models for comparative analysis integrating phylogenetic uncertainty. BMC Evol Biol 12:102
Díaz-Uriarte R, Garland T Jr (1998) Effects of branch lengths errors on the performance of phylogenetically independent contrasts. Syst Biol 47:654–672
Felsenstein J (1973) Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Human Genet 25:471–492
Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125:1–15
Freckleton RP (2009) The seven deadly sins of comparative analysis. J Evol Biol 22:1367–1375
Freckleton RP, Harvey PH, Pagel M (2002) Phylogenetic analysis and comparative data: a test and review of evidence. Am Nat 160:712–726
Fritz SA, Purvis A (2010) Selectivity in mammalian extinction risk and threat types: a new measure of phylogenetic signal strength in binary traits. Conserv Biol 24:1042–1051
Garamszegi LZ, Møller AP (2010) Effects of sample size and intraspecific variation in phylogenetic comparative studies: a meta-analytic review. Biol Rev 85:797–805
Garamszegi LZ, Calhim S, Dochtermann N, Hegyi G, Hurd PL, Jørgensen C, Kutsukake N, Lajeunesse MJ, Pollard KA, Schielzeth H, Symonds MRE, Nakagawa S (2009) Changing philosophies and tools for statistical inferences in behavioral ecology. Behav Ecol 20:1363–1375
Garland T Jr, Ives AR (2000) Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. Am Nat 155:346–364
Garland T Jr, Harvey PH, Ives AR (1992) Procedures for the analysis of comparative data using phylogenetically independent contrasts. Syst Biol 41:18–32
Gittleman JL, Kot M (1990) Adaptation: statistics and a null model for estimating phylogenetic effects. Syst Zool 39:227–241
Grafen A (1989) The phylogenetic regression. Phil Trans R Soc B 326:119–157
Grafen A (2014) phyreg: Implements the phylogenetic regression of Grafen (1989). http://cran.r-project.org/web/packages/phyreg/index.html
Hansen TF (1997) Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351
Hansen TF, Pienaar J, Orzack SH (2008) A comparative method for studying adaptation to a randomly evolving environment. Evolution 62:1965–1977
Harvey PH (1991) Comparing uncertain relationships: the Swedes in revolt. Trends Ecol Evol 6:38–39
Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, Oxford
Huey RB (1987) Phylogeny, history and the comparative method. In: Feder ME, Bennett AF, Burggren WW, Huey RB (eds) New directions in ecological physiology. Cambridge University Press, Cambridge, pp 76–101
Ives AR, Garland T Jr (2010) Phylogenetic logistic regression for binary dependent variables. Syst Biol 59:9–26
Ives AR, Midford PE, Garland T Jr (2007) Within-species variation and measurement error in phylogenetic comparative methods. Syst Biol 56:252–270
Kamilar JM, Cooper N (2013) Phylogenetic signal in primate behaviour, ecology and life history. Phil Trans R Soc B 368:20120341
Kembel SW, Cowan PD, Helmus MR, Cornwell WK, Morlon H, Ackerly DD, Blomberg SP, Webb CO (2010) Picante: R tools for integrating phylogenies and ecology. Bioinformatics 26:1463–1464
Lumley T (2009) fortunes: R fortunes. R package version 1.5-0, http://CRAN.R-project.org/package=fortunes
Losos JB (2011) Seeing the forest for the trees: the limitations of phylogenies in comparative biology. Am Nat 177:709–727
Maddison DR, Swofford DL, Maddison WP (1997) Nexus: an extensible file format for systematic information. Syst Biol 46:590–621
Maddison WP (1990) A method for testing the correlated evolution of two binary characters: are gains or losses concentrated on certain branches of a phylogenetic trees? Evolution 44:539–557
Maddison WP (2000) Testing character correlation using pairwise comparisons on a phylogeny. J Theor Biol 202:195–204
Martins EP (2004) COMPARE. Version 4.6b. Computer programs for the statistical analysis of comparative data. Department of Biology, Indiana University, Bloomington. http://compare.bio.indiana.edu/
Martins EP, Hansen TF (1997) Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. Am Nat 149:646–667
Martins EP, Housworth EA (2002) Phylogeny shape and the phylogenetic comparative method. Syst Biol 51:873–880
Menard S (2000) Coefficients of determination for multiple logistic regression analysis. Am Stat 54(1):17–24
Münkemüller T, Lavergne S, Bzeznik B, Dray S, Jombart T, Schiffers K, Thuiller W (2012) How to measure and test phylogenetic signal. Methods Ecol Evol 3:743–756
Orme D, Freckleton R, Thomas G, Petzoldt T, Fritz S, Isaac N, Pearse W (2012) caper: comparative analysis of phylogenetics and evolution in R. http://CRAN.R-project.org/package=caper
Pagel MD (1992) A method for the analysis of comparative data. J Theor Biol 156:431–442
Pagel M (1994) Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proc R Soc B 255:37–45
Pagel M (1997) Inferring evolutionary processes from phylogenies. Zool Scripta 26:331–348
Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401:877–884
Pagel M, Meade A (2013) BayesTraits version 2.0 (Beta). University of Reading. http://www.evolution.rdg.ac.uk/BayesTraits.html
Paradis E, Claude J (2002) Analysis of comparative data using generalized estimating equations. J Theor Biol 218:175–185
Paradis E, Claude J, Strimmer K (2004) APE: analysis of phylogenetics and evolution in R language. Bioinformatics 20:289–290
Pearse WD, Purvis A (2013) phyloGenerator: an automated phylogeny generation tool for ecologists. Methods Ecol Evol 4:692–698
Pinheiro JC, Bates DM (2000) Mixes-effects models in S and S-PLUS. Springer, Berlin
Pinheiro J, Bates D, DebRoy S, Sarker D, R Development Core Team (2013) nlme: linear and nonlinear mixed effects models. R package version 3.1-111. http://cran.r-project.org/web/packages/nlme/index.html
Promislow DEL, Harvey PH (1990) Living fast and dying young: a comparative analysis of life-history variation among mammals. J Zool 220:417–437
Purvis A, Garland T Jr (1993) Polytomies in comparative analysis of continuous characters. Syst Biol 42:569–575
Purvis A, Gittleman JL, Luh H-K (1994) Truth or consequences: effects of phylogenetic accuracy on two comparative methods. J Theor Biol 167:293–300
Revell LJ (2010) Phylogenetic signal and linear regression on species data. Methods Ecol Evol 1:319–329
Revell LJ (2012) phytools: an R package for phylogenetic comparative biology (and other things). Methods Ecol Evol 3:217–223
Revell LJ, Reynolds RG (2012) A new Bayesian method for fitting evolutionary models to comparative data with intraspecific variation. Evolution 66:2697–2707
Revell LJ, Harmon LJ, Collar DC (2008) Phylogenetic signal, evolutionary process and rate. Syst Biol 57:591–601
Rheindt FE, Grafe TU, Abouheif E (2004) Rapidly evolving traits and the comparative method: how important is testing for phylogenetic signal? Evol Ecol Res 6:377–396
Ridley M (1983) The explanation of organic diversity. Oxford University Press, Oxford
Rohlf FJ (2001) Comparative methods for the analysis of continuous variables: geometric interpretations. Evolution 55:2143–2160
Rosenberg MS (2001) The systematics and taxonomy of fiddler crabs: a phylogeny of the genus Uca. J Crust Biol 21:839–869
Rosenberg MS (2002) Fiddler crab claw shape variation: a geometric morphometric analysis across the genus Uca (Crustacea: Brachyura: Ocypodidae). Biol J Linn Soc 75:147–162
Stone EA (2011) Why the phylogenetic regression appears robust to tree misspecification. Syst Biol 60:245–260
Symonds MRE (2002) The effects of topological inaccuracy in evolutionary trees on the phylogenetic comparative method of independent contrasts. Syst Biol 51:541–553
Symonds MRE, Elgar MA (2002) Phylogeny affects estimation of metabolic scaling in mammals. Evolution 56:2330–2333
Westoby M, Leishman MR, Lord JM (1995) On misinterpreting the ‘phylogenetic correction’. J Ecol 83:531–534
Acknowledgments
We are grateful to László Zsolt Garamszegi for his advice and encouragement during the writing of this chapter. Alan Grafen provided insightful comments on an earlier draft.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Further Mathematical Details of the Calculation of OLS and PGLS Using Our Worked Example
An alternative way of expressing the ordinary least squares regression formula that is quicker and more effective for analysis with more than one predictor is using matrix algebra. Here, the equation to obtain regression estimates is given as
In this case, β is the vector consisting of the parameter estimates (b0, b1, and so on if more than one predictor variable). X is a matrix consisting of n rows and (m + 1) columns (m is the number of predictor variables), where the first column represents a constant (given the value 1 on each row), and the subsequent columns are the X values for each predictor variable. In the matrix formulation, the term \( {\mathbf{X}}^{{\prime }} \) denotes the ‘transpose’ of X—simply put, the rows become columns, and the columns become rows.
When multiplied together, these become \( {\mathbf{X}}^{{\prime }} {\mathbf{X}} \), calculated as follows:
Here, the value in row i, column j of \( {\mathbf{X}}^{{\prime }} {\mathbf{X}} \) equals the sum total of row i elements of \( {\mathbf{X}}^{{\prime }} \) multiplied by their respective column j elements of X. So for example, row 2, column 2 of \( {\mathbf{X}}^{{\prime }} {\mathbf{X}} \) is (1.02 × 1.02) + (1.06 × 1.06) + (0.96 × 0.96) + (0.92 × 0.92) + (0.89 × 0.89) = 4.724.
Finally, the suffix −1 applied to \( {\mathbf{X}}^{{\prime }} {\mathbf{X}} \) indicates the ‘inverse’ matrix. The way the inverse matrix is calculated is somewhat complex but it is the matrix that when multiplied by it original form (\( {\mathbf{X}}^{{\prime }} {\mathbf{X}} \)) produces a matrix with 1s in the diagonal elements, and 0s in the off-diagonals (this is known as the identity matrix—see below).
y is the vector of n rows, containing the values of Y.
As with \( {\mathbf{X}}^{{\prime }} {\mathbf{X}} \), for the \( {\mathbf{X}}^{{\prime }} {\mathbf{y}} \) vector, the row i value is the overall total of each of the row i elements of \( {\mathbf{X}}^{{\prime }} \) multiplied by their respective counterparts in the column of y (i.e. row 2 = (1.02 × 1.38) + (1.06 × 1.41) + (0.96 × 1.36) + (0.92 × 1.22) + (0.89 × 1.13) = 6.336.
Hence, when \( ({\mathbf{X}}^{{\prime }} {\mathbf{X}})^{ - 1} \)is then multiplied by \( {\mathbf{X}}^{{\prime }} {\mathbf{y}} \), we get the OLS solution for β
where the first value (−0.229) is the intercept (b0) and the second value is the slope estimate (b1).
For generalised least squares, an additional element is added to the regression equation, in the form of the variance–covariance matrix, which represents the expected covariance structure of the residuals from the regression equation. In the case of OLS regression, the assumption is that there is no covariance between residuals (i.e. all species are independent of each other, and residuals from closely related species are not more similar on average than residuals from distantly related species). This (n × n) variance–covariance matrix is denoted as C, and the regression equation becomes
Under the assumption that there is no covariance among the residuals and they are normally distributed, with mean = 0 and standard deviation σε, then
The diagonal elements (the line of values from top left to bottom right) therefore represent the variance of the residuals, while the other off-diagonal elements = 0, meaning there is no covariation among the residuals. The inverse of this matrix, \( {\mathbf{C}}^{ - 1} \), has essentially the same properties (all the off-diagonal elements remain as 0) except the diagonal elements now equal \( 1/\sigma_{\varepsilon }^{2} \). When this variance–covariance structure is assumed, the results of GLS are the same as those of OLS (the C part of the regression equation essentially drops out). On the other hand, if the variances are not equal, then you have a standard weighted least squares regression.
For phylogenetic generalised least squares, our expected variance–covariance matrix is Cphyl (see main text), and its inverse
Taking apart the components of the GLS regression equation, we first calculate the product \( {\mathbf{X}}^{{\prime }} {\mathbf{C}}^{ - 1} \)whose row i and column j values are the total of the ith row of \( {\mathbf{X}}^{{\prime }} \) multiplied by the jth column of \( {\mathbf{C}}^{ - 1} \). So, for example, row 2, column 3 of \( {\mathbf{X}}^{{\prime }} {\mathbf{C}}^{ - 1} \) is (1.02 × -0.048) + (1.06 × -0.048) + (0.96 × 0.619) + (0.92 × -0.381) + (0.89 × 0) = 0.144
In similar fashion \( {\mathbf{X}}^{{\prime }} {\mathbf{C}}^{ - 1} {\mathbf{X}} \)is therefore
The inverse of which is
The second component of the GLS regression equation \( {\mathbf{X}}^{{\prime }} {\mathbf{C}}^{ - 1} {\mathbf{y}} \) follows likewise as
where, for example, the first row value (1.139) = (0.142 × 1.38) + (0.142 × 1.41) + (0.142 × 1.36) + (0.142 × 1.22) + (0.142 × 1.13).
Finally, we can combine our two products to obtain the PGLS solution for β.
where b0 = −0.276 and b1 = 1.616.
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Symonds, M.R.E., Blomberg, S.P. (2014). A Primer on Phylogenetic Generalised Least Squares. In: Garamszegi, L. (eds) Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43550-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-43550-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43549-6
Online ISBN: 978-3-662-43550-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)