Skip to main content

Abstract

Using phylogenetic generalized least squares (PGLS) means to fit a linear regression aiming to investigate the impact of one or several predictor variables on a single response variable while controlling for potential phylogenetic signal in the response (and, hence, non-independence of the residuals). The key difference between PGLS and standard (multiple) regression is that PGLS allows us to control for residuals being potentially non-independent due to the phylogenetic history of the taxa investigated. While the assumptions of PGLS regarding the underlying processes of evolution and the correlation of the predictor and response variables with the phylogeny have received considerable attention, much less focus has been put on the checks of model reliability and stability commonly used in case of standard general linear models. However, several of these checks could be similarly applied in the context of PGLS as well. Here, I describe how such checks of model stability and reliability could be applied in the context of a PGLS and what could be done in case they reveal potential problems. Besides treating general questions regarding the conceptual and technical validity of the model, I consider issues regarding the sample size, collinearity among the predictors, the distribution of the predictors and the residuals, model stability, and drawing inference based on P-values. Finally, I emphasize the need for reporting checks of assumptions (and their results) in publications.

The original version of this chapter was revised: Online Practical Material website has been updated. The erratum to this chapter is available at https://doi.org/10.1007/978-3-662-43550-2_23

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Having an interaction between two predictors in a model means to allow for a situation where the impact of one of the two on the response is dependent on the value or state of the other and vice versa. Interactions can involve two or more covariates, two or more factors, and any mixture of covariates and factors.

  2. 2.

    Note that ‘number of predictors' should actually be labeled ‘number of estimated terms' (meaning that a factor would be counted as the number of its levels minus 1, interactions and squared terms need to be considered, and in the context of a PGLS a parameter like lambda needs to be counted as well).

  3. 3.

    For estimating the effect of an interaction reasonably well, more cases per combination of the levels of the factors would be needed.

  4. 4.

    Note that residuals of a PGLS are actually multivariate normal (Freckleton et al. 2011), which has implications for practical checks of their distribution; see the Online Practical Material (http://www.mpcm-evolution.com) for more.

  5. 5.

    Note that this requires the model to be fitted using maximum likelihood; see the Online Practical Material (http://www.mpcm-evolution.com) for more details.

References

  • Aiken LS, West SG (1991) Multiple regression: testing and interpreting interactions. Sage, Newbury Park

    Google Scholar 

  • Arnold C, Nunn CL (2010) Phylogenetic targeting of research effort in evolutionary biology. Am Nat 176:601–612

    Article  Google Scholar 

  • Budaev SV (2010) Using principal components and factor analysis in animal behaviour research: caveats and guidelines. Ethology 116:472–480

    Article  Google Scholar 

  • Burnham KP, Anderson DR (2002) Model selection and multimodel inference, 2nd edn. Springer, Berlin

    Google Scholar 

  • Chatfield C (1995) Model uncertainty, data mining and statistical inference. J Roy Stat Soc A 158:419–466

    Article  Google Scholar 

  • Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, New York

    Google Scholar 

  • Cohen J, Cohen P (1983) Applied multiple regression/correlation analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates Inc., New Jersey

    Google Scholar 

  • Cooper N, Jetz W, Freckleton RP (2010) Phylogenetic comparative approaches for studying niche conservatism. J Evol Biol 23:2529–2539

    Article  CAS  Google Scholar 

  • Díaz-Uriarte R, Garland T Jr (1996) Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Syst Biol 45:27–47

    Article  Google Scholar 

  • Díaz-Uriarte R, Garland T Jr (1998) Effects of branch length errors on the performance of phylogenetically independent contrasts. Syst Biol 47:654–672

    Article  Google Scholar 

  • Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125:1–15

    Article  Google Scholar 

  • Felsenstein J (1988) Phylogenies and quantitative characters. Ann Rev Ecol Syst 19:445–471

    Article  Google Scholar 

  • Field A (2005) Discovering statistics using SPSS. Sage Publications, London

    Google Scholar 

  • Forstmeier W, Schielzeth H (2011) Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner’s curse. Behav Ecol Sociobiol 65:47–55

    Article  Google Scholar 

  • Fox J, Monette G (1992) Generalized collinearity diagnostics. J Am Stat Assoc 87:178–183

    Article  Google Scholar 

  • Freckleton RP (2009) The seven deadly sins of comparative analysis. J Evol Biol 22:1367–1375

    Article  CAS  Google Scholar 

  • Freckleton RP (2011) Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error. Behav Ecol Sociobiol 65:91–101

    Article  Google Scholar 

  • Freckleton RP, Cooper N, Jetz W (2011) Comparative methods as a statistical fix: the dangers of ignoring an evolutionary model. Am Nat 178:E10–E17

    Article  Google Scholar 

  • Freckleton RP, Jetz W (2009) Space versus phylogeny: disentangling phylogenetic and spatial signals in comparative data. Proc Roy Soc B—Biol Sci 276:21–30

    Article  Google Scholar 

  • Garamszegi LZ, Møller AP (2012) Untested assumptions about within-species sample size and missing data in interspecific studies. Behav Ecol Sociobiol 66:1363–1373

    Article  Google Scholar 

  • Garland T Jr, Ives AR (2000) Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. Am Nat 155:346–364

    Article  Google Scholar 

  • Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge

    Google Scholar 

  • Grafen A (1989) The phylogenetic regression. Phil Trans Roy Soc Lond B, Biol Sci 326:119–157

    Article  CAS  Google Scholar 

  • Grafen A, Ridley M (1996) Statistical tests for discrete cross-species data. J Theor Biol 183:225–267

    Article  Google Scholar 

  • Hansen TF (1997) Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351

    Article  Google Scholar 

  • Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, Oxford

    Google Scholar 

  • Ives AR, Garland T Jr (2010) Phylogenetic logistic regression for binary dependent variables. Syst Biol 59:9–26

    Article  Google Scholar 

  • Martins EP, Diniz-Filho JAF, Housworth EA (2002) Adaptive constraints and the phylogenetic comparative method: a computer simulation test. Evolution 56:1–13

    Article  Google Scholar 

  • Mundry R (2011) Issues in information theory based statistical inference—a commentary from a frequentist’s perspective. Behav Ecol Sociobiol 65:57–68

    Article  Google Scholar 

  • Nunn CL (2011) The comparative approach in evolutionary anthropology and biology. The University of Chicago Press, Chicago

    Book  Google Scholar 

  • Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401:877–884

    Article  CAS  Google Scholar 

  • Polly PD, Lawing AM, Fabre A-C, Goswami A (2013) Phylogenetic principal components analysis and geometric morphometrics. Hystrix, Ital J Mammal 24:33–41

    Google Scholar 

  • Quinn GP, Keough MJ (2002) Experimental designs and data analysis for biologists. Cambridge University Press, Cambridge

    Google Scholar 

  • Ramsey PH (1980) Exact type 1 error rates for robustness of student’s t test with unequal variances. J Educ Stat 5:337–349

    Article  Google Scholar 

  • R Core Team (2013) R: a language and environment for statistical computing. R foundation for statistical computing. Vienna, Austria

    Google Scholar 

  • Revell LJ (2009) Size-correction and principal components for interspecific comparative studies. Evolution 63:3258–3268

    Article  Google Scholar 

  • Revell LJ (2010) Phylogenetic signal and linear regression on species data. Methods Ecol Evol 1:319–329

    Article  Google Scholar 

  • Rohlf FJ (2006) A comment on phylogenetic correction. Evolution 60:1509–1515

    Article  Google Scholar 

  • Schielzeth H (2010) Simple means to improve the interpretability of regression coefficients. Meth Ecol Evol 1:103–113

    Article  Google Scholar 

  • Zuur AF, Ieno EN, Elphick CS (2010) A protocol for data exploration to avoid common statistical problems. Meth Ecol Evol 1:3–14

    Article  Google Scholar 

Download references

Acknowledgments

First of all, I would like to thank László Zsolt Garamszegi for inviting me to write this chapter. I also thank László Zsolt Garamszegi and two anonymous reviewers for very helpful comments on an earlier draft of this chapter. I equally owe thanks to Charles L. Nunn for initially leading my attention to the need for and rationale of phylogenetically corrected statistical analyses. During the three AnthroTree workshops held in Amherst, MA, U.S.A., in 2010–2012 and supported by the NSF (BCS-0923791) and the National Evolutionary Synthesis Center (NSF grant EF-0905606) I learnt a lot about the philosophy and practical implementation of phylogenetic approaches to statistical analyses, and I am very grateful to have had the opportunity to attend them. This article was mainly written during a stay on the wonderful island of Læsø, Denmark, and I owe warm thanks to the staff of the hotel Havnebakken for their hospitality that made my stay very enjoyable and productive at the same time.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roger Mundry .

Editor information

Editors and Affiliations

Glossary

Case

Set of entries in the data referring to the same taxon; represented by one row in the data set and corresponds to one tip in the phylogeny.

Covariate

Quantitative predictor variable.

Dummy coding

Way of representing a factor in a linear model, by turning it into a set of ‘quantitative’ variables. One level of the factor is defined the ‘reference’ level (or reference category), and for each of the other levels a variable is created which is one if the respective case in the data set is of that level and zero otherwise. The estimate derived for a dummy coded variable reveals the degree by which the response in the coded level differs from that of the reference level.

Factor

Qualitative (or categorical) predictor variable.

General linear model

Unified approach to test the effect(s) of one or several quantitative or categorical predictors on a single quantitative response; makes the assumptions of normally and homogeneously distributed residuals; multiple regression, ANOVA, ANCOVA, and the t-tests are all just special cases of the general linear model.

Level

Particular value of a factor (for instance, the factor ‘sex’ has the levels ‘female’ and ‘male’).

Predictor (variable)

Variable for which its influence on the response variable should be investigated or controlled for; can be a factor or a covariate.

Response (variable)

Variable being in the focus of the study and for which it should be investigated how one or several predictors influence it.

Right (left) skewed distribution

Distribution with many small and few large values (a left skewed distribution shows the opposite pattern).

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mundry, R. (2014). Statistical Issues and Assumptions of Phylogenetic Generalized Least Squares. In: Garamszegi, L. (eds) Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43550-2_6

Download citation

Publish with us

Policies and ethics