Statistical Issues and Assumptions of Phylogenetic Generalized Least Squares

Mundry, Roger

doi:10.1007/978-3-662-43550-2_6

Roger Mundry²

8432 Accesses
112 Citations

Abstract

Using phylogenetic generalized least squares (PGLS) means to fit a linear regression aiming to investigate the impact of one or several predictor variables on a single response variable while controlling for potential phylogenetic signal in the response (and, hence, non-independence of the residuals). The key difference between PGLS and standard (multiple) regression is that PGLS allows us to control for residuals being potentially non-independent due to the phylogenetic history of the taxa investigated. While the assumptions of PGLS regarding the underlying processes of evolution and the correlation of the predictor and response variables with the phylogeny have received considerable attention, much less focus has been put on the checks of model reliability and stability commonly used in case of standard general linear models. However, several of these checks could be similarly applied in the context of PGLS as well. Here, I describe how such checks of model stability and reliability could be applied in the context of a PGLS and what could be done in case they reveal potential problems. Besides treating general questions regarding the conceptual and technical validity of the model, I consider issues regarding the sample size, collinearity among the predictors, the distribution of the predictors and the residuals, model stability, and drawing inference based on P-values. Finally, I emphasize the need for reporting checks of assumptions (and their results) in publications.

The original version of this chapter was revised: Online Practical Material website has been updated. The erratum to this chapter is available at https://doi.org/10.1007/978-3-662-43550-2_23

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Having an interaction between two predictors in a model means to allow for a situation where the impact of one of the two on the response is dependent on the value or state of the other and vice versa. Interactions can involve two or more covariates, two or more factors, and any mixture of covariates and factors.
2.
Note that ‘number of predictors' should actually be labeled ‘number of estimated terms' (meaning that a factor would be counted as the number of its levels minus 1, interactions and squared terms need to be considered, and in the context of a PGLS a parameter like lambda needs to be counted as well).
3.
For estimating the effect of an interaction reasonably well, more cases per combination of the levels of the factors would be needed.
4.
Note that residuals of a PGLS are actually multivariate normal (Freckleton et al. 2011), which has implications for practical checks of their distribution; see the Online Practical Material (http://www.mpcm-evolution.com) for more.
5.
Note that this requires the model to be fitted using maximum likelihood; see the Online Practical Material (http://www.mpcm-evolution.com) for more details.

References

Aiken LS, West SG (1991) Multiple regression: testing and interpreting interactions. Sage, Newbury Park
Google Scholar
Arnold C, Nunn CL (2010) Phylogenetic targeting of research effort in evolutionary biology. Am Nat 176:601–612
Article Google Scholar
Budaev SV (2010) Using principal components and factor analysis in animal behaviour research: caveats and guidelines. Ethology 116:472–480
Article Google Scholar
Burnham KP, Anderson DR (2002) Model selection and multimodel inference, 2nd edn. Springer, Berlin
Google Scholar
Chatfield C (1995) Model uncertainty, data mining and statistical inference. J Roy Stat Soc A 158:419–466
Article Google Scholar
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, New York
Google Scholar
Cohen J, Cohen P (1983) Applied multiple regression/correlation analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates Inc., New Jersey
Google Scholar
Cooper N, Jetz W, Freckleton RP (2010) Phylogenetic comparative approaches for studying niche conservatism. J Evol Biol 23:2529–2539
Article CAS Google Scholar
Díaz-Uriarte R, Garland T Jr (1996) Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Syst Biol 45:27–47
Article Google Scholar
Díaz-Uriarte R, Garland T Jr (1998) Effects of branch length errors on the performance of phylogenetically independent contrasts. Syst Biol 47:654–672
Article Google Scholar
Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125:1–15
Article Google Scholar
Felsenstein J (1988) Phylogenies and quantitative characters. Ann Rev Ecol Syst 19:445–471
Article Google Scholar
Field A (2005) Discovering statistics using SPSS. Sage Publications, London
Google Scholar
Forstmeier W, Schielzeth H (2011) Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner’s curse. Behav Ecol Sociobiol 65:47–55
Article Google Scholar
Fox J, Monette G (1992) Generalized collinearity diagnostics. J Am Stat Assoc 87:178–183
Article Google Scholar
Freckleton RP (2009) The seven deadly sins of comparative analysis. J Evol Biol 22:1367–1375
Article CAS Google Scholar
Freckleton RP (2011) Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error. Behav Ecol Sociobiol 65:91–101
Article Google Scholar
Freckleton RP, Cooper N, Jetz W (2011) Comparative methods as a statistical fix: the dangers of ignoring an evolutionary model. Am Nat 178:E10–E17
Article Google Scholar
Freckleton RP, Jetz W (2009) Space versus phylogeny: disentangling phylogenetic and spatial signals in comparative data. Proc Roy Soc B—Biol Sci 276:21–30
Article Google Scholar
Garamszegi LZ, Møller AP (2012) Untested assumptions about within-species sample size and missing data in interspecific studies. Behav Ecol Sociobiol 66:1363–1373
Article Google Scholar
Garland T Jr, Ives AR (2000) Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. Am Nat 155:346–364
Article Google Scholar
Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge
Google Scholar
Grafen A (1989) The phylogenetic regression. Phil Trans Roy Soc Lond B, Biol Sci 326:119–157
Article CAS Google Scholar
Grafen A, Ridley M (1996) Statistical tests for discrete cross-species data. J Theor Biol 183:225–267
Article Google Scholar
Hansen TF (1997) Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351
Article Google Scholar
Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, Oxford
Google Scholar
Ives AR, Garland T Jr (2010) Phylogenetic logistic regression for binary dependent variables. Syst Biol 59:9–26
Article Google Scholar
Martins EP, Diniz-Filho JAF, Housworth EA (2002) Adaptive constraints and the phylogenetic comparative method: a computer simulation test. Evolution 56:1–13
Article Google Scholar
Mundry R (2011) Issues in information theory based statistical inference—a commentary from a frequentist’s perspective. Behav Ecol Sociobiol 65:57–68
Article Google Scholar
Nunn CL (2011) The comparative approach in evolutionary anthropology and biology. The University of Chicago Press, Chicago
Book Google Scholar
Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401:877–884
Article CAS Google Scholar
Polly PD, Lawing AM, Fabre A-C, Goswami A (2013) Phylogenetic principal components analysis and geometric morphometrics. Hystrix, Ital J Mammal 24:33–41
Google Scholar
Quinn GP, Keough MJ (2002) Experimental designs and data analysis for biologists. Cambridge University Press, Cambridge
Google Scholar
Ramsey PH (1980) Exact type 1 error rates for robustness of student’s t test with unequal variances. J Educ Stat 5:337–349
Article Google Scholar
R Core Team (2013) R: a language and environment for statistical computing. R foundation for statistical computing. Vienna, Austria
Google Scholar
Revell LJ (2009) Size-correction and principal components for interspecific comparative studies. Evolution 63:3258–3268
Article Google Scholar
Revell LJ (2010) Phylogenetic signal and linear regression on species data. Methods Ecol Evol 1:319–329
Article Google Scholar
Rohlf FJ (2006) A comment on phylogenetic correction. Evolution 60:1509–1515
Article Google Scholar
Schielzeth H (2010) Simple means to improve the interpretability of regression coefficients. Meth Ecol Evol 1:103–113
Article Google Scholar
Zuur AF, Ieno EN, Elphick CS (2010) A protocol for data exploration to avoid common statistical problems. Meth Ecol Evol 1:3–14
Article Google Scholar

Download references

Acknowledgments

First of all, I would like to thank László Zsolt Garamszegi for inviting me to write this chapter. I also thank László Zsolt Garamszegi and two anonymous reviewers for very helpful comments on an earlier draft of this chapter. I equally owe thanks to Charles L. Nunn for initially leading my attention to the need for and rationale of phylogenetically corrected statistical analyses. During the three AnthroTree workshops held in Amherst, MA, U.S.A., in 2010–2012 and supported by the NSF (BCS-0923791) and the National Evolutionary Synthesis Center (NSF grant EF-0905606) I learnt a lot about the philosophy and practical implementation of phylogenetic approaches to statistical analyses, and I am very grateful to have had the opportunity to attend them. This article was mainly written during a stay on the wonderful island of Læsø, Denmark, and I owe warm thanks to the staff of the hotel Havnebakken for their hospitality that made my stay very enjoyable and productive at the same time.

Author information

Authors and Affiliations

Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
Roger Mundry

Authors

Roger Mundry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roger Mundry .

Editor information

Editors and Affiliations

Department of Evolutionary Ecology, Estación Biológica de Doñana-CSIC, Sevilla, Spain
László Zsolt Garamszegi

Glossary

Case: Set of entries in the data referring to the same taxon; represented by one row in the data set and corresponds to one tip in the phylogeny.
Covariate: Quantitative predictor variable.
Dummy coding: Way of representing a factor in a linear model, by turning it into a set of ‘quantitative’ variables. One level of the factor is defined the ‘reference’ level (or reference category), and for each of the other levels a variable is created which is one if the respective case in the data set is of that level and zero otherwise. The estimate derived for a dummy coded variable reveals the degree by which the response in the coded level differs from that of the reference level.
Factor: Qualitative (or categorical) predictor variable.
General linear model: Unified approach to test the effect(s) of one or several quantitative or categorical predictors on a single quantitative response; makes the assumptions of normally and homogeneously distributed residuals; multiple regression, ANOVA, ANCOVA, and the t-tests are all just special cases of the general linear model.
Level: Particular value of a factor (for instance, the factor ‘sex’ has the levels ‘female’ and ‘male’).
Predictor (variable): Variable for which its influence on the response variable should be investigated or controlled for; can be a factor or a covariate.
Response (variable): Variable being in the focus of the study and for which it should be investigated how one or several predictors influence it.
Right (left) skewed distribution: Distribution with many small and few large values (a left skewed distribution shows the opposite pattern).

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mundry, R. (2014). Statistical Issues and Assumptions of Phylogenetic Generalized Least Squares. In: Garamszegi, L. (eds) Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43550-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-662-43550-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43549-6
Online ISBN: 978-3-662-43550-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics