Abstract
Using phylogenetic generalized least squares (PGLS) means to fit a linear regression aiming to investigate the impact of one or several predictor variables on a single response variable while controlling for potential phylogenetic signal in the response (and, hence, non-independence of the residuals). The key difference between PGLS and standard (multiple) regression is that PGLS allows us to control for residuals being potentially non-independent due to the phylogenetic history of the taxa investigated. While the assumptions of PGLS regarding the underlying processes of evolution and the correlation of the predictor and response variables with the phylogeny have received considerable attention, much less focus has been put on the checks of model reliability and stability commonly used in case of standard general linear models. However, several of these checks could be similarly applied in the context of PGLS as well. Here, I describe how such checks of model stability and reliability could be applied in the context of a PGLS and what could be done in case they reveal potential problems. Besides treating general questions regarding the conceptual and technical validity of the model, I consider issues regarding the sample size, collinearity among the predictors, the distribution of the predictors and the residuals, model stability, and drawing inference based on P-values. Finally, I emphasize the need for reporting checks of assumptions (and their results) in publications.
The original version of this chapter was revised: Online Practical Material website has been updated. The erratum to this chapter is available at https://doi.org/10.1007/978-3-662-43550-2_23
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Having an interaction between two predictors in a model means to allow for a situation where the impact of one of the two on the response is dependent on the value or state of the other and vice versa. Interactions can involve two or more covariates, two or more factors, and any mixture of covariates and factors.
- 2.
Note that ‘number of predictors' should actually be labeled ‘number of estimated terms' (meaning that a factor would be counted as the number of its levels minus 1, interactions and squared terms need to be considered, and in the context of a PGLS a parameter like lambda needs to be counted as well).
- 3.
For estimating the effect of an interaction reasonably well, more cases per combination of the levels of the factors would be needed.
- 4.
Note that residuals of a PGLS are actually multivariate normal (Freckleton et al. 2011), which has implications for practical checks of their distribution; see the Online Practical Material (http://www.mpcm-evolution.com) for more.
- 5.
Note that this requires the model to be fitted using maximum likelihood; see the Online Practical Material (http://www.mpcm-evolution.com) for more details.
References
Aiken LS, West SG (1991) Multiple regression: testing and interpreting interactions. Sage, Newbury Park
Arnold C, Nunn CL (2010) Phylogenetic targeting of research effort in evolutionary biology. Am Nat 176:601–612
Budaev SV (2010) Using principal components and factor analysis in animal behaviour research: caveats and guidelines. Ethology 116:472–480
Burnham KP, Anderson DR (2002) Model selection and multimodel inference, 2nd edn. Springer, Berlin
Chatfield C (1995) Model uncertainty, data mining and statistical inference. J Roy Stat Soc A 158:419–466
Cohen J (1988) Statistical power analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates, New York
Cohen J, Cohen P (1983) Applied multiple regression/correlation analysis for the behavioral sciences, 2nd edn. Lawrence Erlbaum Associates Inc., New Jersey
Cooper N, Jetz W, Freckleton RP (2010) Phylogenetic comparative approaches for studying niche conservatism. J Evol Biol 23:2529–2539
DÃaz-Uriarte R, Garland T Jr (1996) Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Syst Biol 45:27–47
DÃaz-Uriarte R, Garland T Jr (1998) Effects of branch length errors on the performance of phylogenetically independent contrasts. Syst Biol 47:654–672
Felsenstein J (1985) Phylogenies and the comparative method. Am Nat 125:1–15
Felsenstein J (1988) Phylogenies and quantitative characters. Ann Rev Ecol Syst 19:445–471
Field A (2005) Discovering statistics using SPSS. Sage Publications, London
Forstmeier W, Schielzeth H (2011) Cryptic multiple hypotheses testing in linear models: overestimated effect sizes and the winner’s curse. Behav Ecol Sociobiol 65:47–55
Fox J, Monette G (1992) Generalized collinearity diagnostics. J Am Stat Assoc 87:178–183
Freckleton RP (2009) The seven deadly sins of comparative analysis. J Evol Biol 22:1367–1375
Freckleton RP (2011) Dealing with collinearity in behavioural and ecological data: model averaging and the problems of measurement error. Behav Ecol Sociobiol 65:91–101
Freckleton RP, Cooper N, Jetz W (2011) Comparative methods as a statistical fix: the dangers of ignoring an evolutionary model. Am Nat 178:E10–E17
Freckleton RP, Jetz W (2009) Space versus phylogeny: disentangling phylogenetic and spatial signals in comparative data. Proc Roy Soc B—Biol Sci 276:21–30
Garamszegi LZ, Møller AP (2012) Untested assumptions about within-species sample size and missing data in interspecific studies. Behav Ecol Sociobiol 66:1363–1373
Garland T Jr, Ives AR (2000) Using the past to predict the present: confidence intervals for regression equations in phylogenetic comparative methods. Am Nat 155:346–364
Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge
Grafen A (1989) The phylogenetic regression. Phil Trans Roy Soc Lond B, Biol Sci 326:119–157
Grafen A, Ridley M (1996) Statistical tests for discrete cross-species data. J Theor Biol 183:225–267
Hansen TF (1997) Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341–1351
Harvey PH, Pagel MD (1991) The comparative method in evolutionary biology. Oxford University Press, Oxford
Ives AR, Garland T Jr (2010) Phylogenetic logistic regression for binary dependent variables. Syst Biol 59:9–26
Martins EP, Diniz-Filho JAF, Housworth EA (2002) Adaptive constraints and the phylogenetic comparative method: a computer simulation test. Evolution 56:1–13
Mundry R (2011) Issues in information theory based statistical inference—a commentary from a frequentist’s perspective. Behav Ecol Sociobiol 65:57–68
Nunn CL (2011) The comparative approach in evolutionary anthropology and biology. The University of Chicago Press, Chicago
Pagel M (1999) Inferring the historical patterns of biological evolution. Nature 401:877–884
Polly PD, Lawing AM, Fabre A-C, Goswami A (2013) Phylogenetic principal components analysis and geometric morphometrics. Hystrix, Ital J Mammal 24:33–41
Quinn GP, Keough MJ (2002) Experimental designs and data analysis for biologists. Cambridge University Press, Cambridge
Ramsey PH (1980) Exact type 1 error rates for robustness of student’s t test with unequal variances. J Educ Stat 5:337–349
R Core Team (2013) R: a language and environment for statistical computing. R foundation for statistical computing. Vienna, Austria
Revell LJ (2009) Size-correction and principal components for interspecific comparative studies. Evolution 63:3258–3268
Revell LJ (2010) Phylogenetic signal and linear regression on species data. Methods Ecol Evol 1:319–329
Rohlf FJ (2006) A comment on phylogenetic correction. Evolution 60:1509–1515
Schielzeth H (2010) Simple means to improve the interpretability of regression coefficients. Meth Ecol Evol 1:103–113
Zuur AF, Ieno EN, Elphick CS (2010) A protocol for data exploration to avoid common statistical problems. Meth Ecol Evol 1:3–14
Acknowledgments
First of all, I would like to thank László Zsolt Garamszegi for inviting me to write this chapter. I also thank László Zsolt Garamszegi and two anonymous reviewers for very helpful comments on an earlier draft of this chapter. I equally owe thanks to Charles L. Nunn for initially leading my attention to the need for and rationale of phylogenetically corrected statistical analyses. During the three AnthroTree workshops held in Amherst, MA, U.S.A., in 2010–2012 and supported by the NSF (BCS-0923791) and the National Evolutionary Synthesis Center (NSF grant EF-0905606) I learnt a lot about the philosophy and practical implementation of phylogenetic approaches to statistical analyses, and I am very grateful to have had the opportunity to attend them. This article was mainly written during a stay on the wonderful island of Læsø, Denmark, and I owe warm thanks to the staff of the hotel Havnebakken for their hospitality that made my stay very enjoyable and productive at the same time.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Glossary
- Case
-
Set of entries in the data referring to the same taxon; represented by one row in the data set and corresponds to one tip in the phylogeny.
- Covariate
-
Quantitative predictor variable.
- Dummy coding
-
Way of representing a factor in a linear model, by turning it into a set of ‘quantitative’ variables. One level of the factor is defined the ‘reference’ level (or reference category), and for each of the other levels a variable is created which is one if the respective case in the data set is of that level and zero otherwise. The estimate derived for a dummy coded variable reveals the degree by which the response in the coded level differs from that of the reference level.
- Factor
-
Qualitative (or categorical) predictor variable.
- General linear model
-
Unified approach to test the effect(s) of one or several quantitative or categorical predictors on a single quantitative response; makes the assumptions of normally and homogeneously distributed residuals; multiple regression, ANOVA, ANCOVA, and the t-tests are all just special cases of the general linear model.
- Level
-
Particular value of a factor (for instance, the factor ‘sex’ has the levels ‘female’ and ‘male’).
- Predictor (variable)
-
Variable for which its influence on the response variable should be investigated or controlled for; can be a factor or a covariate.
- Response (variable)
-
Variable being in the focus of the study and for which it should be investigated how one or several predictors influence it.
- Right (left) skewed distribution
-
Distribution with many small and few large values (a left skewed distribution shows the opposite pattern).
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Mundry, R. (2014). Statistical Issues and Assumptions of Phylogenetic Generalized Least Squares. In: Garamszegi, L. (eds) Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43550-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-662-43550-2_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43549-6
Online ISBN: 978-3-662-43550-2
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)