Advertisement

Behavior Research Methods

, Volume 47, Issue 4, pp 913–917 | Cite as

A power fallacy

  • Eric-Jan Wagenmakers
  • Josine Verhagen
  • Alexander Ly
  • Marjan Bakker
  • Michael D. Lee
  • Dora Matzke
  • Jeffrey N. Rouder
  • Richard D. Morey
Article

Abstract

The power fallacy refers to the misconception that what holds on average –across an ensemble of hypothetical experiments– also holds for each case individually. According to the fallacy, high-power experiments always yield more informative data than do low-power experiments. Here we expose the fallacy with concrete examples, demonstrating that a particular outcome from a high-power experiment can be completely uninformative, whereas a particular outcome from a low-power experiment can be highly informative. Although power is useful in planning an experiment, it is less useful—and sometimes even misleading—for making inferences from observed data. To make inferences from data, we recommend the use of likelihood ratios or Bayes factors, which are the extension of likelihood ratios beyond point hypotheses. These methods of inference do not average over hypothetical replications of an experiment, but instead condition on the data that have actually been observed. In this way, likelihood ratios and Bayes factors rationally quantify the evidence that a particular data set provides for or against the null or any other hypothesis.

Keywords

Hypothesis test Likelihood ratio Statistical evidence Bayes factor 

Notes

References

  1. Bakker, M., van Dijk, A., Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7, 543–554.CrossRefPubMedGoogle Scholar
  2. Bem, D. J. (2011). Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100, 407–425.CrossRefPubMedGoogle Scholar
  3. Berger, J. O., & Wolpert, R. L. (1988). The likelihood principle Vol. 2. Hayward (CA): Institute of Mathematical Statistics.Google Scholar
  4. Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 1–12.CrossRefGoogle Scholar
  5. Cohen, J. (1990). Things I have learned (thus far). American Psychologist, 45, 1304–1312.CrossRefGoogle Scholar
  6. Faul, F., Erdfelder, E., Lang, A.-G., Buchner, A. (2007). G ∗Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.CrossRefPubMedGoogle Scholar
  7. Galak, J., LeBoeuf, R. A., Nelson, L. D., Simmons, J. P. (2012). Correcting the past: failures to replicate Psi. Journal of Personality and Social Psychology, 103, 933–948.CrossRefPubMedGoogle Scholar
  8. Gönen, M., Johnson, W. O., Lu, Y., Westfall, P. H. (2005). The Bayesian two–sample t test. The American Statistician, 59, 252– 257.CrossRefGoogle Scholar
  9. Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2, 696–701.Google Scholar
  10. Jaynes, E. T. (2003). Probability theory: the logic of science. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  11. Jeffreys, H. (1961). Theory of probability (3 ed). Oxford: Oxford University Press.Google Scholar
  12. Johnson, V. E. (2013). Revised standards for statistical evidence. In Proceedings of the national academy of sciences of the United States of America (Vol. 11, pp. 19313–19317).Google Scholar
  13. Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.CrossRefGoogle Scholar
  14. Lee, M. D., & Wagenmakers, E.-J (2013). Bayesian modeling for cognitive science: a practical course. Cambridge University Press.Google Scholar
  15. Morey, R. D., & Wagenmakers, E.-J. (2014). Simple relation between Bayesian order-restricted and point-null hypothesis tests. Statistics and Probability Letters, 92, 121–124.CrossRefGoogle Scholar
  16. Pratt, J. W. (1965). Bayesian interpretation of standard inference statements. Journal of the Royal Statistical Society B, 27, 169– 203.Google Scholar
  17. Ritchie, S. J., Wiseman, R., French, C. C. (2012). Failing the future: three unsuccessful attempts to replicate Bem’s ‘retroactive facilitation of recall’ effect. PLoS ONE, 7, e33423.PubMedCentralCrossRefPubMedGoogle Scholar
  18. Sellke, T., Bayarri, M. J., Berger, J. O. (2001). Calibration of p values for testing precise null hypotheses. The American Statistician, 55, 62–71.CrossRefGoogle Scholar
  19. Sham, P. C., & Purcell, S. M. (2014). Statistical power and significance testing in large-scale genetic studies. Nature Reviews Genetics, 15, 335–346.CrossRefPubMedGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2014

Authors and Affiliations

  • Eric-Jan Wagenmakers
    • 1
  • Josine Verhagen
    • 1
  • Alexander Ly
    • 1
  • Marjan Bakker
    • 1
  • Michael D. Lee
    • 2
  • Dora Matzke
    • 1
  • Jeffrey N. Rouder
    • 3
  • Richard D. Morey
    • 4
  1. 1.Department of PsychologyUniversity of AmsterdamAmsterdamThe Netherlands
  2. 2.University of California IrvineIrvineUSA
  3. 3.University of MissouriColumbiaUSA
  4. 4.University of GroningenGroningenNetherlands

Personalised recommendations