Perspectives on Behavior Science

, Volume 42, Issue 1, pp 77–89 | Cite as

The “Reproducibility Crisis:” Might the Methods Used Frequently in Behavior-Analysis Research Help?

  • Marc N. BranchEmail author


Mainstream biomedical and behavioral sciences are facing what has been dubbed “the reproducibility crisis.” The crisis is borne out of failures to replicate the results of published research at an average rate of somewhere near 50%. In this paper I make a case that the prime culprit leading to this unsatisfactory state of affairs has been the widespread use of p-values from tests of statistical significance as a criterion for publication. Even though it has been known, and made public, for decades that p-values provide no quantitative information about the likelihood that experimental results are likely to be repeatable, they remain a fundamental criterion for publication. A growing realization among researchers that p-values do not provide information that bears on repeatability may offer an opportunity for wider application of research methods frequently used in the research specialty known as Behavior Analysis, as well as a few other research traditions. These alternative approaches are founded on within- and between-participant replication as integral parts of research designs. The erosion of public confidence in science, which is bolstered by the reproducibility crisis, is a serious threat. Anything that the field of Behavior Analysis can offer as assistance in ameliorating the problem should be welcomed.


Statistical significance P-values Replication Individual-case designs 


  1. Anscombe, F. J. (1973). Graphs in statistical analysis. American Statistician, 27, 17–21.Google Scholar
  2. Aubert, H. (1865). Physiologie der netzhout. Breslan: E. Morgenstern.Google Scholar
  3. Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66, 423–437.CrossRefPubMedGoogle Scholar
  4. Barch, D. M., & Yarkoni, T. (2013). Introduction to the special issue on reliability and replication in cognitive and affective neuroscience research. Cognitive and Affective Behavioral Neuroscience, 13, 687–689.CrossRefGoogle Scholar
  5. Bartlett, N. R. (1965). Dark and light adaptation. Chapter 8. In C. H. Graham (Ed.), Vision and visual perception. New York: John Wiley and Sons.Google Scholar
  6. Begley, C. G., & Ioannidis, J. P. A. (2015). Reproducibility in science: improving the standard for basic and preclinical research. Circulation Research, 116, 116–126.CrossRefPubMedGoogle Scholar
  7. Berkson, J. (1942). Tests of significance considered as evidence. Journal of the American Statistical Association, 37, 325–335.CrossRefGoogle Scholar
  8. Bernard, C. (1865/1957). An introduction to the study of experimental medicine. Dover edition 1957; originally published in 1865; first English translation by Henry Copley Greene, London, Macmillan & Co., Ltd., 1927.Google Scholar
  9. Branch, M. N. (2014). Malignant side effects of null-hypothesis significance testing. Theory & Psychology, 24, 256–277.CrossRefGoogle Scholar
  10. Branch, M. N., & Pennypacker, H. S. (2013). Generality and generalization of research findings. In G. J. Madden (Ed.), APA handbook of behavior analysis, volume 1 (pp. 151–175). Washington, DC: American Psychological Association.Google Scholar
  11. Carver, R. P. (1978). The case against statistical significance testing. Harvard Educational Review, 48, 378–399.CrossRefGoogle Scholar
  12. Cleveland, W. S. (1994). The elements of graphing data. Murray Hill, NJ: AT&T Bell Laboratories.Google Scholar
  13. Cohen, J. (1994). The world is round (p<.05). American Psychologist, 49, 997–1003.CrossRefGoogle Scholar
  14. Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: the amazing persistence of a probabilistic misconception. Theory & Psychology, 5, 75–98.CrossRefGoogle Scholar
  15. Goldstein, E. B. (2014). Sensation and perception (9th ed.). Boston: Cengage Learning.Google Scholar
  16. Goodman, S. (1999). Toward evidence-based medical statistics. 1: the p value fallacy. Annals of Internal Medicine, 130, 995–1004.CrossRefPubMedGoogle Scholar
  17. Haller, S., & Krauss, S. (2002). Misinterpretations of significance: a problem students share with their teachers? Methods of Psychological Research, 7, 1–20.Google Scholar
  18. Harris, R. F. (2017). Rigor mortis: how sloppy science creates worthless cures, crushes hope, and wastes billions. New York: Basic Books.Google Scholar
  19. Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLoS Biology, 13, e1002106. Scholar
  20. Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2, 0696–0701.Google Scholar
  21. Johnston, J. M., & Pennypacker, H. S. (2009). Strategies and tactics of behavioral research (3rd ed.). New York: Routledge.Google Scholar
  22. Kent, D. M., & Hayward, R. A. (2007a). Limitations of applying summary results of clinical trials to individual patients: the need for risk stratification. Journal of the American Medical Association, 298, 1209–1212.CrossRefPubMedGoogle Scholar
  23. Kent, D. M., & Hayward, R. A. (2007b). When averages hide individual differences in clinical trials: analyzing the results of clinical trials to expose individual patients’ risks might help doctors make better treatment decisions. American Scientist, 95, 1016–1019.CrossRefGoogle Scholar
  24. Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 5, 161–171.CrossRefGoogle Scholar
  25. Madden, G. J. (Ed.). (2013). APA handbook of behavior analysis, volume 2: translating principles to practice. Washington, DC: American Psychological Association.Google Scholar
  26. Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834.CrossRefGoogle Scholar
  27. Morgan, D. L., & Morgan, R. K. (2001). Single-participant research design: bringing science to managed care. American Psychologist, 56, 119–127.CrossRefPubMedGoogle Scholar
  28. Nickerson, R. S. (2000). Null hypothesis significance testing: a review of an old and continuing controversy. Psychological Methods, 5, 241–301.CrossRefPubMedGoogle Scholar
  29. Oakes, M. (1986). Statistical inference: a commentary for the social and behavioral sciences. New York: Wiley.Google Scholar
  30. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. Scholar
  31. Penston, J. (2005). Large-scale randomized trials: a misguided approach to clinical research. Medical Hypotheses, 64, 651–657.CrossRefPubMedGoogle Scholar
  32. Perone, M., & Hursh, D. E. (2013). Single-case experimental design. In G. Madden (Ed.), APA handbook of behavior analysis, volume 1: methods and principles (pp. 107–126). Washington, DC: American Psychological Association.CrossRefGoogle Scholar
  33. Phillips, L. (1939). Some factors producing individual differences in dark adaptation. Proceedings of the Royal Society B, 127, 405–424.CrossRefGoogle Scholar
  34. Pirenne, M. H. (1962). Dark adaptation and night vision. In H. Davson (Ed.), The eye, volume 2. London: Academic Press.Google Scholar
  35. Rozeboom, W. W. (1960). The fallacy of the null hypothesis significance test. Psychological Bulletin, 57, 416–428.CrossRefPubMedGoogle Scholar
  36. Sanabria, F., & Killeen, P. R. (2007). Better statistics for better decisions: rejecting null-hypothesis statistical tests in favor of replication statistics. Psychology in the Schools, 44, 471–481.CrossRefPubMedPubMedCentralGoogle Scholar
  37. Sidman, M. (1960). Tactics of scientific research. New York: Basic Books.Google Scholar
  38. Siegfried, T. (2010). Odds are, it’s wrong: science fails to face the shortcomings of statistics. Science News, 177, 26–37.CrossRefGoogle Scholar
  39. Skinner, B. F. (1938). The behavior of organisms: an experimental analysis. New York: Appleton Century.Google Scholar
  40. Świątkowski, W., & Dompnier, B. (2017). Replicability crisis in social psychology: looking at the past to find new pathways for the future. International Review of Social Psychology, 30, 111–124.CrossRefGoogle Scholar
  41. Thompson, B. (1993). The use of statistical significance tests in research: bootstrap and other alternatives. Journal of Experimental Education, 61, 361–377.CrossRefGoogle Scholar
  42. Thompson, B. (1994). The pivotal role of replication in psychological research: empirically evaluating the replicability of sample results. Journal of Personality, 62, 157–176.CrossRefGoogle Scholar
  43. Tukey, J. W. (1958). Bias and confidence in not quite large samples (abstract). The Annals of Mathematical Statistics, 29, 614. Scholar
  44. Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley.Google Scholar
  45. Wasserstein, R. L., & Lazar, N. A. (2016). ASA statement on statistical significance and p-values. American Statistician, 70, 129–133.CrossRefGoogle Scholar
  46. Williams, B. A. (2010). Perils of evidence-based medicine. Perspectives in Biology and Medicine, 53, 106–120.CrossRefPubMedGoogle Scholar
  47. Wolf, E. (1960). Glare and age. Archives of Opthalmology, 64, 502–514.CrossRefGoogle Scholar

Copyright information

© Association for Behavior Analysis International 2018

Authors and Affiliations

  1. 1.Psychology DepartmentUniversity of FloridaGainesvilleUSA

Personalised recommendations