Perspectives on Behavior Science

, Volume 42, Issue 1, pp 91–108 | Cite as

How I Learned to Stop Worrying and Love Replication Failures

  • Michael PeroneEmail author


Worries about the reproducibility of experiments in the behavioral and social sciences arise from evidence that many published reports contain false positive results. Misunderstanding and misuse of statistical procedures are key sources of false positives. In behavior analysis, however, statistical procedures have not been used much. Instead, the investigator must show that the behavior of an individual is consistent over time within an experimental condition, that the behavior changes systematically across conditions, and that these changes can be reproduced – and then the whole pattern must be shown in additional individuals. These high standards of within- and between-subject replication protect behavior analysis from the publication of false positive findings. When a properly designed and executed experiment fails to replicate a previously published finding, the failure exposes flaws in our understanding of the phenomenon under study – perhaps in recognizing the boundary conditions of the phenomenon, identifying the relevant variables, or bringing the variables under sufficient control. We must accept the contradictory findings as valid and pursue an experimental analysis of the possible reasons. In this way, we resolve the contradiction and advance our science. To illustrate, two research programs are described, each initiated because of a replication failure.


Replication Replication failure Open Science Collaboration Statistical significance Fixed-ratio pausing Conditioned reinforcement 


Compliance with Ethical Standards

Conflict of Interest

The author declares that he has no conflict of interest.

Ethical Standards

This paper does not report original empirical research, but rather reviews previous work, almost all of it published in peer-reviewed journals. The research in which I directly participated was reviewed and approved by the responsible committee (IACUC or IRB).


  1. Anderson, D. R., Burnham, K. P., & Thompson, W. L. (2000). Null hypothesis testing: Problems, prevalence, and an alternative. Journal of Wildlife Management, 64, 912–923.CrossRefGoogle Scholar
  2. Baker, M. (2015). Over half of psychology studies fail reproducibility test. Retrieved from Nature News,
  3. Barlow, D. H., Hayes, S. C., & Nelson, R. O. (1984). The scientist practitioner: Research and accountability in clinical and educational settings. New York: Pergamon Press.Google Scholar
  4. Barlow, D. H., Nock, M. K., & Hersen, M. (2009). Single case experimental designs: Strategies for studying behavior change (3rd ed.). New York: Pearson i9780205474554.Google Scholar
  5. Baron, A., & Perone, M. (1998). Experimental design and analysis in the laboratory study of human operant behavior. In K. A. Lattal & M. Perone (Eds.), Handbook of research methods in human operant behavior (pp. 45–91). New York: Plenum.CrossRefGoogle Scholar
  6. Baron, A., Perone, M., & Galizio, M. (1991). The experimental analysis of human behavior. Indispensable, ancillary, or irrelevant? The Behavior Analyst, 14, 145–155.CrossRefGoogle Scholar
  7. Castillo, M. I., Clark, D. R., Schaller, E. A., Donaldson, J. M., DeLeon, I. G., & Kahng, S. (2018). Descriptive assessment of problem behavior during transitions of children with intellectual and developmental disabilities. Journal of Applied Behavior Analysis, 51, 99–117. Scholar
  8. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003.CrossRefGoogle Scholar
  9. Derenne, A., & Baron, A. (1999). Human sensitivity to reinforcement: A comment on Kollins, Newland, and Critchfield’s (1997) quantitative literature review. The Behavior Analyst, 22, 35–41. Scholar
  10. Derenne, A., & Baron, A. (2000). Quantitative summaries of single-subject studies: What do group comparisons tell us about individual performances? The Behavior Analyst, 23, 101–106.CrossRefGoogle Scholar
  11. Dinsmoor, J. A. (1983). Observing and conditioned reinforcement. Behavioral and Brain Sciences, 6, 693–704.CrossRefGoogle Scholar
  12. Fantino, E. (1977). Conditioned reinforcement: Choice and information. In W. K. Honig & J. E. R. Staddon (Eds.), Handbook of operant behavior (pp. 313–339). Englewood Cliffs: Prentice-Hall.Google Scholar
  13. Fantino, E., & Case, D. A. (1983). Human observing: Maintained by stimuli correlated with reinforcement but not extinction. Journal of the Experimental Analysis of Behavior, 40, 193–220.CrossRefGoogle Scholar
  14. Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement. New York: Appleton-Century-Crofts Scholar
  15. Johnston, J. M., & Pennypacker, H. S. (2009). Strategies and tactics of behavioral research (3rd ed.). New York: Routledge.Google Scholar
  16. Kline, R. B. (2004). Beyond significance testing. Washington, DC: American Psychological Association.Google Scholar
  17. Kollins, S. H., Newland, M. C., &. Critchfield, T. S. (1997). Human sensitivity to reinforcement in operant choice: How much do consequences matter? Psychonomic Bulletin & Review, 4, 208–220. Erratum: Psychonomic Bulletin & Review, 4, 431.
  18. Kollins, S. H., Newland, M. C., & Critchfield, T. S. (1999). Quantitative integration of single-subject studies: Methods and misinterpretations. The Behavior Analyst, 22, 149–157. Scholar
  19. Kubrick, S. (1964). Dr. Strangelove or: How I learned to stop worrying and love the bomb [motion picture]. United States: Columbia Pictures.Google Scholar
  20. Lowe, C. F., Davey, G. C. L., & Harzem, P. (1974). Effects of reinforcement magnitude on interval and ratio schedules. Journal of the Experimental Analysis of Behavior, 22, 553–560.CrossRefGoogle Scholar
  21. Lykken, D. T. (1968). Statistical significance in psychological research. Psychological Bulletin, 70, 151–159. Scholar
  22. Minervini, V., & Branch, M. N. (2013). Tolerance to cocaine’s effects following chronic administration of a dose without detected effects on response rate or pause. Journal of the Experimental Analysis of Behavior, 100, 316–332.CrossRefGoogle Scholar
  23. Morrison, D. E., & Henkel, R. E. (Eds.). (1970). The significance test controversy. Chicago: Aldine.Google Scholar
  24. Mulvaney, D. E., Dinsmoor, J. A., Jwaideh, A. R., & Hughes, L. H. (1974). Punishment of observing by the negative discriminative stimulus. Journal of the Experimental Analysis of Behavior, 21, 37–44.CrossRefGoogle Scholar
  25. Nelson, L. D., Simmons, J., & Simonsohn, U. (2018). Psychology’s renaissance. Annual Review of Psychology, 69, 17.1–17.24. Scholar
  26. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. Scholar
  27. Perone, M. (1991). Experimental design in the analysis of free-operant behavior. In I. H. Iversen & K. A. Lattal (Eds.), Techniques in the behavioral and neural sciences: Vol. 6. Experimental analysis of behavior, part 1 (pp. 135–171). Amsterdam: Elsevier.Google Scholar
  28. Perone, M. (1999). Statistical inference in behavior analysis: Experimental control is better. The Behavior Analyst, 22, 109–116.CrossRefGoogle Scholar
  29. Perone, M., & Baron, A. (1980). Reinforcement of human observing behavior by a stimulus correlated with extinction or increased effort. Journal of the Experimental Analysis of Behavior, 34, 239–261.CrossRefGoogle Scholar
  30. Perone, M., & Courtney, K. (1992). Fixed-ratio pausing: Joint effects of past reinforcer magnitude and stimuli correlated with upcoming magnitude. Journal of the Experimental Analysis of Behavior, 57, 33–46.CrossRefGoogle Scholar
  31. Perone, M., & Hursh, D. E. (2013). Single-case experimental designs. In G. J. Madden (Ed.), APA handbook of behavior analysis: Vol. 1. Methods and principles (pp. 107–126). Washington, DC: American Psychological Association.CrossRefGoogle Scholar
  32. Perone, M., & Kaminski, B. J. (1992). Conditioned reinforcement of human observing behavior by descriptive and arbitrary verbal stimuli. Journal of the Experimental Analysis of Behavior, 58, 557–575.CrossRefGoogle Scholar
  33. Perone, M., Perone, C. L., & Baron, A. (1987). Inhibition by reinforcement: Effects of reinforcer magnitude and timeout on fixed-ratio pausing. Psychological Record, 37, 227–238.CrossRefGoogle Scholar
  34. Powell, R. W. (1968). The effect of small sequential changes in fixed-ratio size upon the post-reinforcement pause. Journal of the Experimental Analysis of Behavior, 11, 589–593.CrossRefGoogle Scholar
  35. Powell, R. W. (1969). The effect of reinforcement magnitude upon responding under fixed-ratio schedules. Journal of the Experimental Analysis of Behavior, 12, 605–608.CrossRefGoogle Scholar
  36. Sidman, M. (1960). Tactics of scientific research. New York: Basic Books.Google Scholar
  37. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. Scholar
  38. Skinner, B. F. (1938). The behavior of organisms. New York: Appleton-Century-Crofts.Google Scholar
  39. Smith, N. C. (1970). Replication studies: A neglected aspect of psychological research. American Psychologist, 25, 970–975. Scholar
  40. Weinberg, T. & Cohen, J. (2014). Baseball’s been very, very good to me: The Minnie Miñoso story [Video file]. Retrieved from
  41. Williams, D. C., Saunders, K. J., & Perone, M. (2011). Extended pausing by humans on multiple fixed-ratio schedules with varied reinforcer magnitude and response requirements. Journal of the Experimental Analysis of Behavior, 95, 203–220.CrossRefGoogle Scholar

Copyright information

© Association for Behavior Analysis International 2018

Authors and Affiliations

  1. 1.Department of PsychologyWest Virginia UniversityMorgantownUSA

Personalised recommendations