Advertisement

Perspectives on Behavior Science

, Volume 42, Issue 1, pp 33–57 | Cite as

An Overview of Scientific Reproducibility: Consideration of Relevant Issues for Behavior Science/Analysis

  • Sean LarawayEmail author
  • Susan Snycerski
  • Sean Pradhan
  • Bradley E. Huitema
Article

Abstract

For over a decade, the failure to reproduce findings in several disciplines, including the biomedical, behavioral, and social sciences, have led some authors to claim that there is a so-called “replication (or reproducibility) crisis” in those disciplines. The current article examines: (a) various aspects of the reproducibility of scientific studies, including definitions of reproducibility; (b) published concerns about reproducibility in the scientific literature and public press; (c) variables involved in assessing the success of attempts to reproduce a study; (d) suggested factors responsible for reproducibility failures; (e) types of validity of experimental studies and threats to validity as they relate to reproducibility; and (f) evidence for threats to reproducibility in the behavior science/analysis literature. Suggestions for improving the reproducibility of studies in behavior science and analysis are described throughout.

Keywords

Reproducibility Replication Null hypothesis significance testing Statistical power Effect size measures Statistical conclusion validity Construct validity 

Notes

References

  1. Anderson, C. J., Bahnik, S., Barnett-Cowan, M., Bosco, F. A., Chandler, J., Chartier, C. R., et al. (2016). Response to Comment on Estimating the reproducibility of psychological science. Science, 351(6277), 1037c.  https://doi.org/10.1126/science.aad9163.CrossRefGoogle Scholar
  2. Armstrong, K. J., Ehrhardt, K. E., Cool, R. T., & Poling, A. (1997). Social validity and treatment integrity data: Reporting in articles published in the Journal of Developmental and Physical Disabilities, 1991–1995. Journal of Developmental & Physical Disabilities, 9(4), 359–367.CrossRefGoogle Scholar
  3. Aschwanden, C. (2015). Science isn’t broken. FiveThirtyEight. Retrieved from https://fivethirtyeight.com/features/science-isnt-broken/
  4. Bakker, M., & Wicherts, J. M. (2011). The (mis) reporting of statistical results in psychology journals. Behavior Research Methods, 43(3), 666–678.CrossRefPubMedPubMedCentralGoogle Scholar
  5. Barlow, D. H., & Hersen, M. (1984). Single case experimental designs: Strategies for studying behavior change. New York: Pergamon.Google Scholar
  6. Bartels, J. M. (2015). The Stanford prison experiment in introductory psychology textbooks: A content analysis. Psychology Learning & Teaching, 14(1), 36–50.CrossRefGoogle Scholar
  7. Begley, C. G., & Ioannidis, J. P. (2015). Reproducibility in science: Improving the standard for basic and preclinical research. Circulation research, 116(1), 116–126.CrossRefPubMedGoogle Scholar
  8. Beck, J. (2017). The challenge of fighting mistrust in science. The Atlantic Monthly. Retrieved from https://www.theatlantic.com/science/archive/2017/06/the-challenge-of-fighting-mistrust-in-science/531531/
  9. Beretvas, S. N., & Chung, H. (2008). A review of meta-analyses of single-subject experimental designs: Methodological issues and practice. Evidence-Based Communication Assessment & Intervention, 2(3), 129–141.CrossRefGoogle Scholar
  10. Bobrovitz, C. D., & Ottenbacher, K. J. (1998). Comparison of visual inspection and statistical analysis of single-subject data in rehabilitation research. American Journal of Physical Medicine & Rehabilitation, 77(2), 94–102.CrossRefGoogle Scholar
  11. Branch, M. N. (1999). Statistical inference in behavior analysis: Some things significance testing does and does not do. The Behavior Analyst, 22(2), 87–92.CrossRefPubMedPubMedCentralGoogle Scholar
  12. Branch, M. N., & Pennypacker, H. S. (2013). Generality and generalization of research findings. In G. J. Madden (Ed.), APA Handbook of Behavior Analysis (Vol. 1, pp. 151–175). Washington, DC: American Psychological Association.Google Scholar
  13. Branch, M. N. (2018). The “Reproducibility Crisis:” Might the Methods Used Frequently in Behavior-Analysis Research Help? Perspectives on Behavior Science.  https://doi.org/10.1007/s40614-018-0158-5.
  14. Braver, S. L., Thoemmes, F. J., & Rosenthal, R. (2014). Continuously cumulating meta-analysis and replicability. Perspectives on Psychological Science, 9(3), 333–342.CrossRefPubMedGoogle Scholar
  15. Brossart, D. F., Parker, R. I., Olson, E. A., & Mahadevan, L. (2006). The relationship between visual analysis and five statistical analyses in a simple AB single-case research design. Behavior Modification, 30(5), 531–563.CrossRefPubMedGoogle Scholar
  16. Bruns, S. B., & Ioannidis, J. P. (2016). P-curve and p-hacking in observational research. PLoS One, 11(2), e0149144.CrossRefPubMedPubMedCentralGoogle Scholar
  17. Carr, J. E., & Chong, I. M. (2005). Habit reversal treatment of tic disorders: A methodological critique of the literature. Behavior Modification, 29(6), 858–875.CrossRefPubMedGoogle Scholar
  18. Clemens, M. A. (2017). The meaning of failed replications: A review and proposal. Journal of Economic Surveys, 31(1), 326–342.CrossRefGoogle Scholar
  19. Cleveland, W. S., & McGill, R. (1985). Graphical perception and graphical methods for analyzing scientific data. Science, 229(4716), 828–833.CrossRefPubMedGoogle Scholar
  20. Cleveland, W. S., & McGill, R. (1986). An experiment in graphical perception. International Journal of Man-Machine Studies, 25(5), 491–500.CrossRefGoogle Scholar
  21. Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304–1312.CrossRefGoogle Scholar
  22. Cohen, J. (1992). A power primer. Psychological bulletin, 112(1), 155–159.CrossRefPubMedGoogle Scholar
  23. Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49(12), 997–1003.CrossRefGoogle Scholar
  24. Collini, S. A., & Huitema, B. E. (2019). Effect metrics for behavioral data. Paper to be presented at the Association for Behavior Analysis International Conference, Chicago.Google Scholar
  25. Couzin-Frankel, J. (2018). Journals under the microscope. Science, 361(6408), 1180–1183.  https://doi.org/10.1126/science.361.6408.1180.CrossRefPubMedGoogle Scholar
  26. Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25(1), 7–29.CrossRefPubMedGoogle Scholar
  27. Cumming, G., & Finch, S. (2005). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60(2), 170–180.CrossRefPubMedGoogle Scholar
  28. de Vrieze, J. (2018). The metawars. Science, 361(6408), 1184–1188.CrossRefPubMedGoogle Scholar
  29. Earp, B. D., & Trafimow, D. (2015). Replication, falsification, and the crisis of confidence in social psychology. Frontiers in Psychology, 6, 621.  https://doi.org/10.3389/fpsyg.2015.00621.CrossRefPubMedPubMedCentralGoogle Scholar
  30. Ellis, P. D. (2010). The essential guide to effect sizes: Statistical power, meta-analysis, and the interpretation of research results. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  31. Errington, T. M., Iorns, E., Gunn, W., Tan, F. E., Lomax, J., & Nosek, B. A. (2014). Science forum: An open investigation of the reproducibility of cancer biology research. Elife, 3, e04333.CrossRefPubMedCentralGoogle Scholar
  32. Ferron, J., & Jones, P. K. (2006). Tests for the visual analysis of response-guided multiple-baseline data. Journal of Experimental Education, 75(1), 66–81.CrossRefGoogle Scholar
  33. Fisher, A., Anderson, G. B., Peng, R., & Leek, J. (2014). A randomized trial in a massive online open course shows people don’t know what a statistically significant relationship looks like, but they can learn. PeerJ, 2, e589.  https://doi.org/10.7717/peerj.589.
  34. Fisher, W. W., Kelley, M. E., & Lomas, J. E. (2003). Visual aids and structured criteria for improving visual inspection and interpretation of single-case designs. Journal of Applied Behavior Analysis, 36(3), 387–406.CrossRefPubMedPubMedCentralGoogle Scholar
  35. Fisch, G. S. (1998). Visual inspection of data revisited: Do the eyes still have it? The Behavior Analyst, 21(1), 111–123.CrossRefPubMedPubMedCentralGoogle Scholar
  36. Fong, E. H., Catagnus, R. M., Brodhead, M. T., Quigley, S., & Field, S. (2016). Developing the cultural awareness skills of behavior analysts. Behavior Analysis in Practice, 9(1), 84–94.CrossRefPubMedPubMedCentralGoogle Scholar
  37. Foster, T. M., Jarema, K., & Poling, A. (1999). Inferential statistics: Criticised by Sidman (1960), but popular in the Journal of the Experimental Analysis of Behavior. Behaviour Change, 16(3), 203–204.CrossRefGoogle Scholar
  38. Frank, M. C., Bergelson, E., Bergmann, C., Cristia, A., Floccia, C., Gervain, J., et al. (2017). A collaborative approach to infant research: Promoting reproducibility, best practices, and theory-building. Infancy, 22(4), 421–435.CrossRefGoogle Scholar
  39. Gelman, A., & Carlin, J. (2014). Beyond power calculations: Assessing type S (sign) and type M (magnitude) errors. Perspectives on Psychological Science, 9(6), 641–651.CrossRefPubMedGoogle Scholar
  40. Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on “Estimating the reproducibility of psychological science”. Science, 351(6277), 1037–1037.CrossRefPubMedGoogle Scholar
  41. Goodman, S. N., Fanelli, D., & Ioannidis, J. P. (2016). What does research reproducibility mean? Science Translational Medicine, 8(341), 1–6.Google Scholar
  42. Greenwald, A., Gonzalez, R., Harris, R. J., & Guthrie, D. (1996). Effect sizes and p values: What should be reported and what should be replicated? Psychophysiology, 33(2), 175–183.CrossRefPubMedGoogle Scholar
  43. Gresham, F. M., Gansle, K. A., & Noell, G. H. (1993). Treatment integrity in applied behavior analysis with children. Journal of Applied Behavior Analysis, 26(2), 257–263.CrossRefPubMedPubMedCentralGoogle Scholar
  44. Hales, A. H., Wesselmann, E. D., & Hilgard, J. (2018). Improving psychological science through transparency and openness: An overview. Perspectives on Behavior Science, 1–19.  https://doi.org/10.1007/s40614-018-00186-8.
  45. Hamblin, J. (2018). A credibility crisis in food science. The Atlantic Monthly. Retrieved from https://www.theatlantic.com/health/archive/2018/09/what-is-food-science/571105/
  46. Haney, C., Banks, W. C., & Zimbardo, P. G. (1973). A study of prisoners and guards in a simulated prison. Naval Research Review, 30, 4–17.Google Scholar
  47. Hanley, G. P. (2012). Functional assessment of problem behavior: Dispelling myths, overcoming implementation obstacles, and developing new lore. Behavior Analysis in Practice, 5(1), 54–72.CrossRefPubMedPubMedCentralGoogle Scholar
  48. Hantula, D. A. (2018). Behavior science emerges. Perspectives on Behavior Science, 41(1), 1–6.Google Scholar
  49. Harris, R. J. (1997). Significance tests have their place. Psychological Science, 8(1), 8–11.CrossRefGoogle Scholar
  50. Harris, R. J. (2016). Reforming significance testing via three-valued logic. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger, (Eds.) What if there were no significance tests? (pp. 179–206). New York: Routledge.Google Scholar
  51. Harvey, S. T., Boer, D., Meyer, L. H., & Evans, I. M. (2009). Updating a meta-analysis of intervention research with challenging behaviour: Treatment validity and standards of practice. Journal of Intellectual & Developmental Disability, 34(1), 67–80.CrossRefGoogle Scholar
  52. Haslam, S. A., & Reicher, S. D. (2012). Contesting the “nature” of conformity: What Milgram and Zimbardo's studies really show. PLoS Biology, 10(11), e1001426.CrossRefPubMedPubMedCentralGoogle Scholar
  53. Henrich, J., Heine, S. J., & Norenzayan, A. (2010). Beyond WEIRD: Towards a broad-based behavioral science. Behavioral & Brain Sciences, 33(2–3), 111–135.CrossRefGoogle Scholar
  54. Heyvaert, M., Saenen, L., Campbell, J. M., Maes, B., & Onghena, P. (2014). Efficacy of behavioral interventions for reducing problem behavior in persons with autism: An updated quantitative synthesis of single-subject research. Research in Developmental Disabilities, 35(10), 2463–2476.CrossRefPubMedGoogle Scholar
  55. Horner, R. H., Swaminathan, H., Sugai, G., & Smolkowski, K. (2012). Considerations for the systematic analysis and use of single-case research. Education & Treatment of Children, 35(2), 269–290.CrossRefGoogle Scholar
  56. Huitema, B. E. (1979). Graphic vs. statistical methods of evaluating data: Another look and another analysis. Dearborn: Paper presented at the meeting of the Association for Behavior Analysis.Google Scholar
  57. Huitema, B. E. (1986a). Autocorrelation in behavioral research. In A. Poling & R. W. Fuqua (Eds.), Research methods in applied behavior analysis: Issues and advances (pp. 187–208). New York: Plenum.CrossRefGoogle Scholar
  58. Huitema, B. E. (1986b). Statistical analysis and single-subject designs: Some misunderstandings. In A. Poling & R. W. Fuqua (Eds.), Research methods in applied behavior analysis: Issues and Advances (pp. 209–232). Boston: Springer.CrossRefGoogle Scholar
  59. Huitema, B. E. (1988). Autocorrelation: 10 years of confusion. Behavioral Assessment, 10(3), 253–294.Google Scholar
  60. Huitema, B. E. (2004). Analysis of interrupted time-series experiments using ITSE: A critique. Understanding Statistics: Statistical Issues in Psychology, Education, & the Social Sciences, 3(1), 27–46.CrossRefGoogle Scholar
  61. Huitema, B. (2011). The analysis of covariance and alternatives: Statistical methods for experiments, quasi-experiments, and single-case studies. Hoboken: Wiley.CrossRefGoogle Scholar
  62. Huitema, B. E. (2016, May). Final fusilillade. Paper presented at the meeting of the Association for Behavior Analysis International, Chicago.Google Scholar
  63. Huitema, B. E. (2018). The effect. Unpublished Department of Psychology Technical Report. Kalamazoo: Western Michigan University.Google Scholar
  64. Huitema, B. E., & McKean, J. W. (1998). Irrelevant autocorrelation in least-squares intervention models. Psychological Methods, 3(1), 104–116.CrossRefGoogle Scholar
  65. Huitema, B. E., & McKean, J. W. (2000). Design specification issues in time-series intervention models. Educational & Psychological Measurement, 60, 38–58.CrossRefGoogle Scholar
  66. Huitema, B. E., McKean, J. W., & Laraway, S. (2008). Time-series intervention analysis using ITSACORR: Fatal flaws. Journal of Modern Applied Statistical Methods, 6, 367–379.CrossRefGoogle Scholar
  67. Huitema, B.E., & Urschel, J. (2014). Elementary statistics courses fail miserably in teaching the p-value. Paper presented at the meeting of the Association for Behavior Analysis International, Chicago.Google Scholar
  68. Hurlbert, S. H., & Lombardi, C. M. (2009). Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian. Annales Zoologici Fennici, 46(5), 311–350.CrossRefGoogle Scholar
  69. Hurl, K., Wightman, J., Haynes, S. N., & Virues-Ortega, J. (2016). Does a pre-intervention functional assessment increase intervention effectiveness? A meta-analysis of within-subject interrupted time-series studies. Clinical Psychology Review, 47, 71–84.CrossRefPubMedGoogle Scholar
  70. Ioannidis, J. P. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.CrossRefPubMedPubMedCentralGoogle Scholar
  71. Ioannidis, J. P. (2014). How to make more published research true. PLoS Medicine, 11(10), e1001747.  https://doi.org/10.1371/journal.pmed.1001747.CrossRefPubMedPubMedCentralGoogle Scholar
  72. Ioannidis J. P. (2015). Failure to Replicate: Sound the Alarm. Cerebrum: The Dana forum on brain science, 2015, cer-12a-15. City of publication is NY, NY. The editor is Glovin, B.Google Scholar
  73. John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23(5), 524–532.CrossRefPubMedGoogle Scholar
  74. Johnston, J. M., & Pennypacker, H. S., Jr. (2009). Strategies and tactics of behavioral research (3rd ed.). New York: Routledge/Taylor & Francis Group.Google Scholar
  75. Jones, L. V., & Tukey, J. W. (2000). A sensible formulation of the significance test. Psychological Methods, 5(4), 411–414.CrossRefPubMedGoogle Scholar
  76. Kahneman, D. (2014). A new etiquette for replication. Social Psychology, 45(4), 310–311.Google Scholar
  77. Kata, A. (2010). A postmodern Pandora's box: Anti-vaccination misinformation on the Internet. Vaccine, 28(7), 1709–1716.CrossRefPubMedGoogle Scholar
  78. Kazdin, A. (1982). Single-case research designs: Methods for Clinical and Applied Settings. New York: Oxford University Press.Google Scholar
  79. Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality & Social Psychology Review, 2(3), 196–217.CrossRefGoogle Scholar
  80. Killeen, P. R. (2018). Predict, control, and replicate to understand: How statistics can foster the fundamental goals of science. Perspectives on Behavior Science.  https://doi.org/10.1007/s40614-018-0171-8.
  81. Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational & Psychological Measurement, 56(5), 746–759.CrossRefGoogle Scholar
  82. Kratochwill, T. R., Hitchcock, J., Horner, R. H., Levin, J. R., Odom, S. L., Rindskopf, D. M & Shadish, W. R. (2010). Single-case designs technical documentation. Retrieved from what works clearing house website: http://ies.ed.gov/ncee/wwc/pdf/wwc_scd.pdf.
  83. Kulig, T. C., Pratt, T. C., & Cullen, F. T. (2017). Revisiting the Stanford Prison Experiment: A case study in organized skepticism. Journal of Criminal Justice Education, 28(1), 74–111.CrossRefGoogle Scholar
  84. Kyonka, E. G. (2018). Tutorial: small-N power analysis. Perspectives on Behavior Science.  https://doi.org/10.1007/s40614-018-0167-4.
  85. Lang, J. M., Rothman, K. J., & Cann, C. I. (1998). That confounded P-value. Epidemiology, 9(1), 7–8.CrossRefPubMedGoogle Scholar
  86. Lanovaz, M. J., Huxley, S. C., & Dufour, M. M. (2017). Using the dual-criteria methods to supplement visual inspection: An analysis of nonsimulated data. Journal of Applied Behavior Analysis, 50(3), 662–667.CrossRefPubMedGoogle Scholar
  87. Lanovaz, M. J., Robertson, K. M., Soerono, K., & Watkins, N. (2013). Effects of reducing stereotypy on other behaviors: A systematic review. Research in Autism Spectrum Disorders, 7(10), 1234–1243.CrossRefGoogle Scholar
  88. Lanovaz, M. J., Turgeon, S., Cardinal, P., & Wheatley, T. L. (2018). Using single-case designs in practical settings: Is within-subject replication always necessary? Perspectives on Behavior Science, 1–10.  https://doi.org/10.1007/s40614-018-0138-9.
  89. Lane, J. D., & Gast, D. L. (2014). Visual analysis in single case experimental design studies: Brief review and guidelines. Neuropsychological Rehabilitation, 24(3–4), 445–463.CrossRefPubMedGoogle Scholar
  90. Leek, J. T., & Jager, L. R. (2017). Is most published research really false? Annual Review of Statistics & Its Application, 4, 109–122.  https://doi.org/10.1146/annurev-statistics-060116-054104.CrossRefGoogle Scholar
  91. Leek, J. T., & Peng, R. D. (2015). Statistics: P values are just the tip of the iceberg. Nature News, 520(7549), 612.CrossRefGoogle Scholar
  92. Loftus, G. R. (1996). Psychology will be a much better science when we change the way we analyze data. Current Directions in Psychological Science, 5(6), 161–171.CrossRefGoogle Scholar
  93. Lynch, J. G., Jr., Bradlow, E. T., Huber, J. C., & Lehmann, D. R. (2015). Reflections on the replication corner: In praise of conceptual replications. International Journal of Research in Marketing, 32(4), 333–342.CrossRefGoogle Scholar
  94. Matyas, T. A., & Greenwood, K. M. (1990). Visual analysis of single-case time series: Effects of variability, serial dependence, and magnitude of intervention effects. Journal of Applied Behavior Analysis, 23(3), 341–351.CrossRefPubMedPubMedCentralGoogle Scholar
  95. Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? American Psychologist, 70(6), 487–498.CrossRefPubMedGoogle Scholar
  96. McElreath, R., & Smaldino, P. E. (2015). Replication, communication, and the population dynamics of scientific discovery. PLoS One, 10(8), e0136088.CrossRefPubMedPubMedCentralGoogle Scholar
  97. McIntyre, L. L., Gresham, F. M., DiGennaro, F. D., & Reed, D. D. (2007). Treatment integrity of school-based interventions with children in the Journal of Applied Behavior Analysis 1991–2005. Journal of Applied Behavior Analysis, 40(4), 659–672.CrossRefPubMedGoogle Scholar
  98. McNeeley, S., & Warner, J. J. (2015). Replication in criminology: A necessary practice. European Journal of Criminology, 12(5), 581–597.CrossRefGoogle Scholar
  99. Michael, J. (1974). Statistical inference for individual organism research: Mixed blessing or curse? Journal of Applied Behavior Analysis, 7(4), 647–653.CrossRefPubMedPubMedCentralGoogle Scholar
  100. Mischel, W. (1958). Preference for delayed reinforcement: An experimental study of a cultural observation. Journal of Abnormal & Social Psychology, 56(1), 57.CrossRefGoogle Scholar
  101. Nelson, L. D., Simmons, J., & Simonsohn, U. (2018). Psychology’s renaissance. Annual review of Psychology, 69, 511–523.  https://doi.org/10.1146/annurev-psych-122216-011836.CrossRefPubMedGoogle Scholar
  102. Nix, T. W., & Barnette, J. J. (1998). The data analysis dilemma: Ban or abandon. A review of null hypothesis significance testing. Research in the Schools, 5(2), 3–14.Google Scholar
  103. Northup, J., Fusilier, I., Swanson, V., Roane, H., & Borrero, J. (1997). An evaluation of methylphenidate as a potential establishing operation for some common classroom reinforcers. Journal of Applied Behavior Analysis, 30(4), 615–625.CrossRefPubMedPubMedCentralGoogle Scholar
  104. Nosek, B. A., & Errington, T. M. (2017). Reproducibility in cancer biology: Making sense of replications. Elife, 6, e23383.CrossRefPubMedPubMedCentralGoogle Scholar
  105. Olive, M. L., & Smith, B. W. (2005). Effect size calculations and single subject designs. Educational Psychology, 25(2–3), 313–324.CrossRefGoogle Scholar
  106. Open Science Collaboration. (2012). An open, large-scale, collaborative effort to estimate the reproducibility of psychological science. Perspectives on Psychological Science, 7(6), 657–660.CrossRefGoogle Scholar
  107. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.CrossRefGoogle Scholar
  108. Open Science Collaboration. (2017). Maximizing the reproducibility of your research. In S. O. Lilienfeld & I. D. Waldmen (Eds.), Psychological science under scrutiny: Recent challenges and proposed solutions (pp. 1–21). New York: Wiley.Google Scholar
  109. Parker, R. I., & Vannest, K. (2009). An improved effect size for single-case research: Nonoverlap of all pairs. Behavior Therapy, 40(4), 357–367.CrossRefPubMedGoogle Scholar
  110. Parsonson, B. S., & Baer, D. M. (1986). The graphic analysis of data. In A. Poling & R. W. Fuqua (Eds.), Research methods in applied behavior analysis: Issues and Advances (pp. 157–186). New York: Plenum.CrossRefGoogle Scholar
  111. Pashler, H., & Harris, C. R. (2012). Is the replicability crisis overblown? Three arguments examined. Perspectives on Psychological Science, 7(6), 531–536.CrossRefPubMedGoogle Scholar
  112. Pashler, H., & Wagenmakers, E.-J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence? Perspectives on Psychological Science, 7(6), 528–530.CrossRefPubMedGoogle Scholar
  113. Perone, M. (1991). Experimental design in the analysis of free-operant behavior. In I. H. Iversen & K. A. Lattal (Eds.), Techniques in the behavioral and neural sciences: Vol. 6. Experimental Analysis of Behavior: Part I (pp. 135–171) Amsterdam: Elsevier.Google Scholar
  114. Perone, M. (1999). Statistical inference in behavior analysis: Experimental control is better. The Behavior Analyst, 22(2), 109–116.CrossRefPubMedPubMedCentralGoogle Scholar
  115. Perone, M. (2018). How I learned to stop worrying and love replication failures. Perspectives on Behavior Science.  https://doi.org/10.1007/s40614-018-0153-x.
  116. Perry, G. (2018). The shocking truth of Stanley Milgram’s obedience experiments. New Scientist. Retrieved from https://www.newscientist.com/article/mg23731691-000-the-shocking-truth-of-stanley-milgrams-obedience-experiments/
  117. Peterson, L., Homer, A. L., & Wonderlich, S. A. (1982). The integrity of independent variables in behavior analysis. Journal of Applied Behavior Analysis, 15(4), 477–492.CrossRefPubMedPubMedCentralGoogle Scholar
  118. Petursdottir, A. I., & Carr, J. E. (2018). Applying the taxonomy of validity threats from mainstream research design to single-case experiments in applied behavior analysis. Behavior Analysis in Practice, 11(3), 228–240.CrossRefPubMedGoogle Scholar
  119. Poling, A., & Fuqua, R. W. (1986). Research methods in applied behavior analysis: Issues and Advances. New York: Plenum.CrossRefGoogle Scholar
  120. Poling, A., Grossett, D., Karas, C. A., & Breuning, S. E. (1985). Medication regimen: A subject characteristic rarely reported in behavior modification studies. Applied Research in Mental Retardation, 6(1), 71–77.CrossRefPubMedGoogle Scholar
  121. Poling, A., Methot, L. L., & LeSage, M. G. (1995). Fundamentals of behavior analytic research. New York: Plenum Press.CrossRefGoogle Scholar
  122. Reicher, S., & Haslam, S. A. (2006). Rethinking the psychology of tyranny: The BBC prison study. British Journal of Social Psychology, 45(1), 1–40.CrossRefPubMedGoogle Scholar
  123. Resnick, B. (2017, July). What a nerdy debate about p-values shows about science—and how to fix it. Vox. Retrieved from https://www.vox.com/science-and-health/2017/7/31/16021654/p-values-statistical-significance-redefine-0005
  124. Resnick, B. (2018). The Stanford Prison Experiment was massively influential. We just learned it was a fraud. Vox. Retrieved from https://www.vox.com/2018/6/13/17449118/stanford-prison-experiment-fraud-psychology-replication
  125. Resnik, D. B., & Stewart, C. N. (2012). Misconduct versus honest error and scientific disagreement. Accountability in Research, 19(1), 56–63.CrossRefPubMedPubMedCentralGoogle Scholar
  126. Romm, C. (2015). Rethinking one of psychology's most infamous experiments. The Atlantic Monthly. Retrieved from https://www.theatlantic.com/health/archive/2015/01/rethinking-one-of-psychologys-most-infamous-experiments/384913/
  127. Rooker, G. W., Iwata, B. A., Harper, J. M., Fahmie, T. A., & Camp, E. M. (2011). False-positive tangible outcomes of functional analyses. Journal of Applied Behavior Analysis, 44(4), 737–745.CrossRefPubMedPubMedCentralGoogle Scholar
  128. Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638–641.CrossRefGoogle Scholar
  129. Rosenthal, R., & Rosnow, R. L. (2009). Artifacts in behavioral research: Robert Rosenthal and Ralph L. Rosnow’s Classic Books. New York: Oxford University Press.CrossRefGoogle Scholar
  130. Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research: A correlational approach. Cambridge: Cambridge University Press.Google Scholar
  131. Rosnow, R. L., & Rosenthal, R. (1989). Statistical procedures and the justification of knowledge in psychological science. American psychologist, 44(10), 1276–1284.CrossRefGoogle Scholar
  132. Rotello, C. M., Heit, E., & Dubé, C. (2015). When more data steer us wrong: Replications with the wrong dependent measure perpetuate erroneous conclusions. Psychonomic Bulletin & Review, 22(4), 944–954.CrossRefGoogle Scholar
  133. Schmidt, F. L., & Oh, I. S. (2016). The crisis of confidence in research findings in psychology: Is lack of replication the real problem? Or is it something else? Archives of Scientific Psychology, 4(1), 32–37.CrossRefGoogle Scholar
  134. Schooler, J. W. (2014). Turning the lens of science on itself: Verbal overshadowing, replication, and metascience. Perspectives on Psychological Science, 9(5), 579–584.CrossRefPubMedGoogle Scholar
  135. Schwartz, I. S., & Baer, D. M. (1991). Social validity assessments: Is current practice state of the art? Journal of Applied Behavior Analysis, 24(2), 189–204.CrossRefPubMedPubMedCentralGoogle Scholar
  136. Schweinsberg, M., Madan, N., Vianello, M., Sommer, S. A., Jordan, J., Tierney, W., & Srinivasan, M. (2016). The pipeline project: Pre-publication independent replications of a single laboratory's research pipeline. Journal of Experimental Social Psychology, 66, 55–67.CrossRefGoogle Scholar
  137. Shadish, W., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston: Houghton Mifflin.Google Scholar
  138. Shadish, W. R., Hedges, L. V., & Pustejovsky, J. E. (2014a). Analysis and meta-analysis of single-case designs with a standardized mean difference statistic: A primer and applications. Journal of School Psychology, 52(2), 123–147.CrossRefPubMedGoogle Scholar
  139. Shadish, W. R., Hedges, L. V., Pustejovsky, J. E., Boyajian, J. G., Sullivan, K. J., Andrade, A., & Barrientos, J. L. (2014b). A d-statistic for single-case designs that is equivalent to the usual between-groups d-statistic. Neuropsychological Rehabilitation, 24(3–4), 528–553.CrossRefPubMedGoogle Scholar
  140. Shadish, W. R., & Sullivan, K. J. (2011). Characteristics of single-case designs used to assess intervention effects in 2008. Behavior Research Methods, 43(4), 971–980.CrossRefPubMedGoogle Scholar
  141. Shaw, D. (2018). The quest for clarity in research integrity: A conceptual schema. Science & Engineering Ethics, 1–9.  https://doi.org/10.1007/s11948-018-0052-2
  142. Shirley, M. J., Iwata, B. A., & Kahng, S. (1999). False-positive maintenance of self-injurious behavior by access to tangible reinforcers. Journal of Applied Behavior Analysis, 32(2), 201–204.CrossRefPubMedPubMedCentralGoogle Scholar
  143. Shrout, P. E., & Rodgers, J. L. (2018). Psychology, science, and knowledge construction: Broadening perspectives from the replication crisis. Annual Review of Psychology, 69, 487–510.  https://doi.org/10.1146/annurev-psych-122216-011845.CrossRefPubMedGoogle Scholar
  144. Sidman, M. (1960). Tactics of scientific research. Oxford: Basic Books.Google Scholar
  145. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366.CrossRefPubMedGoogle Scholar
  146. Simonsohn, U. (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559–569.CrossRefPubMedGoogle Scholar
  147. Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: A key to the file-drawer. Journal of Experimental Psychology: General, 143(2), 534–547.CrossRefGoogle Scholar
  148. Skinner, B. F. (1956). A case history in scientific method. American Psychologist, 11(5), 221–233.CrossRefGoogle Scholar
  149. Skinner, B. F. (2014). Verbal behavior. Cambridge: B. F. Skinner Foundation (Original work published 1957).Google Scholar
  150. Smaldino, P. E., & McElreath, R. (2018). The natural selection of bad science. Royal Society Open Science, 3(9), 160384.CrossRefGoogle Scholar
  151. Stein, R. A. (2017). The golden age of anti-vaccine conspiracies. Germs, 7(4), 168–170.CrossRefPubMedPubMedCentralGoogle Scholar
  152. Stokstad, E. (2018). The truth squad. Science, 361(6408), 1189–1191.  https://doi.org/10.1126/science.361.6408.1189.CrossRefPubMedGoogle Scholar
  153. Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9(1), 59–71.CrossRefPubMedGoogle Scholar
  154. Trafimow, D., & Marks, M. (2015). Editorial. Basic & Applied Social Psychology, 37, 1–2.CrossRefGoogle Scholar
  155. Tufte, E. R. (1990). Envisioning information. Cheshire: Graphics Press.Google Scholar
  156. Tufte, E. R. (1997). Visual explanations. CT: Cheshire.Google Scholar
  157. Tufte, E. R. (2006). Beautiful evidence. CT: Cheshire.Google Scholar
  158. Tufte, E. R. (2009). The visual display of quantitative information (2nd ed.). CT: Cheshire.Google Scholar
  159. Tukey, J. W. (1977). Exploratory data analysis. Reading: Addison-Wesley.Google Scholar
  160. Valentine, J. C., Aloe, A. M., & Lau, T. S. (2015). Life after NHST: How to describe your data without “p-ing” everywhere. Basic & Applied Social Psychology, 37(5), 260–273.CrossRefGoogle Scholar
  161. Van Bavel, J. J., Mende-Siedlecki, P., Brady, W. J., & Reinero, D. A. (2016). Contextual sensitivity in scientific reproducibility. Proceedings of the National Academy of Sciences, 113(23), 6454–6459.CrossRefGoogle Scholar
  162. Watts, T. W., Duncan, G. J., & Quan, H. (2018). Revisiting the marshmallow test: A conceptual replication investigating links between early delay of gratification and later outcomes. Psychological Science, 29(7), 1159–1177.CrossRefPubMedPubMedCentralGoogle Scholar
  163. Weaver, E. S., & Lloyd, B. P. (2018). Randomization tests for single case designs with rapidly alternating conditions: An analysis of p-values from published experiments. Perspectives on Behavior Science,  https://doi.org/10.1007/s40614-018-0165-6.
  164. Weeden, M., & Poling, A. (2011). Identifying reinforcers in skill acquisition studies involving participants with autism: Procedures reported from 2005 to 2009. Research in Autism Spectrum Disorders, 5(1), 388–391.CrossRefGoogle Scholar
  165. Weeden, M., Porter, L. K., Durgin, A., Redner, R. N., Kestner, K. M., Costello, M., et al. (2011). Reporting of medication information in applied studies of people with autism. Research in Autism Spectrum Disorders, 5(1), 108–111.CrossRefGoogle Scholar
  166. Wilkinson, L. & American Psychological Association Task Force on Statistical Inference Science Directorate. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54(8), 594–604.CrossRefGoogle Scholar
  167. Williams, V. S., Jones, L. V., & Tukey, J. W. (1999). Controlling error in multiple comparisons, with examples from state-to-state differences in educational achievement. Journal of Educational and Behavioral Statistics, 24(1), 42–69.CrossRefGoogle Scholar
  168. White, D. M., Rusch, F. R., Kazdin, A. E., & Hartmann, D. P. (1989). Applications of meta analysis in individual-subject research. Behavioral Assessment, 11(3), 281–296. Google Scholar
  169. Wolf, M. M. (1978). Social validity: The case for subjective measurement or how applied behavior analysis is finding its heart. Journal of Applied Behavior Analysis, 11(2), 203–214.CrossRefPubMedPubMedCentralGoogle Scholar
  170. Yong, E. (2012). In the wake of high-profile controversies, psychologists are facing up to problems with replication. Nature, 485(7398), 298–300.CrossRefPubMedGoogle Scholar

Copyright information

© Association for Behavior Analysis International 2019

Authors and Affiliations

  1. 1.Department of PsychologySan José State UniversitySan JoséUSA
  2. 2.Menlo CollegeAthertonUSA
  3. 3.Western Michigan UniversityKalamazooUSA

Personalised recommendations