Predict, Control, and Replicate to Understand: How Statistics Can Foster the Fundamental Goals of Science
- 344 Downloads
Scientists abstract hypotheses from observations of the world, which they then deploy to test their reliability. The best way to test reliability is to predict an effect before it occurs. If we can manipulate the independent variables (the efficient causes) that make it occur, then ability to predict makes it possible to control. Such control helps to isolate the relevant variables. Control also refers to a comparison condition, conducted to see what would have happened if we had not deployed the key ingredient of the hypothesis: scientific knowledge only accrues when we compare what happens in one condition against what happens in another. When the results of such comparisons are not definitive, metrics of the degree of efficacy of the manipulation are required. Many of those derive from statistical inference, and many of those poorly serve the purpose of the cumulation of knowledge. Without ability to replicate an effect, the utility of the principle used to predict or control is dubious. Traditional models of statistical inference are weak guides to replicability and utility of results. Several alternatives to null hypothesis testing are sketched: Bayesian, model comparison, and predictive inference (prep). Predictive inference shows, for example, that the failure to replicate most results in the Open Science Project was predictable. Replicability is but one aspect of scientific understanding: it establishes the reliability of our data and the predictive ability of our formal models. It is a necessary aspect of scientific progress, even if not by itself sufficient for understanding.
KeywordsControl Predict Replicate Understand NHST Open Science Collaboration Four causes prep
- APS. (2017). Registered replication reports. Retrieved from https://www.psychologicalscience.org/publications/replication.
- Barlow, D. H., Nock, M., & Hersen, M. (2008). Single case research designs: strategies for studying behavior change (3rd ed.). New York, NY: Allyn & Bacon.Google Scholar
- Brackney, R. J., Cheung, T. H., Neisewander, J. L., & Sanabria, F. (2011). The isolation of motivational, motoric, and schedule effects on operant performance: a modeling approach. Journal of the Experimental Analysis of Behavior, 96(1), 17–38. https://doi.org/10.1901/jeab.2011.CrossRefPubMedPubMedCentralGoogle Scholar
- Branch, M. N. (2018). The “reproducibility crisis”: might methods used frequently in behavior analysis research help? Perspectives on Behavior Science. https://doi.org/10.1007/s40614-018-0158-5.
- Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: a practical information-theoretic approach (2nd ed.). New York, NY: Springer.Google Scholar
- Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365–376. https://doi.org/10.1038/nrn3475.CrossRefPubMedGoogle Scholar
- Colquhoun, D. (2017). The problem with p-values. Aeon. Retrieved from https://aeon.co/essays/it-s-time-for-science-to-abandon-the-term-statistically-significant?utm_source=Friends&utm_campaign=169df1a4dd.
- Davison, M. (2016). Quantitative analysis: a personal historical reminiscence. Retrieved from https://www.researchgate.net/profile/Michael_Davison2/publication/292986440_History/links/56b4614908ae5deb26587dbe.pdf.
- Edgington, E., & Onghena, P. (2007). Randomization tests. Boca Raton, FL: Chapman Hall/CRC Press.Google Scholar
- Estes, W. K. (1991). Statistical models in behavioral research. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
- Fisher, R. A. (1959). Statistical methods and scientific inference (2nd ed.). New York, NY: Hafner.Google Scholar
- Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: methodological issues (pp. 311–339). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
- Gigerenzer, G. (2006). What’s in a sample? A manual for building cognitive theories. In K. Fiedler & P. Juslin (Eds.), Information sampling and adaptive cognition (pp. 239–260). New York, NY: Cambridge University Press.Google Scholar
- Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (Eds.). (1997). What if there were no significance tests? Mawah, NJ: Lawrence Erlbaum Associates.Google Scholar
- Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. New York, NY: Academic Press.Google Scholar
- Killeen, P. R. (2005b). Tea-tests. General Psychologist, 40(2), 16–19.Google Scholar
- Killeen, P. R. (2007). Replication statistics. In J. W. Osborne (Ed.), Best practices in quantitative methods (pp. 103–124). Thousand Oaks, CA: Sage.Google Scholar
- Killeen, P. R. (2015). P rep, the probability of replicating an effect. In R. L. Cautin & S. O. Lillenfeld (Eds.), The encyclopedia of clinical psychology (Vol. 4, pp. 2201–2208). Hoboken, NJ: Wiley.Google Scholar
- Kyonka, E. G. E. (2018). Tutorial: small-n power analysis. [e-article]. Perspectives on Behavior Science. https://doi.org/10.1007/s40614-018-0167-4.
- Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting & Clinical Psychology, (46), 806–834.Google Scholar
- Mill, J. S. (1904). A system of logic (8th ed.). London: Longmans, Green.Google Scholar
- Nuzzo, R. (2014). Scientific method, statistical errors: P values, the “gold standard” of statistical validity, are not as reliable as many scientists assume. Nature News. Retrieved from http://www.nature.com/news/scientific-method-statistical-errors-1.14700 , 506, 150–152.CrossRefGoogle Scholar
- Okrent, A. (2013). The Cupertino effect: 11 spell check errors that made it to press. Mental Floss. Retrieved from https://goo.gl/yQobXc.
- Peirce, C. S. (1955). Abduction and induction: philosophical writings of Peirce (Vol. 11). New York, NY: Dover.Google Scholar
- Perone, M. (2018). How I learned to stop worrying and love replication failures. Perspectives on Behavior Science. https://doi.org/10.1007/s40614-018-0153-x.
- Perone, M., & Hursh, D. E. (2013). Single-case experimental designs. APA handbook of behavior analysis (vol. 1, pp. 107–126).Google Scholar
- Royall, R. (1997). Statistical evidence: a likelihood paradigm. London, UK: Chapman & Hall.Google Scholar
- Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. New York, NY: Wadsworth Cengage Learning.Google Scholar
- Shadish, W. R., & Haddock, C. K. (1994). Combining estimates of effect size. In H. Cooper & V. L. Hedges (Eds.), The handbook of research synthesis (pp. 261–281). New York, NY: Russell Sage Foundation.Google Scholar
- Urbach, P. (1987). Francis Bacon’s philosophy of science: an account and a reappraisal. LaSalle, IL: Open Court.Google Scholar
- Van Dongen, H. P. A., & Dinges, D. F. (2000). Circadian rhythms in fatigue, alertness, and performance. In M. Kryger, T. Roth, & W. Dement (Eds.), Principles and practice of sleep medicine (Vol. 20, 3rd ed., pp. 391–399). Philadelphia, PA: Saunders.Google Scholar
- Weaver, E. S., & Lloyd, B. P. (2018). Randomization tests for single case designs with rapidly alternating conditions: an analysis of p-values from published experiments. Perspectives on Behavior Science. https://doi.org/10.1007/s40614-018-0165-6.
- Wikipedia. (2017a). Replication crisis. Retrieved August 21, 2017, from https://en.wikipedia.org/w/index.php?title=Replication_crisis&oldid=795876147.
- Wikipedia. (2017b). Scientific method. Retrieved July 22, 2018, from https://en.wikipedia.org/w/index.php?title=Scientific_method&oldid=795832022.
- Winkler, R. L. (2003). An introduction to Bayesian inference and decision (2nd ed.). Gainseville, FL: Probabilistic Publishing.Google Scholar
- Yong, E. (2015). How reliable are psychology studies. The Atlantic. https://www.theatlantic.com/science/archive/2015/08/psychology-studies-reliability-reproducability-nosek/402466/.