Tests of Statistical Significance Made Sound

  • Brian D. HaigEmail author
Part of the Studies in Applied Philosophy, Epistemology and Rational Ethics book series (SAPERE, volume 45)


This chapter considers the nature and place of tests of statistical significance (ToSS) in science, with particular reference to psychology. Despite the enormous amount of attention given to this topic, psychology’s understanding of ToSS remains deficient. The major problem stems from a widespread and uncritical acceptance of null hypothesis significance testing, which is an indefensible amalgam of ideas adapted from Fisher’s thinking on the subject and from Neyman and Pearson’s alternative account. To correct for the deficiencies of the hybrid, it is suggested that psychology avail itself of two important and more recent viewpoints on ToSS, namely the neo-Fisherian and the error-statistical perspectives. It is suggested that these more recent outlooks on ToSS are a definite improvement on standard null hypothesis significance testing. It is concluded that ToSS can play a useful, if limited, role in psychological research.


  1. Acree, M. C. (1978). Theories of statistical inference in psychological research: A historico-critical study (University Microfilms No. H790 H7000). Ann Arbor, MI: University Microfilms International.Google Scholar
  2. Bolles, R. C. (1962). The difference between statistical hypotheses and scientific hypotheses. Psychological Reports, 11, 639–645.CrossRefGoogle Scholar
  3. Cox, D. R. (1958). Some problems connected with statistical inference. Annals of Mathematical Statistics, 29, 357–372.CrossRefGoogle Scholar
  4. Cox, D. R. (2006). Principles of statistical inference. Cambridge, England: Cambridge University Press.CrossRefGoogle Scholar
  5. Cox, D. R., & Mayo, D. G. (2010). Objectivity and conditionality in frequentist inference. In D. G. Mayo & A. Spanos (Eds.), Error and inference: recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science (pp. 276–304). New York, NY: Cambridge University Press.Google Scholar
  6. Cumming, G. (2014). The new statistics: why and how. Psychological Science, 25, 7–29.CrossRefGoogle Scholar
  7. Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on? Perspectives on Psychological Science, 6, 274–290.CrossRefGoogle Scholar
  8. Eich, E. (2014). Business not as usual. Psychological Science, 25, 3–6.CrossRefGoogle Scholar
  9. Fisher, R. A. (1925). Statistical methods for research workers. Edinburgh, Scotland: Oliver & Boyd.Google Scholar
  10. Gelman, A., & Shalizi, C. R. (2013). Philosophy and the practice of Bayesian statistics. British Journal of Mathematical and Statistical Psychology, 66, 8–38.CrossRefGoogle Scholar
  11. Gigerenzer, G. (1993). The superego, the ego, and the id in statistical reasoning. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences (pp. 311–339). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  12. Grice, J. W. (2011). Observation oriented modeling: analysis of cause in the behavioral sciences. San Diego, CA: Academic Press.Google Scholar
  13. Haig, B. D. (2014). Investigating the psychological world: scientific method in the behavioral sciences. Cambridge, MA: MIT Press.CrossRefGoogle Scholar
  14. Halpin, P. F., & Stam, H. J. (2006). Inductive inference or inductive behavior: Fisher and Neyman-Pearson approaches to statistical testing in psychological research (1940–1960). American Journal of Psychology, 119, 625–653.CrossRefGoogle Scholar
  15. Harlow, L. L., Mulaik, S. A., & Steiger, J. H. (Eds.). (1997). What if there were no significance tests?. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  16. Harris, R. J. (1997). Reforming significance testing via three-valued logic. In L. L. Harlow, S. A. Mulaik, & J. H. Steiger (Eds.), What if there were no significance tests? (pp. 145–174). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
  17. Hoover, K. D., & Siegler, M. V. (2008). Sound and fury: McCloskey and significance testing in economics. Journal of Economic Methodology, 15, 1–37.CrossRefGoogle Scholar
  18. Hubbard, R. (2004). Alphabet soup: Blurring the distinction between p’s and a’s in psychological research. Theory & Psychology, 14, 295–327.CrossRefGoogle Scholar
  19. Hubbard, R. (2016). Corrupt research: The case for reconceptualising empirical management and social science. Thousand Oaks, CA: Sage.Google Scholar
  20. Hurlbert, S. H., & Lombardi, C. M. (2009). Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian. Annales Zoologici Fennici, 46, 311–349.CrossRefGoogle Scholar
  21. Kaiser, H. F. (1960). Directional statistical decisions. Psychological Review, 67, 160–167.CrossRefGoogle Scholar
  22. Kruscke, J. (2015). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan (2nd ed.). Amsterdam, the Netherlands: Elsevier.Google Scholar
  23. Lehmann, E. L. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? Journal of the American Statistical Association, 88, 1242–1249.CrossRefGoogle Scholar
  24. Lindley, D. V. (2000). The philosophy of statistics. The Statistician, 49, 293–319.Google Scholar
  25. Mayo, D. G. (1996). Error and the growth of experimental knowledge. Chicago, IL: University of Chicago Press.CrossRefGoogle Scholar
  26. Mayo, D. G. (2011). Statistical science and philosophy of science: Where do/should they meet in 2011 (and beyond)? Rationality, Markets and Morals, 2, 79–102.Google Scholar
  27. Mayo, D. G. (2012). Statistical science meets philosophy of science, part 2: Shallow versus deep explorations. Rationality, Markets and Morals, 3, 71–107.Google Scholar
  28. Mayo, D. G., & Cox, D. (2010). Frequentist statistics as a theory of inductive inference. In D. G. Mayo & A. Spanos (Eds.), Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science (pp. 247–304). New York, NY: Cambridge University Press.Google Scholar
  29. Mayo, D. G., & Spanos, A. (2006). Severe testing as a basic concept in a Neyman-Pearson philosophy of induction. British Journal for the Philosophy of Science, 57, 323–357.CrossRefGoogle Scholar
  30. Mayo, D. G., & Spanos, A. (Eds.). (2010). Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science. New York, NY: Cambridge University Press.Google Scholar
  31. Mayo, D. G., & Spanos, A. (2011). Error statistics. In P. S. Bandyopadhyay & M. R. Forster (Eds.), Handbook of philosophy of Science: Vol. 7. Philosophy of statistics (pp. 153–198). Amsterdam, the Netherlands: Elsevier.CrossRefGoogle Scholar
  32. McCloskey, D. N., & Ziliak, S. T. (1996). The standard error of regressions. Journal of Economic Literature, 34, 97–114.Google Scholar
  33. Morrison, D. E., & Henkel, R. E. (Eds.). (1970). The significance test controversy: A reader. Chicago, IL: Aldine.Google Scholar
  34. Neyman, J., & Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philosophical Transactions of the Royal Society of London A, 231, 289–337.CrossRefGoogle Scholar
  35. Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301.CrossRefGoogle Scholar
  36. Pace, L., & Salvan, A. (1997). Advanced series on statistical science and applied probability: Vol. 4. Principles of statistical inference from a neo-Fisherian perspective. Singapore: World Scientific.Google Scholar
  37. Peirce, C. S. (1931–1958). The collected papers of Charles Sanders Peirce (Vols. 1–8; C. Hartshorne & P. Weiss [Eds., Vols. 1–6], & A. W. Burks [Ed., Vols. 7-8]). Cambridge, MA: Harvard University Press.Google Scholar
  38. Popper, K. R. (1959). The logic of scientific discovery. London, England: Hutchinson.Google Scholar
  39. Senn, S. (2001). Two cheers for P-values? Journal of Epidemiology and Biostatistics, 6, 193–204.CrossRefGoogle Scholar
  40. Spanos, A. (1999). Probability theory and statistical inference: Economic modeling with observational data. Cambridge, England: Cambridge University Press.CrossRefGoogle Scholar
  41. Spanos, A. (2010). On a new philosophy of frequentist inference: Exchanges with David Cox and Deborah G. Mayo. In D. G. Mayo & A. Spanos (Eds.), Error and inference: Recent exchanges on experimental reasoning, reliability, and the objectivity and rationality of science (pp. 315–330). New York, NY: Cambridge University Press.Google Scholar
  42. Spanos, A. (2014). Recurring controversies about P values and confidence intervals revisited. Ecology, 95, 645–651.CrossRefGoogle Scholar
  43. Suppes, P. (1962). Models of data. In E. Nagel, P. Suppes, & A. Tarski (Eds.), Logic, methodology, and philosophy of science: Proceedings of the 1960 International Congress (pp. 252–261). Stanford, CA: Stanford University Press.Google Scholar
  44. Trafimow, D., & Marks, M. (2015). Editorial. Basic and Applied Social Psychology, 37, 1–2.CrossRefGoogle Scholar
  45. Van Dyk, D. A. (2014). The role of statistics in the discovery of a Higgs Boson. Annual Review of Statistics and Its Applications, 1, 41–59.CrossRefGoogle Scholar
  46. Wagenmakers, E.-J. (2007). A practical solution to the pervasive problems of p values. Psychonomic Bulletin & Review, 14, 779–804.CrossRefGoogle Scholar
  47. Woodward, J. (1989). Data and phenomena. Synthese, 79, 393–472.CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of PsychologyUniversity of CanterburyChristchurchNew Zealand

Personalised recommendations