Abstract
We review the recent debate on the lack of reliability of scientific results and its connections to the statistical methodologies at the core of the discovery paradigm. Null hypotheses statistical testing, in particular, has often been related to, if not blamed for, the present situation. We argue that a loose relation exists: although NHST, if properly used, could not be seen as a cause, some common misuses may mask or even favour bad practices leading to the lack of reliability. We discuss various proposals which have been put forward to deal with these issues.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baker, M.: Is there a reproducibility crisis? Nature 533, 452–454 (2016)
Beall, A.T., Tracy, J.L.: Women are more likely to wear red or pink at peak fertility. Psychol. Sci. 24, 1837–1841 (2013)
Berger, J.O.: Could Fisher, Jeffreys and Neyman have agreed on testing? Stat. Sci. 18(1), 1–12 (2003)
Boland, M.R., Shahn, Z., Madigan, D., Hripcsak, G., Tatonetti, N.P.: Birth month affects lifetime disease risk: a phenome-wide method. J. Am. Med. Inform. Assoc. ocv046 (2015)
Brodeur, A., Lé, M., Sangnier, M., Zylberberg, Y.: Star wars: the empirics strike back. Am. Econ. J. Appl. Econ. 8(1), 1–32 (2016)
Burnham, K., Anderson, D.: P values are only an index to evidence: 20th-vs. 21st-century statistical science. Ecology 95(3), 627–630 (2014)
Cohen, J.: The earth is round (\(p\,<\,0.05\)). Am. Psychol. 49, 997–1003 (1994)
Cowan, G., Cranmer, K., Gross, E., Vitells, O.: Asymptotic formulae for likelihood-based tests of new physics. Eur. Phys. J. C 71(2), 1–19 (2011)
Cowen, R.: Big bang finding challenged. Nature 510(7503), 20 (2014)
Cumming, G.: The new statistics why and how. Psychol. Sci. 25, 7–29 (2013)
Fidler, F., Loftus, G.R.: Why figures with error bars should replace p values: some conceptual arguments and empirical demonstrations. J. Psychol. 217(1), 27–37 (2009)
Fisher, R.A., et al.: Statistical methods for research workers. In: Statistical Methods for Research Workers, 10th. edn. (1946)
Gelman, A.: Commentary: P values and statistical practice. Epidemiology 24(1), 69–72 (2013)
Gelman, A., Loken, E.: The statistical crisis in science. Am. Sci. 102, 460–465 (2014)
Gigerenzer, G.: Mindless statistics. J. Socio-Econ. 33(5), 587–606 (2004)
Goodman, S.N.: Toward evidence-based medical statistics. 1: the p value fallacy. Ann. Intern. Med. 130(12), 995–1004 (1999)
Goodman, S.N.: Toward evidence-based medical statistics. 2: the bayes factor. Ann. Intern. Med. 130(12), 1005–1013 (1999)
Goodman, S.N.: Aligning statistical and scientific reasoning. Science 352, 1180–1181 (2016)
Greenland, S., Poole, C.: Living with p values: resurrecting a bayesian perspective on frequentist statistics. Epidemiology 24(1), 62–68 (2013)
Hart, et al.: Dogs are sensitive to small variations of the Earth’s magnetic field. Front. Zool. 10, 80 (2013)
Hauer, E.: The harm done by tests of significance. Accident Analysis & Prevention 36(3), 495–500 (2004)
Head, M.L., Holman, L., Lanfear, R., Kahn, A.T., Jennions, M.D.: The extent and consequences of p-hacking in science. PLoS Biol. 13(3), e1002,106 (2015)
Hoover, K.D., Siegler, M.V.: Sound and fury: Mccloskey and significance testing in economics. J. Econ. Method. 15(1), 1–37 (2008)
Ioannidis, J.P.: Contradicted and initially stronger effects in highly cited clinical research. Jama 294(2), 218–228 (2005)
Ioannidis, J.P.: Why most published research findings are false. PLoS Med. 2(8), e124 (2005)
Kaplan, R.M., Irvin, V.L.: Likelihood of null effects of large nhlbi clinical trials has increased over time. PloS one 10(8), e0132,382 (2015)
Klein, J.R., Roodman, A.: Blind analysis in nuclear and particle physics. Ann. Rev. Nucl. Part. Sci. 55(1), 141–163 (2005)
Krantz, D.H.: The null hypothesis testing controversy in psychology. J. Am. Stat. Assoc. 94(448), 1372–1381 (1999)
Leek, J.T., Peng, R.D.: Statistics: P-values are just the tip of the iceberg. Nature 520(7549) (2015)
Lovell, D.: Biological importance and statistical significance. J. Agric. Food Chem. 61(35), 8340–8348 (2013)
MacCoun, R., Perlmutter, S.: Blind analysis: hide results to seek the truth. Nature 526(7572), 187–189 (2015)
Masicampo, E.J., Lalande, D.R.: A peculiar prevalence of p-values just below.05. Q. J. Exp. Psychol. 65(11), 2271–2279 (2012)
Mayo, D.G., Spanos, A.: Severe testing as a basic concept in a neymanpearson philosophy of induction. Br. J. Philos. Sci. 57(2), 323–357 (2006)
McCloskey, D.: The insignificance of statistical significance. Sci. Am. 272, 32–33 (1995)
McCloskey, D.N., Ziliak, S.T.: The standard error of regressions. J. Econ. Lit. 34(1), 97–114 (1996)
Meehl, P.: The problem is epistemology, not statistics: replace significance tests by confidence intervals and quantify accuracy of risky numerical predictions. In: What if there were no significance tests, pp. 393–425. Psychology press (2013)
Neyman, J., Pearson, E.S.: On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Lon. Ser. A 231, 289–337 (1933)
Nicholls, N.: Commentary and analysis: the insignificance of significance testing. Bull. Am. Meteorol. Soc. 82(5), 981–986 (2001)
Nuzzo, R.: Scientific method: statistical errors. Nature 506(7487), 150–152 (2014)
Reich, E.S.: Timing glitches dog neutrino claim. Nature 483(7387), 17 (2012)
Rogoff, K., Reinhart, C.: Growth in a time of debt. Am. Econ. Rev. 100, 573–578 (2010)
Rothman, K.J.: Writing for epidemiology. Epidemiology 9(3), 333–337 (1998)
Royall, R.: Statistical Evidence: A Likelihood Paradigm (Chapman & Hall/CRC Monographs on Statistics & Applied Probability). Chapman and Hall/CRC (1997)
Schmidt, F., Hunter, J.: Eight common but false objections to the discontinuation of significance testing in the analysis of research data. In: S.A.S.J. Harlow L.L. (ed.) What if There were no Significance Tests?, pp. 37–64. Psychology Press (1997)
Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-Positive psychology-undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22(11), 1359–1366 (2011)
Simonsohn, U., Nelson, L.D., Simmons, J.P.: P-curve: a key to the file-drawer. J. Exp. Psychol. Gen. 143(2), 534–547 (2014)
Sterne, J.A.C., Smith, G.D., Cox, D.R.: Sifting the evidence-what’s wrong with significance tests? Phys. Ther. 81(8), 1464–1469 (2001)
Trafimow, D.: Editorial. Basic Appl. Soc. Psychol. 36(1), 1–2 (2014)
Trafimow, D., Marks, M.: Editorial. Basic Appl. Soc. Psychol. 37(1), 1–2 (2015)
Wagenmakers, E.J.J.: A practical solution to the pervasive problems of p values. Psychon. Bull. Rev. 14(5), 779–804 (2007)
Wasserstein, R.L., Lazar, N.A.: The ASA’s statement on p-values: context, process, and purpose. Am. Stat. 70(2), 129–133 (2016)
Ziliak, S., McCloskey, D.: Size matters: the standard error of regressions in the american economic review. J. Socio-Econ. 33(5), 527–546 (2004)
Acknowledgements
This work was supported by Univesity of Trieste within the FRA project “Politiche strutturali e riforme. Analisi degli indicatori e valutazione degli effetti”.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Pauli, F. (2018). The p-value Case, a Review of the Debate: Issues and Plausible Remedies. In: Perna, C., Pratesi, M., Ruiz-Gazen, A. (eds) Studies in Theoretical and Applied Statistics. SIS 2016. Springer Proceedings in Mathematics & Statistics, vol 227. Springer, Cham. https://doi.org/10.1007/978-3-319-73906-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-73906-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73905-2
Online ISBN: 978-3-319-73906-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)