Abstract
Null hypothesis testing applies to confirmatory research , where substantive hypotheses are tested. The preferred approach is to construct a confidence interval (CI) because a CI simultaneously assesses the precision of a parameter estimate and tests a null hypothesis on the parameter. Two- and one-sided CIs and two- and one-tailed tests are considered. The CI approach is demonstrated for conventional tests of the null hypothesis of equal means of paired (Student’s t test) and independent (Student’s t and Welch tests) variables. Bootstrap methods make weaker assumptions than the conventional tests. The bootstrap t method for the means of paired and independent variables and the modified percentile bootstrap method for the product moment correlation are described. Null hypothesis testing is often incorrectly understood and applied. Several methods to correct these flaws are discussed. First, the overlap of the CIs of two means does not imply that the difference of the two means is not significant. Second, a two-step procedure, where the choice of a test is based on results of tests of the assumptions of the test, inflates the Type I error. Third, standardized effect sizes can be computed in different ways, which hampers the comparability of effect sizes in meta-analysis. Fourth, an observed power analysis , where the effect size is estimated from sample data, cannot explain nonsignificant results. Fifth, testing multiple null hypotheses increases the probability of rejecting at least one true null hypothesis, which is prevented by applying multiple null hypothesis testing methods (e.g., Hochberg’s method ). Sixth, data exploration may yield interesting substantive hypotheses, but these have to be confirmed with new data of a cross-validation or replication study . Seventh, adding participants to the sample till the null hypothesis is rejected inflates the Type I error, which is prevented by using sequential testing methods (e.g., the group sequential testing procedure ). Finally, if researchers do not want to reject a null hypothesis, they have to apply equivalence testing .
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
American Educational Research Association. (2006). Standards for reporting on empirical social science research in AERA publications. Educational Researcher, 35, 33–40.
APA. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association.
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66, 423–437.
Belia, S., Fidler, F., Williams, J., & Cumming, G. (2005). Researchers misunderstand confidence intervals and standard error bars. Psychological Methods, 10, 389–396.
Brewer, J. K. (1972). On the power of statistical tests in the American Educational Research Journal. American Educational Research Journal, 9, 391–401.
Brewer, J. K., & Owen, P. W. (1973). A note on the power of statistical tests in the Journal of Educational Measurement. Journal of Educational Measurement, 10, 71–74.
Chase, L. J., & Chase, R. B. (1976). A statistical power analysis of applied psychological research. Journal of Applied Psychology, 61, 234–237.
Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145–153.
Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York, NY: Academic Press.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NY: Erlbaum.
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. London, England: Routledge.
de Groot, A. D. (1956/2014). De betekenis van ‘significant’ bij verschillende typen onderzoek. Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden, 11, 398-409/ The meaning of ‘significance’ for different types of research. Acta Psychologica (E.-J. Wagenmakers et al., Trans. and annotated), 148, 188–194.
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York, NY: Chapman and Hall.
Elstrodt, M., & Mellenbergh, G. J. (1978). Eén minus de vergeten fout [One minus the forgotten Type II error]. Nederlands Tijdschrift voor de Psychologie, 33, 33–47.
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.
Glass, G. V., Peckham, P. D., & Sanders, J. R. (1972). Consequences of failure to meet assumptions underlying the analysis of variance and analysis of covariance. Review of Educational Research, 42, 237–288.
Goldstein, H., & Healy, J. R. (1995). The graphical presentation of a collection of means. Journal of the Royal Statistical Society, 158, 175–177 (Part I).
Hayes, A. F., & Cai, L. (2007). Further evaluating the conditional decision rule for comparing two independent means. British Journal of Mathematical and Statistical Psychology, 60, 217–244.
Hedges, L. V., & Pigott, T. D. (2001). The power of statistical tests in meta-analysis. Psychological Methods, 6, 203–217.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802.
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculation for data analysis. American Statistician, 55, 19–24.
Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17, 137–152.
Keppel, G., & Wickens, T. D. (2004). Design and analysis: A researcher’s handbook (4th ed.). Upper Saddle River, NJ: Pearson.
Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K. N. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Methods, 13, 110–129.
Lakens, D. (2017). Equivalence tests. A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8, 355–362.
Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical guide. Cambridge, UK: Cambridge University Press.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166.
Morrison, D. F. (1990). Multivariate statistical methods (3rd ed.). New York, NY: McGraw-Hill.
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301.
O’Keefe, K. J. (2007). Post hoc power, observed power, a priori power, retrospective power, prospective power, achieved power. Sorting out appropriate uses of statistical power analysis. Communication Methods and Measures, 1, 291–299.
Onwuegbuzie, A. J., & Leech, N. L. (2004). Post hoc power: A concept whose time has come. Understanding Statistics, 3, 201–230.
Peng, C.-J. J., & Chen, L.-T. (2014). Beyond Cohen’s d: Alternative effect size measures for between-subject designs. Journal of Experimental Education, 82, 22–50.
Piantodosi, S. (2005). Clinical trials: A methodological perspective (2nd ed.). Hoboken, NJ: Wiley.
Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113, 553–565.
Rom, D. M. (2013). An improved Hochberg procedure for multiple tests of significance. British Journal of Mathematical and Statistical Psychology, 66, 189–196.
Ruscio, J., & Roche, B. (2012). Variance heterogeneity in published psychological research: A review and a new index. Methodology, 8, 1–11.
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309–316.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 20, 1–8.
Sun, S., Pan, W., & Wang, L. L. (2010). Rethinking observed power: Concept, practice, and implications. Methodology, 7, 81–87.
van Belle, G. (2002). Statistical rules of thumb. New York, NY: Wiley.
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 627–633.
Walker, E., & Nowacki, A. S. (2010). Understanding equivalence and noninferiority testing. Journal General Internal Medicine, 26, 192–196.
Westlake, W. J. (1981). Bioequivalence testing-A need to rethink. Biometrics, 37, 591–593.
Wilcox, R. R. (1998). How many discoveries has been lost by ignoring modern statistical methods? American Psychologist, 53, 300–314.
Wilcox, R. R. (2010). Fundamentals of modern statistical methods. New York, NY: Springer.
Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals. American Psychologist, 54, 594–604.
Yuan, K.-H., & Maxwell, S. (2005). On the post hoc power in testing mean differences. Journal of Educational and Behavioral Statistics, 30, 141–167.
Zimmerman, D. W. (1996). Some properties of preliminary tests of equality of variances in the two-sample location problem. The Journal of General Psychology, 123, 217–231.
Zimmerman, D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology, 57, 173–181.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Mellenbergh, G.J. (2019). Null Hypothesis Testing. In: Counteracting Methodological Errors in Behavioral Research. Springer, Cham. https://doi.org/10.1007/978-3-030-12272-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-12272-0_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74352-3
Online ISBN: 978-3-030-12272-0
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)