Skip to main content

Null Hypothesis Testing

  • Chapter
  • First Online:
  • 529 Accesses

Abstract

Null hypothesis testing applies to confirmatory research , where substantive hypotheses are tested. The preferred approach is to construct a confidence interval (CI) because a CI simultaneously assesses the precision of a parameter estimate and tests a null hypothesis on the parameter. Two- and one-sided CIs and two- and one-tailed tests are considered. The CI approach is demonstrated for conventional tests of the null hypothesis of equal means of paired (Student’s t test) and independent (Student’s t and Welch tests) variables. Bootstrap methods make weaker assumptions than the conventional tests. The bootstrap t method for the means of paired and independent variables and the modified percentile bootstrap method for the product moment correlation are described. Null hypothesis testing is often incorrectly understood and applied. Several methods to correct these flaws are discussed. First, the overlap of the CIs of two means does not imply that the difference of the two means is not significant. Second, a two-step procedure, where the choice of a test is based on results of tests of the assumptions of the test, inflates the Type I error. Third, standardized effect sizes can be computed in different ways, which hampers the comparability of effect sizes in meta-analysis. Fourth, an observed power analysis , where the effect size is estimated from sample data, cannot explain nonsignificant results. Fifth, testing multiple null hypotheses increases the probability of rejecting at least one true null hypothesis, which is prevented by applying multiple null hypothesis testing methods (e.g., Hochberg’s method ). Sixth, data exploration may yield interesting substantive hypotheses, but these have to be confirmed with new data of a cross-validation or replication study . Seventh, adding participants to the sample till the null hypothesis is rejected inflates the Type I error, which is prevented by using sequential testing methods (e.g., the group sequential testing procedure ). Finally, if researchers do not want to reject a null hypothesis, they have to apply equivalence testing .

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • American Educational Research Association. (2006). Standards for reporting on empirical social science research in AERA publications. Educational Researcher, 35, 33–40.

    Google Scholar 

  • APA. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association.

    Google Scholar 

  • Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66, 423–437.

    Article  Google Scholar 

  • Belia, S., Fidler, F., Williams, J., & Cumming, G. (2005). Researchers misunderstand confidence intervals and standard error bars. Psychological Methods, 10, 389–396.

    Article  Google Scholar 

  • Brewer, J. K. (1972). On the power of statistical tests in the American Educational Research Journal. American Educational Research Journal, 9, 391–401.

    Google Scholar 

  • Brewer, J. K., & Owen, P. W. (1973). A note on the power of statistical tests in the Journal of Educational Measurement. Journal of Educational Measurement, 10, 71–74.

    Article  Google Scholar 

  • Chase, L. J., & Chase, R. B. (1976). A statistical power analysis of applied psychological research. Journal of Applied Psychology, 61, 234–237.

    Article  Google Scholar 

  • Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145–153.

    Article  Google Scholar 

  • Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York, NY: Academic Press.

    Google Scholar 

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NY: Erlbaum.

    Google Scholar 

  • Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. London, England: Routledge.

    Google Scholar 

  • de Groot, A. D. (1956/2014). De betekenis van ‘significant’ bij verschillende typen onderzoek. Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden, 11, 398-409/ The meaning of ‘significance’ for different types of research. Acta Psychologica (E.-J. Wagenmakers et al., Trans. and annotated), 148, 188–194.

    Google Scholar 

  • Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York, NY: Chapman and Hall.

    Book  Google Scholar 

  • Elstrodt, M., & Mellenbergh, G. J. (1978). Eén minus de vergeten fout [One minus the forgotten Type II error]. Nederlands Tijdschrift voor de Psychologie, 33, 33–47.

    Google Scholar 

  • Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.

    Article  Google Scholar 

  • Glass, G. V., Peckham, P. D., & Sanders, J. R. (1972). Consequences of failure to meet assumptions underlying the analysis of variance and analysis of covariance. Review of Educational Research, 42, 237–288.

    Article  Google Scholar 

  • Goldstein, H., & Healy, J. R. (1995). The graphical presentation of a collection of means. Journal of the Royal Statistical Society, 158, 175–177 (Part I).

    Google Scholar 

  • Hayes, A. F., & Cai, L. (2007). Further evaluating the conditional decision rule for comparing two independent means. British Journal of Mathematical and Statistical Psychology, 60, 217–244.

    Article  Google Scholar 

  • Hedges, L. V., & Pigott, T. D. (2001). The power of statistical tests in meta-analysis. Psychological Methods, 6, 203–217.

    Article  Google Scholar 

  • Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802.

    Article  Google Scholar 

  • Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.

    Article  Google Scholar 

  • Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculation for data analysis. American Statistician, 55, 19–24.

    Article  Google Scholar 

  • Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17, 137–152.

    Article  Google Scholar 

  • Keppel, G., & Wickens, T. D. (2004). Design and analysis: A researcher’s handbook (4th ed.). Upper Saddle River, NJ: Pearson.

    Google Scholar 

  • Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K. N. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Methods, 13, 110–129.

    Article  Google Scholar 

  • Lakens, D. (2017). Equivalence tests. A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8, 355–362.

    Article  Google Scholar 

  • Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical guide. Cambridge, UK: Cambridge University Press.

    Book  Google Scholar 

  • Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166.

    Article  Google Scholar 

  • Morrison, D. F. (1990). Multivariate statistical methods (3rd ed.). New York, NY: McGraw-Hill.

    Google Scholar 

  • Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301.

    Article  Google Scholar 

  • O’Keefe, K. J. (2007). Post hoc power, observed power, a priori power, retrospective power, prospective power, achieved power. Sorting out appropriate uses of statistical power analysis. Communication Methods and Measures, 1, 291–299.

    Article  Google Scholar 

  • Onwuegbuzie, A. J., & Leech, N. L. (2004). Post hoc power: A concept whose time has come. Understanding Statistics, 3, 201–230.

    Article  Google Scholar 

  • Peng, C.-J. J., & Chen, L.-T. (2014). Beyond Cohen’s d: Alternative effect size measures for between-subject designs. Journal of Experimental Education, 82, 22–50.

    Article  Google Scholar 

  • Piantodosi, S. (2005). Clinical trials: A methodological perspective (2nd ed.). Hoboken, NJ: Wiley.

    Book  Google Scholar 

  • Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113, 553–565.

    Article  Google Scholar 

  • Rom, D. M. (2013). An improved Hochberg procedure for multiple tests of significance. British Journal of Mathematical and Statistical Psychology, 66, 189–196.

    Article  Google Scholar 

  • Ruscio, J., & Roche, B. (2012). Variance heterogeneity in published psychological research: A review and a new index. Methodology, 8, 1–11.

    Article  Google Scholar 

  • Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309–316.

    Article  Google Scholar 

  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 20, 1–8.

    Google Scholar 

  • Sun, S., Pan, W., & Wang, L. L. (2010). Rethinking observed power: Concept, practice, and implications. Methodology, 7, 81–87.

    Article  Google Scholar 

  • van Belle, G. (2002). Statistical rules of thumb. New York, NY: Wiley.

    Google Scholar 

  • Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 627–633.

    Article  Google Scholar 

  • Walker, E., & Nowacki, A. S. (2010). Understanding equivalence and noninferiority testing. Journal General Internal Medicine, 26, 192–196.

    Article  Google Scholar 

  • Westlake, W. J. (1981). Bioequivalence testing-A need to rethink. Biometrics, 37, 591–593.

    Google Scholar 

  • Wilcox, R. R. (1998). How many discoveries has been lost by ignoring modern statistical methods? American Psychologist, 53, 300–314.

    Article  Google Scholar 

  • Wilcox, R. R. (2010). Fundamentals of modern statistical methods. New York, NY: Springer.

    Book  Google Scholar 

  • Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals. American Psychologist, 54, 594–604.

    Google Scholar 

  • Yuan, K.-H., & Maxwell, S. (2005). On the post hoc power in testing mean differences. Journal of Educational and Behavioral Statistics, 30, 141–167.

    Article  Google Scholar 

  • Zimmerman, D. W. (1996). Some properties of preliminary tests of equality of variances in the two-sample location problem. The Journal of General Psychology, 123, 217–231.

    Article  Google Scholar 

  • Zimmerman, D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology, 57, 173–181.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gideon J. Mellenbergh .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mellenbergh, G.J. (2019). Null Hypothesis Testing. In: Counteracting Methodological Errors in Behavioral Research. Springer, Cham. https://doi.org/10.1007/978-3-030-12272-0_12

Download citation

Publish with us

Policies and ethics