Null Hypothesis Testing

Mellenbergh, Gideon J.

doi:10.1007/978-3-030-12272-0_12

Null Hypothesis Testing

Gideon J. Mellenbergh²

Chapter
First Online: 17 May 2019

529 Accesses

Abstract

Null hypothesis testing applies to confirmatory research , where substantive hypotheses are tested. The preferred approach is to construct a confidence interval (CI) because a CI simultaneously assesses the precision of a parameter estimate and tests a null hypothesis on the parameter. Two- and one-sided CIs and two- and one-tailed tests are considered. The CI approach is demonstrated for conventional tests of the null hypothesis of equal means of paired (Student’s t test) and independent (Student’s t and Welch tests) variables. Bootstrap methods make weaker assumptions than the conventional tests. The bootstrap t method for the means of paired and independent variables and the modified percentile bootstrap method for the product moment correlation are described. Null hypothesis testing is often incorrectly understood and applied. Several methods to correct these flaws are discussed. First, the overlap of the CIs of two means does not imply that the difference of the two means is not significant. Second, a two-step procedure, where the choice of a test is based on results of tests of the assumptions of the test, inflates the Type I error. Third, standardized effect sizes can be computed in different ways, which hampers the comparability of effect sizes in meta-analysis. Fourth, an observed power analysis , where the effect size is estimated from sample data, cannot explain nonsignificant results. Fifth, testing multiple null hypotheses increases the probability of rejecting at least one true null hypothesis, which is prevented by applying multiple null hypothesis testing methods (e.g., Hochberg’s method ). Sixth, data exploration may yield interesting substantive hypotheses, but these have to be confirmed with new data of a cross-validation or replication study . Seventh, adding participants to the sample till the null hypothesis is rejected inflates the Type I error, which is prevented by using sequential testing methods (e.g., the group sequential testing procedure ). Finally, if researchers do not want to reject a null hypothesis, they have to apply equivalence testing .

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

American Educational Research Association. (2006). Standards for reporting on empirical social science research in AERA publications. Educational Researcher, 35, 33–40.
Google Scholar
APA. (2010). Publication manual of the American Psychological Association (6th ed.). Washington, DC: American Psychological Association.
Google Scholar
Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin, 66, 423–437.
Article Google Scholar
Belia, S., Fidler, F., Williams, J., & Cumming, G. (2005). Researchers misunderstand confidence intervals and standard error bars. Psychological Methods, 10, 389–396.
Article Google Scholar
Brewer, J. K. (1972). On the power of statistical tests in the American Educational Research Journal. American Educational Research Journal, 9, 391–401.
Google Scholar
Brewer, J. K., & Owen, P. W. (1973). A note on the power of statistical tests in the Journal of Educational Measurement. Journal of Educational Measurement, 10, 71–74.
Article Google Scholar
Chase, L. J., & Chase, R. B. (1976). A statistical power analysis of applied psychological research. Journal of Applied Psychology, 61, 234–237.
Article Google Scholar
Cohen, J. (1962). The statistical power of abnormal-social psychological research: A review. Journal of Abnormal and Social Psychology, 65, 145–153.
Article Google Scholar
Cohen, J. (1969). Statistical power analysis for the behavioral sciences. New York, NY: Academic Press.
Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NY: Erlbaum.
Google Scholar
Cumming, G. (2012). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. London, England: Routledge.
Google Scholar
de Groot, A. D. (1956/2014). De betekenis van ‘significant’ bij verschillende typen onderzoek. Nederlands Tijdschrift voor de Psychologie en haar Grensgebieden, 11, 398-409/ The meaning of ‘significance’ for different types of research. Acta Psychologica (E.-J. Wagenmakers et al., Trans. and annotated), 148, 188–194.
Google Scholar
Efron, B., & Tibshirani, R. J. (1993). An introduction to the bootstrap. New York, NY: Chapman and Hall.
Book Google Scholar
Elstrodt, M., & Mellenbergh, G. J. (1978). Eén minus de vergeten fout [One minus the forgotten Type II error]. Nederlands Tijdschrift voor de Psychologie, 33, 33–47.
Google Scholar
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175–191.
Article Google Scholar
Glass, G. V., Peckham, P. D., & Sanders, J. R. (1972). Consequences of failure to meet assumptions underlying the analysis of variance and analysis of covariance. Review of Educational Research, 42, 237–288.
Article Google Scholar
Goldstein, H., & Healy, J. R. (1995). The graphical presentation of a collection of means. Journal of the Royal Statistical Society, 158, 175–177 (Part I).
Google Scholar
Hayes, A. F., & Cai, L. (2007). Further evaluating the conditional decision rule for comparing two independent means. British Journal of Mathematical and Statistical Psychology, 60, 217–244.
Article Google Scholar
Hedges, L. V., & Pigott, T. D. (2001). The power of statistical tests in meta-analysis. Psychological Methods, 6, 203–217.
Article Google Scholar
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802.
Article Google Scholar
Hoekstra, R., Morey, R. D., Rouder, J. N., & Wagenmakers, E.-J. (2014). Robust misinterpretation of confidence intervals. Psychonomic Bulletin & Review, 21, 1157–1164.
Article Google Scholar
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculation for data analysis. American Statistician, 55, 19–24.
Article Google Scholar
Kelley, K., & Preacher, K. J. (2012). On effect size. Psychological Methods, 17, 137–152.
Article Google Scholar
Keppel, G., & Wickens, T. D. (2004). Design and analysis: A researcher’s handbook (4th ed.). Upper Saddle River, NJ: Pearson.
Google Scholar
Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K. N. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Methods, 13, 110–129.
Article Google Scholar
Lakens, D. (2017). Equivalence tests. A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8, 355–362.
Article Google Scholar
Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical guide. Cambridge, UK: Cambridge University Press.
Book Google Scholar
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 105, 156–166.
Article Google Scholar
Morrison, D. F. (1990). Multivariate statistical methods (3rd ed.). New York, NY: McGraw-Hill.
Google Scholar
Nickerson, R. S. (2000). Null hypothesis significance testing: A review of an old and continuing controversy. Psychological Methods, 5, 241–301.
Article Google Scholar
O’Keefe, K. J. (2007). Post hoc power, observed power, a priori power, retrospective power, prospective power, achieved power. Sorting out appropriate uses of statistical power analysis. Communication Methods and Measures, 1, 291–299.
Article Google Scholar
Onwuegbuzie, A. J., & Leech, N. L. (2004). Post hoc power: A concept whose time has come. Understanding Statistics, 3, 201–230.
Article Google Scholar
Peng, C.-J. J., & Chen, L.-T. (2014). Beyond Cohen’s d: Alternative effect size measures for between-subject designs. Journal of Experimental Education, 82, 22–50.
Article Google Scholar
Piantodosi, S. (2005). Clinical trials: A methodological perspective (2nd ed.). Hoboken, NJ: Wiley.
Book Google Scholar
Rogers, J. L., Howard, K. I., & Vessey, J. T. (1993). Using significance tests to evaluate equivalence between two experimental groups. Psychological Bulletin, 113, 553–565.
Article Google Scholar
Rom, D. M. (2013). An improved Hochberg procedure for multiple tests of significance. British Journal of Mathematical and Statistical Psychology, 66, 189–196.
Article Google Scholar
Ruscio, J., & Roche, B. (2012). Variance heterogeneity in published psychological research: A review and a new index. Methodology, 8, 1–11.
Article Google Scholar
Sedlmeier, P., & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309–316.
Article Google Scholar
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 20, 1–8.
Google Scholar
Sun, S., Pan, W., & Wang, L. L. (2010). Rethinking observed power: Concept, practice, and implications. Methodology, 7, 81–87.
Article Google Scholar
van Belle, G. (2002). Statistical rules of thumb. New York, NY: Wiley.
Google Scholar
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., van der Maas, H. L. J., & Kievit, R. A. (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7, 627–633.
Article Google Scholar
Walker, E., & Nowacki, A. S. (2010). Understanding equivalence and noninferiority testing. Journal General Internal Medicine, 26, 192–196.
Article Google Scholar
Westlake, W. J. (1981). Bioequivalence testing-A need to rethink. Biometrics, 37, 591–593.
Google Scholar
Wilcox, R. R. (1998). How many discoveries has been lost by ignoring modern statistical methods? American Psychologist, 53, 300–314.
Article Google Scholar
Wilcox, R. R. (2010). Fundamentals of modern statistical methods. New York, NY: Springer.
Book Google Scholar
Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals. American Psychologist, 54, 594–604.
Google Scholar
Yuan, K.-H., & Maxwell, S. (2005). On the post hoc power in testing mean differences. Journal of Educational and Behavioral Statistics, 30, 141–167.
Article Google Scholar
Zimmerman, D. W. (1996). Some properties of preliminary tests of equality of variances in the two-sample location problem. The Journal of General Psychology, 123, 217–231.
Article Google Scholar
Zimmerman, D. W. (2004). A note on preliminary tests of equality of variances. British Journal of Mathematical and Statistical Psychology, 57, 173–181.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Emeritus Professor Psychological Methods, Department of Psychology, University of Amsterdam, Amsterdam, The Netherlands
Gideon J. Mellenbergh

Authors

Gideon J. Mellenbergh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gideon J. Mellenbergh .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mellenbergh, G.J. (2019). Null Hypothesis Testing. In: Counteracting Methodological Errors in Behavioral Research. Springer, Cham. https://doi.org/10.1007/978-3-030-12272-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-12272-0_12
Published: 17 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-74352-3
Online ISBN: 978-3-030-12272-0
eBook Packages: Behavioral Science and PsychologyBehavioral Science and Psychology (R0)

Publish with us

Policies and ethics