How to guarantee finding a statistically significant difference: the use and abuse of subgroup analyses

Fayers, Peter M.; King, Madeleine T.

doi:10.1007/s11136-009-9473-3

How to guarantee finding a statistically significant difference: the use and abuse of subgroup analyses

Published: 02 April 2009

Volume 18, pages 527–530, (2009)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Peter M. Fayers^1,2 &
Madeleine T. King³

391 Accesses
14 Citations
Explore all metrics

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

ISIS-2 (Second International Study of Infarct Survival) Collaborative Group. (1988). Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17,187 cases of suspected acute myocardial infarction: ISIS-2. Lancet, ii, 349–360.
Google Scholar
Bland, J. M., & Altman, D. G. (1995). Multiple significance tests: The Bonferroni method. British Medical Journal, 310, 170.
PubMed CAS Google Scholar
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802. doi:10.1093/biomet/75.4.800.
Article Google Scholar
Sankoh, A. J., Huque, M. F., & Dubey, S. D. (1997). Some comments on frequently used multiple endpoint adjustment methods in clinical trials. Statistics in Medicine, 16, 2529–2542. doi:10.1002/(SICI)1097-0258(19971130)16:22<2529::AID-SIM692>3.0.CO;2-J.
Article PubMed CAS Google Scholar
Assmann, S. F., Pocock, S. J., Enos, L. E., & Kasten, L. E. (2000). Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet, 355, 1064–1069. doi:10.1016/S0140-6736(00)02039-0.
Article PubMed CAS Google Scholar
Brookes, S. T., Whitely, E., Egger, M., Smith, G. D., Mulheran, P. A., & Peters, T. J. (2004). Subgroup analyses in randomized trials: Risks of subgroup-specific analyses; power and sample size for the interaction test. Journal of Clinical Epidemiology, 57, 229–236. doi:10.1016/j.jclinepi.2003.08.009.
Article PubMed Google Scholar
Horton, R. (2000). Commentary: From star signs to trial guidelines. Lancet, 355, 1033–1034. doi:10.1016/S0140-6736(00)02031-6.
Article PubMed CAS Google Scholar
Peto, R. (1990). Misleading subgroup analyses in GISSI. The American Journal of Cardiology, 66, 771–772. doi:10.1016/0002-9149(90)91149-Z.
Article PubMed CAS Google Scholar
Yusuf, S., Wittes, J., Probstfield, J., & Tyroler, H. A. (1991). Analysis and interpretation of treatment effects in subgroups of patients in randomised clinical trials. Journal of the American Medical Association, 266, 93–98. doi:10.1001/jama.266.1.93.
Article PubMed CAS Google Scholar
Brookes, S. T., Whitley, E., Peters, T. J., Mulheran, P. A., Egger, M., & Davey Smith, G. (2001). Subgroup analyses in randomised controlled trials: Quantifying the risks of false-positives and false-negatives. Health Technology Assessment, 5(33), 1–56. From http://www.hta.nhs.uk/fullmono/mon533.pdf. Accessed 12 March 2009.
Grouin, J.-M., Coste, M., & Lewis, J. (2005). Subgroup analyses in randomized clinical trials: Statistical and regulatory issues. Journal of Biopharmaceutical Statistics, 15, 869–882. doi:10.1081/BIP-200067988.
Article PubMed Google Scholar
Wang, R., Lagakos, S. W., Ware, J. H., Hunter, D. J., & Drazen, J. M. (2007). Statistics in medicine—reporting of subgroup analyses in clinical trials. The New England Journal of Medicine, 357, 2189–2194. doi:10.1056/NEJMsr077003.
Article PubMed CAS Google Scholar
Martin, V., Cady, R., Mauskop, A., Seidman, L. S., Rodgers, A., Hustard, C. M., et al. (2008). Efficacy of rizatriptan for menstrual migraine in an early intervention model: A prospective subgroup analysis of the rizatriptan TAME (Treat a Migraine Early) studies. Headache, 48, 226–235.
Article PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Applied Health Sciences, University of Aberdeen Medical School, Foresterhill, Aberdeen, AB25 2ZD, UK
Peter M. Fayers
Department of Cancer Research and Molecular Medicine, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim, Norway
Peter M. Fayers
Quality of Life Office, Psycho-Oncology Cooperative Research Group, School of Psychology, University of Sydney, Sydney, NSW, Australia
Madeleine T. King

Authors

Peter M. Fayers
View author publications
You can also search for this author in PubMed Google Scholar
Madeleine T. King
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter M. Fayers.

Appendix

Type 1 error for independent hypothesis tests

If k independent hypothesis tests are carried out, each with a significance level (P value) of α ₀, the overall probability of a type 1 error (false positive) is α = 1 − (1 − α ₀)^k. Thus the risk of a false positive result rapidly increases as k increases. Suppose a factor used for subgroup analyses has two response levels, thus dividing the data into two subgroups. If this factor is unrelated to outcome, each of the two portions of the data is equivalent to an independent random sample. Thus the probability that at least one of these subgroups is falsely significant, P < 0.05, is 1 − (1 − 0.05)² = 1 − 0.95², which equals 0.0975. This is a value that is nearly double the nominal 0.05.

Type 1 error for multiple subgroup analyses—simulations

Suppose m factors are used for subgroup analyses, each dividing the data into two approximately equal halves. Also, assume that these factors are independent of each other. For example, one factor might be gender, and another factor might be age grouping defined as above or below the median age. Although these factors are independent, the subgroups formed by them will include overlapping subjects. For example, roughly half of the female respondents will also be included in the young age group. Therefore, even though the factors are independent, the resultant P values will be correlated. This makes analytical solutions more difficult.

A simple way to estimate the type 1 error is to use computer simulations. We assumed that there was, in truth, no treatment effect in any of the subgroups, that the outcome of interest followed a normal distribution, and that a t-test would be applied. Binary factors were applied, effectively dichotomising the data into two separate halves. Sample sizes of 50, 100, 200, 300, 400 and 500 random normally distributed observations were generated. Each of these simulations was repeated 40,000 times, and the proportion of studies in which at least one subgroup had a P value that exceeded P < 0.05 was counted. As might be anticipated, sample size did not affect these proportions (sample size should only affect the type 2 error).

The results of the simulation are summarised in Fig. 1.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fayers, P.M., King, M.T. How to guarantee finding a statistically significant difference: the use and abuse of subgroup analyses. Qual Life Res 18, 527–530 (2009). https://doi.org/10.1007/s11136-009-9473-3

Download citation

Received: 14 October 2008
Accepted: 16 March 2009
Published: 02 April 2009
Issue Date: June 2009
DOI: https://doi.org/10.1007/s11136-009-9473-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to guarantee finding a statistically significant difference: the use and abuse of subgroup analyses

Access this article

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Type 1 error for independent hypothesis tests

Type 1 error for multiple subgroup analyses—simulations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation