References
ISIS-2 (Second International Study of Infarct Survival) Collaborative Group. (1988). Randomised trial of intravenous streptokinase, oral aspirin, both, or neither among 17,187 cases of suspected acute myocardial infarction: ISIS-2. Lancet, ii, 349–360.
Bland, J. M., & Altman, D. G. (1995). Multiple significance tests: The Bonferroni method. British Medical Journal, 310, 170.
Hochberg, Y. (1988). A sharper Bonferroni procedure for multiple tests of significance. Biometrika, 75, 800–802. doi:10.1093/biomet/75.4.800.
Sankoh, A. J., Huque, M. F., & Dubey, S. D. (1997). Some comments on frequently used multiple endpoint adjustment methods in clinical trials. Statistics in Medicine, 16, 2529–2542. doi:10.1002/(SICI)1097-0258(19971130)16:22<2529::AID-SIM692>3.0.CO;2-J.
Assmann, S. F., Pocock, S. J., Enos, L. E., & Kasten, L. E. (2000). Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet, 355, 1064–1069. doi:10.1016/S0140-6736(00)02039-0.
Brookes, S. T., Whitely, E., Egger, M., Smith, G. D., Mulheran, P. A., & Peters, T. J. (2004). Subgroup analyses in randomized trials: Risks of subgroup-specific analyses; power and sample size for the interaction test. Journal of Clinical Epidemiology, 57, 229–236. doi:10.1016/j.jclinepi.2003.08.009.
Horton, R. (2000). Commentary: From star signs to trial guidelines. Lancet, 355, 1033–1034. doi:10.1016/S0140-6736(00)02031-6.
Peto, R. (1990). Misleading subgroup analyses in GISSI. The American Journal of Cardiology, 66, 771–772. doi:10.1016/0002-9149(90)91149-Z.
Yusuf, S., Wittes, J., Probstfield, J., & Tyroler, H. A. (1991). Analysis and interpretation of treatment effects in subgroups of patients in randomised clinical trials. Journal of the American Medical Association, 266, 93–98. doi:10.1001/jama.266.1.93.
Brookes, S. T., Whitley, E., Peters, T. J., Mulheran, P. A., Egger, M., & Davey Smith, G. (2001). Subgroup analyses in randomised controlled trials: Quantifying the risks of false-positives and false-negatives. Health Technology Assessment, 5(33), 1–56. From http://www.hta.nhs.uk/fullmono/mon533.pdf. Accessed 12 March 2009.
Grouin, J.-M., Coste, M., & Lewis, J. (2005). Subgroup analyses in randomized clinical trials: Statistical and regulatory issues. Journal of Biopharmaceutical Statistics, 15, 869–882. doi:10.1081/BIP-200067988.
Wang, R., Lagakos, S. W., Ware, J. H., Hunter, D. J., & Drazen, J. M. (2007). Statistics in medicine—reporting of subgroup analyses in clinical trials. The New England Journal of Medicine, 357, 2189–2194. doi:10.1056/NEJMsr077003.
Martin, V., Cady, R., Mauskop, A., Seidman, L. S., Rodgers, A., Hustard, C. M., et al. (2008). Efficacy of rizatriptan for menstrual migraine in an early intervention model: A prospective subgroup analysis of the rizatriptan TAME (Treat a Migraine Early) studies. Headache, 48, 226–235.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Type 1 error for independent hypothesis tests
If k independent hypothesis tests are carried out, each with a significance level (P value) of α 0, the overall probability of a type 1 error (false positive) is α = 1 − (1 − α 0)k. Thus the risk of a false positive result rapidly increases as k increases. Suppose a factor used for subgroup analyses has two response levels, thus dividing the data into two subgroups. If this factor is unrelated to outcome, each of the two portions of the data is equivalent to an independent random sample. Thus the probability that at least one of these subgroups is falsely significant, P < 0.05, is 1 − (1 − 0.05)2 = 1 − 0.952, which equals 0.0975. This is a value that is nearly double the nominal 0.05.
Type 1 error for multiple subgroup analyses—simulations
Suppose m factors are used for subgroup analyses, each dividing the data into two approximately equal halves. Also, assume that these factors are independent of each other. For example, one factor might be gender, and another factor might be age grouping defined as above or below the median age. Although these factors are independent, the subgroups formed by them will include overlapping subjects. For example, roughly half of the female respondents will also be included in the young age group. Therefore, even though the factors are independent, the resultant P values will be correlated. This makes analytical solutions more difficult.
A simple way to estimate the type 1 error is to use computer simulations. We assumed that there was, in truth, no treatment effect in any of the subgroups, that the outcome of interest followed a normal distribution, and that a t-test would be applied. Binary factors were applied, effectively dichotomising the data into two separate halves. Sample sizes of 50, 100, 200, 300, 400 and 500 random normally distributed observations were generated. Each of these simulations was repeated 40,000 times, and the proportion of studies in which at least one subgroup had a P value that exceeded P < 0.05 was counted. As might be anticipated, sample size did not affect these proportions (sample size should only affect the type 2 error).
The results of the simulation are summarised in Fig. 1.
Rights and permissions
About this article
Cite this article
Fayers, P.M., King, M.T. How to guarantee finding a statistically significant difference: the use and abuse of subgroup analyses. Qual Life Res 18, 527–530 (2009). https://doi.org/10.1007/s11136-009-9473-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-009-9473-3