Appropriate statistical methods for analysing partially nested randomised controlled trials with continuous outcomes: a simulation study
Abstract
Background
In individually randomised trials we might expect interventions delivered in groups or by care providers to result in clustering of outcomes for participants treated in the same group or by the same care provider. In partially nested randomised controlled trials (pnRCTs) this clustering only occurs in one trial arm, commonly the intervention arm. It is important to measure and account for betweencluster variability in trial design and analysis. We compare analysis approaches for pnRCTs with continuous outcomes, investigating the impact on statistical inference of cluster sizes, coding of the nonclustered arm, intracluster correlation coefficient (ICCs), and differential variance between intervention and control arm, and provide recommendations for analysis.
Methods
We performed a simulation study assessing the performance of six analysis approaches for a twoarm pnRCT with a continuous outcome. These include: linear regression model; fully clustered mixedeffects model with singleton clusters in control arm; fully clustered mixedeffects model with one large cluster in control arm; fully clustered mixedeffects model with pseudo clusters in control arm; partially nested homoscedastic mixed effects model, and partially nested heteroscedastic mixed effects model. We varied the cluster size, number of clusters, ICC, and individual variance between the two trial arms.
Results
All models provided unbiased intervention effect estimates. In the partially nested mixedeffects models, methods for classifying the nonclustered control arm had negligible impact. Failure to account for even small ICCs resulted in inflated Type I error rates and overcoverage of confidence intervals. Fully clustered mixed effects models provided poor control of the Type I error rates and biased ICC estimates. The heteroscedastic partially nested mixedeffects model maintained relatively good control of Type I error rates, unbiased ICC estimation, and did not noticeably reduce power even with homoscedastic individual variances across arms.
Conclusions
In general, we recommend the use of a heteroscedastic partially nested mixedeffects model, which models the clustering in only one arm, for continuous outcomes similar to those generated under the scenarios of our simulations study. However, with few clusters (3–6), small cluster sizes (5–10), and small ICC (≤0.05) this model underestimates Type I error rates and there is no optimal model.
Keywords
Clustering Randomised controlled trial Partially nested Partially clustered Therapist effects Individually randomised group treatment Individually randomised cluster trial Intervention studiesAbbreviations
 cRCT
Cluster randomised controlled trials
 ICC
Intracluster correlation coefficient
 iRCT
Individually randomised controlled trials
 pnRCT
Partially nested randomised controlled trials
Background
Randomised controlled trials (RCTs) are often categorised into two types: individually randomised controlled trials (iRCTs) where participants are individually randomised to treatment arms to receive one of the investigative treatments; and cluster randomised controlled trials (cRCTs) where groups of participants (clusters) are randomised to treatment arms. We may expect outcomes for participants within the same cluster to be more similar than those from different clusters. The similarity can arise due to participants in the same cluster receiving care from the same health provider or interacting with one another [1]. The implications of clustering in cRCTs are widely acknowledged [1, 2]. Clustering results in a reduction in statistical efficiency in cRCTs and if ignored standard errors and pvalues for intervention effects are typically underestimated.
Clustering can also occur in iRCTs. For instance, clustering of participants outcomes due to receiving treatment as part of a groupbased parenting intervention [3], treatment in specialist clinics for the treatment of venous leg ulcers [4], or participants under the care of a surgeon for comparison for hemostasis in elective benign thyroid surgery [5]. The care provider or group dynamics may play a role in the causal pathway of the intervention effect. We might expect correlated outcomes between individuals either in the same group or receiving treatment from the same care provider.
Standard sample size and analysis methods for iRCTs rely on the assumption of independence between participants, which is violated when clustering is present. The ‘clustering effect’ is commonly quantified using the intracluster correlation coefficient (ICC). The ICC measures the extent to which outcomes from participants within the same cluster are correlated to one another [1]. When designing and analysing iRCTs with clustering we need to consider implications of the potential lack of independence. Ignoring clustering in the analysis can lead to over precise results and consequently incorrect conclusions [1]. Clustering of any form results in a reduction in the effective sample size, hence, there is a reduction in the power to detect an intervention effect if it truly exists.
In addition to obtaining sufficient power and accurate results, accounting for clustering enables us to estimate the ICC. ICCs are often important for the interpretation of trial results; we may be directly interested in the intervention group or care provider effects. ICCs are also key when calculating sample sizes for RCTs with clustering, in order to maintain power [1].
An increasingly applied design in healthcare and education research is a partially nested randomised controlled trial (pnRCT) [6], where participants are individually randomised to trial arms and clustering of outcomes occurs in only one arm of the trial [7] (also termed partially clustered trials). The STEPWISE trial is an example of a pnRCT, assessing a structured lifestyle education programme aimed at supporting weight loss for adults with schizophrenia and first episode psychosis in a community mental health setting. Individuals were randomised to either an intervention arm of groupbased lifestyle education sessions or a control arm receiving usual care at the individual level [8].
Specific statistical methods need to be used for analysing pnRCTs. Consequently, there has been a considerable growth in the methodology literature (particularly in the fields of psychotherapy and educational research) in the past few decades both proposing and reviewing statistical methods for pnRCTs.
Summary of relevant literature on analysis of pnRCTs
Paper  Relevant themes  Range of values^{a}  Findings 

Schweig & Pane [16]  Describe and compare models for pnRCTs with noncompliance using a simulation study.  Simulation for two levels of clustering, exact cluster sizes (m) unclear in paper, c_{school} = 37, c_{class} = 177, λ_{B} = 2, 8, ρ_{school} = 0.005, 0.05, 0.15, ρ_{class} = 0.0004, 0.10, 0.25, and θ = 0.087.  Clustering and noncompliance may have a substantial impact on statistical inference about intentiontotreat effects. Provide methods that may accommodate pnRCT with noncompliance, recommend using complier average causal effect estimate (CACE) and scaling by the proportion of compliers. No mention of degrees of freedom, we have assumed they used default degrees of freedom method available in R lme packages. 
Flight et al. [9]  Compare models applied to four examples of pnRCTs. Compare three different methods for classifying the nonclustered control arm in pnRCTs, including: singleton clusters, one large cluster and pseudo clusters.  Examples with {m, c} = {36, 8; 24, 7; 14, 8; 5, 6}, and estimated ρ = < 0.0001, 0.02, 0.007.  Recommend use of the heteroscedastic model, recommendations based only on reanalysis of case studies. Methods for classifying the nonclustered control arm in pnRCTs had a large impact in fully clustered mixed effects models and no measurable impact in partially nested mixedeffects models. ICCs in four examples were small. 
Sterba [27]  Review of modelling developments for pnRCTs, focused on those particularly relevant to psychotherapy trials.  Recommend the inclusion of cluster variability in analysis model as it provides insight into treatment process (rather than treating it as a nuisance). Annotated Mplus commands for models  
Lohr, Schochet & Sanders [19]  Report presenting a guide to design and analysis issues for pnRCTs in education research, using example trials. Discussion of degrees of freedom issue in Appendix.  Guidance document, defines pnRCT in context of education research and show methods to analyse these using SAS. Provide SAS commands for model fitting in examples.  
Korendijk [18]  Compare models for pnRCTs using simulation study, investigate misspecification for the estimation of the parameters and their standard errors.  Simulation study with m = 5, c = 10, 30, 50, 100, ρ = 0.05, 0.1, 0.2, λ_{A} = 1, d = 0.3.  All models perform comparably with respect to fixed effect estimates. Recommend use of partially nested mixedeffects model. Simulations were under null and ICC always greater than zero. No mention of degrees of freedom, we have we assumed default degrees of freedom used from MLwiN software, and homoscedasticity was assumed for ndividual variances between the two arms. 
Sanders [17]  Compare models for pnRCTs using simulation study in terms of Type I error and power  Simulation study with {m, c} = {2, 10; 4, 4; 5, 4; 10, 2}, ρ = 0, 0.1, 0.2, 0.3, 0.4, 0.5, λ_{A} = 1, and ω^{2} = 0, 0.01, 0.059, 0.138.  Type I error rate increased as ICC increased, Satterthwaite degrees of freedom had better control than KenwardRoger degrees of freedom. Found using mixedeffects model for pnRCT when ICC is zero likely leads to never detecting intervention effects, observed Type I error rates nearly nonexistent under all scenarios with ICC equal to zero. Recommend to evaluate if ICC is significantly different from zero prior to selecting analysis method. Homoscedasticity was assumed for individual variances between the two arms. 
Baldwin et al. [15]  Compare analysis models for pnRCT simulation study, comparing three degrees of freedom calculations, and a pnRCT example.  Simulation for m = 5, 15, 30, c = 2, 4, 8, 16, ρ = 0, 0.05, 0.1, 0.15, 0.3, λ_{B} = 0.25, 1, 4, and d = 0, 0.5.  Recommend pnRCTs take account of heteroscedasticity. Satterthwaite and KenwardRoger degrees of freedom control Type I error rate. The heteroscedastic model provides an unbiased estimate and little reduction in power compared to the homoscedastic model. Argue that using a partially nested mixedeffects model only a problem for statistical inference when the number of clusters is small. The number of clusters has greater impact on power in pnRCTs. At least eight, preferably 16 clusters, to maintain Type I error rate. 
Bauer et al. [6]  Review of RCTs to ascertain the prevalence of pnRCTS in four public health and clinical research journals. Analysis models for pnRCTs extended to include pretest measures as covariates, individual and group level covariates, and example of pnRCT  Example with clustering in one arm c = 41, m = 9, and estimated ρ = 0.02.  Out of 94 RCTs, 32% were pnRCTs, 40% iRCTs and 27% cRCT. None used methods specific to pnRCTs. Example pnRCT data could be analysed using mixedeffects models. Argue pnRCTs “often increase external validity at the expense of internal validity” (p.20). 
Roberts & Roberts [7]  Examine the case of pnRCTs, heterogeneity, comparison of analysis methods for simulation study and present an example.  Simulation for m = 6, c = 8, ρ = 0, 0.1, 0.2, 0.3, λ_{A} = 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2 and d = 0.  Recommend pnRCTs take account of heteroscedasticity. Satterthwaite unequal variances ttest gave robust to heteroscedasticity. The heteroscedastic model gives slightly inflated test size for large ρ: suggest Satterthwaite degrees of freedom as a solution. 
Lee & Thompson [28]  Describe analysis models for iRCTs with clustering and apply to two examples (using Bayesian approach)  Show that ignoring clustering may underestimate uncertainty, leading to incorrect conclusions.  
Hoover [34]  Statistical tests for RCTs with clustering that differ across trial arms.  Example with clustering in both arms with m = 7 − 12, c = 3.  Provide an adjustment for the independent samples ttest for pnRCTs. Statistical impact of heterogeneity effect increases as the cluster size increases, and as heterogeneity increases. The test does not allow for the inclusion of covariates, multiple treatments, baseline measures, or nonnormally distributed outcomes. 
In this study, we use a series of simulations to evaluate the statistical analysis models for twoarm parallel pnRCTs with continuous outcomes, assessing a range of scenarios including the effect of cluster size and the number of clusters. In theory, the mixedeffects models can be formulated so that they do not model clustering in the control arm, however, when running these models in statistical software packages it is necessary to impose some form of clustering in the control arm. The literature identified in Table 1 highlighted that research to date is lacking in addressing the best way to treat the nonclustered control arm when fitting the models in statistical software, using scenarios of relevance in the field of public health with small clusters and small ICCs [9], and evaluating the effect of the variance ratio of the residuals on the model fit. In pnRCTs we may have small numbers of clusters [9], thus we evaluate the impact of the number of clusters on statistical inference and if statistical inference remains valid using mixedeffects models.
 1)
where mixedeffects models are necessary,
 2)
methods of specifying the clusters in the nonclustered arm when fitting a model and the impact these have on the analysis,
 3)
the impact of cluster sizes and the number of clusters on statistical inference and,
 4)
the impact of heteroscedastic individual variance between trial arms on statistical inference.
Methods
Methods for analysis of partially nested trials
In this section, we present the main modelling approaches currently available and used for pnRCTs, including ignoring clustering altogether, imposing clustering in the nonclustered control arm, and explicitly modelling the partially nested design by modelling clustering only in the intervention arm.
It is possible to account for clustering by including each cluster as a fixed effect in a standard regression model, in addition to a fixed effect representing the intervention effect. Although this method is simple to implement, it is not recommended. Firstly, it does not reflect the study design of a pnRCT and may require a large number of fixed effects to be fitted lowering the degrees of freedom [9]. Secondly, if clusters are of size one there is insufficient information to estimate both the intervention effect and the cluster effect for each cluster. Finally, it will generally underestimate the intervention effect variability as the cluster level variability is removed.
Models for the analysis of pnRCTs
Model description  Statistical model  Heteroscedastic residuals  

Model 1  Linear regression (ignore clustering)  y_{i} = β_{0} + θt_{i} + ϵ_{i} • \( {\epsilon}_i\sim N\left(0,{\sigma}_{\epsilon}^2\right) \) the individual level variation  No 
Model 2  Fully clustered (impose clustering)  y_{ij} = β_{0} + θt_{ij} + u_{j} + ϵ_{ij} • \( {u}_j\sim N\left(0,{\sigma}_u^2\right) \) a random effects term representing between cluster variation • \( {\epsilon}_{ij}\sim N\left(0,{\sigma}_{\epsilon}^2\right) \) the individual level variation  No 
Model 3  Partially nested homoscedastic  y_{ij} = β_{0} + θt_{ij} + u_{j}t_{ij} + ϵ_{ij} • \( {u}_j\sim N\left(0,{\sigma}_u^2\right) \) a random effects term representing betweencluster variation in clustered arm • \( {\epsilon}_{ij}\sim N\left(0,{\sigma}_{\epsilon}^2\right) \) the individual level variation  No 
Model 4  Partially nested heteroscedastic  y_{ij} = β_{0} + θt_{ij} + u_{j}t_{ij} + r_{ij}(1 − t_{ij}) + ϵ_{ij}t_{ij} • \( {u}_j\sim N\left(0,{\sigma}_u^2\right) \) a random effects term representing between clustervariation in clustered arm • \( {r}_{ij}\sim N\left(0,{\sigma}_r^2\right) \) the individual level variation in the nonclustered control arm. • \( {\epsilon}_{ij}\sim N\left(0,{\sigma}_{\epsilon}^2\right) \) the individual level variation in the clustered arm  Yes 
Model 1 (Table 2) is the linear regression model which ignores the clustering and uses analysis for nonclustered trials, assuming independence between individuals regardless of whether they are in the same cluster. This infers that the conditional variance of y in both the intervention and control arms is equal. If the outcomes of individuals in the same cluster are correlated, the independence assumption is violated and we underestimate uncertainty around intervention effects when using the linear regression model above.
Options for imposing clustering in the nonclustered control arm
Option  Cluster  Intervention 

1  j = 0  j = 1, …, c 
2  j = 1, …, l  j = l + 1, …, c 
3  j = 1, …, k  j = k + 1, …, c 
Individuals in the nonclustered control arm are assumed independent. This accurately reflects the design of the study with clustering only in one arm. In the partially nested homoscedastic model, we apply the random effect u_{j} to the clustered intervention arm only; betweencluster variability is only present for the intervention arm. Model 3 is homoscedastic as the variance of the individual errors, ϵ_{ij}, between arms is the same. In practice, the variance of the individual errors may differ between trial arms [7]. Therefore, model 3 is extended to a partially nested heteroscedastic model, model 4, this allows for differing individual errors between intervention and control arms but does not constrain the form of heteroscedasticity.
Imposed clustering in the control arm
Regardless of whether or not the model assumes clustering in one (models 3 and 4) or both arms (model 2), within the statistical software package a decision must be made about how to code the cluster indicators in the control arm. One method is to impose clusters for all individuals, including those in the control arm, and use analysis for cRCTs with clustering in both arms.
Table 3 represents the different options for imposing clustering, j, in the control arm, l is the number of individuals in the control arm and k is the number of arbitrary clusters in the control group. Option one treats the control group as one single cluster; option two treats each individual in the control arm as their own cluster of size one (singleton clusters) giving j=l clusters in the control arm. ICC estimation can be problematic with options one and two, in theory, it is not possible to estimate betweencluster variability in option one, or estimate within cluster variability in the control group using option two [20]. Option three imposes artificial pseudorandom clusters in the control group to overcome the problem of estimating within or betweencluster variability. The number of arbitrary clusters, k, needs to be considered. We chose it to be equal across treatment arms. In addition, option three will likely result in a lower ICC estimation due to the assumed independence of control participants.

Model 2.1 fully clustered mixedeffects model with singleton clusters in the control arm;

Model 2.2 fully clustered mixedeffects model with one large cluster in the control arm;

Model 2.3 fully clustered mixedeffects model with pseudo clusters in the control arm.
Flight et al. [9] investigated the effect of the different methods of imposing clustering in the control arm presented in Table 3 in four pnRCT casestudies. The four casestudies covered trials evaluating the effect of: specialist leg ulcer clinics (clustered by clinic), acupuncture for low back pain (clustered by acupuncturist), postnatal support in the community (clustered by community support worker), and telephone befriending for maintaining quality of life in older people (clustered by volunteer facilitator). Little difference was found between the different methods for the fully clustered mixedeffects models and there was no difference between the different methods for the partially nested mixedeffects models.
We anticipated that the method of imposing the clustering in the control arm does not affect the results of the methods which model clustering in only one arm, however, this evaluated in the simulation study.
Degrees of freedom for fixed effect estimates
In the mixedeffects models above we wish to carry out significance tests for the intervention effect. In addition to the correct choice of model, the test statistics and degrees of freedom in mixedeffects models also need to be considered. For large sample sizes in mixedeffects models, the test statistics for fixed effects can be assumed Normally distributed. However, for small samples, the tdistribution is generally used as an approximation of the distribution of the test statistic. Estimating the degrees of freedom for the tdistribution is unclear for pnRCTs and will affect both the significance test and the confidence intervals of the intervention effect estimate.
Comparison of degrees of freedom correction methods has been undertaken for cRCTs and pnRCTs with small numbers of clusters [15, 21]. The Satterthwaite smallsample degrees of freedom correction takes into account the variance structure of the data, for pnRCTs, it has been shown to be superior to the betweenwithin method for maintaining Type I error rates (and comparable to the KenwardRoger method) [15]. Following these results, the Satterthwaite approximation was used to calculate degrees of freedom (using dfmethod() option for mixed, available in Stata 14 onwards [22]).
Simulation study
Overview
We performed a simulation study to evaluate the statistical analysis models for pnRCTs presented in Table 2, and the imposed clustering of the control arm in Table 3 [23]. All models were fitted using a restricted maximum likelihood procedure (REML). All simulations were done in Stata [22], graphs produced using ggplot2 [24] in R [25]. See Additional file 1 for simulation code.
Datagenerating mechanism
We simulated data to replicate a twoarm parallel pnRCT trial with a nonclustered control arm and a clustered intervention arm (randomised 1:1) and a continuous outcome. We simulated data under various design scenarios and under both the null (θ = 0) and alternative hypothesis (θ = A, where A ≠ 0).

For the intervention arm \( \left(t=1\right)\kern0.28em {y}_{ij}=\theta +{u}_j\sqrt{\rho }+{z}_{ij}\sqrt{1\rho } \)

For the control arm \( \left(t=0\right)\kern0.28em {y}_{ij}={z}_{ij}\sqrt{\gamma \left(1\rho \right)} \)
Simulation input scenario values (total 1440 scenarios)
Variable  Notation  Values 

Number of clusters  c  3, 6, 12, 24 
Cluster size  m  5, 10, 20, 30 
Intervention effect  θ  0, 0.2, 0.5 
ICC  ρ  0, 0.01, 0.05, 0.1, 0.2^{a}, 0.3^{a} 
Ratio of individual variance between control and cluster trial arms  γ  0.25^{a}, 0.5, 1, 2, 4^{a} 
Methods
Each simulated dataset was analysed using the models described in Table 2.
Estimand
Our estimands of interest are the REML estimate of the intervention effect θ and the model estimate of the ICC ρ.
Performance measures
We used the following performance measures: bias, mean square error (MSE), and coverage of 95% confidence intervals for \( \widehat{\theta} \), Type I error rate and power (calculated as the proportion of simulated results with a statistically significant intervention effect at the 5% level when the null or alternative hypothesis were true, for Type I error and power respectively) and where applicable, model estimated ICC. See Additional file 2 for more detail on performance measures. For each of the 1440 scenarios 1000 datasets were generated; a 5% significance level and 95% confidence interval based on 1000 simulations has a Monte Carlo error of 0.7%.
Results
Model convergence was generally satisfactory for all models with models converging 95–100% of the time across the different scenarios.
Imposed clustering in the control arm
Methods for imposing clustering in the control arm, presented in Table 3, had a negligible impact on the performance measures of the partially nested mixedeffects models (models 3 and 4). Under the simulation scenarios, the differences in the pvalue, confidence intervals and estimated ICC between the methods were only present at four decimal places. Model fitting was considerably faster (around four to five times faster) using either one large cluster or the pseudo clusters compared to the singleton clusters, however, this will likely be immaterial when fitting only a small number of models.
Methods for imposing clustering in the control arm had a large impact on the performance measures of the fully clustered mixedeffects models (models 2.1, 2.2, and 2.3). Specific results for each performance measure are presented in the following sections.
Results are reported only for the partially nested mixedeffects models (models 3 and 4) with the nonclustered controls classified as one large cluster, as other methods were comparable. All three methods for classifying the nonclustered control arms for the fully clustered mixedeffects model (models 2.1, 2.2, and 2.3) are reported.
Bias
The bias of the intervention effect estimate was not affected by the analysis model used, individual variances (γ) or the ICC (ρ). The maximum absolute bias of the intervention effect was 0.057, 0.043, and 0.053 for θ = 0, 0.2 and 0.5, respectively.
Mean square error
The models produced unbiased estimators with no difference in the observed MSE between the different models. The MSE of the intervention effect estimate had a mean of 0.051 (standard deviation (SD) 0.056) and maximum of 0.346.
Type I error
Model 2, the fully clustered models with imposed clustering in the control arm resulted in biased Type I error rates. Imposing clustering as singleton clusters (model 2.1) led to Type I error rates which were largely affected by the ratio of individual variances (γ) and ICC (ρ). Imposing one large cluster in the control arm (model 2.2) resulted in Type I error rates of zero (due to the Satterthwaite degrees of freedom correction resulting in large degrees of freedom when imposing one large cluster in the control arm). Imposing pseudo clusters in the control arm of the same size as the intervention arm (model 2.3) provided relatively good control of Type I error rates, mean Type I error of 0.039 (SD 0.018), but was affected slightly by both the ratio of individual variances (γ) and ICC (ρ).
Both the homoscedastic and heteroscedastic partially nested models (models 3 and 4) provided good control of Type I error rates (model 3: mean Type I error 0.045 (SD 0.016) and model 4: mean Type I error 0.044 (SD 0.014)) with little difference present between the two models.
The Satterthwaite correction used in Stata mixed (dfmethod(sat)) did not fully correct the Type I error rates to the nominal 5% level, even with the use of the heteroscedastic model 4. The heteroscedastic model 4 did have slightly improved control of Type I error rates than the homoscedastic model 3.
Coverage
Both the homoscedastic and heteroscedastic partially nested models (models 3 and 4) provided good control of coverage rates of 95% confidence intervals (model 3: mean coverage rate 0.956 (SD 0.014) and model 4: mean coverage rate 0.956 (SD 0.014)) with little difference between the two models. In the simulations over coverage of the 95% confidence intervals for the heteroscedastic model 4 occurred when ICC ρ ≤ 0.05, except when the ratio of individual variances γ = 4. Hence, the results were generally conservative when ICC was small. Under coverage of the 95% confidence intervals for the heteroscedastic model 4 only occurred for large ICC (ρ) and ratio of individual variances (γ).
Power
For ICC ρ ≤ 0.05, which is commonly assumed when planning complex intervention trials in healthcare, power of 80% was generally achieved with: 24 clusters of any size, 12 clusters of size ten or more, and six clusters of size 20 or more (120 in each arm).
Mean and SD of power of model 4 versus model 1 under ρ = 0 over all scenarios
Intervention effect (θ)  Model  Power 

Mean (SD)  
0  1  0.050 (0.007) 
4  0.033 (0.014)  
0.2  1  0.388 (0.276) 
4  0.327 (0.286)  
0.5  1  0.803 (0.254) 
4  0.740 (0.298) 
ICC
Summary of results
Summary of simulation results by different models split by ρ, m, and c averaged over all γ
Discussion
In this study, we have investigated six modelling strategies for the analysis of pnRCTs with a continuous outcome. Our simulation study showed that when analysing pnRCTs the use of the heteroscedastic partially nested mixedeffects model for normally distributed outcome data (using Satterthwaite degrees of freedom) in general provides: unbiased effect estimates; maintains relatively good control of Type I error rates; and did not noticeably cause a reduction in power even with homoscedastic individual variances across arms. The heteroscedastic partially nested model takes account of the betweencluster variance (if present) and therefore provides valid inferences for the intervention effect. Additional file 2 presents modelfitting code for R, Stata and SAS. When using the partially nested mixedeffects model, the method of classifying the nonclustered controls had a negligible impact on statistical inference under the simulation scenarios, agreeing with findings from analysis of four example pnRCTs by Flight et al. [9].
Our findings were broadly similar to those of Baldwin et al. [15]. However, they did not assess the method of classifying the nonclustered controls or performance of models under small ICC (ρ = 0.01, lowest value used in our study) which commonly occur in pnRCTs [7, 8, 9, 26, 29]. Unlike findings from Baldwin et al. [15], the Satterthwaite degrees of freedom correction did not fully control the Type I error rate in our simulations. The largest discrepancy from the nominal level occurred when the ICC was small, ratio of individual variances <1, and under small sample sizes.
We have illustrated that using a naïve linear regression model, which ignores clustering in pnRCTs, gave inflated Type I error rates and resulted in under coverage of confidence intervals when clustering of outcomes was present. When ICC 0.01 ≤ ρ ≤ 0.05, which we believe is common in pnRCTs [9], ignoring clustering led to largely inflated Type I error rates using the linear regression model. A low ICC may still have a large impact, particularly when cluster sizes are large.
When ICC was small and/or with very few clusters and small cluster sizes using the partially nested mixedeffects models 3 and 4 resulted in underestimated Type I error rates. These models correctly reflect the design of the trials; however, they can result in conservativism regarding the precision of estimates due to the bias in estimating the variance estimates when we have a small number of clusters. Consequently, using the partially nested mixed effects models with small ICC may make it difficult to detect differences between the trial arms when present.
Sanders [17] recommend evaluating whether ICC is significantly different from zero prior to selecting an analysis method. We caution such significance testing for ICC, and similarly testing for heteroscedasticity [7]. These tests will generally lack power in a pnRCT and it is not the statistical significance of the ICC that matters but impact of the magnitude on inference. In general, we recommend the use of the partially nested models when analysing pnRCT trials, particularly if conservatism and an ICC estimate are desired. However, the choice of model and the requirement or not for conservatism needs to be considered in the context of the specific trial setting.
Similar to cRCTs [1], in a pnRCT increasing the number of clusters rather than increasing the cluster size has a greater increase in power for a fixed total sample size. Our simulation results showed that this will also provide a more accurate estimation of the ICC. When the number of clusters is small, for example, three clusters in the intervention arm, the ICC estimation will likely be upwardly biased. With six clusters in the intervention arm, the ICC estimate was relatively unbiased once the true ICC ≥0.1. The ICC estimation became consistent regardless of cluster size or true ICC only once there were 24 clusters in the simulation scenarios. This reflects findings from previous research that to reliably estimate the size of clustering effects a large number of clusters are required [30].
This study investigated the case of analysing partially nested trials under complete compliance. Noncompliance in the clustered arm of a pnRCT may occur when some participants randomised to a particular treatment group or care provider do not attend any sessions or receive treatment as part of different treatment group or care provider intended at randomisation. Consequently, noncomplier outcomes may be assumed independent if they do not receive the clustered intervention. Schweig and Pane [16] describe and compare models for pnRCTs with noncompliance using a simulation study. They argue that an unbiased intentiontotreat (ITT) estimate under noncompliance on a pnRCT may be obtained using a Complier Average Causal Effects (CACE) model. This method involves estimating the treatment effect for compliers and scaling this CACE effect estimate by the proportion of compliers to provide an ITT effect estimate. The issues posed by noncompliance warrant further investigation, considering a broader range of scenarios and investigating the degrees of freedom corrections for valid statistical inferences.
The design and analysis of trials with clustering in one arm needs to take account of heterogeneity and ICC to have a sufficiently powered sample size and accurate intervention effect. We strongly recommend the reporting of ICCs in trials results papers. The framework developed for cRCTs is also broadly applicable in iRCTs with clustering, identifying three dimensions to consider when reporting an ICC: a description of the dataset (including characteristics of the outcome and the intervention); how the ICC was calculated; and the precision of the ICC [31]. This has the potential to improve the assumptions about ICCs in iRCTs, adhere to CONSORT reporting guidelines for RCTs of nonpharmacological interventions [32], and raise awareness of the need to account for clustering in both the sample size and analysis in iRCTs with clustering.
A wide variety of terminology are used in iRCTs with clustering in one arm, including partially nested, partially clustered, multilevel, and individually randomized group intervention. A more consistent use of terminology would reduce confusion, improve reporting and make finding relevant ICCs from previous trials easier. We suggest the terminology partially nested randomised trial (pnRCT) to describe an iRCT with clustering in one arm.
Limitations
All the mixedeffects models assume that the cluster level means follow a Normal distribution. This may not be a valid assumption, for example, when we have a small number of clusters.
In the simulations, we have used fixed cluster sizes. In practice, cluster size may vary, causing a loss in efficiency when estimating the intervention effect. A simulation study by Candel and Van Breukelen [10] found the efficiency loss in the intervention effect estimate was rarely more than 10%, requiring recruitment of 11% more clusters for the intervention arm and 11% more individuals for the control arm. The loss of efficiency in the intercept variance reached to 15%, requiring 19% more clusters in the clustered arm, and no additional recruitment in the control arm. Additionally, it has been shown in cluster trials if the coefficient of variation in cluster size is small, less than 0.23, then the correction on sample size is negligible [33]. It should be noted that cluster sizes are likely to be more similar in group administered interventions compared to trials which impose clustering by being treated by the same care provider [7].
Throughout the simulations we assumed there was no effect of clustering in the control arm, this may not strictly be true in practice. In healthcare intervention trials, a commonly used control intervention is ‘care as usual’. This type of control may induce some form of lowlevel clustering, for instance, treatment by a healthcare practitioner. If the same practitioner treats numerous individuals, we can assume, in the same sense as we have done for the intervention arm that these individuals are clustered and include this in the modelling procedure. However, this information is often not available in trial data and is not unique to pnRCTs.
Partially nested trials pose a number of challenges, in particular, the issue of internal validity [6]. The grouping of individuals as part of the delivery of a treatment may affect the outcome. However, taking a pragmatic viewpoint, we consider the grouping as part of the treatment as a whole if this is reflective of treatment delivery in realworld practice. In addition, if the ungrouped controls are the true comparison in real life a pnRCT design will provide external validity.
Conclusion
Partially nested RCTs are increasingly used in complex intervention research. Ignoring clustering can lead to inflations of the Type I error rates, even for small ICCs. When analysing a pnRCT with continuous outcomes we recommend the use of a heteroscedastic partially nested mixedeffects model with corrected degrees of freedoms such as using the Satterhwaite method, for outcomes similar to those generated under the scenarios of our simulations study. The method used for classifying the nonclustered controls had a negligible impact on the results using the partially nested mixedeffects model. The model is easy to implement in standard statistical software and does not cause a notable reduction in power under homoscedastic variances. With few clusters, small cluster sizes and small ICC, the partially nested model underestimated Type I error rates and gave largely inflated ICC estimates, hence, for such designs there is no optimal model and we need to be cautious in model interpretation. Finally, to aid the design and prior selection of an appropriate analysis plan for pnRCTs, we strongly recommend the reporting of estimated ICC when publishing trials results.
Notes
Funding
JC was funded by the University of Sheffield Harry Worthington PhD Scholarship. MDT, SJW, MD, and LM were funded by the University of Sheffield. This is a summary of independent research funded by the National Institute for Health Research (NIHR)’s Doctoral Research Fellowship funding LF (DRF201508013). The views expressed are those of the author and not necessarily those of the NHS, the NIHR, the Department of Health or the University of Sheffield.
Availability of data and materials
All data used was simulated, simulations code is available in Additional file 1.
Authors’ contributions
JC designed and implemented the simulations and drafted and edited the manuscript. MD, LF, and LM provided assistance and consultation in running the simulations. MDT, MD, LF, LM, and SJW provided input into the simulation conception and design and revised the manuscript critically. All authors approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary material
References
 1.Campbell MJ, Walters SJ. How to Design, Analyse and Report Cluster Randomised Trials in Medicine and Health Related Research. Chichester: Statistics in Practice, Wiley; 2014.Google Scholar
 2.Campbell MK, Piaggio G, Elbourne DR, Altman DG, Group C. Consort 2010 statement: extension to cluster randomised trials. BMJ. 2012;345:e5661.CrossRefGoogle Scholar
 3.Hutchings J, Gardner F, Bywater T, Daley D, Whitaker C, Jones K, et al. Parenting intervention in sure start services for children at risk of developing conduct disorder: pragmatic randomised controlled trial. BMJ. 2007;334(7595):678.CrossRefGoogle Scholar
 4.Morrell CJ, Walters SJ, Dixon S, Collins KA, Brereton LM, Peters J, et al. Cost effectiveness of community leg ulcer clinics: randomised controlled trial. BMJ. 1998;316(7143):1487–91.CrossRefGoogle Scholar
 5.Diener MK, Seiler CM, von Frankenberg M, Rendel K, Schüle S, Maschuw K, et al. Vascular clips versus ligatures in thyroid surgery—results of a multicenter randomized controlled trial (CLIVIT trial). Langenbeck’s Arch Surg. 2012;397(7):1117–26.CrossRefGoogle Scholar
 6.Bauer DJ, Sterba SK, Hallfors DD. Evaluating groupbased interventions when control participants are ungrouped. Multivariate Behav Res. 2008;43(2):210–36.CrossRefGoogle Scholar
 7.Roberts C, Roberts SA. Design and analysis of clinical trials with clustering effects due to treatment. Clin Trials. 2005;2(2):152–62.CrossRefGoogle Scholar
 8.GossageWorrall R, Holt RIG, Barnard K, Carey ME, Davies MJ, Dickens C, et al. STEPWISE – STructured lifestyle education for people WIth SchizophrEnia: a study protocol for a randomised controlled trial. Trails. 2016;17(1):475.Google Scholar
 9.Flight L, Allison A, Dimairo M, Lee E, Mandefield L, Walters SJ. Recommendations for the analysis of individually randomised controlled trials with clustering in one arm – a case of continuous outcomes. BMC Med Res Methodol. 2016;16(1):165.CrossRefGoogle Scholar
 10.Candel MJJM, Van Breukelen GJP. Varying cluster sizes in trials with clusters in one treatment arm: sample size adjustments when testing treatment effects with linear mixed models. Stat Med. 2009;28(18):2307–24.CrossRefGoogle Scholar
 11.Fazzari MJ, Kim MY, Heo M. Sample size determination for threelevel randomized clinical trials with randomization at the first or second level. J Biopharm Stat. 2014;24(3):579–99.CrossRefGoogle Scholar
 12.Batistatou EE, Roberts C, Roberts S. Sample size and power calculations for trials and quasiexperimental studies with clustering. Stata J. 2014;14(1):159–75.Google Scholar
 13.Moerbeek M, Wong WK. Sample size formulae for trials comparing group and individual treatments in a multilevel model. Stat Med. 2008;27(15):2850–64.CrossRefGoogle Scholar
 14.Moerbeek M, Teerenstra S. Power analysis of trials with multilevel data. Boca Raton: Chapman & Hall; 2016.Google Scholar
 15.Baldwin SA, Bauer DJ, Stice E, Rohde P. Evaluating models for partially clustered designs. Psychol Methods. 2011;16(2):149–65.CrossRefGoogle Scholar
 16.Schweig JD, Pane JF. Intentiontotreat analysis in partially nested randomized controlled trials with realworld complexity. Int J Res Method Educ. 2016;39(3):268–86.CrossRefGoogle Scholar
 17.Sanders E. Multilevel Analysis Methods for Partially Nested Cluster Randomized Trials. University of Washington; 2011. Available online at: https://eric.ed.gov/?id=ED529306. Accessed 04 Sep 2018.
 18.Korendijk EJH. Robustness and optimal design issues for cluster randomized trials. Utrecht University; 2012. Available online at: https://dspace.library.uu.nl/handle/1874/240965. Accessed 04 Sep 2018.
 19.Lohr S, Schochet PZ, Sanders E. Partially Nested Randomized Controlled Trials in Education Research: A Guide to Design and Analysis. National Center for Education Research (NCER). Washington, DC; 2014. Available online at: https://ies.ed.gov/ncer/pubs/20142000/pdf/20142000.pdf. Accessed 04 Sep 2018.
 20.Bland M. Grouping in individually randomised trials. In: 4th Annual Conference on Randomised Controlled Trials in the Social Sciences. York; 2009. Available online at: https://wwwusers.york.ac.uk/~mb55/talks/individ.pdf. Accessed 4 Sept 2018.
 21.Kahan BC, Forbes G, Ali Y, Jairath V, Bremner S, Harhay MO, et al. Increased risk of type I errors in cluster randomised trials with small or medium numbers of clusters: a review, reanalysis, and simulation study. Trials. 2016;17(1):438.CrossRefGoogle Scholar
 22.StataCorp. Stata Statistical Software: Release 14. College Station: StataCorp LP; 2015. p. 2015.Google Scholar
 23.Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006;25(24):4279–92.CrossRefGoogle Scholar
 24.Wickham H. ggplot2: elegant graphics for data analysis: New York: SpringerVerlag; 2016.Google Scholar
 25.R Core Team. R: a language and environment for statistical computing. Vienna: R Foundation for Statistical Computing; 2017. http://www.Rproject.org/ Google Scholar
 26.Baldwin SA, Murray DM, Shadish WR, Pals SL, Holland JM, Abramowitz JS, et al. Intraclass correlation associated with therapists: estimates and applications in planning psychotherapy research. Cogn Behav Ther. 2011;40(1):15–33.CrossRefGoogle Scholar
 27.Sterba SK. Partially nested designs in psychotherapy trials: a review of modeling developments. Psychother Res. 2017;27(4):425–36.CrossRefGoogle Scholar
 28.Lee KJ, Thompson SG. The use of random effects models to allow for clustering in individually randomized trials. Clin Trials. 2005;2(2):163–73.CrossRefGoogle Scholar
 29.Cook JA, Bruckner T, Maclennan GS, Seiler CM. Clustering in surgical trials : database of intracluster correlations. Trials. 2012;13:2.CrossRefGoogle Scholar
 30.Maas CJM, Hox JJ. Robustness issues in multilevel regression analysis. Stat Neerl. 2004;58(2):127–37.CrossRefGoogle Scholar
 31.Campbell MK, Grimshaw JM, Elbourne DR, Pocock S, Campbell M, Grimshaw J, et al. Intracluster correlation coefficients in cluster randomized trials: empirical insights into how should they be reported. BMC Med Res Methodol. 2004;4:9.CrossRefGoogle Scholar
 32.Boutron I, Moher D, Altman DG, Schulz KF, Ravaud P, Grp C. Extending the CONSORT statement to randomized trials of nonpharmacologic treatment: explanation and elaboration. Ann Intern Med. 2008;148(4):295–309.CrossRefGoogle Scholar
 33.Eldridge SM, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int J Epidemiol. 2006;35(5):1292–300.CrossRefGoogle Scholar
 34.Hoover DR. Clinical trials of behavioural interventions with heterogeneous teaching subgroup effects. Stat Med. 2002;21(10):1351–64.CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.