1 Introduction

There is an increasing demand from decision makers for information not just on the effectiveness of new interventions, but on their cost-effectiveness (value for money). Cost-effectiveness studies of new medical therapies, treatments or health care technologies are therefore increasingly carried out and such studies frequently have a nested design, that is, costs and effectiveness data are collected at the individual person level, but these persons are nested within larger units called clusters, for instance, patients are nested within general practices. In such two-level nested designs, treatments are randomly assigned either to whole clusters (cluster randomized trial (CRT)) or to individuals within clusters (multicentre trial). Examples of cost-effectiveness CRTs are the SPHERE study (Ng et al. 2016) and the PONDER study (Morrell et al. 2009). Examples of multicentre cost-effectiveness trials are the EVALUATE trial (Sculpher et al. 2004; Manca et al. 2005) and the Comparison of European Stroke Costs and Survival study (Grieve et al. 2001, 2007).

The planning of such trials itself involves cost-effectiveness considerations, as the trial must have sufficient power and precision for testing and estimating the cost-effectiveness of the new treatment within the constraints of a budget for sampling, treating and measuring clusters and individuals. In particular, the question is how many clusters and how many individuals per cluster are needed to prove the cost-effectiveness of treatments. This problem can be addressed by optimal design theory. The optimal design, henceforth called the optimal sample size, is the design which maximizes the efficiency of the treatment effect estimator, and thus results in maximum power for testing the treatment effect, given a constraint on the total cost for sampling and measuring clusters and individuals. Alternatively, and equivalently, the optimal design is the design which minimizes the total costs for sampling and measuring given a constraint on the statistical power or efficiency. This paper presents software for computing the optimal design for cost-effectiveness CRTs and multicenter cost-effectiveness trials. The optimal design usually depends on some model parameters which are unknown in the design stage and is thus optimal only for a specific set of parameter values. This is known as the local optimality problem. One solution to this problem is the so-called maximin design (MMD). This design maximizes the minimum efficiency and can be considered a design which is efficient for a plausible range of values for the unknown model parameters (Berger and Wong 2009; Atkinson et al. 2007).

The statistical theory for constructing both optimal design and MMD for cost-effectiveness CRTs and multicentre cost-effectiveness trials is presented by Manju et al. (2014), Manju et al. (2015) This paper presents its software implementation in a computer program with a graphical user-interface (GUI). This computer program, called SamP2CeT (Sample size and Power calculation for 2-level Cost-effectiveness Trials), provides a user-friendly environment for researchers to design cost-effectiveness trials (CRT and multicentre trial) with regard to the optimal number of clusters and the optimal number of subjects per cluster. In case of limited information on model parameters, the SamP2CeT also provides maximin (i.e., worst case) numbers of clusters and subjects per cluster. The software presented here can either be used to minimize the budget for a desired level of power, or to maximize the power for a fixed budget. In addition to fixing a budget or power, SamP2CeT also evaluates the power if the design is fixed. Note that other software programs have been developed for sample size calculation for cluster randomized and multicentre trials, such as OD (Raudenbush and Liu 2000; Spybrooka et al. 2011), PINT (Snijders and Bosker 1993; Bosker et al. 2003) and ML-DE (Cools et al. 2008). These programs only take into account the effectiveness of interventions, whereas SamP2CeT also considers the costs of interventions. Furthermore, these software programs do not allow for optimizing the power for a given budget or minimizing the costs for a desired power level.

The paper is organized as follows. In Sect. 2, the design problem for cost-effectiveness trials is briefly discussed. The methodologies for CRTs and multicentre trials are outlined respectively in Sects. 3 and 4. The model, the cost function, the optimal design and the MMD for both CRTs and multicentre trials are explained. Section 5 describes the computer program and its input values. An illustration of the computer program with practical examples for both a CRT and a multicentre trial is provided in Sect. 6. Section 7 addresses the benefits of using optimal design and MMD. Finally, a discussion and conclusion are presented in Sect. 8.

2 Design problem for cost-effectiveness trials

A fundamental issue in both types of cost-effectiveness trials (CRTs and multicentre trials) is that individuals are nested within clusters, and that there will usually be variability in costs and effects not only between persons within clusters, but also between clusters, and so there is sampling variance at both design levels: person and cluster. Statistical methods that accommodate between cluster variability of outcomes in the analysis of cost-effectiveness CRTs and multicentre cost-effectiveness trials are relatively well established in the literature (Grieve et al. 2010; Gomes et al. 2012; Manju et al. 2014; Grieve et al. 2007; Manju et al. 2015). These methods address the variability in both costs and effects at both design levels, clusters and persons, and they also take into account the correlation between costs and effects at each level. The methods start from a bivariate mixed-effects model where costs and effects are assumed to follow a bivariate normal distributionFootnote 1. This bivariate mixed-effects model is then used to estimate average incremental costs and effects from treatment as compared with the control condition. These average incremental costs and effects in turn are combined into a single outcome called the average incremental net-monetary benefit (INMB), which is a widely used measure of a treatment’s cost-effectiveness (Stinnett and Mullahy 1998). The new treatment or intervention is said to be cost-effective if the average INMB exceeds 0Footnote 2. In this case the intervention is considered for adoption in health care practice. Technical details of the average INMB procedure are given in the next two sections.

When constructing an optimal design for cost-effectiveness trials, we want to estimate the average INMB as precisely, and to test it with as much power, as possible. Therefore, the optimal design minimizes the sampling variance (i.e. the squared standard error) of the estimator of the average INMB, subject to a certain cost function. This cost function relates the required research budget to the sampling costs per person and per cluster. The result of the design optimization are sample size formulae for the numbers of individuals and clusters that maximize the study power and precision given a fixed research budget, or that minimize the study costs given a desired level of precision and power for testing the average INMB.

These optimal design or sample size formulae depend on some model parameters which are not known in the planning stage, and so the optimal design is said to be locally optimal,(Berger and Wong 2009; Atkinson et al. 2007) that is, it typically only is optimal for specific values of model parameters, and not for other values. The maximin approach solves this local optimality problem by the following two-step procedure: (i) First, find that set of model parameter values within their plausible range of values, which minimizes the design efficiency, as expressed by the inverse of the sampling variance of the estimator of average INMB, and (ii) then choose that design which maximizes this minimum efficiency, thereby also maximizing the precision and power in the worst case scenario, given a fixed budget. The resulting design called the MMD, actually is the optimal design for the worst case.

3 Cluster randomized trials (CRTs)

3.1 Model

Suppose that the cost-effectiveness of a new treatment is evaluated in a CRT, where \(k_{t}\) and \(k_{c}\) clusters (e.g., general practices or hospitals) are randomly assigned to the new treatment and control treatment respectively. Within each treatment cluster j (j= 1,2, ..., \(k_{t}\)) there are m individuals (e.g., patients), and within each control cluster j (j= \(k_{t}\)+1, \(k_{t}\)+2, ..., \(k_{t}\)+\(k_{c}\)) there are n individuals. All individuals receive the treatment to which their cluster is allocated. To describe the modeling framework for the cost-effectiveness analysis of CRTs, the following bivariate linear mixed model is considered (Grieve et al. 2010; Gomes et al. 2012; Manju et al. 2014):

$$\begin{aligned}&E_{ij}=\beta ^e_{0}+\beta ^e_{1} x _{j}+ u ^e_{0j}+\epsilon ^e_{ij},\,and C_{ij}=\beta ^c_{0}+\beta ^c_{1} x _{j}+ u ^c_{0j}+\epsilon ^c_{ij},\nonumber \\&\begin{bmatrix} u ^e_{0j}\\ u ^c_{0j}\\ \end{bmatrix} \sim N \left( \begin{bmatrix} 0\\ 0\\ \end{bmatrix} , \begin{bmatrix} \sigma ^2_{e0}&\rho _{u0}\sigma _{e0}\sigma _{c0}\\&\sigma ^2_{c0}\\ \end{bmatrix} \right) and \begin{bmatrix} {\epsilon }^e_{ij}\\ {\epsilon }^c_{ij}\\ \end{bmatrix} \sim N \left( \begin{bmatrix} 0\\ 0\\ \end{bmatrix} , \begin{bmatrix} \sigma ^2_{\epsilon ^e}&\rho _{\epsilon }\sigma _{\epsilon ^e}\sigma _{\epsilon ^c}\\&\sigma ^2_{\epsilon ^c}\\ \end{bmatrix} \right) , \end{aligned}$$
(1)

where \(E_{ij}\) and \(C_{ij}\) are the effects and costs for individual i in cluster j respectively. The variable \(x_{j}\) represents the treatment assigned to cluster j and will be coded 0 and 1 for the control and treatment clusters respectively, although other coding schemes are also possible and lead to the same sample size formulas for the optimal design and MMD. The coefficients \(\beta ^e_{0}\) and \(\beta ^c_{0}\) can be regarded as the average effects and costs respectively for the group of individuals receiving the control treatment. The random intercepts \(u^e_{0j}\) and \(u^c_{0j}\) reflect the deviations from the averages \(\beta ^e_{0}\) and \(\beta ^c_{0}\) for the \(j^{th}\) cluster respectively and are assumed to follow a bivariate normal distribution, with variances \(\sigma ^2_{e0}\) and \(\sigma ^2_{c0}\) and correlation \(\rho _{u0}(-1\le \rho _{u0}\le 1)\). The slope coefficients \(\beta ^e_{1}\) and \(\beta ^c_{1}\) correspond to the average incremental effects and costs from treatment respectively. Finally, \(\epsilon ^e_{ij}\) and \(\epsilon ^c_{ij}\) are individual level residuals for effects and costs respectively, assumed to follow a bivariate normal distribution, with variances \(\sigma ^2_{\epsilon ^e}\) and \(\sigma ^2_{\epsilon ^c}\) and correlation \(\rho _{\epsilon }(-1\le \rho _{\epsilon }\le 1)\).

In Eq. (1), the costs and effects are expressed on their original scales, but the net-monetary benefit (NMB) framework, where both the effects and costs of interventions are scaled onto the same monetary scale on the basis of a threshold value \(\lambda \), offers a convenient way of modeling the data from cost-effectiveness CRTs (Stinnett and Mullahy 1998). Here NMB can be calculated for each individual i in cluster j as: \(NMB_{ij}=\lambda E_{ij}-C_{ij}\), where the value \(\lambda (0\le \lambda <\infty )\) is the threshold willingness to pay for a unit of health gain. Costs and effects are thus combined into the average incremental net-monetary benefit, INMB (\(\beta _{1}\)), a measure of relative value for money of intervention as compared with control, which is defined as \(\beta _{1}=\lambda \beta ^e_{1}-\beta ^c_{1}\). In a CRT, the outcomes of two individuals in an arbitrary cluster are correlated, and these intraclass correlations (ICCs) for effects and costs are equal to \(0\le \rho _{e}=\frac{\sigma ^2_{e0}}{\sigma ^2_{e0}+\sigma ^2_{\epsilon ^e}}\le 1\) and \(0\le \rho _{c}=\frac{\sigma ^2_{c0}}{\sigma ^2_{c0}+\sigma ^2_{\epsilon ^c}}\le 1\) respectively. We furthermore define \(\varphi =\frac{\lambda ^2(\sigma ^2_{e0}+\sigma ^2_{\epsilon ^e})}{\sigma ^2_{c0}+\sigma ^2_{\epsilon ^c}}\), and this \(\varphi (0\le \varphi <\infty )\), which we call the variance ratio, can be interpreted as the ratio of total variance for effects translated into costs (\(\lambda E_{ij}\)) to the total variance for costs (\(C_{ij}\)). The ICCs and the variance ratio will be seen to play an important role in the design optimization.

3.2 Sample size calculation

The aim of optimal design theory is to find the design which minimizes some function of the covariance matrix of one or more parameter estimators of interest, and this function is called the optimality criterion. Here, the optimality criterion is simply the sampling variance of \(\hat{\beta _{1}}\), the maximum likelihood (ML) estimator of the average INMB, \(\beta _{1}\), which measures the treatment’s cost-effectiveness. The following asymptotic variance of the ML estimator of \(\beta _{1}\) has been derived from the general expression for the variance-covariance matrix of the fixed-effects estimators for a linear mixed model as proposed by Verbeke and Molenberghs (2000) and Manju et al. (2014):

$$\begin{aligned} var (\hat{\beta _{1}})= \Big \{\Big (\frac{1}{k_{t}}+\frac{1}{k_{c}}\Big )A+\Big (\frac{1}{k_{t}m}+\frac{1}{k_{c}n}\Big )B\Big \}v_{c}, \end{aligned}$$
(2)

where \(A=\varphi \rho _{e}+\rho _{c}-2\rho _{u0}\sqrt{\rho _{e}\rho _{c}\varphi }\), \(B=\varphi \left( 1-\rho _{e}\right) +\left( 1-\rho _{c}\right) -2\rho _{\epsilon }\sqrt{\left( 1-\rho _{e}\right) \left( 1-\rho _{c}\right) \varphi }\) and \(v_{c}=var (C_{ij})=\sigma ^2_{c0}+\sigma ^2_{\epsilon ^c}\). Note that A is the contribution of random cluster effects, and B is the contribution of random person and measurement error effects, to the sampling variance of \(\hat{\beta _{1}}\).

Designs are usually restricted by a budget. Therefore, it is important to take the costs for measuring and sampling clusters and individuals into account when planning a CRT. Let C represent the total budget for sampling, treating and measuring clusters and persons and let \(c_{t}>0\) and \(c_{c}>0\) be the cost per cluster in the treatment and control arm, respectively. Furthermore, let \(s_{t}>0\) and \(s_{c}>0\) be the cost per individual in the treatment and control cluster, respectively. The required budget (or total costs) is then given by Tokola et al. (2011) and Manju et al. (2014):

$$\begin{aligned} C=k_{t}c_{t}+k_{c}c_{c}+k_{t}ms_{t}+k_{c}ns_{c}. \end{aligned}$$
(3)

Finding an optimal design means finding the m, n, \(k_{t}\) and \(k_{c}\) which minimize \(var (\hat{\beta _{1}})\) in Eq.  (2), and thus maximize the statistical power, given the budget constraint in Eq. (3). The optimal design formulas have been derived by Manju et al. (2014) and are here provided in Appendix A.

Computing the optimal design requires knowledge about the model parameters \(\rho _{e}\), \(\rho _{c}\), \(\rho _{u0}\), \(\rho _{\epsilon }\) and \(\varphi \). As described in the previous section, since specifying these parameter values may not be possible at the planning stage of a trial, a MMD can be determined as an efficient design over a plausible range of values for these unknown model parameters. The expressions for the MMDs for planning a cost-effectiveness CRT, as derived by Manju et al. (2014), are presented in Appendix A. These MMDs depend on the upper bounds for plausible ranges of \(\rho _{e}\) and \(\rho _{c}\).

The optimal design and the MMD as proposed by Manju et al. (2014) have been derived for a fixed budget. Therefore, these designs give the smallest variance of the treatment effect estimator and thus the largest precision and power, given the budget and given either the model parameter values (optimal design) or plausible ranges of these model parameters (maximin design). In designing a cost-effectiveness study, however, one will often aim at achieving a certain power for the treatment effect test instead of fixing the budget beforehand. This can be achieved by the same optimal and maximin design equations as for a fixed budget, however, by noting that the optimal respectively maximin cluster sizes m and n do not depend on the budget, whereas the optimal respectively maximin numbers of clusters do (see Appendix A for details). So if a given budget does not give sufficient power even after design optimization, then the budget must be increased and this will lead to an increase of the number of clusters \(k_{t}\) and \(k_{c}\), not of the cluster sizes m and n. The required budget can thus be calculated either by repeated increases of the budget followed by computation of the optimal cluster sizes and power until the desired power is attained, or by rewriting the optimal sample size equations for a given budget into those for a given power. The latter approach underlies the present computer program SamP2CeT. To compute the power and required budget the user will also need to specify the effect size (ES) of interest. The program uses as measure of ES an adaptation of the classical Cohen’s d (Cohen 1992), to the present nested design. For details, see Appendix A.

A third and last way of using SamP2CeT, is to calculate the power for a fixed design, that is, for user-specified values of m, n, \(k_{t}\) and \(k_{c}\). This is relevant if a researcher has several designs that are practically feasible, and wants to evaluate these in terms of their power. In this case, the same procedure as above for computing the power of the optimal design for a given budget is followed, except that user-specified values of m, n, \(k_{t}\) and \(k_{c}\) are used instead of the optimal m, n, \(k_{t}\) and \(k_{c}\). Furthermore, if the user does not have precise information on the model parameters that affect the power, the user can evaluate the worst case (lowest) power. In this case, the power of the user specified design in a worst case scenario results, comparable to the power of the MMD for a given budget.

4 Multicentre trials

4.1 Model

Next to CRTs, also multicentre trials, with randomization of individuals within clusters or centres, are often employed in cost-effectiveness studies. This section describes the modeling approach to analyze data from multicentre cost-effectiveness trials involving a treatment-by-centre interaction and assuming a m:n randomization in each centre j (j= 1,2, ..., k). We have the following bivariate linear mixed model for the effect of treatment x on the quantitative outcomes, effects (\(E_{ij}\)) and costs (\(C_{ij}\)), for individual i in centre j (Grieve et al. 2007; Manju et al. 2015):

$$\begin{aligned}&E_{ij}=\beta ^e_{0j}+\beta ^e_{1} x _{ij}+ u ^e_{1j} x _{ij}+\epsilon ^e_{ij},\,and C_{ij}=\beta ^c_{0j}+\beta ^c_{1} x _{ij}+ u ^c_{1j} x _{ij}+\epsilon ^c_{ij},\nonumber \\&\begin{bmatrix} u ^e_{1j}\\ u ^c_{1j}\\ \end{bmatrix} \sim N \left( \begin{bmatrix} 0\\ 0\\ \end{bmatrix} , \begin{bmatrix} \sigma ^2_{e1}&\rho _{u1}\sigma _{e1}\sigma _{c1}\\&\sigma ^2_{c1}\\ \end{bmatrix} \right) , and \begin{bmatrix} {\epsilon }^e_{ij}\\ {\epsilon }^c_{ij}\\ \end{bmatrix} \sim N \left( \begin{bmatrix} 0\\ 0\\ \end{bmatrix} , \begin{bmatrix} \sigma ^2_{\epsilon ^e}&\rho _{\epsilon }\sigma _{\epsilon ^e}\sigma _{\epsilon ^c}\\&\sigma ^2_{\epsilon ^c}\\ \end{bmatrix} \right) , \end{aligned}$$
(4)

where \(x_{ij}\) is coded as b for treated and a control individuals (a and b are real numbers and \(a\ne b\)), so that the average incremental effects and costs from treatment are \((b-a)\beta ^{e}_{1}\) and \((b-a)\beta ^{c}_{1}\) respectively. In case of a multicentre trial, the \(NMB_{ij}\) and the average INMB are also defined as \(NMB_{ij}=\lambda E_{ij}-C_{ij}\) and \((b-a)\beta _{1}=(b-a)(\lambda \beta ^{e}_{1}-\beta ^{c}_{1})\) respectively. The intercepts \(\beta ^{e}_{0j}\) and \(\beta ^{c}_{0j}\) for centre j, the main effect of centres for effects and costs respectively, may be either fixed or random. If the randomization is m:n in each centre j (j= 1,2, ..., k), it makes no difference for the optimal design and MMD whether the components \(\beta ^{e}_{0j}\) and \(\beta ^{c}_{0j}\) are treated as random or fixed (Senn 2014). The random components \(u^{c}_{1j}\) and \(u^{e}_{1j}\), represent the treatment-by-centre interactions for costs and effects respectively and are assumed to follow a bivariate normal distribution with the variances \(\sigma ^2_{c1}\) and \(\sigma ^2_{e1}\) and correlation \(\rho _{u1}(-1\le \rho _{u1}\le 1)\). We make the same distributional assumptions regarding the random individual level (\(\epsilon ^{c}_{ij}, \epsilon ^{e}_{ij}\)) costs and effects in Eq. (4) as in Eq. (1). In deriving the optimal design and MMD for multicentre cost-effectiveness trials, Manju et al. (2015) used the following re-parameterization of the model parameters in Eq. (4):

\(0\le \theta _{e}=\frac{(b-a)^2 \sigma ^{2}_{e1}}{(b-a)^2 \sigma ^{2}_{e1}+\sigma ^{2}_{\epsilon ^e}}\le 1\), \(0\le \theta _{c}=\frac{(b-a)^2 \sigma ^{2}_{c1}}{(b-a)^2 \sigma ^{2}_{c1}+\sigma ^{2}_{\epsilon ^c}}\le 1\) and \(0\le \phi =\frac{\lambda ^2 \{(b-a)^2 \sigma ^{2}_{e1}+\sigma ^{2}_{\epsilon ^e}\}}{\{(b-a)^2 \sigma ^{2}_{c1}+\sigma ^{2}_{\epsilon ^c}\}}<\infty \). Here \(\theta _{e}\) and \(\theta _{c}\) play the same role as the ICCs in the CRT model as discussed in the previous part on cluster randomized trials, and \(\phi \) is similar to the ratio of the variance for effects translated into costs (i.e., \(\lambda E_{ij}\)) to the variance for costs, except that the intercept variances \(\sigma ^{2}_{e0}\) and \(\sigma ^{2}_{c0}\) are replaced with the slope variances \(\sigma ^{2}_{e1}\) and \(\sigma ^{2}_{c1}\) multiplied with a scaling factor \((b-a)^2\) which depends on the coding of the treatment indicator \(x_{ij}\) in Eq. (4). Note that the quantities \((b-a)^2 \sigma ^{2}_{e1}\) and \((b-a)^2 \sigma ^{2}_{c1}\) do not depend on the treatment coding. For instance, changing from 0/1 coding to − 1/+ 1 coding makes \((b-a)^2\) four times larger, but \(\sigma ^{2}_{e1}\) four times smaller, since \(\beta ^{e}_{1}+u^{e}_{1j}\) is the effects difference between treatment and control in centre j in case of 0/1 treatment coding, but only half that difference in case of − 1/+ 1 coding. Because of the similarity with the ICCs and the variance ratio for cluster randomized trials (CRTs), in the remainder of this paper we will call \(\theta _{e}\) and \(\theta _{c}\) the quasi-ICCs for effects and costs respectively, and denote \(\phi \) as the quasi-variance ratio.

4.2 Sample size calculation

If there are m and n individuals randomly allocated to the treatment and control condition in each of the k centres, the following expression for the variance of the average INMB estimator has been derived based on ML estimation as covered by Verbeke and Molenberghs (2000), Manju et al. (2015)

$$\begin{aligned} var (\hat{\beta _{1}})=\frac{v^{\prime }_{c}}{k}\Big \{A^{\prime }+\Big (\frac{1}{m}+\frac{1}{n}\Big )B^{\prime }\Big \}, \end{aligned}$$
(5)

where \(A^{\prime }=\phi \theta _{e}+\theta _{c}-2\rho _{u1}\sqrt{\theta _{e}\theta _{c}\phi }\), \(B^{\prime }=\phi \left( 1-\theta _{e}\right) +\left( 1-\theta _{c}\right) -2\rho _{\epsilon }\sqrt{\left( 1-\theta _{e}\right) \left( 1-\theta _{c}\right) \phi }\) and \(v^{\prime }_{c}=\sigma ^2_{c1}+(b-a)^{-2}\sigma ^2_{\epsilon ^c}\). Note that, the random slope and random error variances for NMB as specified in the previous section are \(\sigma ^{2}_{1}=var (\lambda u^{e}_{1j}-u^{c}_{1j})=A^{\prime }v^{\prime }_{c}\) and \(\sigma ^{2}=var (\lambda \epsilon ^{e}_{ij}-\epsilon ^{c}_{ij})=(b-a)^{2}B^{\prime }v^{\prime }_{c}\) respectively. Further, \(v^{\prime }_{c}=\sigma ^2_{c1}+(b-a)^{-2}\sigma ^2_{\epsilon ^c}\) is analogous to the total cost variance \(v_{c}\) in Eq. (2), except that intercept variance is replaced with slope variance.

To find the optimal design, we need a function that relates sample size to costs. Assume that the cost per included centre is \(c>0\), whereas the costs per included individual in the treatment arm is \(s_{t}>0\) and in the control arm is \(s_{c}>0\). The budget C needed for including k centres of \(m+n\) individuals each is then (Liu 2003; Manju et al. 2015):

$$\begin{aligned} C=k(c+ms_{t}+ns_{c}). \end{aligned}$$
(6)

Similar to what has been discussed for CRTs, the optimal design as well as MMD for the multicentre cost-effectiveness trials have been derived by Manju et al. (2015) and the equations for these designs (optimal and MMD) are given in Appendix B. Analogously to what was seen in the section on cluster randomized trials, these equations for multicenter trials can be used to maximize the power for a given budget, or to minimize the budget needed for a given power, or to evaluate the power of a given design. And again, the effect size upon which the power depends is expressed in a shape close to the classical Cohen’s d,(Cohen 1992) (see Appendix B for details).

5 SamP2Cet program

The computer program SamP2CeT (Sample size and Power calculation for 2-level Cost-effectiveness Trials) is an easy to use tool to design cost-effectiveness trials. SamP2CeT runs within the R(\(\ge \) 3.2.4) system for statistical computing (R Development Core Team 2015). The reason for choosing R is that R is a free software environment for statistical computing. Also, R is a popular software program within the field of medical science for the design and analysis of clinical trials (Bornkamp et al. 2009; Peace and Chen 2010). The present computer program SamP2CeT asks the user to specify various design characteristics as well as input parameters through a Graphical User Interface (GUI). The GUI uses the R-Tcl/Tk(Welch 1995) interface implemented in the R package: tcltk (Dalgaard 2001).

This section describes the GUI of the program which calculates the optimal (or maximin) number of clusters and number of individuals per cluster for a cost-effectiveness CRT or a multicentre cost-effectiveness trial. The program also calculates the minimum budget for a desired level of power, the maximum power for a given budget, or the power for a user-specified design. The program is available as supplementary material from the journal webpage. The program SamP2CeT was developed and tested on a Windows 7 operating system.

To start the program the user types library(SamP2CeT) and then SamP2CeT() in the command window of R and presses the enter key. The directory containing SamP2CeT should be the current directory on the R desktop. The main window of the SamP2CeT will appear as presented in Fig. 1. The main menu requires the user to make three different choices (see Fig. 1): (a) between a cluster randomized and a multicentre trial, and (b) between minimizing the study costs, or maximizing the study power, or computing the power of some user-specified design, and (c) between specific values and ranges for the outcome variances and correlations. Pushing the button Next displays one of 12 different windows (submenus), depending on the user’s choices concerning level of randomization, purpose and knowledge of parameters. The input parameters for these 12 different submenus are discussed in the SamP2CeT user’s manual, which is included as supplementary material.

6 Illustrations

The program SamP2CeT will be illustrated for a cost-effectiveness CRT known as the SPHERE study and for a multicentre cost-effectiveness trial known as the EVALUATE trial. The user can calculate the sample sizes for both trials either for a fixed power or for a fixed budget. Because of the similarity of the input and result windows for fixed power and fixed budget, we will only consider a fixed power in the following illustrations, but we will consider the case of a fixed budget in Sect. 7. In addition to calculating the sample sizes, the user can evaluate the power of the actual design of these trials. For the sake of illustration, we assume imprecise knowledge of variances and correlations for the SPHERE study, leading to a maximin design and precise knowledge for the EVALUATE trial, leading to an optimal design.

Fig. 1
figure 1

Main menu of the SamP2CeT program

6.1 Example of a cost-effectiveness CRT: the SPHERE study

We illustrate the program SamP2CeT on the SPHERE study which assessed the cost-effectiveness of a secondary prevention strategy for patients with coronary heart disease in general practice (Ng et al. 2016). General practices were randomised to control (usual care) or treatment (tailored care). The data were collected from 48 general practices (24 treatment clusters and 24 control clusters) with, on the average, 18 and 19 patients in each of the treatment and control clusters respectively. The main endpoints, costs and effects, were expressed in terms of Euros (€) and quality-adjusted life-years (QALYs), respectively.

Fig. 2
figure 2

Entries in SamP2CeT submenu to calculate the MMD for the SPHERE study when the power is fixed

Fig. 3
figure 3

Result window of SamP2CeT appearing for the MMD for the SPHERE study when the power is fixed

6.1.1 Maximin sample sizes

It is assumed that the user wants to calculate the MMD for a cost-effectiveness CRT with the same research question as the SPHERE study, and that the power level is fixed (so that the optimization minimizes the study costs). In that case, the user needs to specify the upper bounds for the intraclass correlation (ICC) for effects (\(\rho _{e}\)) and the intraclass correlation (ICC) for costs (\(\rho _{c}\)). The ICC upper bounds are based on Gomes et al. (2012)), and the cluster and subject-specific costs were chosen by us for the sake of illustration lacking any information on the actual costs of the SPHERE trial. All input values are shown in Fig. 2, which yield the maximin design as shown in Fig. 3. Note that the small number of subjects per cluster is due to the large upper bound for the ICCs (see appendix A for details).

6.1.2 Power calculation for a user-specified design

Fig. 4
figure 4

Entries in SamP2CeT submenu to calculate power for the actual design of the SPHERE study when the information on correlation and variance parameters is not precise

Suppose that the user wants to check the power of the SPHERE study. Choosing in Fig. 1 for “Evaluate Power for a User-specified Design”, and then filling in the parameter values and actual design as in Fig. 4 gives a power of 0.83 or 83%. Note that, for a power that is slightly larger than 80% this actual design requires a budget of 156000 euro, more than twice as much as the MMD in Sect. 6.1.1.

6.2 Example of a multicentre cost-effectiveness trial: the EVALUATE trial

This section illustrates the program SamP2CeT for a multicentre cost-effectiveness trial known as the EVALUATE trial which compared laparoscopic-assisted hysterectomy (treatment) and standard hysterectomy (control) (Sculpher et al. 2004; Manca et al. 2005). The data were collected from 25 centres with, on the average, 23 treated and 11 control patients per centre.

Fig. 5
figure 5

Entries in SamP2CeT submenu to calculate the optimal design for the EVALUATE trial when the power is fixed

6.2.1 Optimal sample sizes

Suppose that the user wants to calculate the optimal design for the EVALUATE trial for a fixed power level. For this illustration, we use as parameter values the estimates from the EVALUATE trial as published in Manca et al. (2005). These input values are displayed in Fig. 5. Furthermore, costs per centre and costs per individual in each treatment arm are assumed by us in the absence of empirical information on the planning costs from the EVALUATE trial. Figure 6, shows the optimal design for the EVALUATE trial for a given power level. To run this trial at a power level of 80% a budget of €209000 is required.

Fig. 6
figure 6

Result window of SamP2CeT appearing for the optimal design for the EVALUATE trial when the power is fixed

Fig. 7
figure 7

Entries in SamP2CeT submenu to calculate power for the actual design of the EVALUATE trial when the information on correlation and variance parameters is precise

6.2.2 Power calculation for a user-specified design

To evaluate the power for the actual design (\(k=25\), \(m=23\) and \(n=11\)) of the EVALUATE trial the user again needs to supply the information on correlation and variance parameters (\(\theta _{e}\), \(\theta _{c}\), \(\rho _{u1}\), \(\rho _{\epsilon }\) and \(\phi \)) which is displayed in Fig. 7. The resulting power of the actual design is close to 1.0, which is very high as the actual design of the EVALUATE trial consists of a large number of centres and the price of this high power is of course a high study budget, as the actual design requires \(C=\) €812500 when assuming the same costs as in the previous section.

7 Benefits of using optimal and maximin sample sizes

The benefits of optimal and maximin designs can be assessed by considering the percentage budget gain (for fixed power) or the percentage power gain (for fixed budget) of these designs compared with the actual design of a study. This comparison is made for the SPHERE study and the EVALUATE trial in Table 1, using the cost ratios in Manju et al. (2014), Manju et al. (2015), and the parameter values in Ng et al. (2016), and Sculpher et al. (2004) and Manca et al. (2005) for the optimal designs, and the parameter boundaries in Gomes et al. (2012) and Manju et al. (2015) for the maximin designs. The parameters values are \(\rho _{e}=0.001\), \(\rho _{c}=0.007\), \(\rho _{u0}=-0.18\), \(\rho _{\epsilon }=-0.04\), \(\varphi =0.232\) for the SPHERE study, and \(\theta _{e}=0.26\), \(\theta _{c}=0.05\), \(\rho _{u1}=0.66\), \(\rho _{\epsilon }=-0.083\), \(\phi =0.60\) for the EVALUATE trial, and the chosen parameter boundaries are the ICC upper bounds for effects \(=0.30\) and for costs \(=0.30\) for the SPHERE study and the quasi-ICC upper bounds for effects \(=0.30\) and for costs \(=0.30\) for the EVALUATE trial. In the power comparison of the designs we took an ES of 0.20 which implies a power of 0.79 and 0.21 for the actual design of the SPHERE study and of 0.59 and 0.25 for the actual design of the EVALUATE study for the parameter values as obtained in these studies and as assumed for the maximin scenarios respectively. In contrast, the ES does not play a role in the budget comparison between actual, optimal and maximin designs. This is because the ratio of the budgets needed for any two designs to have the same power for a given ES only depends on the relative efficiency of these two designs, that is, on the ratio of their \(var (\hat{\beta }_{1})\). The latter in turn does not depend on the ES. Further, for given cost ratios, the actual design defines the available budget, that is: from the assumed costs and the actual design (nr of clusters, nr of individuals per cluster) the budget needed for the actual design was computed. The optimal and maximin designs are then calculated for that budget and the same cost ratios. The percentage power gains of these optimal and maximin designs are shown in the last two columns for each set of cost ratios. The budget is thus the same within each row (and thus allows for comparison of powers within a row), but may differ from one row to the other.

Table 1 shows that using optimal and maximin designs can substantially reduce the total study costs and increase the power of the study as compared with the actual design, depending on the cost ratios. Table 1 shows that in some cases the MMD reduces the budget and increases the power more than does the optimal design. This happens because the optimal design and the MMD were computed based on two different sets of parameters, making these two designs not directly comparable with each other in Table 1, but only with the actual study designs of the SPHERE and EVALUATE trials. Actually, the maximin design is the optimal design for those parameter values within the chosen range of parameter values that give the largest (worst case) sampling variance of the treatment effect (for details, see Manju et al. (2014), Manju et al. (2015)). In the present examples, these worst case parameter values are \(\rho _{e}=0.3\) and \(\rho _{c}=0.30\) for the SPHERE study, and \(\theta _{e}=0.3\) and \(\theta _{c}=0.30\) for the EVALUATE trial. Choosing these values for the optimal design would have given the same design as the maximin design.

Table 1 Budget and power gain \((\%)\) of optimal and maximin versus actual samples

8 Discussion

In this paper, we have briefly reviewed recent methodological developments on sample size and power calculation for both cost-effectiveness CRTs and multicentre cost-effectiveness trials, and we have introduced a software implementation with a graphical user interface SamP2CeT. The sample sizes concern the numbers of clusters, and subjects per cluster, in each treatment arm for a cost-effectiveness CRT, or the numbers of centres and subjects per treatment arm per centre for a multicentre cost-effectiveness trial. This paper describes the use of the program SamP2CeT to compute the smallest budget needed for a user-specified power level, or the largest power attainable with a user-specified budget, as well as the corresponding sample sizes. In addition, the program allows for power calculation for a user-specified design. Calculating either the sample sizes, or the power for a user-specified design, requires prior knowledge on correlation and variance parameters. Therefore, the computer program allows the user to either specify values for these parameters, or plausible ranges for them, depending on the precision of the user’s knowledge on these parameters.

Following Manju et al. (2014), Manju et al. (2015) this paper and software use maximin design to handle uncertainty about those parameter values on which the optimal design depends. Two other approaches in the literature are the Bayesian and the sequential approach. The Bayesian approach (Spiegelhalter et al. 2007) maximizes the expected instead of minimum efficiency, which is a more optimistic but also less safe approach than maximin design. Apart from this, Bayesian design has two drawbacks that need consideration before preferring it to maximin design. First, maximizing expected efficiency, or maximizing expected power, or minimizing expected sampling variance, give different designs because efficiency, power and sampling variance are nonlinearly related. In contrast, maximin design gives the same design for all three criteria because they are monotonically related. Secondly, Bayesian design depends on the chosen prior for the parameters on which the design depends, whereas maximin design only depends on the range for those parameters. The sequential procedure handles parameter uncertainty by updating the design based on parameter estimates at interim analyses (Wu 1985). This second approach could be combined with maximin design, for instance by choosing the maximin design as initial design, which can then be updated at interim.