Implementing Optimal Designs for Dose–Response Studies Through Adaptive Randomization for a Small Population Group

Ryeznik, Yevgen; Sverdlov, Oleksandr; Hooker, Andrew C.

doi:10.1208/s12248-018-0242-5

Implementing Optimal Designs for Dose–Response Studies Through Adaptive Randomization for a Small Population Group

Research Article
Open access
Published: 19 July 2018

Volume 20, article number 85, (2018)
Cite this article

Download PDF

You have full access to this open access article

The AAPS Journal Aims and scope Submit manuscript

Implementing Optimal Designs for Dose–Response Studies Through Adaptive Randomization for a Small Population Group

Download PDF

Yevgen Ryeznik^1,2,
Oleksandr Sverdlov³ &
Andrew C. Hooker²

2587 Accesses
1 Citation
Explore all metrics

Abstract

In dose–response studies with censored time-to-event outcomes, D-optimal designs depend on the true model and the amount of censored data. In practice, such designs can be implemented adaptively, by performing dose assignments according to updated knowledge of the dose–response curve at interim analysis. It is also essential that treatment allocation involves randomization—to mitigate various experimental biases and enable valid statistical inference at the end of the trial. In this work, we perform a comparison of several adaptive randomization procedures that can be used for implementing D-optimal designs for dose–response studies with time-to-event outcomes with small to moderate sample sizes. We consider single-stage, two-stage, and multi-stage adaptive designs. We also explore robustness of the designs to experimental (chronological and selection) biases. Simulation studies provide evidence that both the choice of an allocation design and a randomization procedure to implement the target allocation impact the quality of dose–response estimation, especially for small samples. For best performance, a multi-stage adaptive design with small cohort sizes should be implemented using a randomization procedure that closely attains the targeted D-optimal design at each stage. The results of the current work should help clinical investigators select an appropriate randomization procedure for their dose–response study.

Adaptive Optimal Designs for Dose-Finding Studies with Time-to-Event Outcomes

Article Open access 28 December 2017

Optimal Adaptive Phase III Design with Interim Sample Size and Dose Determination

Safety Concerns of the 3+3 Design: A Comparison to the mTPI Design

INTRODUCTION

Multi-arm clinical trials are increasingly used in modern clinical research. Some examples of multi-arm trials include phase II dose–response studies (1), drug combination studies (2), multi-arm multi-stage (MAMS) designs (3,4), and master protocols to study multiple therapies, multiple diseases, or both (5). A benefit of multi-arm trials is the ability to test many new promising treatments and address multiple research objectives within a single protocol, thereby potentially speeding up research and development processes compared to a sequence of single-arm or two-arm trials (6).

When designing a multi-arm trial, an important consideration is the choice of the allocation ratio, i.e., the target allocation proportions across the treatment arms. The choice of the allocation ratio usually stems from the study objectives. Many clinical trials are designed with an intent to have equal allocation to the treatment groups, which is consistent with a principle of “clinical equipoise” and frequently leads to maximum statistical power for treatment comparisons (e.g., if the primary outcome variance is constant across the groups) (7). On the other hand, unequal allocation designs have recently gained considerable attraction (8,9). For instance, unequal allocation designs may be preferred over equal allocation designs under the following circumstances: (i) in studies with nonlinear dose–response estimation objectives (10,11,12,13); (ii) when there is heterogeneity of the outcome variance across the treatment arms (14,15,16); (iii) when there is an ethical imperative to allocate greater proportion of study patients to superior treatment arms (17,18,19); (iv) when there is an unequal interest in certain treatment comparisons (20); and (v) when there is a differential treatment cost and an investigator wants to get most power for the given budget (21). Importantly, unequal allocation designs can involve non-integer (irrational) numbers. For example, in a $ \left(K>2\right) $-arm trial comparing $ \left(K-1\right) $ experimental treatments versus control (Dunnett’s procedure), the optimal allocation ratio minimizing the sum of variances of the $ \left(K-1\right) $ pairwise comparisons is given by $ {\sigma}_1\sqrt{K-1}:{\sigma}_2:\dots :{\sigma}_K $, where σ_i is the standard deviation of the outcome in the ith treatment group (7).

Once the target allocation ratio is chosen, a question is how to implement it in practice. It is well recognized that randomization is the hallmark of any well-conducted clinical trial (22). When properly implemented, randomization can promote selected study objectives while maintaining validity and integrity of the study results (23). There is a variety of randomization designs that can be applied in multi-arm trials with equal or unequal integer-valued allocation ratios. The most common one is the permuted block design (PBD) for which treatment assignments are made at random in blocks of a given size to achieve the desired allocation ratio C₁ : C₂ : … : C_K, where C_i’s are positive, not necessarily equal, integers with the greatest common divisor of 1. The PBD has been criticized by some authors as being too restrictive and susceptible to selection bias (24,25). Some alternatives to the PBD have been developed recently (26,27,28).

For multi-arm trials with unequal allocation involving non-integer (irrational) proportions, the choice of a randomization design is less straightforward, as highlighted in (29). The simplest approach is to use complete randomization (CR) for which treatment assignments are generated independently, according to a multinomial distribution with cell probabilities equal to the target allocation proportions. A major drawback with CR is that it can result with non-negligible probability in large departures from the desired allocation, especially in small trials. One useful alternative to CR is the mass weighted urn design (MWUD) which was shown to maintain a good tradeoff between treatment balance and allocation randomness (29). Other designs for irrational target allocations can be constructed by adopting the methodology of optimal response-adaptive randomization (30). Some promising designs for this purpose are the doubly adaptive biased coin design (31), the generalized drop-the-loser urn (32), and the optimal adaptive generalized Pólya urn (33), to name a few. However, all these designs rely on asymptotic results which may not hold in small to moderate sample sizes that are common in practice.

The present paper is motivated by our recent work (34) which investigated the structure of the D-optimal design for dose-finding experiments with time-to-event data. In particular, we found that for a quadratic dose–response model with Weibull outcomes that are subject to right censoring, the equal allocation (1:1:1) design can be highly inefficient when the amount of censoring is high. The D-optimal design is supported at 3 points, but the location of these points in the dose interval, as well as the optimal allocation proportions at these points, depend on the true model and the amount of censored data in the experiment. As such, the D-optimal allocation proportions are found through numerical optimization and they are generally quite different from the equal allocation. A two-stage adaptive design was proposed and it was found to be nearly as efficient as the true D-optimal design. The authors of (34) also mentioned that practical implementation of the adaptive D-optimal design requires a judicious choice of a randomization procedure. Given that, in practice, dose-finding studies are relatively small (due to budgetary and ethical constraints), it is imperative that the chosen randomization procedure can closely attain the desired optimal allocation for small and moderate samples while maintaining the randomized nature of the experiment. Our main conjecture in the present paper is that the choice of randomization for the D-optimal design does matter as far as statistical properties such as quality of dose–response curve estimation are concerned.

The remainder of this paper is organized as follows. In the “MATERIALS AND METHODS” section, we give a statistical background and overview of randomization designs that can be used to target multi-arm unequal allocation with possibly non-integer (irrational) proportions for trials with small and moderate sample sizes. In the “SIMULATION STUDY PLAN” section, we outline a strategy to investigate statistical properties of selected randomization procedures targeting the D-optimal design. The “RESULTS” section presents findings from our simulations, which includes a study of single-stage randomization procedures targeting locally D-optimal design, two-stage adaptive optimal designs, and multi-stage adaptive designs with early stopping rules. We also explore robustness of our proposed designs to experimental (chronological and selection) biases. The “DISCUSSION” section concludes with a summary of our main findings and outlines some important future work.

MATERIALS AND METHODS

D-Optimal Design

Following (34), we consider a second-order polynomial model for log-transformed event times:

$$ \log T={\beta}_0+{\beta}_1x+{\beta}_2{x}^2+ b\varepsilon, $$

(1)

where x is the dose level chosen from the interval $ \mathcal{X}=\left[0,1\right] $, ε is an error term following the standard extreme value distribution, b > 0 is a scale parameter that determines the Weibull hazard pattern, and (β₀, β₁, β₂) are the regression coefficients. For the model in Eq. (1), T~Weibull distribution with Median(T| x) = exp(β₀ + β₁x + β₂x²)log^b(2), and the hazard function of T (conditional on x) is h(t| x) = b⁻¹ exp(−(β₀ + β₁x + β₂x²)/b)t^{1/b − 1}. For a given x, the hazard is monotone increasing if 0 < b < 1; it is constant (exponential distribution) if b = 1; and it is decreasing if b > 1. Furthermore, we assume that each subject in the study has a fixed follow-up time τ > 0, and T is right-censored by τ such that the observed time is t = min(T, τ). Let δ = 1{T ≤ τ} denote the event indicator. For a study of size n, the data structure is $ {\mathcal{F}}_n=\left\{\left({t}_i,{\delta}_i,{x}_i\right),i=1,\dots, n\right\} $.

The study objective is to estimate the vector of model parameters θ = (β₀, β₁, β₂, b) as precisely as possible. For this purpose, we consider designs of the form ξ = {(x₁, ρ₁); (x₂, ρ₂); (x₃, ρ₃)}, where x_i’s are distinct dose levels in $ \mathcal{X}=\left[0,1\right] $ and ρ_k’s are allocation proportions at these doses (0 < ρ_k < 1 and $ {\sum}_{k=1}^3{\rho}_k=1 $). The design’s Fisher information matrix is a weighted sum $ \mathbf{M}\left(\xi, \boldsymbol{\theta} \right)={\sum}_{k=1}^3{\rho}_k{\mathbf{M}}_{x_k}\left(\xi, \boldsymbol{\theta} \right) $, where $ {\mathbf{M}}_{x_k}\left(\xi, \boldsymbol{\theta} \right) $ is the Fisher information matrix for a single observation at dose x_k (it is a 4 × 4 matrix whose expression is given in Eq. (6) in (34)). The locally D-optimal design ξ^∗ minimizes − log ∣ M(ξ, θ)∣, which leads to the smallest volume of the confidence ellipsoid for θ. In (34), it was found that if there is no censored data in the experiment, then ξ^∗ is the uniform (equal allocation) design, supported at dose levels 0, 1/2, and 1. However, in the presence of censoring, the structure of ξ^∗ is more complex—both optimal dose levels and the allocation proportions depend on the true model and the amount of censoring, i.e., ξ^∗ = {(x_k(θ), ρ_k(θ)), k = 1, 2, 3}, and ξ^∗ must be found numerically, using, for example, a first-order (exchange) algorithm (35). Since in practice θ is unknown, one can construct a two-stage adaptive D-optimal design as follows. At stage 1, a cohort of n⁽¹⁾ subjects is allocated to doses according to the uniform design ξ⁽¹⁾ = {(0, 1/3); (0.5, 1/3); (1, 1/3)}. Based on observed data $ {\mathcal{F}}_{n^{(1)}}=\left\{\left({t}_i,{\delta}_i,{x}_i\right),i=1,\dots, {n}^{(1)}\right\} $, compute $ {\widehat{\boldsymbol{\theta}}}_{MLE}^{(1)} $, the maximum likelihood estimate (MLE) of θ, and approximate ξ^∗ by $ {\overset{\sim }{\xi}}^{\ast }=\left\{\left({x}_k\left({\widehat{\boldsymbol{\theta}}}_{MLE}^{(1)}\right),{\rho}_k\left({\widehat{\boldsymbol{\theta}}}_{MLE}^{(1)}\right)\right),k=1,2,3\right\} $. At stage 2, additional n⁽²⁾ subjects are allocated to doses according to $ {\overset{\sim }{\xi}}^{\ast } $. The final analysis is based on data from the pooled sample of n = n⁽¹⁾ + n⁽²⁾ subjects. In (34), it was shown that such a two-stage design provides a very good approximation to, and it is nearly as efficient as, the true D-optimal design without the need for prior knowledge of the model parameters before the start of the trial.

An important open question is how to allocate subjects to doses for both stage 1 and stage 2 of these adaptive designs. The cohort sizes n⁽¹⁾ and n⁽²⁾ can be small in practice, and an experimenter must ensure that actual allocation numbers are as close as possible (ideally are matching) the targeted ones. At the same time, the allocation must involve a random element to minimize the potential for selection bias (36). Thus, balance and randomization are two competing requirements. There are many randomization procedures that can be used implementing D-optimal allocation (22). In the next section, we describe a selection of procedures that are relevant to our study.

Randomization Procedures for Implementing D-Optimal Design

To fix ideas, we start with an “idealized” setting when both the true model (θ) and the amount of censored data are known, and therefore the D-optimal design ξ^∗ = {(x_k(θ), ρ_k(θ)), k = 1, 2, 3} is available to the experimenter. We shall also use notations d₁, d₂, and d₃ to indicate the optimal dose levels x₁(θ), x₂(θ), and x₃(θ), respectively. For dose d_k (k = 1, 2, 3), the optimal proportion is $ {\rho}_k^{\ast }={\rho}_k\left(\boldsymbol{\theta} \right) $ (possibly an irrational number), with the obvious constraint of $ {\rho}_1^{\ast }+{\rho}_2^{\ast }+{\rho}_3^{\ast }=1 $.

Assume that the total sample size n is fixed and pre-determined. Let N_k(j) denote the sample size for dose d_k after j subjects (1 ≤ j ≤ n) have been randomized into the study. The vector N(j) = (N₁(j), N₂(j), N₃(j)) is random with N₁(j) + N₂(j) + N₃(j) = j, and the distribution of N(j) for j = 1, …, n is determined by a randomization procedure used in the study. A restricted randomization procedure can be formally defined by specifying conditional randomization probabilities for allocating the jth subject to doses d₁, d₂, and d₃ as follows:

$$ {\displaystyle \begin{array}{l}\kern1.8em {P}_k(j)=\Pr \left(\mathrm{Subject}\ j\ \mathrm{is}\ \mathrm{assigned}\ \mathrm{to}\ \mathrm{dose}\ {d}_k\right)=\Pr \left({d}_k|\boldsymbol{N}\left(j-1\right)\right),j=2,\dots, n\\ {}\mathrm{and}\kern0.5em {P}_k(1)={\rho}_k^{\ast },k=1,2,3.\end{array}} $$

(2)

In other words, for any restricted randomization procedure, the randomization probability for the next eligible subject depends on the current numbers of the dose assignments in the trial.

We shall study the following randomization procedures:

Completely randomized design (CRD): Every subject is randomized to the dose groups with probabilities equal to the D-optimal allocation, i.e., $ {P}_k(j)={\rho}_k^{\ast } $, j = 1, …, n, k = 1, 2, 3. The CRD is very simple to implement and it provides the highest degree of randomness, but for small samples, it can lead to deviations from the desired allocation with non-negligible probability (22).
Permuted block design (PBD): To implement allocation ($ {\rho}_1^{\ast },{\rho}_2^{\ast },{\rho}_3^{\ast } $) for a cohort of size n, the desired split of sample size n among the doses is $ n{\rho}_1^{\ast }:n{\rho}_2^{\ast }:n{\rho}_3^{\ast } $, which, after rounding to the integer values, is, say, C₁ : C₂ : C₃. Without loss of generality, we can assume that C_k’s are positive integers with the greatest common divisor of 1 and C₁ + C₂ + C₃ = n. For the PBD, the conditional randomization probabilities are:

$$ {P}_k(j)=\frac{C_k-{N}_k\left(j-1\right)}{C_1+{C}_2+{C}_3-\left(j-1\right)},j=2,\dots, n\ \mathrm{and}\ {P}_k(1)=\frac{C_k}{C_1+{C}_2+{C}_3},k=1,2,3. $$

(3)

Note that N_k(j − 1) in Eq. (3) can take values 0, 1, …, C _k for j = 2, …, n.

Doubly adaptive biased coin design (DBCD): Initial dose assignments (j = 1, …, m₀) are made using PBD with a block size that is a multiple of 3, e.g., m₀ = 3, 6, or 9. Subsequently, the (j + 1)^st subject (j = m₀, …, n−1) is randomized to dose d_k with probability

$$ {P}_k\left(j+1\right)=\frac{\rho_k^{\ast }{\left({\rho}_k^{\ast }/\frac{N_k(j)}{j}\right)}^{\gamma }}{\sum_{l=1}^3{\rho}_l^{\ast }{\left({\rho}_l^{\ast }/\frac{N_l(j)}{j}\right)}^{\gamma }},k=1,2,3, $$

(4)

where γ ≥ 0 is a user-defined parameter controlling the degree of randomness of the procedure (γ = 0 is most random and γ → ∞ is an almost deterministic procedure). The DBCD has established asymptotic properties (31). For practical purposes, γ = 2 is recommended (37).

Generalized drop-the-loser urn design (GDLUD): The GDLUD (32) utilizes an urn containing balls of four types: type 0 is the immigration ball, and types 1, 2, and 3 represent “dose” balls. Dose assignments for eligible subjects are made sequentially by drawing a ball at random from the urn. Let $ {Z}_0=\left(1,{\rho}_1^{\ast },{\rho}_2^{\ast },{\rho}_3^{\ast}\right) $ denote the initial urn composition (one immigration ball and $ {\rho}_1^{\ast }+{\rho}_2^{\ast }+{\rho}_3^{\ast } $ “dose” balls). The urn composition is changed adaptively during the course of the trial. Let Z_j − 1 = (Z_{j − 1,0}, Z_{j − 1,1}, Z_{j − 1,2}, Z_{j − 1,3}) denote the urn composition after j − 1 steps (numbers Z_{j − 1, i},i = 1, 2, 3 can be negative and/or irrational). Let $ {Z}_{j-1,k}^{+}=\max \left(0,{Z}_{j-1,k}\right) $ and k = 0, 1, 2, 3. At the jth step, the probability of selecting a ball of type k is $ {Z}_{j-1,k}^{+}/{\sum}_{i=0}^3{Z}_{j-1,i}^{+} $, k = 0, 1, 2, 3. If selected ball is type 0 (immigration), no dose is assigned and the ball is replaced into the urn together with additional $ C{\rho}_1^{\ast }+C{\rho}_2^{\ast }+C{\rho}_3^{\ast } $ “dose” balls (C is some positive constant). Therefore, the urn composition becomes Z_{j, 0} = Z_{j − 1, 0} and $ {Z}_{j,i}={Z}_{j-1,i}+C{\rho}_i^{\ast } $, i = 1, 2, 3. If selected ball is type ℓ (ℓ = 1, 2, 3), then it is not replaced, the eligible subject is assigned to the corresponding dose level, and the urn composition becomes Z_j,ℓ = Z_{j − 1,ℓ} − 1 and Z_{j, i} = Z_{j − 1,i} for i ≠ ℓ. The described procedure is repeated until a pre-specified number of subjects (n) is randomized in the study. The GDLUD has established asymptotic properties (32): the allocation proportions are strongly consistent for the target proportions and follow an asymptotically normal distribution with known variance structure.
Mass Weighted Urn Design (MWUD): The MWUD (29) uses an urn containing three “dose” balls. Initially, each ball has mass proportional to the target allocation: $ {m}_{0,i}=\alpha {\rho}_i^{\ast } $, i = 1, 2, 3 (the parameter α is a positive integer controlling maximum tolerated imbalance). The mass of the balls is changing adaptively, according to the history of dose assignments. Among the balls with positive mass, a ball is drawn with probability proportional to its mass, and the corresponding dose is assigned to the next eligible subject. One unit mass is taken from the selected ball and redistributed among three balls in the ratio $ {\rho}_1^{\ast }:{\rho}_2^{\ast }:{\rho}_3^{\ast } $, after which the ball is returned into the urn. Therefore, after (j − 1) assignments, the probability mass for the ith dose group is $ {m}_{j-1,i}=\alpha {\rho}_i^{\ast }-{N}_i\left(j-1\right)+\left(j-1\right){\rho}_i^{\ast } $ and the total mass of the three balls in the urn at each step is $ {\sum}_{i=1}^3{m}_{j-1,i}\equiv \alpha $. These steps are repeated until the pre-specified number of subjects is enrolled in the study. The MWUD has a simple explicit formula for conditional randomization probability:

$$ {P}_k(j)=\frac{\max \left\{{\alpha \rho}_k^{\ast }-{N}_k\left(j-1\right)+\left(j-1\right){\rho}_k^{\ast },0\right\}}{\sum_{l=1}^3\max \left\{{\alpha \rho}_l^{\ast }-{N}_l\left(j-1\right)+\left(j-1\right){\rho}_l^{\ast },0\right\}},\kern0.5em k=1,2,3. $$

(5)

It was proved in (29) that, for the MWUD, maximum imbalance (defined as Euclidean distance from the vector of current allocation proportions to the target allocation proportions) at each allocation step is controlled by the value of α.

Maximum entropy constrained balance randomization (MaxEnt): The MaxEnt procedure is an extension of Efron’s biased coin design (38) to a multi-arm setting with unequal allocation (39,40). Dose assignments for eligible subjects are made sequentially. Consider a point in the trial when j − 1 subjects have been randomized into the study, with N_i(j − 1) subjects assigned to dose d_i, i = 1, 2, 3. The randomization rule for the jth subject is as follows: Compute B₁, B₂, B₃, the hypothetical treatment imbalances which would result from assigning the jth subject to doses d₁, d₂, d₃:

$$ {B}_k=\sqrt{\sum \limits_{i=1}^3{\left({N}_{ik}(j)-j{\rho}_k^{\ast}\right)}^2},\mathrm{where}\kern0.3em {N}_{ik}(j)=\Big\{{\displaystyle \begin{array}{cc}{N}_i\left(j-1\right)+1,& if\kern0.3em i=k;\\ {}{N}_i\left(j-1\right),& if\kern0.3em i\ne k.\end{array}}\operatorname{} $$

The vector of randomization probabilities P(j) = (P₁(j), P₂(j), P₃(j)) is obtained by maximizing entropy (minimizing Kullback-Leibler divergence between P(j) and the target allocation $ {\boldsymbol{\rho}}^{\ast}=\left({\rho}_1^{\ast },{\rho}_2^{\ast },{\rho}_3^{\ast}\right) $) subject to a contraint on expected imbalance. Mathematically, it is derived as a solution to the following constrained optimization problem:

$$ {\displaystyle \begin{array}{l}{\operatorname{maximize}}_{\boldsymbol{P}(j)}\left\{-{\sum}_{i=1}^3{P}_i(j)\log \left({P}_i(j)/{\rho}_i^{\ast}\right)\right\}\\ {}\mathrm{s}.\mathrm{t}.{\sum}_{i=1}^3{B}_i{P}_i(j)\le \eta {B}_{(1)}+\left(1-\eta \right){\sum}_{i=1}^3{B}_i{\rho}_i^{\ast}\\ {}\mathrm{and}\;{\sum}_{i=1}^3{P}_i(j)=1,0\le {P}_i(j)\le 1,i=1,2,3.\end{array}} $$

(6)

In Eq. (6), $ {B}_{(1)}=\underset{i}{\min }{B}_i $ and η is a user-defined parameter (0 ≤ η ≤ 1) that controls amount of randomness of the procedure (η = 0 is most random and η = 1 is almost deterministic procedure). The explicit solution to problem in Eq. (6) can be found in (40).

The six described randomization procedures can be used as building blocks for constructing adaptive randomization designs. For instance, a two-stage adaptive design with n⁽¹⁾ = n⁽²⁾ = 30 can be implemented as follows. The first 30 patients are randomized in equal proportions among the doses 0, 1/2, and 1 using PBD, in which case Eq. (3) becomes

$$ {P}_k(j)=\frac{10-{N}_k\left(j-1\right)}{30-\left(j-1\right)},j=2,\dots, 30\ \mathrm{and}\ {P}_k(1)=\frac{1}{3},k=1,2,3. $$

Based on their observed data, the D-optimal design is estimated as $ {\widehat{\xi}}^{\ast }=\left\{\left({\widehat{d}}_k,{\widehat{\rho}}_k^{\ast}\right),k=1,2,3\right\} $, and in stage 2, an additional 30 patients are randomized using CRD, namely, the jth patient (j = 31, …, 60) is randomized among the doses $ {\widehat{d}}_1 $, $ {\widehat{d}}_2 $, and $ {\widehat{d}}_3 $ with probabilities $ {P}_k(j)={\widehat{\rho}}_k^{\ast } $, k = 1, 2, 3. We will denote such a two-stage adaptive randomization design by PBD → CRD, emphasizing that PBD is used in stage 1 and CRD is used in stage 2.

Likewise, a multi-stage design PBD → CRD → CRD → ... means that the first cohort of patients is randomized into the study using PBD; the second cohort is randomized according to an updated D-optimal design using CRD; the third cohort is randomized according to an updated (using cumulative outcome data from first two cohorts) D-optimal design using CRD, etc.

Statistical Criteria for Comparison of Randomization Procedures

The primary objective of a dose–response study is to estimate the dose–response curve as precisely as possible. The D-optimal design fulfills this objective. For a realized design ξ_n = {(d_k, N_k(n)), k = 1, 2, 3}, where N_k(n) subjects have been randomized to dose d_k, one can compute D-efficiency relative to the true D-optimal design ξ^∗ as:

$$ \mathrm{D}-\mathrm{eff}(n)={\left\{\frac{\left|\mathbf{M}\left({\xi}_n,\boldsymbol{\theta} \right)\right|}{\left|\mathbf{M}\left({\xi}^{\ast },\boldsymbol{\theta} \right)\right|}\right\}}^{1/4} $$

(7)

For given values of n and θ, D-eff(n) is, in general, a random variable because ξ_n depends on N(n) = (N₁(n), N₂(n), N₃(n)) whose distribution is determined by a randomization procedure. We can take E(D-eff(n)) as a measure of estimation precision of a randomization procedure targeting D-optimal allocation. High values of E(D-eff(n)) are desirable.

Another measure of estimation accuracy is based on the mean squared error, which takes into account both bias and variance. For a trial of size n, for a given design, compute mean squared errors of the model parameters: $ \left({MSE}_{\beta_0}, MS{E}_{\beta_1}, MS{E}_{\beta_2}, MS{E}_b\right), $ where $ {MSE}_{\beta_0}={\left\{E\left({\widehat{\beta}}_0\right)-{\beta}_0\right\}}^2+ Var\left({\widehat{\beta}}_0\right) $, and other MSEs are defined similarly. When comparing Design 1 vs. Design 2, we take a ratio of MSE values: $ {R}_{\beta_0}=\frac{\mathrm{MS}{\mathrm{E}}_{\beta_0}\left(\mathrm{Design}\ 2\right)}{\mathrm{MS}{\mathrm{E}}_{\beta_0}\left(\mathrm{Design}\ 1\right)} $. A value of $ {R}_{\beta_0}=0.9 $ implies that Design 1 is 90% as efficient as Design 2 for estimating the parameter β₀. Likewise, compute ratios of MSE’s for three other parameters, $ {R}_{\beta_1}, $ $ {R}_{\beta_2}, $ and R_b, and take an average value as an overall measure of relative efficiency:

$$ \mathrm{RE}(n)=\frac{1}{4}\left({R}_{\beta_0}+{R}_{\beta_1}+{R}_{\beta_2}+{R}_b\right) $$

(8)

In addition to statistical estimation, we consider several other useful metrics. A measure of allocation accuracy of a randomization procedure is the closeness of the realized allocation to the true D-optimal allocation. For a design with n subjects, an imbalance (using Euclidean distance) is $ Imb(n)=\sqrt{\sum_{k=1}^3{\left({N}_k(n)-n{\rho}_k^{\ast}\right)}^2} $. Small values of Imb(n) are desirable; ideally Imb(n) = 0. Since N_k(n)’s are random variables, we take expected value, E(Imb(n)), and this is referred to as momentum of probability mass (MPM) (41).

To quantify lack of randomness of a randomization procedure, at the jth allocation step, we compute the distance between the conditional randomization probability vector P(j) and the D-optimal allocation vector ρ^∗ as $ d(j)=\sqrt{\sum_{k=1}^3{\left({P}_k(j)-{\rho}_k^{\ast}\right)}^2} $, j = 1, …, n. If $ {P}_k(j)={\rho}_k^{\ast } $ for k = 1, 2, 3, then the dose assignment for the jth subject is made completely at random. A cumulative measure of lack of randomness (forcing index, FI) (40,42) is defined as:

$$ FI(n)={n}^{-1}{\sum}_{j=1}^nd(j). $$

The smaller FI(n) is, the more random (and therefore, potentially less predictable) a randomization procedure is. FI(n) ≡ 0 corresponds to CRD, which is most random and provides no potential for selection bias in the study.

Finally, we consider variability of randomization procedures by examining the average standard deviation of the allocation proportions: $ ASD(n)=\sqrt{n{\sum}_{i=1}^3{\left\{ SD\left({N}_i(n)/n\right)\right\}}^2} $. It is expected that randomization procedures with low values of ASD(n) should be more concentrated around the target D-optimal allocation, and therefore they should lead to more efficient dose–response estimation.

SIMULATION STUDY PLAN

Throughout, we assume that responses follow the model in Eq. (1) with the following parameters: β₀ = 1.90, β₁ = 0.60, β₂ = 2.80, and b = 0.65. The average event probability is assumed to be 0.50. Under these assumptions, the D-optimal dose levels are d₁ = 0, d₂ = 0.269, and d₃ = 0.726; the optimal allocation proportions are $ {\rho}_1^{\ast }=0.407 $, $ {\rho}_2^{\ast }=0.336 $, and $ {\rho}_3^{\ast }=0.257 $. Figure 1 shows the true dose–response curve, the optimal dose levels, and the optimal allocation proportions at these doses.

Our simulation study consists of four major parts.

First, we evaluate various single-stage randomization procedures targeting the locally D-optimal design (assuming the true model is known) for small and moderate sample sizes. The design operating characteristics include measures of estimation precision, balance, and randomness, as described in the section “Statistical Criteria for Comparison of Randomization Procedures”.

Second, we implement a two-stage adaptive optimal design (34) using different combinations of randomization procedures in stages 1 and 2. For each of these strategies, the target allocation in stage 1 is (1/3, 1/3, 1/3), and the target allocation in stage 2 is derived from the estimated D-optimal design. In this setting, the D-efficiency of a two-stage design relative to the D-optimal design is computed as follows:

$$ \mathrm{D}-\mathrm{eff}(n)={\left\{\frac{\left|{n}^{(1)}\mathbf{M}\left({\xi}_{\mathrm{obs}}^{(1)},\boldsymbol{\theta} \right)+\left(n-{n}^{(1)}\right)\mathbf{M}\left({\xi}_{\mathrm{obs}}^{(2)},\boldsymbol{\theta} \right)\right|}{\left|n\mathbf{M}\left({\xi}^{\ast },\boldsymbol{\theta} \right)\right|}\right\}}^{1/4}, $$

(9)

where $ {\xi}_{\mathrm{obs}}^{(1)} $ is the stage 1 randomization design using n⁽¹⁾ subjects, $ {\xi}_{\mathrm{obs}}^{(2)} $ is the stage 2 randomization design using n − n⁽¹⁾ subjects, and ξ^∗ is the (theoretical) D-optimal design. The relative efficiency of a two-stage design, RE(n), is computed with respect to a single-stage locally D-optimal design implemented using MaxEnt(η = 1) procedure (which, by construction, leads to the most balanced allocation).

Third, we implement a multi-stage adaptive design with early stopping criteria (34) using different combinations of randomization procedures. In this setting, all designs aim at achieving the same pre-defined level of estimation precision, and the key operating characteristic is the sample size at study termination. Our conjecture is that there are randomization procedures that require a smaller sample size, given the stopping rule.

Fourth, we evaluate the robustness of different adaptive randomization strategies to two types of experimental bias: chronological bias and selection bias. Chronological bias can arise if patient outcomes over time are affected by unobserved time trends (43,44). Selection bias can occur when an investigator knows or is able to guess with high probability which treatment is to be assigned to an upcoming patient (36). The advance knowledge of the treatment assignment can motivate an investigator to selectively enroll a particular type of patients who are thought to benefit most from the given treatment thereby confounding the true treatment effect. The importance of assessing robustness of randomization designs to chronological and selection biases has been recently documented by Hilgers and coauthors in the Evaluation of Randomization procedures for Design Optimization (ERDO) template (45).

Note that our simulation plan here is by no means exhaustive. However, we supply an R code (available upon request from the first author) that can be used to reproduce all results in this paper and generate additional findings under user-defined experimental scenarios and other combinations of adaptive randomization strategies.

RESULTS

Targeting Locally D-Optimal Design

We consider seven randomization designs targeting D-optimal allocation (0.407, 0.336, 0.257). These designs are as follows: (I) CRD, (II) DBCD (γ = 2), (III) GDLUD (C = 10), (IV) MWUD (α = 10), (V) MaxEnt(η = 0.5), (VI) MaxEnt(η = 1), and (VII) PBD. We also consider a uniform allocation design which randomizes study subjects among the dose levels 0, 0.5, and 1 in equal proportions by means of PBD, i.e., (VIII) Uniform PBD.

Table I summarizes the performance of the eight randomization designs, evaluated for four values of the sample size: n = 15; 30; 45; 60. In regard to balance and randomness, the most variable and least restrictive design is CRD: it has highest values of MPM(n) and ASD(n) and lowest possible forcing index (FI(n) = 0). By construction, the most balanced designs are MaxEnt(η = 1), PBD, and Uniform PBD: they have constant values of MPM(n)—0.50, 1.14, and 0.54, respectively—regardless of n. MaxEnt(η = 1) is most restrictive: it has FI(n) = 0.66, which is highest among all designs. In comparison, PBD (which also attains a balanced allocation for the given sample size) has FI(n) = 0.11. The difference between these two designs is that MaxEnt(η = 1) forces balance at each allocation step, whereas for PBD, the allocation ratio may deviate from the target at intermediate steps. Other designs provide a tradeoff between balance and randomness and have values of MPM(n), ASD(n), and FI(n) in between the two extremes (CRD and MaxEnt(η = 1)). Clearly, balance (low value of MPM(n)) is achieved at the cost of randomness (high value of FI(n)).

Table I Operating Characteristics of Eight Randomization Designs for a Single-Stage Trial with Locally D-optimal Design

Full size table

Figure 2 shows box-plots of simulated distributions of D-eff(n) for eight randomization designs, for sample sizes n = 15, 30, 45, and 60. One can see that the CRD has the largest spread of D-eff(n), especially for small sample sizes (n = 15 and 30). As n increases, CRD becomes more efficient: for n = 45 and n = 60, the minimum value of D-eff(n) is >0.75 and the median value of D-eff(n) is ~ 0.99. At the same time, the most balanced designs MaxEnt(η = 1) and PBD have the highest values of D-eff(n) ~ 1.0, regardless of n. Other designs targeting D-optimal allocation perform reasonably well—their distributions of D-eff(n) become less spread and closer to 1.0 with the increase in n. Finally, Uniform PBD has constant D-eff(n)=0.74 for all values of n, which is, as expected, much lower than the other designs given the non-optimal doses of the uniform design.

Figure 3 shows measures of estimation accuracy (Bias², Variance, and MSE) for the four model parameters (β₀, β₁, β₂, b) for eight randomization designs and sample sizes n = 15; 30; 45; 60. The seven designs targeting D-optimal allocation exhibit improvement (lower Bias², Variance, MSE) with the increase in n. More variable designs (e.g., CRD) have somewhat higher Bias², Variance, and MSE than less variable designs (e.g., PBD). The Uniform PBD (purple curve) stands out in the plots: its overall performance, as assessed by MSE, is substantially worse compared to the seven designs targeting D-optimal allocation. From Table I, we observe that Uniform PBD performs well relative to the D-optimal design only for small sample size: when n = 15, its RE(n) = 1.03; however, for n = 30, 45, 60 its RE(n) values drop to 0.82, 0.67, and 0.38, respectively.

From these results, we can make an important intermediate observation. In an “idealized” setting of a known nonlinear dose–response model with censored time-to-event data, the quality of estimation depends on both the choice of allocation design and the randomization procedure to implement the target allocation. The D-optimal allocation implemented by a randomization procedure with low variability (e.g., MaxEnt(η = 1) or PBD) results in most accurate estimation of the dose–response relationship, especially when sample size is small, e.g., n = 15 (cf. Fig. 2). Using a less restrictive (more random) randomization procedure can result in some deterioration of statistical estimation in small samples; however, the quality of estimation is improved with larger sample sizes. For instance, when the “most random” CRD procedure is applied to target the D-optimal allocation, the average (across 10,000 simulation runs) D-efficiency values are 0.93, 0.97, 0.98, and 0.99, respectively, for sample sizes n = 15, 30, 45,and 60 (cf. Table I). Using a non-optimal allocation (e.g., Uniform design), even with most restrictive and most balanced randomization procedure leads to inferior performance which does not improve with the increase in sample size. In our example, the average D-efficiency of Uniform PBD was 0.74 for n = 15, 30, 45,and 60 (cf. Table I).

Of course, our observations here are obtained based on data generated from a selected model in Eq. (1), under one experimental scenario (visualized in Fig. 1) and four choices of the sample size, with n = 15 being the smallest one. Additional simulations under other experimental scenarios and using smaller values of n (e.g., n = 9 or 12) can be performed to investigate loss in efficiency due to imbalance induced by randomization in very small samples. We defer this task to the future work.

Overall, our findings from the considered example are in line with the template of Hu and Rosenberger (46) which suggests that for randomized comparative trials the performance of a randomization design is determined by an interplay between optimality (power) of a fixed allocation design, speed of convergence of a randomization procedure to the desired allocation, and variability of the allocation proportions. In our setting, we deal with estimation, not hypothesis testing; yet, we arrive at a similar conclusion: both D-optimality and variability of a randomization procedure (and, of course, the study size) determine the design performance.

Two-Stage Adaptive Optimal Design

To appreciate the impact of randomization on performance of a two-stage adaptive optimal design, we compare five adaptive design strategies using different combinations of randomization procedures (CRD, MaxEnt(η = 1), and PBD) at stages 1 and 2. We also include a non-adaptive Uniform PBD as a reference procedure. We use fixed total sample size n = 60 and investigate three choices of the first-stage cohort size: n⁽¹⁾ = 15, 30, and 45. For each design strategy, the main concern is dose–response estimation quality, as assessed by D-eff(n) and RE(n).

Table II summarizes the performance of various two-stage adaptive design strategies. The two-stage design which uses CRD at stage 1 results in 1–2% loss in average D-efficiency compared to the adaptive designs which utilize MaxEnt(η = 1) or PBD in stage 1. In regard to the RE(n), there is no single best strategy. A combination MaxEnt(η = 1) → PBD is a top performer in scenarios with n⁽¹⁾ = 15 and n⁽¹⁾ = 30, and MaxEnt(η = 1) → CRD seems to perform best when n⁽¹⁾ = 45. Note that in a scenario when the total sample size is split equally between stages 1 and 2 (i.e., n⁽¹⁾ = n⁽²⁾ = 30), the two-stage adaptive designs have highest values of both D-eff(n) (0.84–0.85) and RE(n) (0.67–0.70). By contrast, when n⁽¹⁾ = 15, D-eff(n) is 0.79–0.81, and RE(n) is 0.63–0.66; and when n⁽¹⁾ = 45, D-eff(n) is 0.82–0.83 and the RE(n) is 0.61–0.68. This indicates that adaptive designs using MaxEnt(η = 1) → PBD or PBD → PBD combinations, with an adaptation after 50% of the total sample size provide best performance in our example. The two-stage uniform design applied with the (non-optimal) equal allocation has worst performance (D-eff(n)=0.74 and RE(n)=0.38), which reinforces importance of D-optimality in trial design.

Table II Operating Characteristics of 5 Two-Stage Adaptive Optimal Design Strategies and a Fixed Uniform Allocation Design for a Total Sample Size of n = 60

Full size table

Based on the results from Table II, one may argue that the improvements due to use of a “more balanced” randomization method such as PBD over a “less balanced” randomization procedure such as CRD are very modest (1–2% in our example). However, these results are obtained under only one experimental scenario, with a limited selection of sample sizes for stages 1 and 2, and a limited selection of the stage 1/stage 2 ratio of the sample sizes (15:45, 30:30, and 45:15 in our example). A more thorough study would be needed to carefully assess an impact of these parameters on the performance of various two-stage randomization design strategies.

Table III illustrates the importance of having a sufficiently large size for stage 1 of the trial. Displayed in Table III is the percentage of simulation runs for which the MLE of θ (and, therefore, an estimate of the D-optimal design) could not be obtained based on data ascertained from stage 1 (in which case stage 2 of the trial was implemented using Uniform PBD). One can see that when n⁽¹⁾ = 15, the percentage of “failed” stage 1 trials was 23% for CRD, 11% for MaxEnt(η = 1), and 12% for PBD. In other words, if CRD is used with n⁽¹⁾ = 15, there is almost a 1 out of 4 chance that the D-optimal design cannot be estimated after stage 1. On the other hand, for n⁽¹⁾ = 30 and n⁽¹⁾ = 45, the probability of not being able to estimate the D-optimal design after stage 1 with CRD drops to 4 and 2%, respectively. Obviously, for MaxEnt(η = 1) and PBD, the corresponding numbers are lower (~ 1 and < 1%, respectively) because these two designs have better balancing properties than CRD.

Table III Percentage of Simulation Runs for a Two-Stage Adaptive Design for Which the MLE of θ (and Therefore, an Estimate of the D-optimal Design) Could not Be Obtained Based on Data from Stage 1

Full size table

Adaptive Optimal Designs with Early Stopping

To evaluate an impact of randomization on adaptive designs with early stopping, we consider four competing adaptive design strategies. All designs randomize the first cohort of 15 subjects among the doses 0, 1/2, and 1 using target allocation (1/3, 1/3, 1/3). Thereafter, additional cohorts of 15 subjects are randomized into the study using different randomization procedures targeting an updated D-optimal design, until either the maximum sample size of the study n_max is reached, or the study stopping criterion is met. In our simulations, we set n_max = 1000. For the study stopping criterion, we use the rule based on the volume of the confidence ellipsoid, described in (34) as follows: the study should stop once $ \left|{\mathbf{M}}_{obs}^{-1}\left({\widehat{\boldsymbol{\theta}}}_{MLE},\xi \right)\right|\le {\left({\eta}^4\left|{\widehat{\beta}}_0\right|\left|{\widehat{\beta}}_1\right|\left|{\widehat{\beta}}_2\right|\left|\widehat{b}\right|\right)}^2 $, where 0 < η < 1 is a user-defined constant. In our simulations, we explore four choices of η = 0.15; 0.20; 0.25; 0.35. We also include Uniform PBD with the same stopping rule as a reference procedure.

Figure 4 shows distributions of the sample size at study termination for different adaptive design strategies and the uniform allocation design. Overall, for a given value of η, there is little difference among the adaptive optimal designs. Smaller values of η imply higher bar for estimation accuracy, and therefore a larger sample size is required. When η = 0.35, the CRD → CRD → ... strategy has a median sample size = 45 and a maximum sample size = 120; at the same time, three other adaptive design strategies (MaxEnt(η = 1) → PBD → ...; MaxEnt(η = 1) → MaxEnt(η = 1) → ...; and MaxEnt(η = 1) → CRD → ...) have the same median sample size = 45, but a lower maximum sample size = 60. By contrast, the median sample size for Uniform PBD is substantially larger compared to the adaptive designs: based on our simulations, 7–11 more cohorts of size 15 are required for Uniform PBD to achieve the same level of estimation accuracy as for the adaptive designs.

Robustness to Experimental Biases

A recently published ERDO template (45) emphasizes the importance of assessing the robustness of randomization procedures to chronological bias and selection bias. The chronological bias can arise, for example, in a long-term study with slow recruitment where patients enrolled later in the study may be healthier due to an overall improved standard of care, and if treatment assignments are not balanced over time, then treatment comparison may be biased. To mitigate the impact of chronological bias, it is recommended that a randomization design should balance treatment assignments over time, e.g., by means of some kind of restricted randomization (44,47). The potential negative impact of selection bias on statistical inference (test decisions) is acknowledged and well documented (48,49,50,51). Strategies to reduce risk of selection bias exist (52,53); one recommendation is to use less restrictive randomization procedures, such as the maximal procedure (47,54).

The ERDO template provides a general framework for justifying the choice of a randomization procedure in practice. Here, we apply it in a setting of an adaptive randomized three-arm trial with censored time-to-event outcomes and the D-optimal allocation.

Chronological Bias

We assume that there is an effect due to a time trend, which means that the true model for the jth subject in the study has the form:

$$ \log {T}_j={\beta}_0+{\beta}_1{x}_j+{\beta}_2{x}_j^2+{\eta}_j+b{\varepsilon}_j, $$

(10)

where x_j is the dose, ε_j is the error term following the standard extreme value distribution, and η_j is the time trend, which can take one of the following forms (44):

$$ {\eta}_j=\nu \left\{\begin{array}{cc}\frac{j}{n}& \mathrm{linear}\ \mathrm{time}\ \mathrm{trend},\\ {}{1}_{j\ge c}(j)& \mathrm{stepwise}\ \mathrm{trend},\\ {}\log \left(\frac{j}{n}\right)& \mathrm{logarithmic}\ \mathrm{trend},\end{array}\right. $$

where the time trend effect ν is a positive number. A sensible choice for ν can be a fraction of the variation in the data, i.e., the standard deviation or range. Furthermore, we assume that η_j and ε_j are independent.

We consider a two-stage adaptive design with n = 60 and n⁽¹⁾ = 30 in which data are generated according to model in Eq. (10) with three kinds of time trend described above. For the stepwise trend, we take c = 30, which means that the patients recruited in stage 2 (after interim analysis) are systematically different from the patients recruited in stage 1 of the study. We consider six choices for ν: ν = 0 (no time trend) and ν = 0.5; 1; 2; 5; 10 (time trend is present). We evaluate three different two-stage adaptive designs and the uniform allocation design. The key interest is quality of estimation, as assessed by D-eff(n) and RE(n).

First, we found that chronological bias had no impact on D-efficiency for any of the considered design strategies—the average values of D-eff(n) were identical in the no-trend case (ν = 0) and the cases when the trend was present (ν > 0) (results not displayed here). Figure 5 shows a plot of MSE values for estimating (β₀, β₁, β₂, b) vs. ν, for three kinds of time trend. Overall, one can see that there is no apparent evidence that greater amount of chronological bias (higher values of ν) can lead to an increase of MSE values. Interestingly, the presence of chronological bias (ν > 0) resulted in substantial decrease of MSE values for the parameters b and β₀ for Uniform PBD (purple curve).

Selection Bias

We adopt the approach described in (51) but accounting for a three-arm randomization setting. We assume the outcome is survival time and longer times indicate better treatment efficacy. An investigator favors an experimental treatment, and if she anticipates that the next treatment assignment is either low or high dose, then she can select a terminally ill patient who meets all eligibility criteria patients to be allocated to this dose group.

We assume the investigator has the information of the target allocation (ρ₁, ρ₂, ρ₃), and knows the current treatment distribution (N₁(j), N₂(j), N₃(j)). Furthermore, assume that the investigator uses the minimum imbalance guessing strategy, as described in (55). Compute the imbalance between the current allocation and the target allocation:

$$ {Imb}_i=\frac{N_i(j)}{j}-{\rho}_i,\kern0.5em i=1,2,3. $$

The treatment (dose level) with the minimum value of imbalance is predicted, i.e.,

$$ \ell =\arg \underset{i=1,2,3}{\min}\mathit{\operatorname{Im}}{b}_i $$

In other words, ℓ can take values 1 ⇒ dose level is 0 (placebo), 2 ⇒ low active dose, or 3 ⇒ high active dose. The biasing strategy is as follows. If ℓ = 2 or 3, then the investigator enrolls a “sicker” patient into the study. If ℓ = 1, then the investigator enrolls a “healthier” patient into the study. If there is a tie between 1 and 2 (i.e., Imb₁ = Im b₂ < Im b₃), or a tie between 1 and 3 (i.e., Imb₁ = Im b₃ < Im b₂), or a tie between 1, 2, and 3 (i.e., Imb₁ = Im b₂ = Im b₃), then the investigator enrolls a “normal” patient into the study. Therefore, the model for the jth subject in the study has the form:

$$ \log {T}_j=\left({\beta}_0+{\beta}_1{x}_j+{\beta}_2{x}_j^2\right){\eta}_j+b{\varepsilon}_j, $$

(11)

where x_j is the dose, ε_j is the error term following the standard extreme value distribution, and η_j is given by the following:

$$ {\eta}_j=\left\{\begin{array}{cc}\nu & \mathrm{if}\ \ell =2\ \mathrm{or}\ \ell =3,\\ {}1& \mathrm{if}\ \ell =\left\{1,2\right\}\ \mathrm{or}\ \ell =\left\{1,3\right\}\ \mathrm{or}\ \ell =\left\{1,2,3\right\},\\ {}1/\nu & \mathrm{if}\ \ell =1,\end{array}\right. $$

with ν ∈ (0, 1) being the biasing factor. At a dose level x, the median survival time would be exp(ν(β₀ + β₁x + β₂x²)){log(2)}^b for a “sicker” patient; exp(1/ν(β₀ + β₁x + β₂x²)){log(2)}^b for a “healthier” patient; and exp((β₀ + β₁x + β₂x²)){log(2)}^b for a “normal” patient.

We consider a two-stage adaptive design with n = 60 and n⁽¹⁾ = 30 in which data are generated according to model in Eq. (11) with ν = 0.5 (selection bias is present) and ν = 1 (no selection bias). We evaluate three different two-stage design strategies: CRD → CRD, MaxEnt(η = 1) → MaxEnt(η = 1), and PBD → PBD. As before, we include Uniform PBD as a reference procedure.

Figure 6 shows theoretical (red) and estimated (yellow) median time-to-event profiles as well as the 25th and 75th quantile time-to-event curves for different designs. Clearly, the presence of selection bias (ν = 0.5) has a negative impact on quality of estimation, for all four design strategies considered here: the designs tend to systematically underestimate the dose–response curve at higher dose levels. The “least affected” design is PBD → PBD. The Uniform PBD has the worst performance. Interestingly, the CRD → CRD did not provide best protection against selection bias (as one might have conjectured based on the results from equal (1:1) allocation randomization, in which case the CRD is known to be least susceptible to selection bias (22)).

DISCUSSION

In this paper, we evaluated impact of randomization on statistical properties of adaptive optimal designs in a time-to-event dose–response study with the D-optimal allocation that involves possibly non-integer (irrational) allocation proportions. To our knowledge, this is the first paper that systematically investigated the choice of a randomization procedure in such a setting.

Optimal designs for dose–response studies with nonlinear models depend on the true model parameters that are unknown in practice. A solution to this problem is to use adaptive optimal designs which attempt to achieve maximal incremental increase in information about the model at each step. Previous work has shown that such adaptive designs can successfully approximate true optimal designs in various dose–response settings (56,57,58,59,60). Many of these designs were developed for phase I dose escalation trials in which adaptations are applied sequentially, in a non-randomized manner. On the other hand, phase II dose–response trials use randomized parallel group designs and attempt to gain maximum information about the dose–response over a given dose range for a given sample size. A practical solution for a phase II dose–response study is a two-stage adaptive optimal design, for which data from a (pilot) first stage of the trial are used to ascertain an initial estimate of the dose–response curve and use this information to optimize the second stage of the trial. Two-stage adaptive designs have been shown to be highly efficient in various settings (34,61,62,63,64). With a two-stage design, an important question is how to implement it in practice. Randomization is a powerful tool that can be used to achieve a pre-determined treatment allocation ratio in each stage while protecting a study from bias and maintaining validity of the trial results. The choice of the “best” randomization procedure for use in practice can be elusive due to a variety of available methods (22). Many studies do not go into details on how randomization is implemented in practice (45). In the current paper, we provide an example of how different randomization options can be examined to select one for implementation in an adaptive dose–response trial.

We have shown that both the choice of an allocation design and a randomization procedure to implement the target allocation impact the quality of dose–response estimation, especially for small samples. The D-optimal allocation implemented by a randomization procedure with low variability leads to the most accurate estimation of the dose–response relationship. Our findings are consistent with the template of Hu and Rosenberger (46) which suggests that optimality of a fixed allocation design and variability of the randomization procedure are two major determinants of the performance of a randomization design in practice. From our simulation studies, we found that design optimality has a more profound impact on design performance than the randomization procedure. In other words, applying the “most balanced” randomization procedure such as PBD to target a non-optimal design is an inferior strategy to applying the “most random” CRD procedure to target the D-optimal design. We found that while CRD (applied to the D-optimal target) can incur some loss in efficiency in small samples, it becomes more efficient as the sample size increases, e.g., the median D-efficiency of CRD is ~ 99% for n = 60. Using more restrictive (and properly calibrated) randomization procedures (such as DBCD, GDLUD, etc.) can also be an attractive strategy.

For a two-stage design with a pre-determined total sample size, two important considerations involve the timing of an interim analysis and the choice of randomization procedures for stages 1 and 2. If stage 1 size is too small and a highly variable randomization procedure (such as CRD) is used to allocate patients to doses, then there is a substantial risk that the D-optimal design cannot be estimated after stage 1, thereby defying the purpose of design adaptation. If stage 1 size is too large, then an interim estimate of the D-optimal design can be readily available; yet the second stage may be too small to fully benefit from this interim knowledge. Our simulations show that an equal split of the total sample size between stages 1 and 2 and use of a “well-balanced” randomization to implement target allocation in each stage (especially in stage 1) is an optimal strategy.

For a multi-stage adaptive design with early stopping, it is important that the first (pilot) cohort is randomized to doses according to a “well-balanced” procedure such as PBD or MaxEnt(η = 1). Thereafter, additional cohorts can be randomized (according to an updated D-optimal design) using different methods, including CRD. Again, design optimality has a profound effect—using a sub-optimal (uniform allocation) design requires much larger sample sizes to attain the same level of estimation accuracy as for the adaptive optimal designs.

In practice, the design performance may be affected by various experimental biases. It is increasingly common to evaluate the influence of potential selection bias and chronological bias on test decision (type I error rate) (44,49,50,51,65). In the current paper, we investigated the potential impact of experimental bias on dose–response estimation using a recently published ERDO template (45). In particular, our simulations provide evidence that selection bias can have a detrimental impact on quality of dose–response estimation. A striking finding is that a sub-optimal (uniform allocation) design can lead to very misleading conclusions, in both scenarios with and without selection bias. Without selection bias, the Uniform PBD can overestimate the true dose–response curve, whereas when selection bias is present, the design can grossly underestimate the curve and even yield a false impression that the dose–response is flat. In our example, a two-stage adaptive optimal design with PBD applied in both stages was more robust (while still being affected) to selection bias than other designs.

We would also like to highlight several important problems for future research. In the current paper, we only focused on estimation of dose–response. However, in many phase II clinical trials, the primary objective is to first test whether the dose–response is present and then estimate the dose–response curve. Testing the presence of dose–response in time-to-event settings may be a challenging task—due to small sample sizes, censored data, and model uncertainty. Which test (parametric or nonparametric) should be used? The impact of randomization on power of the test was studied in response-adaptive randomized comparative studies (46) but not in the context of dose–response studies, and this is one important open problem. Another problem is sample size justification for two-stage adaptive designs. How large should a study be? Is equal split of the total sample size between stage 1 and stage 2 always optimal? Our simulations in the current paper suggest so, but the formal proof of this conjecture is yet to be provided.

In practice, historical data from previous studies may be available, in which case a Bayesian design may be a viable option, e.g., the first (pilot) stage may be implemented using Bayesian optimal design (which may be different from the uniform design), and subsequent adaptations can be implemented in a Bayesian manner (rather than using maximum likelihood updating). The impact of randomization on statistical properties of Bayesian adaptive dose–response designs certainly merits investigation.

The results in the current paper are based on the assumption that event times follow a quadratic Weibull regression model with four parameters. While such a model is quite flexible and can cover a broad variety of dose–response shapes (34), it may still be misspecified in a number of ways; e.g., a third- or higher-order polynomial model may be a better choice, and/or a time-to-event distribution may be other than Weibull (say, log-logistic, Gamma, etc.). Finding locally D-optimal designs under different models and constructing response-adaptive designs that converge to the “true” ones can be done using similar arguments as in our previous work (34) and the current paper. More complex models may require larger amount of data to estimate the underlying dose–response and implement the corresponding D-optimal designs. However, the main findings of the current work are likely to be extended to such more complex models (provided that the functional form of the model and the event time distribution are chosen correctly). If the model form and/or the distribution of the event times are misspecified, then statistical properties of response-adaptive optimal designs (constructed under different assumptions) may be affected. The impact of such misspecifications is another important open problem which we hope to pursue in our future work.

In many time-to-event trials, there are important covariates (prognostic factors) that are correlated with the primary outcome. Rosenberger and Sverdlov (66) discuss strategies for handling covariates in the design of randomized comparative trials and advocate a class of covariate-adjusted response-adaptive (CARA) randomization designs. CARA randomization can be particularly attractive in trials for personalized medicine (67). An application of CARA randomization in time-to-event dose–response trials is yet another open problem.

Finally, we think that further theoretical and simulation studies are warranted to better understand the impact of chronological bias and selection bias on estimation and statistical tests following adaptive optimal designs. The ERDO template (45) is an excellent starting point to facilitate such an investigation.

CONCLUSION

The current paper provides a systematic study of adaptive randomization procedures to target D-optimal designs for dose–response trials with time-to-event outcomes. Simulation studies provide evidence that the choice of randomization to implement the D-optimal design does matter as far as quality of dose–response curve estimation is concerned. For best performance, an adaptive design with small cohort sizes should be implemented with a randomization procedure that ensures a “well-balanced” allocation according to the targeted D-optimal design at each stage. Using a sub-optimal design can lead to very misleading results, in both scenarios with and without selection bias. The results of the current work should help clinical investigators select an appropriate randomization procedure for their dose–response study.

REFERENCES

Bretz F, Hsu J, Pinheiro J, Liu Y. Dose finding—a challenge in statistics. Biom J. 2008;50(4):480–504.
Article PubMed Google Scholar
Zhao W, Yang H. Statistical methods in drug combination studies. Boca Raton: Chapman & Hall/CRC Press; 2015. 240 p.
Google Scholar
Jaki T. Multi-arm clinical trials with treatment selection: what can be gained and at what price? Clin Investig (Lond). 2015;5(4):393–9.
Article CAS Google Scholar
Wason J, Magirr D, Law M, Jaki T. Some recommendations for multi-arm multi-stage trials. Stat Methods Med Res. 2012;25(2):716–27.
Article PubMed PubMed Central Google Scholar
Woodcock J, LaVange LM. Master protocols to study multiple therapies, multiple diseases, or both. N Engl J Med. 2017;377(1):62–70.
Article PubMed CAS Google Scholar
Saville BR, Berry SM. Efficiencies of platform clinical trials: a vision of the future. Clin Trials. 2015;13(3):358–66.
Article Google Scholar
Sverdlov O, Rosenberger WF. On recent advances in optimal allocation designs in clinical trials. J Stat Theory Pract. 2013;74(7):753–73.
Article Google Scholar
Dumville JC, Hahn S, Miles JNV, Torgerson DJ. The use of unequal randomisation ratios in clinical trials: a review. Contemp Clin Trials. 2006;27:1–12.
Peckham E, Brabyn S, Cook L, Devlin T, Dumville J, Torgerson DJ. The use of unequal randomisation in clinical trials—an update. Contemp Clin Trials. 2015;45:113–22.
Article PubMed Google Scholar
Biedermann S, Dette H, Zhu W. Optimal designs for dose-response models with restricted design spaces. J Am Stat Assoc. 2006;101(474):747–59.
Article CAS Google Scholar
Miller F, Guilbaud O, Dette H. Optimal designs for estimating the interesting part of a dose-effect curve. J Biopharm Stat. 2007;17:1097–115.
Article PubMed Google Scholar
Dette H, Bretz F, Pepelyshev A, Pinheiro J. Optimal designs for dose-finding studies. J Am Stat Assoc. 2008;103(483):1225–37.
Article CAS Google Scholar
Bretz F, Dette H, Pinheiro JC. Practical considerations for optimal designs in clinical dose finding studies. Stat Med. 2010;29(7–8):731–42.
Article PubMed Google Scholar
Gwise TE, Zhou J, Hu F. An optimal response adaptive biased coin design with k heteroscedastic treatments. J Stat Plan Inference. 2011;141(1):235–42.
Article Google Scholar
Wong WK, Zhu W. Optimum treatment allocation rules under a variance heterogeneity model. Stat Med. 2008;27(22):4581–95.
Article PubMed Google Scholar
Zhu H, Hu F. Implementing optimal allocation for sequential continuous responses with multiple treatments. J Stat Plan Inference. 2009;139(7):2420–30.
Article Google Scholar
Tymofyeyev Y, Rosenberger WF, Hu F. Implementing optimal allocation in sequential binary response experiments. J Am Stat Assoc. 2007;102(477):224–34.
Article CAS Google Scholar
Sverdlov O, Tymofyeyev Y, Wong WK. Optimal response-adaptive randomized designs for multi-armed survival trials. Stat Med. 2011;30(24):2890–910.
Article PubMed Google Scholar
Sverdlov O, Ryeznik Y, Wong WK. Efficient and ethical response-adaptive randomization designs for multi-arm clinical trials with Weibull time-to-event outcomes. J Biopharm Stat. 2014;24(4):732–54.
Article PubMed Google Scholar
Zhu W, Wong WK. Optimal treatment allocation in comparative biomedical studies. Stat Med. 2000;19(5):639–48.
Article PubMed CAS Google Scholar
Feng C, Hu F. Optimal responses-adaptive designs based on efficiency, ethic, and cost. Stat Interface. 2018;11(1):99–107.
Article Google Scholar
Rosenberger WF, Lachin JM. Randomization in clinical trials: theory and practice. 2nd ed. New York: Wiley; 2015. 284 p.
Google Scholar
Sverdlov O, Rosenberger WF. Randomization in clinical trials: can we eliminate bias? Clin Investig (Lond). 2013;3(1):37–47.
Article CAS Google Scholar
Zhao W. A better alternative to the inferior permuted block design is not necessarily complex. Stat Med. 2016;35:1736–8.
Article PubMed Google Scholar
Berger VW, Bejleri K, Agnor R. Comparing MTI randomization procedures to blocked randomization. Stat Med. 2016;35(5):685–94.
Article PubMed Google Scholar
Zhao W, Weng Y. Block urn design—a new randomization algorithm for sequential trials with two or more treatments and balanced or unbalanced allocation. Contemp Clin Trials. 2011;32(6):953–61.
Article PubMed PubMed Central Google Scholar
Kuznetsova OM, Tymofyeyev Y. Brick tunnel randomization for unequal allocation to two or more treatment groups. Stat Med. 2011;30(8):812–24.
PubMed Google Scholar
Kuznetsova OM, Tymofyeyev Y. Wide brick tunnel randomization—an unequal allocation procedure that limits the imbalance in treatment totals. Stat Med. 2014;33(9):1514–30.
Article PubMed Google Scholar
Zhao W. Mass weighted urn design—a new randomization algorithm for unequal allocations. Contemp Clin Trials. 2015;43:209–16.
Article PubMed PubMed Central Google Scholar
Hu F, Rosenberger WF. The theory of response-adaptive randomization in clinical trials. New York: Wiley and Sons; 2006. 218 pp.
Hu F, Zhang LX. Asymptotic properties of doubly adaptive biased coin designs for multitreatment clinical trials. Ann Stat. 2004;32(1):268–301.
Google Scholar
Sun R, Cheung SH, Zhang LX. A generalized drop-the-loser rule for multi-treatment clinical trials. J Stat Plan Inference. 2007;137(6):2011–23.
Article Google Scholar
Yuan A, Chai GX. Optimal adaptive generalized Pólya urn design for multi-arm clinical trials. J Multivar Anal. 2008;99(1):1–24.
Article Google Scholar
Ryeznik Y, Sverdlov O, Hooker AC. Adaptive optimal designs for dose-finding studies with time-to-event outcomes. AAPS J. 2018;20(1):24.
Article CAS Google Scholar
Fedorov VV, Hackl P. Model-oriented design of experiments. New York: Springer New York; 1997. 117 p.
Book Google Scholar
Berger V. Selection bias and covariate imbalances in randomized clinical trials. Hoboken: Wiley; 2005. 205 p.
Rosenberger WF, Hu F. Maximizing power and minimizing treatment failures in clinical trials. Clin Trials. 2004;1(2):141–7.
Article PubMed Google Scholar
Efron B. Forcing a sequential experiment to be balanced. Biometrika. 1971;58(3):403–17.
Article Google Scholar
Klotz JH. Maximum entropy constrained balance randomization for clinical trials. Biometrics. 1978;34(2):283–7.
Article PubMed CAS Google Scholar
Ryeznik Y, Sverdlov O. A comparative study of restricted randomization procedures for multi-arm trials with equal or unequal treatment allocation ratios. Stat Med. 2018; https://doi.org/10.1002/sim.7817.
Kuznetsova OM. Brick tunnel randomization and the momentum of the probability mass. Stat Med. 2015;34(30):4031–56.
Article PubMed Google Scholar
Heritier S, Gebski V, Pillai A. Dynamic balancing randomization in controlled clinical trials. Stat Med. 2005;24(24):3729–41.
Article PubMed Google Scholar
Altman DG, Royston JP. The hidden effect of time. Stat Med. 1988;7(6):629–37.
Article PubMed CAS Google Scholar
Tamm M, Hilgers RD. Chronological bias in randomized clinical trials arising from different types of unobserved time trends. Methods Inf Med. 2014;53(6):501–10.
Article PubMed CAS Google Scholar
Hilgers RD, Uschner D, Rosenberger WF, Heussen N. ERDO—a framework to select an appropriate randomization procedure for clinical trials. BMC Med Res Methodol. 2017;17(1):159.
Article PubMed PubMed Central Google Scholar
Hu F, Rosenberger WF. Optimality, variability, power: evaluating response-adaptive randomization procedures for treatment comparisons. J Am Stat Assoc. 2003;98(463):671–8.
Article Google Scholar
Berger VW. Failure to look beyond blocks is a mistake. Methods Inf Med. 2015;54(3):290.
Article PubMed CAS Google Scholar
Proschan M. Influence of selection bias on type I error rate under random permuted block designs. Stat Sin. 1994;4(4):219–31.
Google Scholar
Kennes LN, Cramer E, Hilgers RD, Heussen N. The impact of selection bias on test decisions in randomized clinical trials. Stat Med. 2011;30(21):2573–81.
PubMed Google Scholar
Tamm M, Cramer E, Kennes LN, Heussen N. Influence of selection bias on the test decision: a simulation study. Methods Inf Med. 2012;51(2):138–43.
Article PubMed CAS Google Scholar
Rückbeil MV, Hilgers RD, Heussen N. Assessing the impact of selection bias on test decisions in trials with a time-to-event outcome. Stat Med. 2017;36(17):2656–68.
Article PubMed PubMed Central Google Scholar
Kahan BC, Rehal S, Cro S. Risk of selection bias in randomised trials. Trials. 2015;16(1):405.
Article PubMed PubMed Central Google Scholar
Berger VW. Risk of selection bias in randomized trials: further insight. Trials. 2016;17(1):485.
Article PubMed PubMed Central Google Scholar
Berger VW, Ivanova A, Knoll MD. Minimizing predictability while retaining balance through the use of less restrictive randomization procedures. Stat Med. 2003;22(19):3017–28.
Article PubMed Google Scholar
Zhao W, Everett CC, Weng Y, Berger VW. Guessing strategies for treatment prediction under restricted randomization with unequal allocation. Contemp Clin Trials. 2017;59:118–20.
Haines LM, Perevozskaya I, Rosenberger WF. Bayesian optimal designs for phase I clinical trials. Biometrics. 2003;59(3):591–600.
Article PubMed Google Scholar
Liu G, Rosenberger WF, Haines LM. Sequential designs for logistic phase I clinical trials. J Biopharm Stat. 2006;16(5):605–21.
Article PubMed Google Scholar
Liu G, Rosenberger WF, Haines LM. Sequential designs for ordinal phase I clinical trials. Biom J. 2009;51(2):335–47.
Article PubMed Google Scholar
Roy A, Ghosal S, Rosenberger WF. Convergence properties of sequential Bayesian D-optimal designs. J Stat Plan Inference. 2009;139(2):425–40.
Article Google Scholar
Roth K. Sequential designs for dose escalation studies in oncology. Commun Stat Simul Comput. 2012;417(41):1131–41.
Article Google Scholar
Dragalin V, Fedorov VV, Wu Y. Two-stage design for dose-finding that accounts for both efficacy and safety. Stat Med. 2008;27(25):5156–76.
Article PubMed Google Scholar
Bornkamp B, Bretz F, Dette H, Pinheiro J. Response-adaptive dose-finding under model uncertainty. Ann Appl Stat. 2011;5(2 B):1611–31.
Article Google Scholar
Ivanova A, Xiao C, Tymofyeyev Y. Two-stage designs for phase 2 dose-finding trials. Stat Med. 2012;31(24):2872–81.
Article PubMed PubMed Central Google Scholar
Dette H, Bornkamp B, Bretz F. On the efficiency of two-stage response-adaptive designs. Stat Med. 2013;32(10):1646–60.
Article PubMed Google Scholar
Uschner D, Hilgers RD, Heussen N. The impact of selection bias in randomized multi-arm parallel group clinical trials. PLoS One. 2018;13(1):e0192065.
Article PubMed PubMed Central Google Scholar
Rosenberger WF, Sverdlov O. Handling covariates in the design of clinical trials. Stat Sci. 2008;23(3):404–19.
Article Google Scholar
Hu F. Statistical issues in trial design and personalized medicine. Clin Investig (Lond). 2012;2:121–4.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Uppsala University, Room Å14133 Lägerhyddsvägen 1, Hus 1, 6 och 7, 751 06, Uppsala, Sweden
Yevgen Ryeznik
Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden
Yevgen Ryeznik & Andrew C. Hooker
Early Development Biostatistics, Novartis Institutes for Biomedical Research, East Hannover, New Jersey, USA
Oleksandr Sverdlov

Authors

Yevgen Ryeznik
View author publications
You can also search for this author in PubMed Google Scholar
Oleksandr Sverdlov
View author publications
You can also search for this author in PubMed Google Scholar
Andrew C. Hooker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yevgen Ryeznik.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Ryeznik, Y., Sverdlov, O. & Hooker, A.C. Implementing Optimal Designs for Dose–Response Studies Through Adaptive Randomization for a Small Population Group. AAPS J 20, 85 (2018). https://doi.org/10.1208/s12248-018-0242-5

Download citation

Received: 24 April 2018
Accepted: 18 June 2018
Published: 19 July 2018
DOI: https://doi.org/10.1208/s12248-018-0242-5

KEY WORDS

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Implementing Optimal Designs for Dose–Response Studies Through Adaptive Randomization for a Small Population Group

Abstract

Similar content being viewed by others

Adaptive Optimal Designs for Dose-Finding Studies with Time-to-Event Outcomes

Optimal Adaptive Phase III Design with Interim Sample Size and Dose Determination

Safety Concerns of the 3+3 Design: A Comparison to the mTPI Design

INTRODUCTION