In this section, I give some exposition of the theory of optimal control. I also illustrate each step of the procedure by applying it to a previously-published model of therapy for patients with CML [52, 53]. This example was selected because it illustrates the calculations needed for the most common cases.
What is control theory?
Control theory refers to the ability to change a system in a desired way. A common setting for control theory is a system of ordinary differential equations (a dynamical system) that represents states we are interested in tracking and changing. We use controls to alter one or more states of the system, which will cause a change in the outcome.
For example, the states might be the three positional coordinates of the center of gravity of a rocket, (x, y, z), as governed by gravity and aerodynamics. The controls might be the direction and force produced by a combustion engine. Changing the controls allows us to change the position of the rocket, while it is still being governed by a system that includes the effects of gravity and aerodynamics. If we check the states (the coordinates) at certain times, and use that information to decide how to change the controls, this is called feedback control. So if unanticipated wind or debris have affected the position of the rocket, a measurement will reveal the effect, and we can adjust the firing of the engine to take that into account.
A simpler example of feedback control is a water storage tank with a float valve. When the water level rises, the float rises too, and shuts off an input valve. When the water level drops, the float drops too, and the input valve is opened, allowing more water to flow in. In this case, the feedback control is automatic. The water level is the state of interest, and the input water is the control.
Likewise, thermostats can be made with metals or gasses that expand with heat. The thermometer is configured so that when the ambient temperature is high enough, the expansion causes the electrical circuit to open, which causes the furnace to stop running. When the ambient temperature drops enough, the material contracts, the circuit is completed again, and the furnace is triggered by the electrical signal to turn on. The temperature is the state, and the heat generated by the furnace is the control.
The artificial pancreas described above is another example of feedback control. The glucose level is the state, and the insulin input is the control.
What is optimal control?
Optimal control tries to find the controls (which may vary over time) that get the system as close as possible to a desired outcome. The desired outcome is quantified by an objective functional that is maximized or minimized. The word functional simply means the objective is a function of one or more functions. While the objective is still a function, the term functional is more precise, just as the term square is more precise (when applicable) than rectangle. Optimal control uses the same type of state and control functions that are used in control theory, but we add the objective functional and optimize it while the system behaves according to specified equations.
Semi-mechanistic dynamical systems models of diseases
The dynamical systems of interest in drug development are those that represent states related to diseases. For example, in the case of a cancer of the blood, the concentration of cancerous cells in a patient’s peripheral blood could be a state we are interested in. We can incorporate anti-cancer treatments as controls in the system. In the dynamics of cancer and therapy, there are host immune system cells that play important roles, and they would be included as states as well. The idea of a “minimal model” that captures the key characteristics of the state and control dynamics leads us to “semi-mechanistic models” [49, p. 38].
The model in [53] is semi-mechanistic and includes cancer cells, C(t), and two types of immune system cells: naive T cells, \(T_n(t)\), and effector T cells, \(T_e(t)\). Each of the cell types is dependent on time t, and time-dependent drug levels (controls) are denoted by \(u_1(t)\) and \(u_2(t)\). The relationships between the cell concentrations and the controls are represented in the differential equations shown here:
$$ \frac{dT_n}{dt} = s_n - u_2 d_n T_n - k_n T_n \left( \frac{C}{C + \eta }\right) $$
(1)
$$ \frac{dT_e}{dt} = \alpha _n k_n T_n \left( \frac{C}{C + \eta }\right) + \alpha _e T_e \left( \frac{C}{C + \eta }\right) - u_2 d_e T_e - \gamma _e C T_e $$
(2)
$$ \frac{dC}{dt} = (1 - u_1) r_c C \ln \left( \frac{C_{\max }}{C}\right) - u_2 d_c C - \gamma _c C T_e $$
(3)
where \(T_n(0)\), \(T_e(0)\), and C(0) are known. The parameters \(s_n\), \(d_n\), \(k_n\), \(\eta \), \(\alpha _n\), \(\alpha _e\), \(d_e\), \(\gamma _e\), \(r_c\), \(C_{\max }\), \(d_c\), \(\gamma _c\) are all assumed to be non-negative constants. More information about the system and the parameters is given in Moore and Li [52] and Nanda et al. [53].
Because the states of interest are \(T_n\), \(T_e\), and C, Eqs. (1)–(3) are called the state equations. In modeling a physical system, it is common that the known information is about local interactions. For example, the last term of Eq. (3) is comprised of three factors multiplied by each other: the constant parameter \(\gamma _c\); the concentration of cancer cells, C; and the concentration of effector T cells, \(T_e\). We used this mass action form because we were modeling cell contact-dependent killing of cancer cells by effector T cells. The parameter \(\gamma _c\) takes into account both the rate at which effector T cells and cancer cells have encounters, and the proportion of those encounters that lead to the loss of the cancer cell.
By modeling rates of local interactions and events, we get expressions for the rates of change, such as those represented in Eqs. (1)–(3). Solving the system of differential equations means solving for the cell populations whose rates of change we modeled. So we start with differential equations composed of local, instantaneous information, and then solve to obtain functions that describe the cell population levels over time. In the examples detailed in this work, fixed values are used for the parameters. The selected values are intended to represent a typical patient. Methods for handling differences and uncertainty in parameter values are included in the “Discussion” section.
Objective functionals
In addition to a mathematical model for the system we wish to control, we also need a mathematical model for the treatment goal or objective. For a disease such as cancer, it could be important to minimize the cancer cell levels during and at the end of the treatment period. For the immune cells in the model, we may wish to keep their levels from being too low at the end of the treatment period. And therapies generally have a risk of side effects, so we don’t want to use more than necessary during the treatment period. To put all of these goals together, we decide on a sign (positive or negative) and a relative weight for each goal and add the quantities we wish to minimize.
For example, for the system above, our treatment goal might be expressed as minimizing J, where
$$ J(u_1, u_2) = \int _0^{t_f} \left[ C(t)+ \frac{B_1}{2} u^2_1(t) + \frac{B_2}{2} u^2_2(t) \right] dt + B_3 C(t_f) - B_4 T_n(t_f) ,$$
(4)
where each \(B_i\), \(i = 1, 2, 3, 4\), is a positive relative constant weight, and \(t_f\) is the end time of the treatment period. We wish to minimize terms in the objective J; since the naive T cells appear in a negative term, minimizing this term maximizes the naive T cell concentration at the end of the treatment period. The controls \(u_1\) and \(u_2\) appear inside the integral as quadratic or squared terms for convenience. This choice is less common these days, and is examined further in the “Discussion” section.
The sizes of the relative weights reflect the importance of the various terms in the therapeutic goal. Generally, we rely on disease knowledge to decide on values for the weights. Decision analysis is a formal approach to quantifying this knowledge [54]. Alternatively, ranges of values can be sampled for the weights, yielding qualitative information about patterns of optimal regimens. Marler and Arora [55] examine ways to decide on relative weights in the objective functional in this and more general settings.
Because the treatment goal depends on \(u_1\) and \(u_2\), which are functions of time, J is called a functional (recall this means it is a function of one or more functions). J also depends on C during the treatment period, but C is determined by the dynamical system given by the state equations (1)–(3). The functions \(u_1\) and \(u_2\) are the only quantities we can control, so they are the inputs, and we consider J to be a function of them, \(J(u_1, u_2)\). Once we have determined an expression for the objective functional J, we want to optimize it. In this example, we optimize by minimizing J. To do that, we will take derivatives and set them equal to zero. However, to maintain the underlying dynamical system at the same time, we need the theory of optimal control.
Optimal control
The key idea behind optimal control is the way the dynamical system and the objective functional are tied together through the adjoint functions. To organize the necessary calculations for optimal control, we first form the Hamiltonian H (so-called because of its similarity to the Hamiltonian in classical mechanics; cf. [56]). The Hamiltonian is a functional that provides a convenient way to record and combine information about the objective functional and the underlying system dynamics. It combines the right-hand sides of the state equations with the derivative of the objective functional, using the adjoint functions to multiply the state equation components. The theory of optimal control that Pontryagin developed specifies what to do to H to obtain controls \(u_i\) that optimize the objective functional. In particular, certain derivatives of H define the adjoint functions through differential equations (the adjoint equations). See the book of Lenhart and Workman [48] for a readable beginner’s introduction to these ideas in optimal control applied to a general setting.
For concreteness, we show the Hamiltonian for the system and objective functional considered above, which demonstrates how to handle common forms:
$$\begin{aligned} H= & {} \,C + \frac{B_1}{2}u^2_1 + \frac{B_2}{2} u^2_2 + \lambda _1 \left( s_n - u_2 d_n T_n - k_n T_n \left( \frac{C}{C + \eta } \right) \right) \\&+\, \lambda _2 \left( \alpha _n k_n T_n \left( \frac{C}{C + \eta } \right) + \alpha _e T_e \left( \frac{C}{C + \eta } \right) - u_2 d_e T_e - \gamma _e C T_e \right) \\&+ \,\lambda _3 \left( (1 - u_1) r_c C \ln \left( \frac{C_{\max }}{C} \right) - u_2 d_c C - \gamma _c C T_e \right) \, . \end{aligned}$$
(5)
The factors \(\lambda _1\), \(\lambda _2\), and \(\lambda _3\) are the adjoint functions, and they are functions of time t, as are the state functions \(T_n\), \(T_e\), and C and the control functions \(u_1\) and \(u_2\). The adjoint functions are used to bring the underlying system dynamics into the optimization (note that they are multiplied by the right-hand sides of the state equations (1)–(3)). The first three terms of H are the terms that are inside the integral in J. The other two terms of J that are not inside the integral contribute the additional transversality conditions that accompany the adjoint equations. Namely, they give the conditions \(\lambda _1(t_f) = -B_4\), \(\lambda _2 (t_f) = 0\), and \(\lambda _3(t_f) = B_3\). When the adjoint equations are combined with these final-time conditions, they specify the adjoint functions \(\lambda _i\) uniquely, just as the state equations and their initial conditions specify the state functions uniquely.
Thanks to the way H is defined, the adjoint equations can be expressed in terms of H:
$$ \frac{d \lambda _1}{dt} = -\frac{\partial H}{\partial T_n} \, , \, \, \frac{d \lambda _2}{dt} = -\frac{\partial H}{\partial T_e} \, , \, \, \frac{d \lambda _3}{dt} = -\frac{\partial H}{\partial C} , $$
(6)
where \(\frac{\partial H}{\partial V}\) denotes the partial derivative of H with respect to the variable V, for \(V = T_n\), \(T_e\), or C. (As a reminder, the partial derivative of H with respect to V is calculated by treating every parameter or variable except V as a constant, and then taking the derivative of H as usual with respect to V.) For the leukemia example considered here, the reader can check that computing the partial derivatives specified in (6) gives the following adjoint equations:
$$ \frac{d \lambda _1}{dt} = \lambda _1 \left( u_2 d_n + k_n \frac{C}{C + \eta } \right) - \lambda _2 \alpha _n k_n \frac{C}{C + \eta } , $$
(7)
$$ \frac{d \lambda _2}{dt} = \lambda _3 \gamma _c C - \lambda _2 \left( \alpha _e \frac{C}{C + \eta } - u_2 d_e - \gamma _e C \right) , $$
(8)
$$\begin{aligned} \frac{d \lambda _3}{dt}= & {} \lambda _1 k_n T_n \frac{\eta }{(C + \eta )^2} - 1 \\&-\, \lambda _2 \left( \alpha _n k_n T_n \frac{\eta }{(C + \eta )^2} + \alpha _e T_e \frac{\eta }{(C + \eta )^2} - \gamma _e T_e \right) \\&- \,\lambda _3 \left( (1 - u_1) r_c \left( \ln \left( \frac{C_{\max }}{C} \right) - 1 \right) - u_2 d_c - \gamma _c T_e \right) . \end{aligned}$$
(9)
Now we have defined all the needed pieces and can state the problem fully. The problem is to find the drug levels \(u_1(t)\) and \(u_2(t)\) (which may vary over time) that minimize the objective functional J for the disease-therapy system governed by Eqs. (1)–(3). To achieve this, we take the partial derivatives of the Hamiltonian H with respect to \(u_1\) and \(u_2\) and set them equal to zero. That is, we compute the optimal regimens \(u_1\) and \(u_2\) for this system by setting \(\frac{\partial H}{du_1}\) and \(\frac{\partial H}{du_2}\) equal to zero and solving for \(u_1\) and \(u_2\). For our example leukemia model, these equations give:
$$ \frac{\partial H}{\partial u_1} = B_1 u_1 - \lambda _3 r_c C \ln \left( \frac{C_{\max }}{C } \right) = 0 , $$
(10)
$$ \frac{\partial H}{\partial u_2} = B_2 u_2 - \lambda _1 d_n T_n - \lambda _2 d_e T_e - \lambda _3 d_c C = 0 . $$
(11)
Solving Eqs. (10) and (11) for \(u_1\) and \(u_2\) give:
$$ u_1 = \frac{\lambda _3 r_c C \ln (\frac{C_{\max }}{C })}{B_1} , $$
(12)
$$ u_2 = \frac{\lambda _1 d_n T_n + \lambda _2 d_e T_e + \lambda _3 d_c C}{B_2} . $$
(13)
We combine these solutions with any lower or upper bounds on \(u_1\) and \(u_2\) to obtain piecewise-defined functions for \(u_1\) and \(u_2\) in terms of the state and adjoint functions. Although it may look like we have explicit formulas for the controls (the \(u_1\) and \(u_2\) functions) and are done, in fact we need to know the state and adjoint functions over time. The state functions are given by Eqs. (1)–(3) and their initial values (at time \(t = 0\)). However, Eqs. (1)–(3) depend on the controls. The adjoint functions are given by Eqs. (7)–(9) and their final values (at time \(t = t_f\)), and Eqs. (7)–(9) depend on the state functions. With all of the interdependencies between the control, state, and adjoint functions, the optimal control solutions generally have to be computed using numerical approximation methods.
One iterative approximation method starts with guesses for \(u_1\) and \(u_2\). For example, we might guess that both control functions are constant, with \(u_1=0.9\) and \(u_2=2.5\). We can then solve Eqs. (1)–(3) for the state functions, which allows us to solve Eqs. (7)–(9) for the adjoint functions. These state and adjoint functions can be used in Eqs. (12) and (13) to calculate the control functions. These updated controls can be used to start the process all over again. Once the iterative process results in no more changes in the controls (up to a specified tolerance), then we have found the optimal controls. We can plot these numerical solutions for \(u_1\) and \(u_2\) (see Fig. 3), as well as the cell levels \(T_n\), \(T_e\), and C over the treatment period [53].