Input estimation for drug discovery using optimal control and Markov chain Monte Carlo approaches
Abstract
Input estimation is employed in cases where it is desirable to recover the form of an input function which cannot be directly observed and for which there is no model for the generating process. In pharmacokinetic and pharmacodynamic modelling, input estimation in linear systems (deconvolution) is well established, while the nonlinear case is largely unexplored. In this paper, a rigorous definition of the inputestimation problem is given, and the choices involved in terms of modelling assumptions and estimation algorithms are discussed. In particular, the paper covers Maximum a Posteriori estimates using techniques from optimal control theory, and full Bayesian estimation using Markov Chain Monte Carlo (MCMC) approaches. These techniques are implemented using the optimisation software CasADi, and applied to two example problems: one where the oral absorption rate and bioavailability of the drug eflornithine are estimated using pharmacokinetic data from rats, and one where energy intake is estimated from bodymass measurements of mice exposed to monoclonal antibodies targeting the fibroblast growth factor receptor (FGFR) 1c. The results from the analysis are used to highlight the strengths and weaknesses of the methods used when applied to sparsely sampled data. The presented methods for optimal control are fast and robust, and can be recommended for use in drug discovery. The MCMCbased methods can have long running times and require more expertise from the user. The rigorous definition together with the illustrative examples and suggestions for software serve as a highly promising starting point for application of inputestimation methods to problems in drug discovery.
Keywords
Input estimation Deconvolution Nonlinear dynamic systems Optimal control Markov Chain Monte Carlo methodIntroduction
A typical example is to estimate the oral absorptionrate of a drug, given measurements of the drug plasma concentration. This assumes that a model of drug distribution and elimination is available. Of particular interest is the oral bioavailability—the fraction of the drug that is absorbed [22]. Clearly, this also applies to other routes of administration, e.g. subcutaneous. Another example is to estimate the energy intake of a subject given bodymass measurements. These estimates are important in research on drugs aimed at reducing body mass, or improving metabolic parameters, or both [11, 15].
One possible approach is to assume that the input function has a prespecified functional form, parametrised with a small number of parameters. Examples of functions used for this purpose include exponentials [37] and inverse Gaussians [7]. In this paper, we consider nonparametric methods that do not make strong assumptions about the form of the input.
When the dynamics are linear, established estimation methods exist [8, 37]. These methods rely on being able to express the input–output relationships as convolution integrals, something that is possible for linear systems only. Additionally, in many cases the linear inputestimation problem can be reduced to solving a quadratic optimisation problem [37]. In contrast, estimation in nonlinear systems is a more difficult problem. While many potentially useful methods exist in the engineering and statistical literature, typical engineering applications have densely sampled data, and these methods are not necessarily a good fit for PKPD applications. Thus, their applicability needs to be assessed on a casebycase basis.
In this paper, we give a rigorous definition of the inputestimation problem for nonlinear systems, and suggest methods and software to solve the problem. Two case studies are presented, using different choices of algorithms to illustrate the methods and serve as a starting point for discussions.
Problem specification

Choice of prior for \(\mathbf {u}(t)\), or equivalently, choice of regularisation term. This encodes the prior assumptions about the shape of the input function.

Functional representation. In practice, the input function \(\mathbf {u}(t)\) has to be represented using a finite set of parameters. Therefore, a choice of basis for the input function must be made.

Desired statistical quantities. Is a MAP estimate enough or are other quantities such as credible intervals desired?
Choice of prior for \(\mathbf {u}(t)\)
In this section, priors for scalar input functions are discussed. For vectorvalued input functions, priors can be independently assigned to each component, as long as they are assumed to be a priori independent. Assigning priors jointly over all components is not discussed in this paper.
In a Bayesian setting, the prior penalising the jth derivative can be interpreted as defining a special case of a Gaussian process [33]. This is a stochastic process whose finitedimensional marginal distributions are Gaussian, and it is completely determined by a mean and a covariance function. The theory of Gaussian processes provides a flexible framework for defining priors over functions. However, this approach does not seem to have been employed in drug discovery modelling to any significant extent.
Functional representation of \(\mathbf {u}(t)\)
For notational convenience, this section discusses scalarvalued functions u(t) only. When the input function is vectorvalued, each component can be represented using the techniques described below.
Perhaps the simplest choice of basis is to represent the function as a piecewise constant function, as done in [29]. These functions are very cheap to evaluate. On the other hand, the resulting staircaselike functions have to be defined on a relatively dense grid to represent the actual function well. This makes for a highdimensional estimation problem which can cause computational difficulties.
In [28], the Karhunen–Loève expansion is given for penalisation of the first and second derivatives of the input function, and applied to determining the input to a linear system. For these priors, the basis functions do not depend on the regularisation parameter. This is important since the regularisation parameter is usually unknown and has to be estimated from the data.
The Karhunen–Loève expansion assumes that the input function starts at 0. When penalising the second derivative, the derivative too starts at 0. To allow functions to start at arbitrary values, a constant or a linear term can be added.
Alternatively, splines can be used [5]. Just as piecewise constant functions, they are nonzero only in a limited interval. They can be computed efficiently, can be made differentiable to an arbitrary degree, and can represent realisticlooking functions using relatively few parameters.
Desired statistical quantities
In classical regularisation theory, the quantity of interest is the penalised maximum likelihood, given by Eq. (5). From a Bayesian perspective, this is the MAP estimate. One might also be interested in the mean input function, as well as pointwise 95 % credible intervals. The latter is important for determining the uncertainty of the estimate.
Estimation algorithms
MAP estimates can be obtained by optimising Eq. (5). Algorithms for optimisation in dynamical systems are studied in the field of optimal control.
When quantities other than MAP are of interest, inference methods such as Markov Chain Monte Carlo (MCMC) approaches can be employed [6]. Although not explored here, it would also be possible to use other sampling methods such as Sequential Monte Carlo [9] or analytical approximations such as Variational Bayesian methods [3].
Another estimation decision is how to select the regularisation parameter \(\tau\). The discrepancy criterion suggests selecting \(\tau\) so that the sum of squared residuals is equal to the expected sum of squared distances between the true function and the measurements. In ordinary and generalised crossvalidation, measurements are left out from the estimation procedure, and the ability to predict these leftout measurements is assessed. For linear Gaussian problems, it is possible to derive analytical maximumlikelihood criteria for \(\tau\) [35]. In the Lcurve approach [18], a MAP estimate is calculated for a large number of values for \(\tau\), and the data and regularisation cost terms \(E_D\) and \(E_W\) are plotted against each other. For a low \(\tau\), the data fit will be almost perfect and the data cost will be almost zero. Conversely, for high \(\tau\), the input function is forced to follow the regularisation criterion, and the regularisation cost will approach a minimum value. Between these extremes, there is a characteristic corner in the plot, where there is a reasonable tradeoff between data fit and regularity. In the Bayesian paradigm, \(\tau\) can be treated as an additional parameter, and can be estimated together with the basis function coefficients.
Optimal controlbased methods
The aim of optimal control is to select the input to a dynamical system that minimises some cost function. Here, the cost function is the negative log posterior. The optimisation problem can be formulated in multiple ways [4, 32].
In single shooting, only the parameters describing the input function are included as decision variables in the optimisation problem. The log posterior for a given input function is calculated by solving the system of differential equations using a numerical ODE solver. This is the most straightforward method.
In multiple shooting, the time course of the system is divided into a number of subintervals. For each such subinterval, the system of ODEs is solved numerically. The input function parameters as well as the state variables at the start of each interval are included as decision variables. Constraints are added to the problem to ensure that the resulting trajectories are continuous. This results in an optimisation problem that is larger than in single shooting, but it tends to be sparser and less nonlinear.
In collocation methods, no ODE solvers are used. Instead, the dynamic model is included in the form of equality constraints in the optimisation problem. The time course of the system is divided into a number of subintervals. In each subinterval, the state trajectories are approximated by loworder polynomials. A small number of collocation points are selected in each interval, and constraints are added to ensure that the solution of the system of ODEs is satisfied at these points. The input function parameters as well as the state variables at the start of each interval and at all collocation points are included as decision variables. Additional constraints are added to ensure that the state trajectory is continuous. This results in an even larger optimisation problem than multiple shooting, but it tends to be even sparser and less nonlinear.
Many optimisation methods rely on gradients and Hessians of the objective function. A straightforward way to compute these is to use finite differences. However, this can be inaccurate and slow, especially in high dimensions. A powerful alternative is to use automatic differentiation, where gradients are automatically computed by applying the chain rule of calculus to the objective function [31]. For numerical ODE solvers, it is also possible to solve the sensitivity equations to obtain gradients [2]. For inputestimation problems, a combination of these two methods can be used.
An advantage of optimal controlbased methods is that they can be very fast, even for highdimensional functional representations such as piecewise constant functions on a dense grid. It is easy to include extra constraints, which is helpful in avoiding unphysical answers, such as negative concentrations. The main disadvantage is that they can only provide MAP estimates. It is therefore hard to assess the uncertainty of the estimate.
Markov chain Monte Carlo (MCMC)
 1.
Propose a new sample \(\varvec{\theta }^\prime\) using an arbitrary proposal distribution \(p(\varvec{\theta }^\prime \varvec{\theta }^{(i1)})\).
 2.Calculate the Metropolis–Hastings ratio A:$$\begin{aligned} A = \min \left( 1, \frac{p(\varvec{\theta }^\prime \mathbf {y})p(\varvec{\theta }^{(i1)}\varvec{\theta }^\prime )}{p(\varvec{\theta }^{(i1)}\mathbf {y})p(\varvec{\theta }^\prime \varvec{\theta }^{(i1)})} \right) \end{aligned}$$
 3.
With probability A, set \(\varvec{\theta }^{(i)} = \varvec{\theta }^\prime\). Otherwise, set \(\varvec{\theta }^{(i)} = \varvec{\theta }^{(i1)}\).
In some cases, it is possible to sample from some of the parameters conditional on the other ones. This typically happens when conjugate priors are used, which are priors that have the same functional form as their corresponding posteriors. As an example, when Karhunen–Loève basis functions are used, their coefficients are normally distributed with a precision equal to the regularisation parameter. If the regularisation parameter is assigned a Gamma distribution prior, it can be shown that its distribution conditioned on the coefficients of the basis functions is also a Gamma distribution. Since efficient methods exist for sampling from Gamma and other standard distributions, this can be used to propose new parameter values. By inserting this proposal into the Metropolis–Hastings ratio, it can be shown that these proposals will always be accepted. This sampling method is called Gibbs sampling [6].
MCMC methods have the advantage that they can handle arbitrary models and estimate any kind of statistical quantity, including credible intervals. The downside is that they can be slow, since a large number of samples may have to be generated. Their performance can depend critically on the choice of proposal distribution. For highdimensional functional representations, finding a proposal distribution that gives acceptable performance can sometimes be challenging.
It is also possible to mix proposals, for example by using different updating mechanisms for different parameters. This is exemplified here in Case Study 2.
Even though the samples from the Markov chain are drawn from the desired distribution asymptotically, the initial part of the chain may be nonrepresentative and should be discarded. Various methods have been proposed to assess whether convergence to the desired distribution has been achieved [25]. In Geweke’s method, the mean and variance from different segments of the chain are compared [12]. The Gelman–Rubin method compares the withinchain and betweenchain variance of several Markov chains initialised in different parts of the parameter space [10]. In the Raftery–Lewis method, the user can specify that quantiles of particular parameters should be estimated to a given accuracy. By analysing statistics of where the parameters exceed these quantiles, an estimate of the number of samples to discard can be obtained [30].
Assessing the number of samples required for an accurate estimate is nontrivial. Since the samples generated by the Markov chain are correlated, N samples from the chain give a less accurate estimate than N independent samples. The previously mentioned Raftery–Lewis method can provide an estimate for the required number of samples. The effective sample size (ESS) is another way to assess the quality of the samples. It is a rough estimate of the number of independent samples required to obtain the same approximation error as the samples from the Markov chain and can be calculated from the autocorrelation of the generated samples [14, Sect. 7.1].
Previous work
Many previously used nonparametric inputestimation methods can be described using the framework presented above:
Verotta [37] gives a good overview of classical inputestimation (deconvolution) methods for linear systems. As a choice of prior, two approaches are suggested: either using the norm of the first or second derivative of the input function, or parametrising the input function with few enough parameters so that the problem becomes wellposed without the use of a prior. As basis functions, piecewise constant and spline functions are suggested. Obtaining point estimates using optimisation techniques is discussed. Suggested methods to assess uncertainty estimates include quadratic approximations around the MAP estimate and bootstrapping. The methods are tested on pharmacokinetic examples as well as an example involving estimating the secretion rate of lutenizing hormone.
Magni et al. [26] employ MCMC techniques to do full Bayesian inference using piecewise constant basis functions and a prior penalising the first or second derivative, and use this to estimate insulin secretion rate after a glucose stimulus.
Pillonetto et al. [29] suggest penalising the first derivative of the logarithm of the input function to handle nonnegativity constraints. Piecewise constant basis functions are used, and full inference is done using MCMC. The method is applied to estimate lutenising hormone secretion rate as well as to pharmacokinetic problems.
Pillonetto and Bell [28] suggest using Karhunen–Loève basis functions with various priors. Since their examples are unconstrained linear Gaussian models, full inference can be done analytically without resorting to sampling methods. However, MCMC is used to estimate the regularisation parameter. The methods are tested on synthetic test functions.
Hattersley et al. [20] use an entropic prior together with piecewise constant basis functions. MAP estimates are obtained using a Sequential Quadratic Programming optimisation method. The method was used to estimate the production rate of free light chains in multiple myeloma patients.
Case studies
Here, various inputestimation approaches are illustrated using two case studies, fully specified in the supplement. Both optimal controlbased and MCMC methods were implemented on top of CasADi, a framework for numeric optimisation [1]. This software can compute gradients and Hessians by automatic differentiation, and has interfaces to the ODE solver package SUNDIALS [21] together with the optimisation software IpOpt [38]. CasADi is implemented in C++, but at the time of writing, the recommended way to use it is to call it from Python.^{1} All the code for the case studies was written in Python 2.7, using CasADi version 2.4.1 for computing log posteriors, gradients and metric tensors as well as for performing optimisation with IpOpt. The code can be obtained at www2.warwick.ac.uk/fac/sci/eng/research/biomedical/impact/earlystageresearcher/magnustragardh/.
Case Study 1
In the original article, this was done by fitting a smoothing spline to the concentration measurements, and directly inverting the system of ODEs. While this approach yielded satisfactory results, it is somewhat simplistic. Since it enforces smoothness on the output function rather than on the input, it is difficult to determine what assumptions about the input function are actually being made. Additionally, no attempt was made to assess estimation uncertainty.
Penalisation of the second derivative, as in Eq. (7) was chosen as a prior. Two input parametrisations were investigated: Cubic Bsplines with breakpoints at the measurement times, and a piecewise constant function, discretised to 100 uniformly distributed intervals. Both MAP estimates and pointwise means and 95 % credible intervals were sought. To this end, optimal control methods as well as MCMC were used.
The time series are characterised by an early phase, where the plasma concentration has a large peak and the sampling is relatively dense, and a late phase, where the plasma concentration has declined and sampling is sparser (Fig. 3). To capture the initial peak, the amount of regularisation cannot be too large. On the other hand, too little regularisation can cause unrealistically large uncertainty in the latter sparsely sampled part. One way to mitigate this is to apply the prior to the logarithm of the input function rather than to the input function itself. This has the added benefit of automatically ensuring that the input function is always nonnegative. In the sequel, this will be referred to as the “logscale model”, while penalising the function itself will be referred to as the “linearscale model”.
As a first step, a suitable value for the regularisation parameter \(\tau\) was determined using the Lcurve approach [18]. To investigate the sensitivity to \(\tau\), the subsequent estimation was run using three different values: one at the “knee” of the curve, and one on either side of this. The resulting estimates were qualitatively similar for all three values. Fig. 4 shows a typical Lcurve.

MAP estimation using the cubic spline model.

MAP estimation using the piecewise constant model.

Full Bayesian estimation by MCMC using the cubic spline model.
For the optimal controlbased methods, single shooting was used for the spline model, since this made it easy to reuse the code for the MCMC estimation. For the piecewise constant model, single shooting was too inefficient and was replaced by collocation.
For MCMC, a simple componentwise Gaussian randomwalk Metropolis–Hastings algorithm was used. In this method, each parameter is updated individually, using a Gaussian proposal density centred on the current value. The variance of the proposal density was tuned during trial runs to maintain the acceptance rate between 0.2 and 0.5, and hence keep it reasonably close to optimal values [6, Sect. 4.2]. The tuning was performed by monitoring the acceptance rate every 100 iterations, and modifying the proposal variance if the acceptance rate was not in the range 0.2–0.5. The Markov chains were initialised to the MAP estimate given by the optimalcontrol techniques. 15,000 samples were drawn for each analysis.
Obtaining MAP estimates is in general computationally cheap. For the linearscale model, even the 100dimensional piecewiseconstant model could be solved using collocation methods in a matter of seconds on an ordinary workstation. On the other hand, these methods were unable to find a solution to the logscale version of the piecewiseconstant model. Therefore, all results for the logscale model were obtained using splines. MCMC is considerably more expensive—drawing 15,000 samples requires 20–30 min on the test machine. One reason for the long running time is that the parameters are updated one at a time, and the ODE solutions have to be recomputed for every update. Updating several parameters jointly would be more efficient, but finding good proposals for joint updates can be challenging. Methods for doing this are explored in Case Study 2.
The ESS, as defined in “Estimation Algorithms” section, was used as a measure of the quality of the samples. In particular, the ESS of the bioavailability was monitored, as it was considered to be the most important quantity. Typical ESS values ranged from 300 to 2000 samples. The notable exception was the linearscale model with the low dose, where the quality of the samples from the Markov chain was very poor despite the large number of samples and extensive tuning. Here, the typical ESS was around 10 samples. The results from that analysis are therefore very uncertain.
From Fig. 3, it can be seen that for the high dose, the linearscale model gives an underprediction of the initial peak at around 350 min. During 900–1400 min, the sparse sampling causes the uncertainty of the plasma concentration to drop below 0, clearly an unphysical result. The logscale model appears to capture the peak better and keeps the uncertainty within reasonable values. For the low dose, the poor mixing of the Markov chain makes it hard to make any conclusions about the linearscale model. The logscale model appears to capture the data well.
Case Study 2
The data in this case study come from a drugdiscovery project where the effect of two optimised monoclonal antibodies targeting fibroblast growth factor receptor (FGFR) 1c (R1c mAb opt1 and R1c mAb opt2) was studied by measuring energy intake and body mass over time [11]. The parent R1c mAb has previously been shown to cause profound body weight and body fat loss due to decreased food intake (with energy expenditure unaltered), thereby improving glucose control in dietinduced obese (DIO) mice [23]. Thus, inhibiting R1c has become an attractive target for developing novel therapies against obesity and diabetes. However, different R1/R1c mAbs have been shown to decrease body weight solely due to hypophagia or via a combined effect on both food intake and energy expenditure [23, 36, 39], thus demonstrating the importance of taking both caloric intake and expenditure into account when defining mechanisms for weightloss therapies. It is of great interest to be able to estimate energy intake without having to measure it directly, since methods for measuring energy intake can be unreliable, or expensive, or both [15]. The objective of this analysis was to investigate the possibility of estimating the energy intake from bodymass measurements alone.
The study consisted of seven groups of DIO mice: one vehicle group, and three groups of each of the two investigated substances, R1c mAb opt1 or R1c mAb opt2 administered as a single subcutaneous injection with doses of 0.3, 3 or 10 mg/kg. Each group comprised four mice, and energy intake was measured per group. Body mass was measured per individual, and group averages computed. Measurements were taken 9 days before treatment, and subsequently once per day or once every 2 days, up to 30 days after treatment. The analysis was performed using group means. All animal experiments were approved by the Gothenburg Ethics Committee for Experimental Animals.
As choice of prior, the first derivative of the energy intake was penalised, which is equivalent to modelling the energy intake as a random walk. For the bodymass measurements, a proportional 0.5 % measurement noise was assumed.
The input functions were represented using Karhunen–Loève basis functions [28]. Twenty basis functions were used, as it was found that adding more did not significantly influence the estimates. A constant term, which was not penalised, was added to allow the energy intake to start at a nonzero value. The regularisation parameter was treated as an unknown parameter, and estimated jointly with the basis function coefficients. It was assigned a Gamma distribution prior, which is a conjugate prior to the inverse variance of the basis function coefficients. This makes it possible to estimate it using Gibbs sampling. The Gamma distribution was assigned a shape parameter of 0.001 and a rate parameter of 0.001, in order to make the prior flat and thus to avoid making strong a priori assumptions about the parameter value.
 1.
A good starting point for MCMC sampling was determined. This was done by fixing the regularisation parameter to a high value, and optimising the log posterior with respect to the basis coefficients.
 2.MCMC sampling was done by alternating between the following updates:
 (a)
Updating the regularisation parameter using Gibbs sampling.
 (b)
Jointly updating the coefficients of the basis functions using SMMALA.
 (a)
Running time and ESS for the timeseries in Case Study 2
Dose group  Method  Number of samples  Time (s)  Median ESS  Median ESS (s) 

Vehicle  SMMALA  5000  151.1  940.8  6.2 
R1c mAb opt1 (0.3 mg/kg)  SMMALA  5000  172.6  685.5  4.0 
R1c mAb opt1 (3 mg/kg)  SMMALA  5000  179.1  601.6  3.4 
R1c mAb opt1 (10 mg/kg)  SMMALA  5000  172.6  694.6  4.0 
R1c mAb opt2 (0.3 mg/kg)  SMMALA  5000  151.6  866.8  5.7 
R1c mAb opt2 (3 mg/kg)  SMMALA  5000  162.9  828.8  5.1 
R1c mAb opt2 (10 mg/kg)  SMMALA  5000  173.4  551.0  3.2 
R1c mAb opt1 (10 mg/kg)  RWMH  50,000  142.4  267.2  1.9 
For comparison, a similar sampling scheme was tested on a single timeseries, where the basis coefficients were jointly updated using the Random Walk Metropolis–Hastings (RWMH). The proposal covariance matrix was selected by computing the metric tensor at the MAP estimate, and scaling this to get an acceptance rate around 20–30 %. RWMH can draw samples approximately 10 times faster than SMMALA, since it does not need to evaluate the Jacobian. However, in terms of effective samples per second, SMMALA gives better performance, since its samples are considerably less correlated (Fig. 7). Furthermore, several trial runs had to be made to find a good scaling factor for RWMH, something which has to be done separately for each timeseries. In contrast, SMMALA required no manual tuning. Running time and median effective sample size are shown in Table 1.
A weakness in the current work is that the measured energy intake was used to estimate the physical activity parameters \(\lambda _i\), \(i \in \{0, 1, 2, 3\}\). Since the purpose here was to evaluate the possibility of estimating the energy intake given that the system dynamics are known, this was considered acceptable. For the methods to be useful in experiments where the energy intake is not measured, it would be necessary to characterise the vehicle and drug effect on the physical activity by a generic model.
Discussion
Both case studies show that, in terms of computational speed, MAP estimates can be obtained quickly for these kinds of models. Obtaining full posteriors in a reasonable amount of time is considerably more difficult. Case Study 1 shows that naive application of MCMC methods does not perform well in certain cases. SMMALA can be a good default choice, since it can efficiently update several parameters jointly, and does not require any userspecified tuning. It could also be worthwhile to investigate more advanced MCMC proposals, such as Hamiltonian Monte Carlo and Riemannian Manifold Hamiltonian Monte Carlo methods [14].
An alternative way to obtain estimates is to make multiple MAP estimates using bootstrapping. Note, however, that this will give a frequentist estimate of estimator variance rather than of uncertainty in the Bayesian sense. Great care has to be taken when interpreting and comparing uncertainty estimates from different methods.
It may also be worthwhile to investigate other parametrisations. Although the Karhunen–Loève expansion has appealing theoretical properties, it has the disadvantage that all basis functions are nonzero at all time points, making it necessary to sum all of them when evaluating at a single time point. They also make it difficult to impose nonnegativity constraints.
The most principled and automated way to estimate the regularisation parameter is the Bayesian method of treating it as an additional random variable to be estimated from the data. However, for certain problems this method can have robustness issues. The Lcurve method is an alternative, but it is far from ideal in terms of usability, since it requires the user to plot and manually select a value from the curve.
A disadvantage of using a linearscale input model is that it does not rule out negative energy intakes. It is possible to impose nonnegativity constraints by using a logscale input model, as in Case Study 1. However, such a model may be hard to justify in certain cases, and it makes it more difficult to find efficient sampling methods. Another way to impose constraints would be to simply reject all proposed input functions that drop below zero. This could be reasonably efficient when only a small portion of the unconstrained distribution is below zero. In other cases, it could lead to prohibitively large rejection rates. A pragmatic approach is to use the unconstrained results as is, acknowledging that it is just an approximation. This is what has been done in Case Study 2.
In the case studies, uncertainties in the system model are not taken into account. Adding this could improve the statistical soundness and give more realistic estimates. This could be done by including uncertain system parameters in the estimation problem, estimating them jointly with the input function. Although it may be computationally expensive, it would not require any conceptual changes to the methods. It may also be possible to embed these methods in a nonlinear mixed effects (NLME) framework.
CasADi has proved to be a valuable tool in that it allows the user to easily obtain gradients and Hessians for complicated functions, which can include calls to numerical ODE solvers. The details involved in formulating and solving the sensitivity equations are handled automatically by the software.
Since the performance of the methods may be problemspecific, it is important to evaluate them using multiple datasets and models. Only then can any conclusions about general usefulness be made.
Conclusions and future work
Numerous drug discovery deconvolutionapplications have been reported over the years, and it is evident that there is a need for useful methods also for the more general nonlinear case, referred to as input estimation. This work serves as a highly promising starting point for application of inputestimation methods to problems in drug discovery: it gives a rigorous definition of the problem, it lists main methods and how these can be implemented, and discusses the application of the methods to realistic case studies. Additionally, the usefulness of CasADi for implementing these methods has been investigated and verified.
The presented methods for optimal control for input estimation can be recommended for use in drug discovery. The MCMCbased methods work well in certain cases, but they can have long running times, and care has to be taken to make sure that the parameter space is well explored. Further improvements would be desirable before they can be recommended for use by nonexperts.
Suggestions for future work in this area include investigating the choice of prior and basis functions. It can be of interest to evaluate additional criteria for choosing the regularisation parameter. Note, however, that the regularisation parameter does not require any special treatment when Bayesian methods are used. Additionally, more advanced MCMC methods can be evaluated and alternative criteria can be considered for determining the number of MCMC samples necessary to obtain an accurate estimate.

Computational speed. The method should be fast enough to appeal to modellers working under time constraints.

Usability. It should be possible for nonexperts to use the method.

Statistical soundness. All assumptions should be reasonable and explicitly stated. Sources of uncertainty should be accounted for.

Usefulness. The algorithm should be applicable to as many inputestimation problems as possible.
Footnotes
 1.
www.python.org.
Notes
Acknowledgments
This work is funded through the Marie Curie FP7 People ITN European Industrial Doctorate (EID) project, IMPACT (Innovative Modelling for Pharmacological Advances through Collaborative Training). Project Number: 316736.
Supplementary material
References
 1.Andersson J (2013) A generalpurpose software framework for dynamic optimization. Ph.D. Thesis, Arenberg Doctoral School, KU Leuven, Department of Electrical Engineering (ESAT/SCD) and Optimization in Engineering Center, Kasteelpark Arenberg 10, 3001Heverlee, BelgiumGoogle Scholar
 2.Bartlett R (2008) A derivation of forward and adjoint sensitivities for ODEs and DAEs. Tech. rep., Tech. Rep. SAND20076699, Sandia National LaboratoriesGoogle Scholar
 3.Beal MJ (2003) Variational algorithms for approximate Bayesian inference. Ph.D. Thesis, University of LondonGoogle Scholar
 4.Betts JT (2010) Practical methods for optimal control and estimation using nonlinear programming. Siam, PhiladelphiaCrossRefGoogle Scholar
 5.de Boor C (1986) B(asic)spline basics. Tech. rep., DTIC DocumentGoogle Scholar
 6.Brooks S, Gelman A, Jones G, Meng XL (2011) Handbook of Markov Chain Monte Carlo. CRC Press, Boca RatonCrossRefGoogle Scholar
 7.Csajka C, Drover D, Verotta D (2005) The use of a sum of inverse Gaussian functions to describe the absorption profile of drugs exhibiting complex absorption. Pharmaceut Res 22(8):1227–1235. doi:10.1007/s1109500552668. http://www.ncbi.nlm.nih.gov/pubmed/16078132
 8.De Nicolao G, Sparacino G, Cobelli C (1997) Nonparametric input estimation in physiological systems: problems, methods, and case studies. Automatica 33(5):851–870. doi:10.1016/S00051098(96)002543. http://linkinghub.elsevier.com/retrieve/pii/S0005109896002543
 9.Del Moral P, Doucet A, Jasra A (2006) Sequential monte carlo samplers. J Roy Stat Soc B 68(3):411–436CrossRefGoogle Scholar
 10.Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4):457–472. doi: 10.1214/ss/1177011136 CrossRefGoogle Scholar
 11.Gennemark P, JanssonLöfmark R, Hyberg G, Wigstrand M, KakolPalm D, Håkansson P, Hovdal D, Brodin P, FritschFredin M, Antonsson M, Ploj K, Gabrielsson J (2013) A modeling approach for compounds affecting body composition. J Pharmacokinet Pharmacodyn 40(6):651–667CrossRefPubMedGoogle Scholar
 12.Geweke J (1992) Evaluating the accuracy of samplingbased approaches to calculating posterior moments. In: Bernardo JM, Berger J, Dawid AP, Smith JFM (eds) Bayesian statistics. Oxford University Press, Oxford, pp 169–193Google Scholar
 13.Gilks WR, Spiegelhalter DJ, Richardson S (eds) (1996) Practical Markov Chain Monte Carlo. Chapman and Hall, LondonGoogle Scholar
 14.Girolami M, Calderhead B (2011) Riemann manifold langevin and hamiltonian monte carlo methods. J Roy Stat Soc B Met 73(2):123–214CrossRefGoogle Scholar
 15.Göbel B, Sanghvi A, Hall KD (2014) Quantifying energy intake changes during obesity pharmacotherapy. Obesity 22(10):2105–2108CrossRefPubMedPubMedCentralGoogle Scholar
 16.Guo J, Hall KD (2009) Estimating the continuoustime dynamics of energy and fat metabolism in mice. PLoS Comput Biol 5(9):511CrossRefGoogle Scholar
 17.Guo J, Hall KD (2011) Predicting changes of body weight, body fat, energy expenditure and metabolic fuel selection in C57BL/6 mice. PLoS One 6(1):e15961CrossRefPubMedPubMedCentralGoogle Scholar
 18.Hansen PC, O’Leary DP (1993) The use of the Lcurve in the regularization of discrete illposed problems. SIAM J Sci Comput 14(6):1487–1503CrossRefGoogle Scholar
 19.Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109CrossRefGoogle Scholar
 20.Hattersley J, Evans ND, Chappell M, Mead G, Hutchison C, Bradwell A (2008) Nonparametric prediction of freelight chain generation in multiple myeloma patients. In: 17th International Federation of Automatic Control World Congress. Seoul, pp 8091–8096Google Scholar
 21.Hindmarsh AC, Brown PN, Grant KE, Lee SL, Serban R, Shumaker DE, Woodward CS (2005) SUNDIALS: suite of nonlinear and differential/algebraic equation solvers. ACM Trans Math Softw 31(3):363–396CrossRefGoogle Scholar
 22.Johansson CC, Gennemark P, Artursson P, Äbelö A, Ashton M, JanssonLöfmark R (2013) Population pharmacokinetic modeling and deconvolution of enantioselective absorption of eflornithine in the rat. J Pharmacokinet Pharmacodyn 40(1):117–28CrossRefPubMedGoogle Scholar
 23.Lelliott CJ, Ahnmark A, Admyre T, Ahlstedt I, Irving L, Keyes F, Patterson L, Mumphrey MB, Bjursell M, Gorman T, BohloolyY M, Buchanan A, Harrison P, Vaughan T, Berthoud HR, Lindén D (2014) Monoclonal antibody targeting of fibroblast growth factor receptor 1c ameliorates obesity and glucose intolerance via central mechanisms. PLoS One 9(11):e112109. doi:10.1371/journal.pone.0112109. http://dx.plos.org/10.1371/journal.pone.0112109
 24.Levy BC (2008) Principles of signal detection and parameter estimation. Springer Science & Business Media, BerlinCrossRefGoogle Scholar
 25.Magni P, Sparacino G (2014) Parameter estimation. In: Carson E, Cobelli C (eds) Modeling methodology for physiology and medicine, 2nd edn. Elsevier, Oxford, pp 83–110CrossRefGoogle Scholar
 26.Magni P, Bellazzi R, De Nicolao G (1998) Bayesian function learning using MCMC methods. IEEE Trans Pattern Anal Mach Intell 20(12):1319–1331CrossRefGoogle Scholar
 27.Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092CrossRefGoogle Scholar
 28.Pillonetto G, Bell BM (2007) Bayes and empirical Bayes semiblind deconvolution using eigenfunctions of a prior covariance. Automatica 43(10):1698–1712CrossRefGoogle Scholar
 29.Pillonetto G, Sparacino G, Cobelli C (2002) Handling nonnegativity in deconvolution of physiological signals: a nonlinear stochastic approach. Ann Biomed Eng 30(8):1077–1087CrossRefPubMedGoogle Scholar
 30.Raftery AE, Lewis SM (1996) The number of iterations, convergence diagnostics and generic metropolis algorithms. In: Gilks WR, Spiegelhalter DJ, Richardson S (eds) Practical Markov Chain Monte Carlo. Chapman and Hall, London, pp 115–130Google Scholar
 31.Rall LB, Corliss GF (1996) An introduction to automatic differentiation. Computational differentiation: techniques, applications, and tools. SIAM, Philadelphia, pp 1–17Google Scholar
 32.Rao A (2009) A survey of numerical methods for optimal control. Adv Astronaut Sci 135(1):497–528Google Scholar
 33.Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, CambridgeGoogle Scholar
 34.Robert C, Casella G (2013) Monte Carlo statistical methods. Springer Science & Business Media, New YorkGoogle Scholar
 35.Sparacino G, Nicolao GD, Pillonetto G, Cobelli C (2014) Deconvolution. In: Carson C (ed) Modeling methodology for physiology and medicine, 2nd edn. Elsevier, Oxford, pp 45–68CrossRefGoogle Scholar
 36.Sun HD, Malabunga M, Tonra JR, DiRenzo R, Carrick FE, Zheng H, Berthoud HR, McGuinness OP, Shen J, Bohlen P et al (2007) Monoclonal antibody antagonists of hypothalamic FGFR1 cause potent but reversible hypophagia and weight loss in rodents and monkeys. Am J Physiol 292(3):E964–E976Google Scholar
 37.Verotta D (1996) Concepts, properties, and applications of linear systems to describe distribution, identify input, and control endogenous substances and drugs in biological systems. Crit Rev Biomed Eng 24:2–3CrossRefGoogle Scholar
 38.Wächter A, Biegler LT (2006) On the implementation of an interiorpoint filter linesearch algorithm for largescale nonlinear programming. Math Program 106(1):25–57CrossRefGoogle Scholar
 39.Wu AL, Kolumam G, Stawicki S, Chen Y, Li J, ZavalaSolorio J, Phamluong K, Feng B, Li L, Marsters S et al (2011) Amelioration of type 2 diabetes by antibodymediated activation of fibroblast growth factor receptor 1. Sci Transl Med 3:113ra126CrossRefPubMedGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.