Abstract
Ordinary differential equations are arguably the most popular and useful mathematical tool for describing physical and biological processes in the real world. Often, these physical and biological processes are observed with errors, in which case the most natural way to model such data is via regression where the mean function is defined by an ordinary differential equation believed to provide an understanding of the underlying process. These regression based dynamical models are called differential equation models. Parameter inference from differential equation models poses computational challenges mainly due to the fact that analytic solutions to most differential equations are not available. In this paper, we propose an approximation method for obtaining the posterior distribution of parameters in differential equation models. The approximation is done in two steps. In the first step, the solution of a differential equation is approximated by the general one-step method which is a class of numerical numerical methods for ordinary differential equations including the Euler and the Runge-Kutta procedures; in the second step, nuisance parameters are marginalized using Laplace approximation. The proposed Laplace approximated posterior gives a computationally fast alternative to the full Bayesian computational scheme (such as Makov Chain Monte Carlo) and produces more accurate and stable estimators than the popular smoothing methods (called collocation methods) based on frequentist procedures. For a theoretical support of the proposed method, we prove that the Laplace approximated posterior converges to the actual posterior under certain conditions and analyze the relation between the order of numerical error and its Laplace approximation. The proposed method is tested on simulated data sets and compared with the other existing methods.
Similar content being viewed by others
References
Alligood, K., Sauer, T., Yorke, J.: Chaos: An Introduction to Dynamical Systems. Springer, New York (1997)
Azevedo-Filho, A., Shachter, R.D.: Laplace’s method approximations for probabilistic inferencein belief networks with continuous variables. In: Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence, pp. 28–36. Morgan Kaufmann Publishers, San Francisco, CA, USA (1994). UAI’94
Bacaër, N.: Verhulst and the logistic equation (1838). In: A Short History of Mathematical Population Dynamics, pp. 35–39. Springer, London (2011)
Barber, D., Wang, Y.: Gaussian processes for bayesian estimation in ordinary differential equations. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1485–1493, (2014)
Bard, Y.: Nonlinear parameter estimation. Academic Press, Cambridge (1974)
Calderhead, B., Girolami, M., Lawrence, N.D.: Accelerating bayesian inference over nonlinear differential equations with gaussian processes. Adv Neural Inf Process Syst 21, 217–224 (2009)
Campbell, D.: Bayesian collocation tempering and generalized profiling for estimation of parameters from differential equation models. Canadian theses, McGill University (Canada), (2007)
Cao, J., Ramsay, J.O.: Generalized profiling estimation for global and adaptive penalized spline smoothing. Comput. Stat. Data Anal. 53(7), 2550–2562 (2009)
Cao, J., Fussmann, G.F., Ramsay, J.O.: Estimating a predator-prey dynamical model with the parameter cascades method. Biometrics 64(3), 959–967 (2008)
Cao, J., Wang, L., Xu, J.: Robust estimation for ordinary differential equation models. Biometrics 67(4), 1305–1313 (2011)
Dondelinger, F., Filippone, M., Rogers, S., Husmeier, D.: Ode parameter inference using adaptive gradient matching with gaussian processes. J. Mach. Learn. Res.—Workshop and Conference Proceedings: The 16th international conference on artificial intelligence and statistics, vol 31, 216–228 (2013)
FitzHugh, R.: Impulses and physiological states in theoretical models of nerve membrane. Biophys. J. 1, 445–466 (1961)
Fussmann, G.F., Ellner, S.P., Shertzer, K.W., Hairston Jr., N.G.: Crossing the hopf bifurcation in a live predator-prey system. Science 290(5495), 1358–1360 (2000)
Gelman, A., Bois, F., Jiang, J.: Physiological pharmacokinetic analysis using population modeling and informative prior distributions. J. Am. Stat. Assoc. 91, 1400–1412 (1996)
Geyer, CJ.: Statistics MUMSO Markov Chain Monte Carlo Maximum Likelihood. Defense Technical Information Center, (1992)
Haario, H., Saksman, E., Tamminen, J.: An adaptive metropolis algorithm. Bernoulli 7, 223–242 (2001)
Haario, H., Laine, M., Mira, A., Saksman, E.: Dram: efficient adaptive MCMC. Stat. Comput. 16(4), 339–354 (2006)
Hastings, W.K.: Monte carlo sampling methods using markov chains and their applications. Biometrika 57(1), 97–109 (1970)
Hodgkin, A., Huxley, A.: A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500–544 (1952)
Hooker, G.: Forcing function diagnostics for nonlinear dynamics. Biometrics 65(3), 928–936 (2009)
Hooker, G., Ellner, S.P., Roditi, L.D.V., Earn, D.J.: Parameterizing state–space models for infectious disease dynamics by generalized profiling: measles in ontario. J. R. Soc. Interface 17, rsif20100412 (2010)
Hooker, G., Xiao, L.: CollocInfer: Collocation Inference for Dynamic Systems, r package version 1.0.1. http://CRAN.R-project.org/package=CollocInfer, (2014)
Huang, Y., Liu, D., Wu, H.: Hierarchical Bayesian methods for estimation of parameters in a longitudinal HIV dynamic system. Biometrics 62(2), 413–423 (2006). doi:10.1111/j.1541-0420.2005.00447.x
Incropera, F.P.: Fundamentals of Heat and Mass Transfer. Wiley, New York (2006)
Joshi, C., Wilson, S.: Grid based bayesian inference for stochastic differential equation models. Technical Paper Trinity College, Dublin (2011)
Kermack, W.O., McKendrick, A.: A contribution to the mathematical theory of epidemics. In Proceedings of the Royal Society of London Series A, containing papers of a mathematical and physical character, vol. 115, pp. 700–721, (1927)
Law, R., Murrell, D.J., Dieckmann, U.: Population growth in space and time: spatial logistic equations. Ecology 84(1), 252–262 (2003)
Macdonald, B., Higham, C., Husmeier, D.: Controversy in mechanistic modelling with gaussian processes. In: J. Mach. Learn. Res.: Workshop and Conference Proceedings, vol. 37, pp. 1539–1547. Microtome Publishing, (2015)
Mathews, J., Fink, K.: Numerical Methods Using MATLAB. Featured Titles for Numerical Analysis Series. Pearson Prentice Hall, Upper Saddle River (2004)
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., Teller, E.: Equation of state calculations by fast computing machines. J. Chem. Phys. 21(6), 1087–1092 (1953)
Nagumo, J., Arimoto, S., Yoshizawa, S.: An active pulse transmission line simulating nerve axon. Proc IRE 50, 2061–2070 (1962)
Palais, R.S., Palais, R.A.: Differential Equations, Mechanics, and Computation, vol. 51. American Mathematical Soc, Providence (2009)
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer Series in Statistics, 2nd edn. Springer, New York (2005)
Ramsay, J.O., Hooker, G., Campbell, D., Cao, J.: Parameter estimation for differential equations: a generalized smoothing approach. J. R. Stat. Soc. Ser. B 69(5), 741–796 (2007). doi:10.1111/j.1467-9868.2007.00610.x
Rue, H., Martino, S., Chopin, N.: Approximate bayesian inference for latent gaussian models by using integrated nested laplace approximations. J. R. Stat. Soc. Ser. B 71(2), 319–392 (2009)
Schmidt, L.D.: The Engineering of Chemical Reactions, 2/E Topics in Chemical Engineering. Oxford University Press, Incorporated (2005)
Soetaert, K., Petzoldt, T.: Inverse modelling, sensitivity and monte carlo analysis in R using package FME. J. Stat. Softw. 33(3), 1–28 (2010). http://www.jstatsoft.org/v33/i03/
Süli, E.: Numerical solution of ordinary differential equations. Lecture Notes at University of Oxford, (2014)
Tierney, L., Kadane, J.B.: Accurate approximations for posterior moments and marginal densities. J. Am. Stat. Assoc. 81(393), 82–86 (1986)
Varah, J.M.: A spline least squares method for numerical parameter estimation in differential equations. SIAM J. Sci. Stat. Comput. 3(1), 28–46 (1982). doi:10.1137/0903003
Xue, H., Miao, H., Wu, H.: Sieve estimation of constant and time-varying coefficients in nonlinear ordinary differential equation models by considering both numerical error and measurement error. Ann. Stat. 38(4), 2351 (2010)
Acknowledgments
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2011-0030037). Research of Sarat C. Dass was partially supported by an ERGS grant from the Government of Malaysia ERGS/1/2013/STG06/UTP/02/02
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Computation of \({\ddot{g}}_n(x_1)\).
Recall that
where \(x_i = x(t_i)\) for \(i=1,2,\ldots ,n\) and \(x(t) = (x_1(t), x_2(t),\ldots , x_p(t))^T\). For the following discussion, we use the following convention for vectors and matrices. Suppose we have an array of real numbers \(a_{ijk}\) with indices \(i=1,2,\ldots , I\), \(j=1,2,\ldots , J\) and \(k=1,2,\ldots , K\). Let \((a_{ijk})_{(i)}\) denote the column vector with dimension I
and \((a_{ijk})_{(j,k)}\) denote the matrix with dimensions \(J \times K\)
The indices in the the subscript with parenthesis are the indices running in the vector or the matrix. The object with one running index is a column vector, while the object with two running indices a matrix where the first and the second running index are for the row and column, respectively.
Note that
where \(g_{ni}(x_1) = y_i^T y_i - 2 x_i^T y_i + x_i^T x_i\). Thus, the (l, k)th element of \({\ddot{g}}_n(x_1)\) is
Note
and
The above equation can be written in a matrix form
Thus,
The derivatives of \(x_i\) with respect to \(x_1\) can be computed by using the sensitivity equation for ODE. See Hooker (2009). Let
be the sensitivity of the state \(x_{j}\) with respect to the initial value \(x_{1l}\). The sensitivity equation is given by
or in matrix notation,
with an initial condition \(Z(t_1) = I_p\). For given \(\theta \) and t, the coefficient \(\partial f_j(x, t ;\theta ) / \partial x_u(t)\) is calculated easily. It is a linear ODE problem whose initial condition is known. We can solve (14) using some numerical methods such as Runge-Kutta method. \(\partial ^2 x_{ij} / (\partial x_{1l} \partial x_{1k})\) can be computed similarly.
Appendix 2: Proof of Theorem 1
Proof
The results of Tierney and Kadane (1986) and Azevedo-Filho and Shachter (1994) assume several regularity conditions such as the existence of a unique global maximum as well as the existence of higher order derivatives (up to sixth order) of the likelihood function. In particular, our methods for approximating the ODE model work only under the assumption of a unique maximum of the likelihood function. Thus, we assume that the likelihood surface does not include any ridges (that is, continuum regions with equal values of the global maximum).
Using the result in Tierney and Kadane (1986) and Azevedo-Filho and Shachter (1994), we have
where \(c_m = \int L_m(\theta ,\tau ^2,x_1)\pi (\theta ,\tau ^2,x_1) dx_1 d\theta d\tau ^2\). Note the full likelihood \(L(\theta , \tau ^2, x_1)\) is
and \(L_m(\theta , \tau ^2, x_1)\) is the corresponding term with \(g_n\) replaced by \(g_n^m\). If \(L_m(\theta ,\tau ^2,x_1)\) converges to \(L(\theta ,\tau ^2,x_1)\) as \(m \rightarrow \infty \) for all \(\theta \in \varTheta , \tau ^2 >0, x_1 \in {\mathbb {R}}^p\) and \(\mathbf{y}_n\), by the dominated convergence theorem, \(c_m \longrightarrow c\) as \(m \rightarrow \infty \). Thus,
which is the desired result.
To complete the proof, we need to prove \(L_m(\theta ,\tau ^2,x_1) \longrightarrow L(\theta ,\tau ^2,x_1)\) as \(m\rightarrow \infty \), and it suffices to prove \(ng_n^m(x_1) \longrightarrow ng_n(x_1)\) as \(m\rightarrow \infty \). Since we assume the Lipschitz continuity of f, the ODE has a unique solution with initial condition \(x(t_1) = x_1\). Assumptions A1 and A3 implies
for some constants \(B >0\). The local errors of the Kth order numerical method are given by
for some \(B'>0\), which depends only on \(\sup _{ t} \Vert d^K f(x,t;\theta )/(dt^K) \Vert \) \(\le B\) (Palais and Palais 2009). Thus, the local errors are uniformly bounded. It implies the global errors uniformly bounded
for some constant \(C>0\).
Thus,
where \(\sup _{t \in [T_0,T_1]}\Vert y(t)\Vert < C_y < \infty \), \(\sup _{t \in [T_0,T_1]}\Vert x(t)\Vert < C_x <\infty \). This completes the proof. \(\square \)
Appendix 3: Proof of Theorem 2
Proof
If \(\alpha > 5/(2K)\), as n goes to infinity, \(n(h/m)^K = O(n^{1-\alpha K}) = O(n^{-3/2})\) and it converges to to zero. Under \(A1-A3\), we have shown in the proof of Theorem 1 that \(| ng_n(x_1) - ng_n^m(x_1)| = O(n (h/m)^K)\). For fixed \(\tau ^2>0\),
because \(e^{x} = 1+ O(x)\) for sufficiently small x. It implies
for sufficiently large n. If \(\alpha > 5/(2K)\), i.e., \(n(h/m)^K \le n^{-3/2}\), \( (1+ O(n^{-3/2})) \times (1+ O(n (h/m)^K))\) is \( (1+ O(n^{-3/2}))\).\(\square \)
Rights and permissions
About this article
Cite this article
Dass, S.C., Lee, J., Lee, K. et al. Laplace based approximate posterior inference for differential equation models. Stat Comput 27, 679–698 (2017). https://doi.org/10.1007/s11222-016-9647-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-016-9647-0