Solution of the chemical master equation by radial basis functions approximation with interface tracking
Abstract
Background
The chemical master equation is the fundamental equation of stochastic chemical kinetics. This differentialdifference equation describes temporal evolution of the probability density function for states of a chemical system. A state of the system, usually encoded as a vector, represents the number of entities or copy numbers of interacting species, which are changing according to a list of possible reactions. It is often the case, especially when the state vector is highdimensional, that the number of possible states the system may occupy is too large to be handled computationally. One way to get around this problem is to consider only those states that are associated with probabilities that are greater than a certain threshold level.
Results
We introduce an algorithm that significantly reduces computational resources and is especially powerful when dealing with multimodal distributions. The algorithm is built according to two key principles. Firstly, when performing time integration, the algorithm keeps track of the subset of states with significant probabilities (essential support). Secondly, the probability distribution that solves the equation is parametrised with a small number of coefficients using collocation on Gaussian radial basis functions. The system of basis functions is chosen in such a way that the solution is approximated only on the essential support instead of the whole state space.
Discussion
In order to demonstrate the effectiveness of the method, we consider four application examples: a) the selfregulating gene model, b) the 2dimensional bistable toggle switch, c) a generalisation of the bistable switch to a 3dimensional tristable problem, and d) a 3dimensional cell differentiation model that, depending on parameter values, may operate in bistable or tristable modes. In all multidimensional examples the manifold containing the system states with significant probabilities undergoes drastic transformations over time. This fact makes the examples especially challenging for numerical methods.
Conclusions
The proposed method is a new numerical approach permitting to approximately solve a wide range of problems that have been hard to tackle until now. A full representation of multidimensional distributions is recovered. The method is especially attractive when dealing with models that yield solutions of a complex structure, for instance, featuring multistability.
Keywords
CME Adaptivity Toggle switch Multistability Cell differentiationAbbreviations
 CME
Chemical master equation
 dCME
Discrete chemical master equation
 DNA
Deoxyribonucleic acid
 FSP
Finite state projection
 GBF
Gaussian basis function
 MC
Monte Carlo
 mRNA
Messenger ribonucleic acid
 MSC
Mesenchymal stem cell
 OCS
Osteochondro switch
Probability density function
 TR
Transcriptional regulator
 SSA
Stochastic simulation algorithm
Background
Temporal evolution of biological systems is often driven by the interaction between different types of particles which, depending on the applications, can represent molecules, bacterias, animals, or other discrete units. In nature, in nearly every process, particle numbers are subject to random fluctuations caused by inherent stochastic noise. Simulations of such systems are usually based on Monte Carlo (MC) simulations of the underlying Markov jump processes, such as Gillespie’s famous stochastic simulation algorithm (SSA) [1]. These methods share some common disadvantages: there is always a sampling error that, in general, is difficult to estimate; the convergence can be quite slow too. Even computing single realisations can be quite costly if many fast reactions are present; therefore approximate MC methods like τleaping [2], averaging approaches [3, 4], and deterministicstochastic hybrid formulations [5, 6, 7] have been introduced. The applicability of these approaches depends on the existence of a permanent timescale gap that allows to clearly distinguish between fast and slow reactions.
An alternative approach is to directly compute the probability density function (PDF) as a solution of the chemical master equation (CME). Solving the CME numerically on a large state space with a huge number of unknowns is known to be difficult [8]. Various hybrid methods were proposed to cope with the curse of dimensionality, a phenomenon that refers to the rapidly increasing number of unknowns when parametrising a multidimensional system [9, 10, 11].
However, in many cases the probability distribution has ‘significant’ values only on a very small portion of the whole state space. Here, ‘significant’ means being distinguishable from zero and refers to a value that is larger than a predefined small tolerance. This fact has motivated an exploration towards special numerical methods that exploit this feature. For example, Deuflhard et al. (twodimensional case) [12] and Cotter et al. (threedimensional case) [13] applied sophisticated adaptive finite element methods to solve the CME, but their approach is limited to lowdimensional problems. To cope with multidimensional problems, methods based on truncation of the CME to finite state space have been developed, such as the Finite State Projection (FSP) method [14, 15, 16] or the finite buffer discrete chemical master equation (dCME) [17, 18]. Based on the FSP, Kazeev et al. used Quantized Tensor Trains for a direct solution of the CME [19].
A very different approach has been taken by Wolf et al. [20], who suggested an algorithm defining a rectangular window in the state space, enclosing the essential part of the distribution that allows to perform parametrisation of the distribution with a small number of parameters. In the current paper we will try to take the latter idea even further. Firstly, we consider cases that are not restricted to one or two dimensions but try to develop a general approach. Secondly, we allow an arbitrary shape of the ‘window’ by considering a manifold that contains system states with probabilities greater than a predefined threshold. Thirdly, we employ the projection on Gaussian basis functions (GBF) to further reduce the computational costs.
The concept of GBF approximation has formerly been applied to various problems in polymer chemistry [21, 22, 23] and colloidal physics [24]. In this paper it is extended to the CME. To account for the fact that at a specific time point only a small part of the system states has to be considered, the system of basis functions is adapted on every time step. The idea behind the adaptation is that the unknown distribution is parametrised using only those GBFs that contribute to probability values that are significantly greater than zero. This procedure allows for a smart approximation with a very low number of approximation parameters even in the case of multidimensional distributions. The total number of parameters is not constant in time but changes according the distribution’s complexity. This approach also allows to capture multimodal cases where a timedependent process leads to splitting/merging of a few disjoint parts of the distribution. One example for such a process is the genetic toggle switch model that typically leads to multistable solutions [25]. Even a more comprehensive behaviour can be observed in CMEs that model cell differentiation.
Mesenchymal stem cells (MSCs) are multipotent stromal cells that can differentiate into a variety of cell types, including osteoblasts (bone cells) and chondrocytes (cartilage cells). When derived form adults, one of the applications is related to transplantation, namely either to promote regeneration of diseased or damaged tissue or to rescue defective genes [26]. Foster et al. developed a mathematical model for cell differentiation that predicts presence of multiple stable states for differentiated cells, bifurcations and switchlike transitions [27, 28]. Later, Schittler et al. expanded the model to include the progenitor state and studied the system of binary differentiation with respect to various stimuli [29]. Despite an advanced level of the mathematical description, models presented in Refs. [26, 29] recover single trajectories for the evolution of biological systems, while realistic systems of that kind are known to be composed of a whole population of cells. In theory, the transition from an ordinary differential equation model, that results in single trajectories, to a CME, that describes the evolution for the whole population of cells, is a matter of pure formalism. However, it is the complexity of algorithms one has to cope with when solving the equations numerically, that kept researchers out from the full, threedimensional solution to the CME problem until now.
Methods
In Eq. (3), \(\nu _{i} \in \mathbb {Z}^{d}\) denotes the stoichiometric vector that defines jumps to new states x+ν _{ i } via the i ^{th} reaction channel. The xdependent coefficients a _{ i }(x) indicate the i ^{th} propensity function.
Here, ν _{ i,k } denotes the k ^{ t h } component of the stoichiometric vector ν _{ i }, and W _{ a }:L→L is a multiplicative operator that takes the probability distribution u(x) to its weighted form a(x)u(x). The representation (6) is especially convenient when implementing the approximation technique.
Results
Even though. the matrix exponentiation is used in (10), there are sufficiently fast algorithms that compute a matrix exponent with up to machine precision in comparably short time, e.g. [30]. Consequently, the error is predominantly introduced by the choice of the discretization nodes x ^{ i }.
where \(\mu :\mathbb {R}^{d}\rightarrow [\!0,\infty)\) is a Lebesgue measure. Although for a fixed system of basis functions the matrix exponentiation (10) provides the exact solution avoiding time discretisation at all, the condition (14) motivates to consider the CME on sequential time steps, not as means of time approximation, but as a way of economy of computational resources. Indeed, if the system of basis functions (7) is chosen to correspond to the current location of the essential support, the total number of discretisation coefficients α _{ i }(t) will be small.
Equations (15) and (16) allow applying the usual numerical toolbox, that is well defined on functions, to essential support sets. Various transformations of the signed distance function lead to changes in the set of points. For instance, let d _{1}(x) and d _{2}(x) be signed distance functions, then d(x)= min(d _{1}(x),d _{2}(x)) is the signed distance function representing the union of the interior regions [32].
Here the parameter γ>0 extends the subdomain as there should be a sufficient layer of basis functions around the essential support. This is necessary to interpolate the density u(x,t) on the boundary. Having the approximation \(\tilde {u}(x,t_{i+1})\) in turn permits to compute \(\text {essup}\{\tilde {u}(x,t_{i+1})\}\) and to evaluate the reliability of the prior estimate S ^{′}. On the basis of this information we can accept results at t _{ i+1} or decide to use a smaller time step.

compute the essential support S _{0}=essup{u _{0}(x)} for the initial condition u _{0}(x); choose a system of basis functions with centres x ^{ i }∈D ^{−1}(D S _{0}−γ) that provides a sufficient approximation to \(\\tilde u_{0}(x)u_{0}(x)\<p_{\text {threshold}}\);

using (10) perform integration of the approximation to u(x,t) on a small interval [0,t _{1}] and compute the new essential support S _{1}; set i=1;

if t _{ i }<t _{ end }, choose t _{ i+1}=t _{ i }+h; using (22), extrapolate the value for S _{ i+1} utilising (22) and compute the corresponding basis; integrate the system up to t _{ i+1}; validate S _{ i+1} by computing \(\D(\text {essup}\,{\tilde u(x, t_{i+1})}) D(S_{i+1})\\); in the case of satisfactory choice for S _{ i+1}, increase i by one and repeat the step, otherwise repeat the step with a smaller value for h.
Here, essential support threshold p _{threshold}, initial time step h, and density of the basis coverage γ are parameters of the method. The parameter of αhull has grid step h as the lower bound and is chosen to be 2h in the numerical examples that follow.
Discussion
Selfregulating gene
Bistable toggle switch
Shall the deviation increase over a certain level, the value of p _{threshold} should be lowered. As depicted in the right panel of Fig. 3, both p _{ threshold } and number of basis functions, n, have a direct influence on the error of the approximation. As can be seen in Fig. 3, decreasing the value of p _{ threshold } lowers the approximation error up to a certain saturation point. The further improvements are possible only by increasing the number of basis functions.
Comparison with Gillespie SSA
Tristable toggle switch
Stem cell differentiation problem
Parameter set used for simulations of the cell differentiation model, as suggested in Ref. [29]
Cell type  Parameter  Value  Description 

*  β  2  Hill coefficient 
Progenitor, (P)  a _{ p }  0.2  Autoactivation 
b _{ p }  0.5  Basal activity  
m _{ p }  {8, 10}  Inflection point  
c _{ pp }  0.1  Selfinhibition strength  
c _{ op }, c _{ cp }  0.5  Inhibition strength on x _{ o },x _{ c }  
k _{ p }  0.1  Decay rate  
Osteoblast and Chrondrocyte, (O,C)  a _{ o }, a _{ c }  0.1  Autoactivation 
b _{ o }, b _{ c }  1  Basal activity  
m _{ o }, m _{ c }  1  Inflection point  
c _{ oo }, c _{ cc }  0.1  Selfinhibition strength  
c _{ oc }, c _{ co }  0.1  Mutual inhibition strength  
k _{ o }, k _{ c }  0.1  Decay rate 
Conclusions
We proposed a numerical method for the approximation of the solution to a wide range of CME based problems that have been hard to tackle until now. The fact that the method recovers a full representation of multidimensional distributions makes it especially attractive for cases of multistability.
In order to reduce the amount of computational resources, the unknown distribution is searched as a linear combination of Gaussian radial basis functions. The efficiency of the method is improved even further by predicting a manifold containing states with probabilities that are greater than a certain significance threshold in every time step. The prediction is done on the basis of information available form previous time steps. It allows to keep the degrees of freedom of the approximation very close to the optimal value corresponding to the significance threshold.

A bistable genetic toggle switch describing two competing species. This problem constitutes an important case: when the normal distribution is taken as initial condition, the manifold containing highly probable states undergoes drastic transformations. Its topology transits from simple connected to a twoconnected domain. Since the exact solution is known for this problem, the approximation error can be evaluated.

A tristable toggle switch, yielding a threedimensional symmetric solution, is introduced as a generalisation of the previous problem. Although this case demonstrates a possible mechanism for three competing species and constitutes an interesting test for the algorithm, it remains a theoretical problem.

A cell differentiation model, describing cell fate determination of osteochondro progenitor cells. The model considers two final cell types, osteoblast and chondrocytes, and is a special case of the previous example. It has been shown how variations of some important parameters affect the stationary solution. It has also been studied how a proosteogenic stimulus leads to a nonsymmetrical solution.
Besides CME, the method has been additionally applied to a system of master equations describing a selfregulatory gene.
We expect that the method can be applied to other CME problems including those that have no a priori information available on the shape, location, or upper bound of the domain that contains states with significant probabilities. The domain is constructed and tracked in time using ideas from level set methods. The advantage of the level set approach is that one can perform numerical computations involving surfaces on a fixed Cartesian grid without having to parameterise these objects. In addition, the level set method makes it very easy to follow shapes that change topology, for example when a shape splits into two, develops holes, or the reverse of these operations.
Although the method features many advantages for multistable systems or systems where rare events are important, highdimensional cases (d >4) are hard to tackle with the current implementation. In future work, we plan to relax the condition that radial basis function centres are selected form a predefined grid in order to reach the optimal number of degrees of freedom in the approximation and extend the algorithm to highdimensional cases.
Notes
Acknowledgements
The authors would like to thank to Steffen Waldherr for helpful discussion and suggestions regarding cell differentiation modelling. IK acknowledge the financial support from the Marie Curie Actions (PITNGA2009238700).
Supplementary material
References
 1.Gillespie DT. Exact stochastic simulation of coupled chemical reactions. J Phys Chem. 1977; 81:2340–361.CrossRefGoogle Scholar
 2.Gillespie DT. Approximate accelerated stochastic simulation of chemically reacting systems. J Phys Chem. 2001; 115:1716–1733.CrossRefGoogle Scholar
 3.Rao CV, Arkin AP. Stochastic chemical kinetics and the quasisteady state assumption: Application to the Gillespie algorithm. J Chem Phys. 2003; 118:4999–5010.CrossRefGoogle Scholar
 4.W E, Liu D, VandenEijnden E. Nested stochastic simulation algorithms for chemical kinetic systems with multiple time scales. J Comput Phys. 2007; 221:158–80.CrossRefGoogle Scholar
 5.Haseltine EL, Rawlings JB. Approximate simulation of coupled fast and slow reactions for stochastic chemical kinetics. J Chem Phys. 2002; 117:6959–969.CrossRefGoogle Scholar
 6.Takahashi K, Kaizu K, Hu B, Tomita M. A multialgorithm, multitimescale method for cell simulation. Bioinformatics. 2004; 20:538–46.CrossRefPubMedGoogle Scholar
 7.Alfonsi A, Cancès E, Turinici G, Ventura BD, Huisinga W. Adaptive simulation of hybrid stochastic and deterministic models for biochemical systems. ESAIM Proc. 2005; 14:1–13.Google Scholar
 8.Jahnke T, Udrescu T. Solving chemical master equations by adaptive wavelet compression. J Comput Phys. 2010; 229(16):5724–741.CrossRefGoogle Scholar
 9.Hellander A, Lötstedt P. Hybrid method for the chemical master equation. J Comput Phys. 2007; 227(1):100–22.CrossRefGoogle Scholar
 10.Erban R, Chapman SJ, Kevrekidis IG, Vejchodský T. Analysis of a stochastic chemical system close to a sniper bifurcation of its meanfield model. SIAM J Appl Math. 2009; 70(3):984–1016.CrossRefGoogle Scholar
 11.Menz S, Latorre J, Schütte C, Huisinga W. Hybrid stochastic–deterministic solution of the chemical master equation. Multiscale Model Simul. 2012; 10(4):1232–1262.CrossRefGoogle Scholar
 12.Deuflhard P, Huisinga W, Jahnke T, Wulkow M. Adaptive discrete galerkin methods applied to the chemical master equation. SIAM J Sci Comput. 2008; 30(6):2990–3011.CrossRefGoogle Scholar
 13.Cotter SL, Vejchodsky T, Erban R. Adaptive finite element method assisted by stochastic simulation of chemical systems. SIAM J Sci Comput. 2013; 35(1):107–31.CrossRefGoogle Scholar
 14.Munsky B, Khammash M. The finite state projection algorithm for the solution of the chemical master equation. J Chem Phys. 2006; 124(4):044104.CrossRefPubMedGoogle Scholar
 15.Peleš S, Munsky B, Khammash M. Reduction and solution of the chemical master equation using time scale separation and finite state projection. J Chem Phys. 2006; 125(20):204104.CrossRefPubMedGoogle Scholar
 16.Munsky B, Khammash M. Transient analysis of stochastic switches and trajectories with applications to gene regulatory networks. IET Syst Biol. 2008; 2(5):323–33.CrossRefPubMedGoogle Scholar
 17.Cao Y, Liang J. Optimal enumeration of state space of finitely buffered stochastic molecular networks and exact computation of steady state landscape probability. BMC Syst Biol. 2008; 2(1):30.PubMedCentralCrossRefPubMedGoogle Scholar
 18.Cao Y, Lu HM, Liang J. Probability landscape of heritable and robust epigenetic state of lysogeny in phage lambda. Proc Natl Acad Sci. 2010; 107(43):18445–18450.PubMedCentralCrossRefPubMedGoogle Scholar
 19.Kazeev V, Khammash M, Nip M, Schwab C. Direct solution of the chemical master equation using quantized tensor trains. PLoS Comput Biol. 2014; 10(3):1003359.CrossRefGoogle Scholar
 20.Wolf V, Goel R, Mateescu M, Henzinger TA. Solving the chemical master equation using sliding windows. BMC Syst Biol. 2010; 4(1):42.PubMedCentralCrossRefPubMedGoogle Scholar
 21.Kryven I, Iedema PD. Transition into the gel regime for free radical crosslinking polymerisation in a batch reactor. Polymer. 2014; 55(16):3475–489.CrossRefGoogle Scholar
 22.Kryven I, Iedema PD. Topology evolution in polymer modification. Macromol Theory Simul. 2014; 23(1):7–14.CrossRefGoogle Scholar
 23.Kryven I, Iedema PD. Deterministic modelling of copolymer microstructure: composition drift and sequence patterns. Macromol React Eng. 2015; 9:285–306.CrossRefGoogle Scholar
 24.Kryven I, Lazzari S, Storti G. Population balance modeling of aggregation and coalescence in colloidal systems. Macromol Theory Simul. 2014; 23(3):170–81.CrossRefGoogle Scholar
 25.Gardner TS, Cantor CR, Collins JJ. Construction of a genetic toggle switch in escherichia coli. Nature. 2000; 403(6767):339–42.CrossRefPubMedGoogle Scholar
 26.Baksh D, Song L, Tuan R. Adult mesenchymal stem cells: characterization, differentiation, and application in cell and gene therapy. J Cell Mol Med. 2004; 8(3):301–16.CrossRefPubMedGoogle Scholar
 27.Roeder I, Glauche I. Towards an understanding of lineage specification in hematopoietic stem cells: a mathematical model for the interaction of transcription factors gata1 and pu. 1. J Theor Biol. 2006; 241(4):852–65.CrossRefPubMedGoogle Scholar
 28.Foster DV, Foster JG, Huang S, Kauffman SA. A model of sequential branching in hierarchical cell fate determination. J Theor Biol. 2009; 260(4):589–97.CrossRefPubMedGoogle Scholar
 29.Schittler D, Hasenauer J, Allgöwer F, Waldherr S. Cell differentiation modeled via a coupled twoswitch regulatory network. Chaos: An Interdisciplinary Journal of Nonlinear Science. 2010; 20(4):045121.CrossRefGoogle Scholar
 30.B.Sidje R. EXPOKIT: A software package for computing matrix exponentials. ACM Trans Math Softw. 1998; 24(1):130–56.CrossRefGoogle Scholar
 31.Edelsbrunner H, Kirkpatrick D, Seidel R. On the shape of a set of points in the plane. IEEE Trans Inf Theory. 1983; 29(4):551–9.CrossRefGoogle Scholar
 32.Osher S, Fedkiw R. Level set methods and dynamic implicit surfaces. Applied Mathematical Sciences. Vol. 153. New York: Springer; 2006.Google Scholar
 33.Hornos J, Schültz D, Innocentini G, Wang J, Walczak A, Onuchic J, et al. Selfregulating gene: an exact solution. Phys Rev E. 2005; 72(5):051907.CrossRefGoogle Scholar
 34.Schultz D, Onuchic JN, Wolynes PG. Understanding stochastic simulations of the smallest genetic networks. J Chem Phys. 2007; 126(24):245102.CrossRefPubMedGoogle Scholar
 35.Sjöberg P, Lötstedt P, Elf J. Fokker–planck approximation of the master equation in molecular biology. Comput Vis Sci. 2009; 12(1):37–50.CrossRefGoogle Scholar
 36.Engblom S. Galerkin spectral method applied to the chemical master equation. Commun Comput Phys. 2009; 5:871–96.Google Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.