Skip to main content

Matching Models Across Abstraction Levels with Gaussian Processes

  • Conference paper
  • First Online:
Computational Methods in Systems Biology (CMSB 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9859))

Included in the following conference series:

Abstract

Biological systems are often modelled at different levels of abstraction depending on the particular aims/resources of a study. Such different models often provide qualitatively concordant predictions over specific parametrisations, but it is generally unclear whether model predictions are quantitatively in agreement, and whether such agreement holds for different parametrisations. Here we present a generally applicable statistical machine learning methodology to automatically reconcile the predictions of different models across abstraction levels. Our approach is based on defining a correction map, a random function which modifies the output of a model in order to match the statistics of the output of a different model of the same system. We use two biological examples to give a proof-of-principle demonstration of the methodology, and discuss its advantages and potential further applications.

GC and GS gratefully acknowledge support from the European Research Council under grant MLCS306999. LB acknowledges partial support from the EU project QUANTICOL, 600708, and by FRA-UniTS. We thank Dimitris Milios for useful discussions and for providing us with the MATLAB for heteroscedastic regression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(\mathsf{{M}}\) could be complex to analyze either because of its structure, e.g., it might have many variables, or its numerical hurdles, e.g., the degree of non-linearity or parameters stiffness. For similar reasons, we do not care whether \(\mathsf{{m}}\) is has been derived by means of independent domain-knowledge or automatic techniques.

  2. 2.

    In principle, even \(\mathsf{{m}}\) might have a set of free variables, with respect to \(\mathsf{{M}}\). However, as we have full control over that model, we could assume a parametrization of such variables and all what follows would be equivalent.

  3. 3.

    In this work, we use the classic Gaussian kernel fixing hyperparameters by maximising the type-II likelihood; see [12].

References

  1. Aitken, S., Alexander, R.D., Beggs, J.D.: A rule-based kinetic model of rna polymerase ii c-terminal domain phosphorylation. J Roy. Soc. Interface 10(86), 20130438 (2013)

    Article  Google Scholar 

  2. Alur, R., Feder, T., Henzinger, T.A.: The benefits of relaxing punctuality. J. ACM 43(1), 116–146 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  3. Barber, D.: Bayesian Reasoning and Machine Learning. Cambridge University Press, Cambridge (2012)

    MATH  Google Scholar 

  4. Bortolussi, L., Milios, D., Sanguinetti, G.: Smoothed model checking for uncertain continuous-time markov chains. Inf. Comput. 247, 235–253 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bortolussi, L., Sanguinetti, G.: Learning and designing stochastic processes from logical constraints. In: Joshi, K., Siegle, M., Stoelinga, M., D’Argenio, P.R. (eds.) QEST 2013. LNCS, vol. 8054, pp. 89–105. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  6. Caravagna, G.: Formal modeling and simulation of biological systems with delays. Ph.D. thesis, University of Pisa (2011)

    Google Scholar 

  7. Cressie, N., Wikle, C.K.: Statistics for Spatio-Temporal Data. Wiley, New York (2015)

    MATH  Google Scholar 

  8. Hoyle, D.C., Rattray, M., Jupp, R., Brass, A.: Making sense of microarray data distributions. Bioinformatics 18(4), 576–584 (2002)

    Article  Google Scholar 

  9. Kennedy, M.C., O’Hagan, A.: Bayesian calibration of computer models. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 63(3), 425–464 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  10. Lawrence, N.D., Sanguinetti, G., Rattray, M.: Modelling transcriptional regulation using gaussian processes. In: Advances in Neural Information Processing Systems, pp. 785–792 (2006)

    Google Scholar 

  11. Noble, D.: Modeling the heart-from genes to cells to the whole organ. Science 295(5560), 1678–1682 (2002)

    Article  Google Scholar 

  12. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giulio Caravagna .

Editor information

Editors and Affiliations

A Appendix

A Appendix

All the code that replicate these analysis is available at the corresponding author’s webpage, and hosted on Github (repository GP-correction-maps).

1.1 A.1 Further Details on the Examples

The two models from Sect. 5.1 correspond to these systems of differential equations

figure a

which we solved in MATLAB with the ode45 routine with all parameters (InitialStep, MaxStep, RelTol and AbsTol) set to 0.01.

Concerning the Protein Translation Network (PTN) in Sect. 5.2, the set of reactions and their propensity functions that we can use to derive a Continuous Time Markov Chain model of the network are the following. Here \(\varvec{x}\) denotes a generic state of the system and, for instance, \(\varvec{x}_\mathsf{{mRNA}}{}\) the number of mRNA copies in \(\varvec{x}\).

figure b

The reduced PTN model is a special of this reactions set where transcription and mRNA decay are omitted. In this case we used StochPy to simulate the models and generate the input data per regression – see http://stochpy.sourceforge.net/; data sampling exploits python parallelism to reduce execution times.

For regression, we used the Gaussian Processes for Machine Learning toolbox for fixed-variance regression, see http://www.gaussianprocess.org/gpml/code/matlab/doc/ and a custom implementation of the other forms of regression.

Fig. 8.
figure 8

Data generated to compute the satisfaction probability of the linear logic formula \(\eta _2\) in Eq. (9). For each model 100 independent simulations are used to estimate the expectation of the probability. The regression input space is the same used to compute \(\eta _1\), but the models are simulated with just one inactive gene in the initial state. The heteroscedastic variance in the regression is computed as the variance of the correction of the expected satisfaction probability (point-wise \(\sigma \)-estimator); the fixed-variance regression is computed by estimating the variance from the data (empirical \(\overline{\sigma }\)-estimator).

1.2 A.2 Proofs

Proof of Theorem 1

Proof

Both the empiricals and nested estimator rely on an unbiased estimator of the mean/variance, which means that if \(k\rightarrow \infty \), i.e., we sample all possible values for the free variables, we would have a true model of \(\overline{y}\) \(\sigma \). This means that, for each sampled value from \(\varTheta \), even the simplest \(\overline{\sigma }\)-estimator would be equivalent, in expectation, to the marginalization of the free variables. This is enough, combined with properties of Gaussian Processes regression (i.e., the convergence to the true model with infinite training points), to state that the overall approach leads to an unbiased estimator of the correction map.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Caravagna, G., Bortolussi, L., Sanguinetti, G. (2016). Matching Models Across Abstraction Levels with Gaussian Processes. In: Bartocci, E., Lio, P., Paoletti, N. (eds) Computational Methods in Systems Biology. CMSB 2016. Lecture Notes in Computer Science(), vol 9859. Springer, Cham. https://doi.org/10.1007/978-3-319-45177-0_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45177-0_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45176-3

  • Online ISBN: 978-3-319-45177-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics