Mathematics of Planet Earth pp 129-173 | Cite as

# Data-Informed Modeling in the Health Sciences

## Abstract

The adoption of automation and technology by health professionals is triggering an explosion of databases and data streams in that sector. The emergence of this data torrent creates the pressing need to mine it for value, which in turn requires investment for the development of modeling and analysis tools. In view of this, dynamicists are presented with the terrific opportunity to enrich their discipline by supplying it with new tools, expanding its scope, and elevating its social impact. This chapter is written in that spirit, examining three concrete case studies encountered *in the field*: quantifying the salmonellosis risk posed by distinct food sources, assimilating genetic data into a dynamical model for avian influenza transmission, and statistically decontaminating gas chromatography/mass spectroscopy time series. We review available prototypical models and build on them guided by data and mathematical abstraction, demonstrating in the process how to root a model into data. This takes us quite naturally into the realm of probabilistic and statistical modeling and reopens a decades-old discussion on the role of discrete models in applied mathematics. We also touch briefly on the timely subject of mathematicians being employed as such outside math departments and attempt a short outlook on their prospects and opportunities.

## Keywords

Probabilistic and data-driven modeling Parameter inference Extramural mathematics Infection source attribution Mathematical epidemiology Data decontamination Bayesian hierarchical models## Notes

### Acknowledgements

The work in Sect. 6.3 was initiated and supervised by Gert-Jan Boender and Thomas Hagenaars (Bacteriology and Epidemiology, Wageningen University and Research). The work in Sect. 6.4 was initiated by and done in collaboration with Rob de Boer (Theoretical Biology and Bioinformatics, Utrecht University), José Borghans (University Medical Center Utrecht), Ad Koets, and Lars Ravesloot (Bacteriology and Epidemiology, Wageningen University and Research). The author thanks them dearly for opening up a world of scientific opportunity and scholarship to him.

## References

- 1.Amari, S., Nagaoka, H.: Methods of Information Geometry. American Mathematical Society, Providence (2000). ISBN: 0-8218-0531-2zbMATHGoogle Scholar
- 2.Barto, A.G.: Discrete and continuous models. Int. J. Gen. Syst.
**4**(3), 163–177 (1978). https://doi.org/10.1080/03081077808960681 CrossRefGoogle Scholar - 3.Benaglia, T., Chauveau, D., Hunter, D.R., et al.: mixtools: an R package for analyzing mixture models. J. Stat. Softw.
**32**(6) (2010). https://doi.org/10.18637/jss.v032.i06 - 4.Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Berlin (2006). ISBN: 0-387-31073-8zbMATHGoogle Scholar
- 5.Boender, G.J., Hagenaars, T.J., Bouma, A., et al.: Risk maps for the spread of highly pathogenic avian influenza in poultry. PLoS Comput. Biol.
**3**(4), 704–712 (2007). https://doi.org/10.1371/journal.pcbi.0030071 MathSciNetCrossRefGoogle Scholar - 6.Box, G.E.P.: Science and statistics. J. Amer. Stat. Assoc.
**71**(356), 791–799 (1976). https://doi.org/10.1080/01621459.1976.10480949 MathSciNetCrossRefGoogle Scholar - 7.Bromham, L., Dinnage, R., Hua, X.: Interdisciplinary research has consistently lower funding success. Nature
**534**(7609) (2016). https://doi.org/10.1038/nature18315 CrossRefGoogle Scholar - 8.Busch, R., Neese, R.A., Awada, M., et al.: Measurement of cell proliferation by heavy water labeling. Nat. Prot.
**2**(12), 3045–3057 (2007). https://doi.org/10.1038/nprot.2007.420 CrossRefGoogle Scholar - 9.Council of the European Communities: Council directive 2005/94/ec of 20 December 2005 on community measures for the control of avian influenza and repealing directive 92/40/eec. Off. J. Eur. Union
**49**, L10/16–65 (2006). ISSN: 1725-2555Google Scholar - 10.Cox, D.R.: Principles of Statistical Inference. Cambridge University Press, Cambridge (2006). ISBN: 978-0-521-86673-6CrossRefGoogle Scholar
- 11.Cramér, H.: Mathematical Methods of Statistics. Princeton University Press, Princeton (1946)zbMATHGoogle Scholar
- 12.Dorado-García, A., Smid, J.H., van Pelt, W., et al.: Molecular relatedness of ESBL/AmpC-producing Escherichia coli from humans, animals, food and the environment: a pooled analysis. J. Antimicrob. Chemother.
**73**(2), 339–347 (2018). https://doi.org/10.1093/jac/dkx397 CrossRefGoogle Scholar - 13.Fisher, R.A.: Presidential address. Sankhy\(\bar {a}\) Ind. J. Stat.
**4**(1), 14–17 (1938)Google Scholar - 14.GitHub repository. https://github.com/azagaris
- 15.Gutenkunst, R.N., Waterfall, J.J., Casey, F.P., et al.: Universally sloppy parameter sensitivities in systems biology models. PLoS Comp. Biol.
**3**, 1871–1878 (2007). https://doi.org/10.1371/journal.pcbi.0030189 MathSciNetCrossRefGoogle Scholar - 16.Hald, T., Wegener, H.C.: Quantitative assessment of the sources of human salmonellosis attributable to pork. In: Proceedings of the 3rd ISECSP, pp. 200–205 (1999)Google Scholar
- 17.Hald, T., Vose, D., Wegener, H.C., et al.: A Bayesian approach to quantify the contribution of animal–food sources to human salmonellosis. Risk Anal.
**24**, 255–269 (2004). https://doi.org/10.1111/j.0272-4332.2004.00427.x CrossRefGoogle Scholar - 18.Hamming, R.W.: Toward a lean and lively calculus: report of the conference/workshop to develop curriculum and teaching methods for calculus at the college level. Am. Math. Mon.
**95**(5), 466–471 (1988). https://doi.org/10.1080/00029890.1988.11972034 MathSciNetGoogle Scholar - 19.Karch, H., Denamur, E., Dobrindt, U., et al.: The enemy within us: lessons from the 2011 European Escherichia coli O104:H4 outbreak. EMBO Mol. Med.
**4**, 841–848 (2012). https://doi.org/10.1002/emmm.201201662 CrossRefGoogle Scholar - 20.Kermack, W.O., McKendrick, A.G.: A contribution to the mathematical theory of epidemics. Proc. R. Soc. A
**115**, 700–721 (1927). https://doi.org/10.1098/rspa.1927.0118 CrossRefGoogle Scholar - 21.Kimura, M.: Estimation of evolutionary distances between homologous nucleotide distances. Proc. Natl. Acad. Sci.
**78**, 454–458 (1981)CrossRefGoogle Scholar - 22.Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat.
**22**(1), 79–86 (1951). https://doi.org/10.1214/aoms/1177729694 MathSciNetCrossRefGoogle Scholar - 23.Pearl, J.: Causality: Models, Reasoning and Inference. Cambridge University Press, New York (2000). ISBN: 978-0521895606Google Scholar
- 24.Raue, A., Kreutz, C., Maiwald, T., et al.: Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics
**25**, 1923–1929 (2009). https://doi.org/10.1093/bioinformatics/btp358 CrossRefGoogle Scholar - 25.Schervish, M.J.: Theory of Statistics. Springer, New York (1995). ISBN: 978-1-4612-8708-7CrossRefGoogle Scholar
- 26.Snow, J.: On the Mode of Communication of Cholera. John Churchill, London (1855)Google Scholar
- 27.Sorg, L.: Forward-looking panel tackles issues of the Mathematics of Planet Earth. SIAM News Blog (2016)Google Scholar
- 28.Stegeman, A., Bouma, A., Elbers, A.R.W., et al.: Avian Influenza A Virus (H7N7) epidemic in The Netherlands in 2003: course of the epidemic and effectiveness of control measures. J. Infect. Dis.
**190**(12), 2088–2095 (2004). https://doi.org/10.1086/425583 CrossRefGoogle Scholar - 29.Tan, C.Y., Iglewicz, B.: Measurement-methods comparisons and linear statistical relationship. Technometrics
**41**(3), 192–201 (1999). https://doi.org/10.1080/00401706.1999.10485668 MathSciNetCrossRefGoogle Scholar - 30.Tufte, E.R.: Visual Explanations: Images and Quantities, Evidence and Narrative. Graphics Press, Cheshire (1997). ISBN: 978-0961392123Google Scholar
- 31.Vrisekoop, N., den Braber, I., de Boer, A.B., et al.: Sparse production but preferential incorporation of recently produced naïve T cells in the human peripheral pool. Proc. Natl. Acad. Sci.
**105**(16), 6115–6120 (2008). https://doi.org/10.1073/pnas.0709713105 CrossRefGoogle Scholar - 32.Waterfall, J.J., Casey, F.P., Gutenkunst, R.N., et al.: Sloppy-model universality class and the Vandermonde matrix. Phys. Rev. Lett.
**97**, 150601 (2006). https://doi.org/10.1103/PhysRevLett.97.150601 CrossRefGoogle Scholar - 33.Wilson, E.O.: Letters to a Young Scientists. Liveright, New York (2003). ISBN: 978-0871403858Google Scholar
- 34.Zilversmit, D.B., Entenman, C., Fishler, M.C.: On the calculation of “turnover time” and “turnover rate” from experiments involving the use of labeling agents. J. Gen. Physiol.
**26**(3), 325–331 (1943)CrossRefGoogle Scholar