Conformal Prediction for Ecotoxicology and Implications for Regulatory Decision-Making

  • Fredrik Svensson
  • Ulf NorinderEmail author
Part of the Methods in Pharmacology and Toxicology book series (MIPT)


Computational methods can be valuable tools for safety prediction of chemicals and have potential to play a role in the regulatory decision-making if the results are transparent and reliable. In this chapter, we discuss a type of confidence predictor called conformal prediction that can be used to generate predictions with a guaranteed error rate. We describe the underlying theory in an informal fashion and exemplify the method on a dataset of chronic toxicity of compounds to Daphnia magna and Pseudokirchneriella subcapitata.

Key words

QSAR Confidence Uncertainty Conformal prediction 



The ARUK UCL Drug Discovery Institute is core funded by Alzheimer’s Research UK (registered charity No. 1077089 and SC042474). The Francis Crick Institute receives its core funding from Cancer Research UK (FC001002), the UK Medical Research Council (FC001002), and the Wellcome Trust (FC001002).


  1. 1.
    Judson R, Richard A, David DJ, Houck K, Martin M, Kavlock R, Dellarco V, Henry T, Holderman T, Sayre P et al (2009) The toxicity data landscape for environmental chemicals. Environ Health Perspect 117(5):685–695PubMedCrossRefGoogle Scholar
  2. 2.
    Cronin MTD, Jaworska JS, Walker JD, Comber MHI, Watts CD, Worth AP (2003) Use of QSARs in international decision-making frameworks to predict health effects of chemical substances. Environ Health Perspect 111(10):1391–1401PubMedPubMedCentralCrossRefGoogle Scholar
  3. 3.
    Eriksson L, Jaworska J, Worth AP, Cronin MTD, McDowell RM, Gramatica P (2003) Methods for reliability and uncertainty assessment and for applicability evaluations of classification- and regression-based QSARs. Environ Health Perspect 111(10):1361–1375PubMedPubMedCentralCrossRefGoogle Scholar
  4. 4.
    Jaworska J, Comber M, Auer C, Leeuwen C (2003) Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Environ Health Perspect 111:1358PubMedPubMedCentralCrossRefGoogle Scholar
  5. 5.
    OECD: OECD principles for the validation, for regulatory purposes, of QSAR models.
  6. 6.
    Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E, Öberg T, Todeschini R, Fourches D, Varnek A (2008) Critical assessment of QSAR models of environmental toxicity against Tetrahymena Pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inf Model 48(9):1733–1746PubMedPubMedCentralCrossRefGoogle Scholar
  7. 7.
    Hanser T, Barber C, Marchaland JF, Werner S (2016) Applicability domain: towards a more formal definition. SAR QSAR Environ Res 27(11):865–881CrossRefGoogle Scholar
  8. 8.
    Williams RV, Amberg A, Brigo A, Coquin L, Giddings A, Glowienke S, Greene N, Jolly R, Kemper R, O’Leary-Steele C et al (2016) It’s difficult, but important, to make negative predictions. Regul Toxicol Pharmacol 76(Suppl C):79–86PubMedCrossRefGoogle Scholar
  9. 9.
    Bosnić Z, Kononenko I (2008) Comparison of approaches for estimating reliability of individual regression predictions. Data Knowl Eng 67(3):504–516CrossRefGoogle Scholar
  10. 10.
    Lazic S, Edmunds N, Pollard C (2017) Predicting drug safety and communicating risk: benefits of a bayesian approach. Toxicol Sci 162:89–98CrossRefGoogle Scholar
  11. 11.
    Cortes-Ciriano I, van Westen GJP, Lenselink EB, Murrell DS, Bender A, Malliavin T (2014) Proteochemometric modeling in a Bayesian framework. J Cheminform 6(1):35PubMedPubMedCentralCrossRefGoogle Scholar
  12. 12.
    Wood DJ, Carlsson L, Eklund M, Norinder U, Stålring J (2013) QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality. J Comput Aided Mol Des 27(3):203–219PubMedPubMedCentralCrossRefGoogle Scholar
  13. 13.
    Aniceto N, Freitas AA, Bender A, Ghafourian T (2016) A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: reliability-density neighbourhood. J Cheminform 8(1):69PubMedCentralCrossRefGoogle Scholar
  14. 14.
    Vovk V, Gammerman A, Shafer G (2005) Algorithmic learning in a random world. Springer, New York, pp 1–324Google Scholar
  15. 15.
    Svensson F, Norinder U, Bender A (2017) Modelling compound cytotoxicity using conformal prediction and PubChem HTS data. Toxicol Res (Camb) 6:73–80CrossRefGoogle Scholar
  16. 16.
    Forreryd A, Norinder U, Lindberg T, Lindstedt M (2018) Predicting skin sensitizers with confidence — using conformal prediction to determine applicability domain of GARD. Toxicol Vitr 48:179–187CrossRefGoogle Scholar
  17. 17.
    Cortés-Ciriano I, Bender A (2019) Deep confidence: a computationally efficient framework for calculating reliable prediction errors for deep neural networks. J Chem Inf Model 59(3):1269–1281PubMedCrossRefGoogle Scholar
  18. 18.
    Svensson F, Aniceto N, Norinder U, Cortes-Ciriano I, Spjuth O, Carlsson L, Bender A (2018) Conformal regression for quantitative structure-activity relationship modeling – quantifying prediction uncertainty. J Chem Inf Model 58(5):1132–1140CrossRefGoogle Scholar
  19. 19.
    Ding F, Wang Z, Yang X, Shi L, Liu J, Chen G (2019) Development of classification models for predicting chronic toxicity of chemicals to daphnia magna and Pseudokirchneriella subcapitata. SAR QSAR Environ Res 30(1):39–50PubMedCrossRefGoogle Scholar
  20. 20.
    Vovk V (2013) Conditional validity of inductive conformal predictors. Mach Learn 92(2):349–376CrossRefGoogle Scholar
  21. 21.
    Chawla NV, Japkowicz N, Drive P (2004) Editorial : special issue on learning from imbalanced data sets. ACM SIGKDD Explor Newsl 6(1):1–6CrossRefGoogle Scholar
  22. 22.
    Löfström T, Boström H, Linusson H, Johansson U (2015) Bias reduction through conditional conformal prediction. Intell Data Anal 19:1355–1375CrossRefGoogle Scholar
  23. 23.
    Norinder U, Boyer S (2017) Binary classification of imbalanced datasets using conformal prediction. J Mol Graph Model 72:256–265PubMedCrossRefGoogle Scholar
  24. 24.
    Papadopoulos H, Vovk V, Gammerman A (2011) Regression conformal prediction with nearest neighbours. J Artif Intell Res 40:815–840CrossRefGoogle Scholar
  25. 25.
    Johansson U, Boström H, Löfström T, Linusson H (2014) Regression conformal prediction with random forests. Mach Learn 97(1):155–176CrossRefGoogle Scholar
  26. 26.
    Carlsson L, Eklund M, Norinder U (2014) Aggregated conformal prediction. In: Iliadis L, Maglogiannis I, Papadopoulos H, Sioutas S, Makris C (eds) Artificial intelligence applications and innovations: AIAI 2014 workshops: CoPA, MHDW, IIVC, and MT4BD, Rhodes, Greece, September 19–21, 2014. proceedings. Springer International Publishing, Berlin, pp 231–240Google Scholar
  27. 27.
    Vovk V (2015) Cross-conformal predictors. Ann Math Artif Intell 74(1):9–28CrossRefGoogle Scholar
  28. 28.
    Sun J, Carlsson L, Ahlberg E, Norinder U, Engkvist O, Chen H (2017) Applying mondrian cross-conformal prediction to estimate prediction confidence on large imbalanced bioactivity data sets. J Chem Inf Model 57(7):1591–1598PubMedCrossRefGoogle Scholar
  29. 29.
    Norinder U, Carlsson L, Boyer S, Eklund M (2014) Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. J Chem Inf Model 54(6):1596–1603PubMedCrossRefGoogle Scholar
  30. 30.
    Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830Google Scholar
  31. 31.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefGoogle Scholar
  32. 32.
    IMI eTOX project standardizer.
  33. 33.
  34. 34.
    RDKit: Open-Source Cheminformatics.
  35. 35.
    Linusson H, Norinder U, Boström H, Johansson U, Löfström T (2017) On the calibration of aggregated conformal predictors. In: Gammerman A, Vovk V, Luo Z, Papadopoulos H (eds) Conformal and probabilistic prediction and applications, 13–16 June 2017, vol 60. Machine Learning Research, Stockholm, pp 154–173Google Scholar
  36. 36.
    Johansson U, Ahlberg E, Boström H, Carlsson L, Linusson H, Sönströd C (2015) Handling small calibration sets in mondrian inductive conformal regressors. In: Gammerman A, Vovk V, Papadopoulos H (eds) Statistical learning and data sciences: third international symposium, SLDS 2015, Egham, UK, April 20–23, 2015, proceedings. Springer International Publishing, Cham, pp 271–280CrossRefGoogle Scholar
  37. 37.
    Kalliokoski T, Kramer C, Vulpetti A, Gedeck P (2013) Comparability of mixed IC50 data – a statistical analysis. PLoS One 8(4):e61007PubMedPubMedCentralCrossRefGoogle Scholar
  38. 38.
    Fourches D, Muratov E, Tropsha A (2010) Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J Chem Inf Model 50(7):1189–1204PubMedPubMedCentralCrossRefGoogle Scholar
  39. 39.
    Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Alzheimer’s Research UK UCL Drug Discovery InstituteUniversity College LondonLondonUK
  2. 2.The Francis Crick InstituteLondonUK
  3. 3.Department of Computer and Systems SciencesStockholm UniversityKistaSweden

Personalised recommendations