Skip to main content

Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling

  • Protocol
  • First Online:
Computational Toxicology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1800))

Abstract

In the context of human safety assessment through quantitative structure–activity relationship (QSAR) modeling, the concept of applicability domain (AD) has an enormous role to play. The Organization of Economic Co-operation and Development (OECD) for QSAR model validation recommended as principle 3 “A defined domain of applicability” to be present for a predictive QSAR model. The study of AD allows estimating the uncertainty in the prediction for a particular molecule based on how similar it is to the training compounds which are used in the model development. In the current scenario, AD represents an active research topic, and many methods have been designed to estimate the competence of a model and the confidence in its outcome for a given prediction task. Thus, characterization of interpolation space is significant in defining the AD. The diverse set of reported AD methods was constructed through different hypotheses and algorithms. These multiplicities of methodologies mystify the end users and make the comparison of the AD for different models a complex issue to address. We have attempted to summarize in this chapter the important concepts of AD including particulars of the available methods to compute the AD along with their thresholds and criteria for estimating AD through training set interpolation in the descriptor space. The idea about transparent domain and decision domain are also discussed. To help readers determine the AD in their projects, practical examples together with available open source software tools are provided.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Roy K, Kar S, Das RN (2015) Understanding the basics of QSAR for applications in pharmaceutical sciences and risk assessment. Academic Press, San Diego, CA, USA

    Google Scholar 

  2. Roy K, Kar S (2015) Importance of applicability domain of QSAR models. In: Roy K (ed) Quantitative structure-activity relationships in drug design, predictive toxicology, and risk assessment. IGI Global, Hershey PA, USA, pp 180–211

    Chapter  Google Scholar 

  3. Gadaleta D, Mangiatordi GF, Catto M, Carotti A, Nicolotti O (2016) Applicability domain for QSAR models: where theory meets reality. Int J Quant Struct Prop Relat J 1:45–63

    Google Scholar 

  4. Mathea M, Klingspohn W, Baumann K (2016) Chemoinformatic classification methods and their applicability domain. Mol Inform 35:160–180

    Article  PubMed  CAS  Google Scholar 

  5. Wold S, Sjostrom M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58:109–130

    Article  CAS  Google Scholar 

  6. Netzeva TI, Worth AP, Aldenberg T, Benigni R, Cronin MTD, Gramatica P et al (2005) Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. Altern Lab Anim 33:155–173

    PubMed  CAS  Google Scholar 

  7. Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model 20:269–276

    Article  PubMed  CAS  Google Scholar 

  8. OECD, Principles for the validation of (Q)SARs (2004). http://www.oecd.org/dataoecd/33/37/37849783.pdf (Accessed 20 May, 2017)

  9. Jaworska JS, Comber M, Auer C, Van Leeuwen CJ (2003) Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Environ Health Perspect 111:1358–1360

    Article  PubMed  PubMed Central  Google Scholar 

  10. Gramatica P (2007) Principles of QSAR models validation: internal and external. QSAR Comb Sci 26:694–701

    Article  CAS  Google Scholar 

  11. Weaver S, Paul Gleeson M (2008) The importance of the domain of applicability in QSAR modeling. J Mol Graph Model 26:1315–1326

    Article  PubMed  CAS  Google Scholar 

  12. Roy K, Kar S, Das RN (2015) A primer on QSAR/QSPR modeling: fundamental concepts (SpringerBriefs in Molecular Science). Springer, Berlin

    Book  Google Scholar 

  13. Roy K, Kar S (2015) How to judge predictive quality of classification and regression based QSAR models? In: Haq ZU, Madura J (eds) Frontiers of computational chemistry. Bentham, Sharjah, pp 71–120

    Google Scholar 

  14. Hanser T, Barber C, Marchaland JF, Werner S (2016) Applicability domain: towards a more formal definition. SAR QSAR Environ Res 27:865–881

    Article  CAS  Google Scholar 

  15. Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR applicability domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 33:445–459

    PubMed  CAS  Google Scholar 

  16. Stanforth RW, Kolossov E, Mirkin B (2007) A measure of domain of applicability for QSAR modeling based on intelligent K-means clustering. QSAR Comb Sci 26:837–844

    Article  CAS  Google Scholar 

  17. Guha R, Jurs PC (2005) Determining the validity of a QSAR model-a classification approach. J Chem Inf Model 45:65–73

    Article  PubMed  CAS  Google Scholar 

  18. Nikolova-Jeliazkova N, Jaworska J (2005) An approach to determining applicability domain for QSAR group contribution models: an analysis of SRC KOWWIN. Altern Lab Anim 33:461–470

    PubMed  CAS  Google Scholar 

  19. Worth AP, Bassan A, Gallegos A, Netzeva TI, Patlewicz G, Pavan M et al (2005) The characterisation of (quantitative) structure-activity relationships: preliminary guidance. ECB Report EUR 21866 EN, European Commission, Joint Research Centre; Ispra, Italy, pp. 95

    Google Scholar 

  20. Topkat OPS (2000). U.S. Patent 6, 036, 349

    Google Scholar 

  21. Preparata FP, Shamos MI (1991) In: Preparata FP, Shamos MI (eds) Computational geometry: an introduction. Springer-Verlag, New York

    Google Scholar 

  22. Jaworska JS, Nikolova-Jeliazkova N, Aldenberg T (2004) Review of methods for applicability domain estimation. Report, The European Commission-Joint Research Centre, Ispra, Italy

    Google Scholar 

  23. Hair JF Jr, Anderson RE, Tatham RL, Black WC (2005) Multivariate data analysis. Pearson Education, Singapore

    Google Scholar 

  24. Sheridan R, Feuston RP, Maiorov VN, Kearsley S (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J Chem Inform Comput Sci 44:1912–1928

    Article  CAS  Google Scholar 

  25. SIMCA-P 10.0. (2002) info@umetrics.com, UMETRICS, Umea, Sweden, www.umetrics.com

  26. Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E et al (2008) Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inform Comput Sci 48:1733–1746

    Article  CAS  Google Scholar 

  27. Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ et al (2003) A consensus neural network-based technique for discriminating soluble and poorly soluble compounds. J Chem Inform Comput Sci 43:674–679

    Article  CAS  Google Scholar 

  28. Tetko IV (2008) Associative neural network. Methods Mol Biol 458:185–202

    PubMed  Google Scholar 

  29. Tetko IV, Tanchuk VY (2002) Application of associative neural networks for prediction of lipophilicity in ALOGPS 2.1 program. J Chem Inform Comput Sci 42:1136–1145

    Article  CAS  Google Scholar 

  30. Chen JJ, Tsai CA, Young JF, Kodell RL (2005) Classification ensembles for unbalanced class sizes in predictive toxicology. SAR QSAR Environ Res 16:517–529

    Article  PubMed  CAS  Google Scholar 

  31. Jouan-Rimbaud D, Bouveresse E, Massart DL, de Noord OE (1999) Detection of prediction outliers and inliers in multivariate calibration. AnalyticaChimicaActa 388:283–301

    CAS  Google Scholar 

  32. Roy K, Kar S, Ambure P (2015) On a simple approach for determining applicability domain of QSAR models. Chemom Intell Lab Syst 145:22–29

    Article  CAS  Google Scholar 

  33. Dimitrov S, Dimitrova G, Pavlov T, Dimitrova N, Patlewicz G, Niemela J et al (2005) Stepwise approach for defining the applicability domain of SAR and QSAR models. J Chem Inform Model 45:839–849

    Article  CAS  Google Scholar 

  34. Tong W, Hong H, Fang H, Xie Q, Perkins R (2003) Decision forest: combining the predictions of multiple independent decision tree models. J Chem Inform Comput Sci 43:525–531

    Article  CAS  Google Scholar 

  35. Tong W, Hong H, Xie Q, Xie L, Fang H, Perkins R (2004) Assessing QSAR limitations–a regulatory perspective. Curr Comput Aided Drug Des 1:195–205

    Article  Google Scholar 

  36. Fechner N, Jahn A, Hinselmann G, Zell A (2009) Atomic local neighborhood flexibility incorporation into a structured similarity measure for QSAR. J Chem Inform Model 49:549–560

    Article  CAS  Google Scholar 

  37. Mirkin B (2005) Clustering for data mining: a data recovery approach. Chapman & Hall/CRC, London

    Book  Google Scholar 

  38. Smellie A (2004) Accelerated K-means clustering in metric spaces. J Chem Inform Comput Sci 44:1929–1935

    Article  CAS  Google Scholar 

Download references

Acknowledgments

S.K. and J.L. thank the National Science Foundation (NSF/CREST HRD-1547754, and NSF/RISE HRD-1547836) for financial support. K.R. is thankful to the UGC, New Delhi for financial assistance under the UPE II scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kunal Roy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Kar, S., Roy, K., Leszczynski, J. (2018). Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling. In: Nicolotti, O. (eds) Computational Toxicology. Methods in Molecular Biology, vol 1800. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7899-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-7899-1_6

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-7898-4

  • Online ISBN: 978-1-4939-7899-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics