Abstract
In the context of human safety assessment through quantitative structure–activity relationship (QSAR) modeling, the concept of applicability domain (AD) has an enormous role to play. The Organization of Economic Co-operation and Development (OECD) for QSAR model validation recommended as principle 3 “A defined domain of applicability” to be present for a predictive QSAR model. The study of AD allows estimating the uncertainty in the prediction for a particular molecule based on how similar it is to the training compounds which are used in the model development. In the current scenario, AD represents an active research topic, and many methods have been designed to estimate the competence of a model and the confidence in its outcome for a given prediction task. Thus, characterization of interpolation space is significant in defining the AD. The diverse set of reported AD methods was constructed through different hypotheses and algorithms. These multiplicities of methodologies mystify the end users and make the comparison of the AD for different models a complex issue to address. We have attempted to summarize in this chapter the important concepts of AD including particulars of the available methods to compute the AD along with their thresholds and criteria for estimating AD through training set interpolation in the descriptor space. The idea about transparent domain and decision domain are also discussed. To help readers determine the AD in their projects, practical examples together with available open source software tools are provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Roy K, Kar S, Das RN (2015) Understanding the basics of QSAR for applications in pharmaceutical sciences and risk assessment. Academic Press, San Diego, CA, USA
Roy K, Kar S (2015) Importance of applicability domain of QSAR models. In: Roy K (ed) Quantitative structure-activity relationships in drug design, predictive toxicology, and risk assessment. IGI Global, Hershey PA, USA, pp 180–211
Gadaleta D, Mangiatordi GF, Catto M, Carotti A, Nicolotti O (2016) Applicability domain for QSAR models: where theory meets reality. Int J Quant Struct Prop Relat J 1:45–63
Mathea M, Klingspohn W, Baumann K (2016) Chemoinformatic classification methods and their applicability domain. Mol Inform 35:160–180
Wold S, Sjostrom M, Eriksson L (2001) PLS-regression: a basic tool of chemometrics. Chemom Intell Lab Syst 58:109–130
Netzeva TI, Worth AP, Aldenberg T, Benigni R, Cronin MTD, Gramatica P et al (2005) Current status of methods for defining the applicability domain of (quantitative) structure-activity relationships. Altern Lab Anim 33:155–173
Golbraikh A, Tropsha A (2002) Beware of q2! J Mol Graph Model 20:269–276
OECD, Principles for the validation of (Q)SARs (2004). http://www.oecd.org/dataoecd/33/37/37849783.pdf (Accessed 20 May, 2017)
Jaworska JS, Comber M, Auer C, Van Leeuwen CJ (2003) Summary of a workshop on regulatory acceptance of (Q)SARs for human health and environmental endpoints. Environ Health Perspect 111:1358–1360
Gramatica P (2007) Principles of QSAR models validation: internal and external. QSAR Comb Sci 26:694–701
Weaver S, Paul Gleeson M (2008) The importance of the domain of applicability in QSAR modeling. J Mol Graph Model 26:1315–1326
Roy K, Kar S, Das RN (2015) A primer on QSAR/QSPR modeling: fundamental concepts (SpringerBriefs in Molecular Science). Springer, Berlin
Roy K, Kar S (2015) How to judge predictive quality of classification and regression based QSAR models? In: Haq ZU, Madura J (eds) Frontiers of computational chemistry. Bentham, Sharjah, pp 71–120
Hanser T, Barber C, Marchaland JF, Werner S (2016) Applicability domain: towards a more formal definition. SAR QSAR Environ Res 27:865–881
Jaworska J, Nikolova-Jeliazkova N, Aldenberg T (2005) QSAR applicability domain estimation by projection of the training set descriptor space: a review. Altern Lab Anim 33:445–459
Stanforth RW, Kolossov E, Mirkin B (2007) A measure of domain of applicability for QSAR modeling based on intelligent K-means clustering. QSAR Comb Sci 26:837–844
Guha R, Jurs PC (2005) Determining the validity of a QSAR model-a classification approach. J Chem Inf Model 45:65–73
Nikolova-Jeliazkova N, Jaworska J (2005) An approach to determining applicability domain for QSAR group contribution models: an analysis of SRC KOWWIN. Altern Lab Anim 33:461–470
Worth AP, Bassan A, Gallegos A, Netzeva TI, Patlewicz G, Pavan M et al (2005) The characterisation of (quantitative) structure-activity relationships: preliminary guidance. ECB Report EUR 21866 EN, European Commission, Joint Research Centre; Ispra, Italy, pp. 95
Topkat OPS (2000). U.S. Patent 6, 036, 349
Preparata FP, Shamos MI (1991) In: Preparata FP, Shamos MI (eds) Computational geometry: an introduction. Springer-Verlag, New York
Jaworska JS, Nikolova-Jeliazkova N, Aldenberg T (2004) Review of methods for applicability domain estimation. Report, The European Commission-Joint Research Centre, Ispra, Italy
Hair JF Jr, Anderson RE, Tatham RL, Black WC (2005) Multivariate data analysis. Pearson Education, Singapore
Sheridan R, Feuston RP, Maiorov VN, Kearsley S (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J Chem Inform Comput Sci 44:1912–1928
SIMCA-P 10.0. (2002) info@umetrics.com, UMETRICS, Umea, Sweden, www.umetrics.com
Tetko IV, Sushko I, Pandey AK, Zhu H, Tropsha A, Papa E et al (2008) Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection. J Chem Inform Comput Sci 48:1733–1746
Manallack DT, Tehan BG, Gancia E, Hudson BD, Ford MG, Livingstone DJ et al (2003) A consensus neural network-based technique for discriminating soluble and poorly soluble compounds. J Chem Inform Comput Sci 43:674–679
Tetko IV (2008) Associative neural network. Methods Mol Biol 458:185–202
Tetko IV, Tanchuk VY (2002) Application of associative neural networks for prediction of lipophilicity in ALOGPS 2.1 program. J Chem Inform Comput Sci 42:1136–1145
Chen JJ, Tsai CA, Young JF, Kodell RL (2005) Classification ensembles for unbalanced class sizes in predictive toxicology. SAR QSAR Environ Res 16:517–529
Jouan-Rimbaud D, Bouveresse E, Massart DL, de Noord OE (1999) Detection of prediction outliers and inliers in multivariate calibration. AnalyticaChimicaActa 388:283–301
Roy K, Kar S, Ambure P (2015) On a simple approach for determining applicability domain of QSAR models. Chemom Intell Lab Syst 145:22–29
Dimitrov S, Dimitrova G, Pavlov T, Dimitrova N, Patlewicz G, Niemela J et al (2005) Stepwise approach for defining the applicability domain of SAR and QSAR models. J Chem Inform Model 45:839–849
Tong W, Hong H, Fang H, Xie Q, Perkins R (2003) Decision forest: combining the predictions of multiple independent decision tree models. J Chem Inform Comput Sci 43:525–531
Tong W, Hong H, Xie Q, Xie L, Fang H, Perkins R (2004) Assessing QSAR limitations–a regulatory perspective. Curr Comput Aided Drug Des 1:195–205
Fechner N, Jahn A, Hinselmann G, Zell A (2009) Atomic local neighborhood flexibility incorporation into a structured similarity measure for QSAR. J Chem Inform Model 49:549–560
Mirkin B (2005) Clustering for data mining: a data recovery approach. Chapman & Hall/CRC, London
Smellie A (2004) Accelerated K-means clustering in metric spaces. J Chem Inform Comput Sci 44:1929–1935
Acknowledgments
S.K. and J.L. thank the National Science Foundation (NSF/CREST HRD-1547754, and NSF/RISE HRD-1547836) for financial support. K.R. is thankful to the UGC, New Delhi for financial assistance under the UPE II scheme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Kar, S., Roy, K., Leszczynski, J. (2018). Applicability Domain: A Step Toward Confident Predictions and Decidability for QSAR Modeling. In: Nicolotti, O. (eds) Computational Toxicology. Methods in Molecular Biology, vol 1800. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-7899-1_6
Download citation
DOI: https://doi.org/10.1007/978-1-4939-7899-1_6
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-7898-4
Online ISBN: 978-1-4939-7899-1
eBook Packages: Springer Protocols