Abstract
The Naïve Bayesian Classifier, as well as related classification and regression approaches based on Bayes’ theorem, has experienced increased attention in the cheminformatics world in recent years. In this contribution, we first review the mathematical framework on which Bayes’ methods are built, and then continue to discuss implications of this framework as well as practical experience under which conditions Bayes’ methods give the best performance in virtual screening settings. Finally, we present an overview of applications of Bayes’ methods to both virtual screening and the chemical biology arena, where applications range from bridging phenotypic and mechanistic space of drug action to the prediction of ligand–target interactions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bayes, T. (1763) An essay towards solving a problem in the doctrine of chances. Philos. Trans. R. Soc. London, 53, 370–418.
Kohavi, R., Becker, B., and Sommerfield, D. (1997) Improving simple Bayes. Proc. 9th Europ. Conf. Mach. Learn., 78–87.
Domingos, P., and Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn., 29, 103–130.
Dougherty, J., Kovahi, R., and Sahami, M. (1995) Supervised and unsupervised discretization of continuous features. Proc. 12th Int. Conf. Mach. Learn., 194–202.
Rish, I., Hellerstein, J., and Thathachar, J. (2001) An analysis of data characteristics that affect naive Bayes performance. IBM Research Report RC21993.
Rish, I., Hellerstein, J. L., and Jayram, T. S. (2001) An analysis of naive Bayes Classifer on low-entropy distributions. IBM Research Report RC91994.
Bender, A., and Glen, R. C. (2004) Molecular similarity: a key technique in molecular informatics. Org. Biomol. Chem., 2, 3204–3218.
Glick, M., Jenkins, J. L., Nettles, J. H., Hitchings, H., and Davies, J. W. (2006) Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and Laplacian-modified naive Bayesian classifiers. J. Chem. Inf. Model., 46, 193–200.
Lameijer, E. W., Kok, J. N., Back, T., and Ijzerman, A. P. (2006) Mining a chemical database for fragment co-occurrence: discovery of “chemical cliches”. J. Chem. Inf. Model., 46, 553–562.
Abdo, A., and Salim, N. (2009) Similarity-based virtual screening with a Bayesian inference network. ChemMedChem, 4, 210–218.
Cloutier, L. M., and Sirois, S. (2008) Bayesian versus Frequentist statistical modeling: a debate for hit selection from HTS campaigns. Drug Discov. Today, 13, 536–542.
Zhou, Y. (2004) Choice of designs and doses for early phase trials. Fundam. Clin. Pharmacol., 18, 373–378.
Gilmore, S. J. (2008) Evaluating statistics in clinical trials: making the unintelligible intelligible. Australas. J. Dermatol., 49, 177–184; quiz 185–186.
Klon, A. E. (2009) Bayesian modeling in virtual high throughput screening. Comb. Chem. High Throughput Screen., 12, 469–483.
Labute, P. (1999) Binary QSAR: a new method for the determination of quantitative structure-activity relationships. Pac. Symp. Biocomput., 4, 444–455.
Chen, B., Harrison, R. F., Papadatos, G., Willett, P., Wood, D. J., Lewell, X. Q., et al. (2007) Evaluation of machine-learning methods for ligand-based virtual screening. J. Comput. Aided Mol. Des., 21, 53–62.
Bender, A., Mussa, H. Y., Glen, R. C., and Reiling, S. (2004) Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J. Chem. Inf. Comput. Sci., 44, 1708–1718.
Gao, H., Williams, C., Labute, P., and Bajorath, J. (1999) Binary quantitative structure-activity relationship (QSAR) analysis of estrogen receptor ligands. J. Chem. Inf. Comput. Sci., 39, 164–168.
Stahura, F. L., Godden, J. W., Xue, L., and Bajorath, J. (2000) Distinguishing between natural products and synthetic molecules by descriptor Shannon entropy analysis and binary QSAR calculations. J. Chem. Inf. Comput. Sci., 40, 1245–1252.
Labute, P., Nilar, S., and Williams, C. (2002) A probabilistic approach to high throughput drug discovery. Comb. Chem. High Throughput Screen., 5, 135–145.
Jacobsson, M., Liden, P., Stjernschantz, E., Bostrom, H., and Norinder, U. (2003) Improving structure-based virtual screening by multivariate analysis of scoring data. J. Med. Chem., 46, 5781–5789.
Bender, A., Mussa, H. Y., Glen, R. C., and Reiling, S. (2004) Molecular similarity searching using atom environments, information-based feature selection, and a naive bayesian classifier. J. Chem. Inf. Comput. Sci., 44, 170–178.
Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., et al. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci., 44, 1177–1185.
Glen, R. C., Bender, A., Arnby, C. H., Carlsson, L., Boyer, S., and Smith, J. (2006) Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. IDrugs, 9, 199–204.
Bender, A., Mussa, H. Y., Gill, G. S., and Glen, R. C. (2004) Molecular surface point environments for virtual screening and the elucidation of binding patterns (MOLPRINT 3D). J. Med. Chem., 47, 6569–6583.
Liu, Y. (2004) A comparative study on feature selection methods for drug discovery. J. Chem. Inf. Comput. Sci., 44, 1823–1828.
Godden, J. W. and Bajorath, J. (2003) An information-theoretic approach to descriptor selection for database profiling and QSAR modeling. QSAR Comb. Sci., 22, 487–497.
Vogt, M., and Bajorath, J. (2008) Bayesian similarity searching in high-dimensional descriptor spaces combined with Kullback-Leibler descriptor divergence analysis. J. Chem. Inf. Model., 48, 247–255.
Diller, D. J., and Hobbs, D. W. (2004) Deriving knowledge through data mining high-throughput screening data. J. Med. Chem., 47, 6373–6383.
Wasserman, L. (2000) Bayesian model selection and model averaging. J. Math. Psychol., 44, 92–107.
Angelopoulos, N., Hadjiprocopis, A., and Walkinshaw, M. D. (2009) Bayesian model averaging for ligand discovery. J. Chem. Inf. Model., 49, 1547–1557.
Parker, C. N. (2005) McMaster university data-mining and docking competition – computational models on the catwalk. J. Biomol. Screen., 10, 647–648.
Rogers, D., Brown, R. D., and Hahn, M. (2005) Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening follow-up. J. Biomol. Screen., 10, 682–686.
Bender, A., Mussa, H. Y., and Glen, R. C. (2005) Screening for dihydrofolate reductase inhibitors using MOLPRINT 2D, a fast fragment-based method employing the naive Bayesian classifier: limitations of the descriptor and the importance of balanced chemistry in training and test sets. J. Biomol. Screen., 10, 658–666.
Glick, M., Klon, A. E., Acklin, P., and Davies, J. W. (2004) Enrichment of extremely noisy high-throughput screening data using a naive Bayes classifier. J. Biomol. Screen., 9, 32–36.
Schneider, G., Schneider, P., and Renner, S. (2006) Scaffold-hopping: how far can you jump? QSAR Comb. Sci., 25, 1162–1171.
Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., et al. (2004) Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Org. Biomol. Chem., 2, 3256–3266.
Crisman, T. J., Bender, A., Milik, M., Jenkins, J. L., Scheiber, J., Sukuru, S. C., et al. (2008) “Virtual fragment linking”: an approach to identify potent binders from low affinity fragment hits. J. Med. Chem., 51, 2481–2491.
Burden, F. R., and Winkler, D. A. (1999) Robust QSAR models using Bayesian regularized neural networks. J. Med. Chem., 42, 3183–3187.
Manning, G., Whyte, D. B., Martinez, R., Hunter, T., and Sudarsanam, S. (2002) The protein kinase complement of the human genome. Science, 298, 1912–1934.
Sutherland, J. J., Higgs, R. E., Watson, I., and Vieth, M. (2008) Chemical fragments as foundations for understanding target space and activity prediction. J. Med. Chem., 51, 2689–2700.
Vieth, M., Erickson, J., Wang, J., Webster, Y., Mader, M., Higgs, R., et al. (2009) Kinase inhibitor data modeling and de novo inhibitor design with fragment approaches. J. Med. Chem., 52, 6456–6466.
Bender, A., Jenkins, J. L., Glick, M., Deng, Z., Nettles, J. H., and Davies, J. W. (2006) “Bayes Affinity Fingerprints” improve retrieval rates in virtual screening and define orthogonal bioactivity space: when are multitarget drugs a feasible concept? J. Chem. Inf. Model., 46, 2445–2456.
Keiser, M. J., Roth, B. L., Armbruster, B. N., Ernsberger, P., Irwin, J. J., and Shoichet, B. K. (2007) Relating protein pharmacology by ligand chemistry. Nat. Biotechnol., 25, 197–206.
Glen, R. C., and Allen, S. C. (2003) Ligand-protein docking: cancer research at the interface between biology and chemistry. Curr. Med. Chem., 10, 767–782.
Warren, G. L., Andrews, C. W., Capelli, A.-M., Clarke, B., LaLonde, J., Lambert, M. H., et al. (2006) A critical assessment of docking programs and scoring functions. J. Med. Chem., 49, 5912–5931.
Prathipati, P., and Saxena, A. K. (2006) Evaluation of binary QSAR models derived from LUDI and MOE scoring functions for structure based virtual screening. J. Chem Inf. Model., 46, 39–51.
Klon, A. E., Glick, M., Thoma, M., Acklin, P., and Davies, J. W. (2004) Finding more needles in the haystack: a simple and efficient method for improving high-throughput docking results. J. Med. Chem., 47, 2743–2749.
Yoon, S., Smellie, A., Hartsough, D., and Filikov, A. (2005) Surrogate docking: structure-based virtual screening at high throughput speed. J. Comput. Aided Mol. Des., 19, 483–497.
Cotesta, S., Giordanetto, F., Trosset, J. Y., Crivori, P., Kroemer, R. T., Stouten, P. F., et al. (2005) Virtual screening to enrich a compound collection with CDK2 inhibitors using docking, scoring, and composite scoring models. Proteins, 60, 629–643.
Nidhi, Glick, M., Davies, J. W., and Jenkins, J. L. (2006) Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J. Chem. Inf. Model., 46, 1124–1133.
Young, D. W., Bender, A., Hoyt, J., McWhinnie, E., Chirn, G. W., Tao, C. Y., et al. (2008) Integrating high-content screening and ligand-target prediction to identify mechanism of action. Nat. Chem. Biol., 4, 59–68.
Feng, Y., Mitchison, T. J., Bender, A., Young, D. W., and Tallarico, J. A. (2009) Multi-parameter phenotypic profiling: using cellular effects to characterize small-molecule compounds. Nat. Rev. Drug Discov., 8, 567–578.
Whitebread, S., Hamon, J., Bojanic, D., and Urban, L. (2005) In vitro safety pharmacology profiling: an essential tool for successful drug development. Drug Discov. Today, 10, 1421–1433.
Rantanen, V. V., Gyllenberg, M., Koski, T., and Johnson, M. S. (2003) A Bayesian molecular interaction library. J. Comput. Aided Mol. Des., 17, 435–461.
Rantanen, V. V., Denessiouk, K. A., Gyllenberg, M., Koski, T., and Johnson, M. S. (2001) A fragment library based on Gaussian mixtures predicting favorable molecular interactions. J. Mol. Biol., 313, 197–214.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Bender, A. (2010). Bayesian Methods in Virtual Screening and Chemical Biology. In: Bajorath, J. (eds) Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology, vol 672. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-839-3_7
Download citation
DOI: https://doi.org/10.1007/978-1-60761-839-3_7
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60761-838-6
Online ISBN: 978-1-60761-839-3
eBook Packages: Springer Protocols