Bayesian Methods in Virtual Screening and Chemical Biology

Part of the Methods in Molecular Biology book series (MIMB, volume 672)


The Naïve Bayesian Classifier, as well as related classification and regression approaches based on Bayes’ theorem, has experienced increased attention in the cheminformatics world in recent years. In this contribution, we first review the mathematical framework on which Bayes’ methods are built, and then continue to discuss implications of this framework as well as practical experience under which conditions Bayes’ methods give the best performance in virtual screening settings. Finally, we present an overview of applications of Bayes’ methods to both virtual screening and the chemical biology arena, where applications range from bridging phenotypic and mechanistic space of drug action to the prediction of ligand–target interactions.

Key words

Bayes Classifier Virtual screening Structure-activity relationships Mode of action analysis Target prediction Adverse drug reactions 


  1. 1.
    Bayes, T. (1763) An essay towards solving a problem in the doctrine of chances. Philos. Trans. R. Soc. London, 53, 370–418.CrossRefGoogle Scholar
  2. 2.
    Kohavi, R., Becker, B., and Sommerfield, D. (1997) Improving simple Bayes. Proc. 9th Europ. Conf. Mach. Learn., 78–87.Google Scholar
  3. 3.
    Domingos, P., and Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn., 29, 103–130.CrossRefGoogle Scholar
  4. 4.
    Dougherty, J., Kovahi, R., and Sahami, M. (1995) Supervised and unsupervised discretization of continuous features. Proc. 12th Int. Conf. Mach. Learn., 194–202.Google Scholar
  5. 5.
    Rish, I., Hellerstein, J., and Thathachar, J. (2001) An analysis of data characteristics that affect naive Bayes performance. IBM Research Report RC21993.Google Scholar
  6. 6.
    Rish, I., Hellerstein, J. L., and Jayram, T. S. (2001) An analysis of naive Bayes Classifer on low-entropy distributions. IBM Research Report RC91994.Google Scholar
  7. 7.
    Bender, A., and Glen, R. C. (2004) Molecular similarity: a key technique in molecular informatics. Org. Biomol. Chem., 2, 3204–3218.PubMedCrossRefGoogle Scholar
  8. 8.
    Glick, M., Jenkins, J. L., Nettles, J. H., Hitchings, H., and Davies, J. W. (2006) Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and Laplacian-modified naive Bayesian classifiers. J. Chem. Inf. Model., 46, 193–200.PubMedCrossRefGoogle Scholar
  9. 9.
    Lameijer, E. W., Kok, J. N., Back, T., and Ijzerman, A. P. (2006) Mining a chemical database for fragment co-occurrence: discovery of “chemical cliches”. J. Chem. Inf. Model., 46, 553–562.PubMedCrossRefGoogle Scholar
  10. 10.
    Abdo, A., and Salim, N. (2009) Similarity-based virtual screening with a Bayesian inference network. ChemMedChem, 4, 210–218.PubMedCrossRefGoogle Scholar
  11. 11.
    Cloutier, L. M., and Sirois, S. (2008) Bayesian versus Frequentist statistical modeling: a debate for hit selection from HTS campaigns. Drug Discov. Today, 13, 536–542.PubMedCrossRefGoogle Scholar
  12. 12.
    Zhou, Y. (2004) Choice of designs and doses for early phase trials. Fundam. Clin. Pharmacol., 18, 373–378.PubMedCrossRefGoogle Scholar
  13. 13.
    Gilmore, S. J. (2008) Evaluating statistics in clinical trials: making the unintelligible intelligible. Australas. J. Dermatol., 49, 177–184; quiz 185–186.PubMedCrossRefGoogle Scholar
  14. 14.
    Klon, A. E. (2009) Bayesian modeling in virtual high throughput screening. Comb. Chem. High Throughput Screen., 12, 469–483.PubMedCrossRefGoogle Scholar
  15. 15.
    Labute, P. (1999) Binary QSAR: a new method for the determination of quantitative structure-activity relationships. Pac. Symp. Biocomput., 4, 444–455.Google Scholar
  16. 16.
    Chen, B., Harrison, R. F., Papadatos, G., Willett, P., Wood, D. J., Lewell, X. Q., et al. (2007) Evaluation of machine-learning methods for ligand-based virtual screening. J. Comput. Aided Mol. Des., 21, 53–62.PubMedCrossRefGoogle Scholar
  17. 17.
    Bender, A., Mussa, H. Y., Glen, R. C., and Reiling, S. (2004) Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J. Chem. Inf. Comput. Sci., 44, 1708–1718.PubMedCrossRefGoogle Scholar
  18. 18.
    Gao, H., Williams, C., Labute, P., and Bajorath, J. (1999) Binary quantitative structure-activity relationship (QSAR) analysis of estrogen receptor ligands. J. Chem. Inf. Comput. Sci., 39, 164–168.PubMedCrossRefGoogle Scholar
  19. 19.
    Stahura, F. L., Godden, J. W., Xue, L., and Bajorath, J. (2000) Distinguishing between natural products and synthetic molecules by descriptor Shannon entropy analysis and binary QSAR calculations. J. Chem. Inf. Comput. Sci., 40, 1245–1252.PubMedCrossRefGoogle Scholar
  20. 20.
    Labute, P., Nilar, S., and Williams, C. (2002) A probabilistic approach to high throughput drug discovery. Comb. Chem. High Throughput Screen., 5, 135–145.PubMedCrossRefGoogle Scholar
  21. 21.
    Jacobsson, M., Liden, P., Stjernschantz, E., Bostrom, H., and Norinder, U. (2003) Improving structure-based virtual screening by multivariate analysis of scoring data. J. Med. Chem., 46, 5781–5789.PubMedCrossRefGoogle Scholar
  22. 22.
    Bender, A., Mussa, H. Y., Glen, R. C., and Reiling, S. (2004) Molecular similarity searching using atom environments, information-based feature selection, and a naive bayesian classifier. J. Chem. Inf. Comput. Sci., 44, 170–178.PubMedCrossRefGoogle Scholar
  23. 23.
    Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., et al. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci., 44, 1177–1185.PubMedCrossRefGoogle Scholar
  24. 24.
    Glen, R. C., Bender, A., Arnby, C. H., Carlsson, L., Boyer, S., and Smith, J. (2006) Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. IDrugs, 9, 199–204.Google Scholar
  25. 25.
    Bender, A., Mussa, H. Y., Gill, G. S., and Glen, R. C. (2004) Molecular surface point environments for virtual screening and the elucidation of binding patterns (MOLPRINT 3D). J. Med. Chem., 47, 6569–6583.PubMedCrossRefGoogle Scholar
  26. 26.
    Liu, Y. (2004) A comparative study on feature selection methods for drug discovery. J. Chem. Inf. Comput. Sci., 44, 1823–1828.PubMedCrossRefGoogle Scholar
  27. 27.
    Godden, J. W. and Bajorath, J. (2003) An information-theoretic approach to descriptor selection for database profiling and QSAR modeling. QSAR Comb. Sci., 22, 487–497.CrossRefGoogle Scholar
  28. 28.
    Vogt, M., and Bajorath, J. (2008) Bayesian similarity searching in high-dimensional descriptor spaces combined with Kullback-Leibler descriptor divergence analysis. J. Chem. Inf. Model., 48, 247–255.PubMedCrossRefGoogle Scholar
  29. 29.
    Diller, D. J., and Hobbs, D. W. (2004) Deriving knowledge through data mining high-throughput screening data. J. Med. Chem., 47, 6373–6383.PubMedCrossRefGoogle Scholar
  30. 30.
    Wasserman, L. (2000) Bayesian model selection and model averaging. J. Math. Psychol., 44, 92–107.PubMedCrossRefGoogle Scholar
  31. 31.
    Angelopoulos, N., Hadjiprocopis, A., and Walkinshaw, M. D. (2009) Bayesian model averaging for ligand discovery. J. Chem. Inf. Model., 49, 1547–1557.PubMedCrossRefGoogle Scholar
  32. 32.
    Parker, C. N. (2005) McMaster university data-mining and docking competition – computational models on the catwalk. J. Biomol. Screen., 10, 647–648.PubMedCrossRefGoogle Scholar
  33. 33.
    Rogers, D., Brown, R. D., and Hahn, M. (2005) Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening follow-up. J. Biomol. Screen., 10, 682–686.PubMedCrossRefGoogle Scholar
  34. 34.
    Bender, A., Mussa, H. Y., and Glen, R. C. (2005) Screening for dihydrofolate reductase inhibitors using MOLPRINT 2D, a fast fragment-based method employing the naive Bayesian classifier: limitations of the descriptor and the importance of balanced chemistry in training and test sets. J. Biomol. Screen., 10, 658–666.PubMedCrossRefGoogle Scholar
  35. 35.
    Glick, M., Klon, A. E., Acklin, P., and Davies, J. W. (2004) Enrichment of extremely noisy high-throughput screening data using a naive Bayes classifier. J. Biomol. Screen., 9, 32–36.PubMedCrossRefGoogle Scholar
  36. 36.
    Schneider, G., Schneider, P., and Renner, S. (2006) Scaffold-hopping: how far can you jump? QSAR Comb. Sci., 25, 1162–1171.CrossRefGoogle Scholar
  37. 37.
    Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., et al. (2004) Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Org. Biomol. Chem., 2, 3256–3266.PubMedCrossRefGoogle Scholar
  38. 38.
    Crisman, T. J., Bender, A., Milik, M., Jenkins, J. L., Scheiber, J., Sukuru, S. C., et al. (2008) “Virtual fragment linking”: an approach to identify potent binders from low affinity fragment hits. J. Med. Chem., 51, 2481–2491.PubMedCrossRefGoogle Scholar
  39. 39.
    Burden, F. R., and Winkler, D. A. (1999) Robust QSAR models using Bayesian regularized neural networks. J. Med. Chem., 42, 3183–3187.PubMedCrossRefGoogle Scholar
  40. 40.
    Manning, G., Whyte, D. B., Martinez, R., Hunter, T., and Sudarsanam, S. (2002) The protein kinase complement of the human genome. Science, 298, 1912–1934.PubMedCrossRefGoogle Scholar
  41. 41.
    Sutherland, J. J., Higgs, R. E., Watson, I., and Vieth, M. (2008) Chemical fragments as foundations for understanding target space and activity prediction. J. Med. Chem., 51, 2689–2700.PubMedCrossRefGoogle Scholar
  42. 42.
    Vieth, M., Erickson, J., Wang, J., Webster, Y., Mader, M., Higgs, R., et al. (2009) Kinase inhibitor data modeling and de novo inhibitor design with fragment approaches. J. Med. Chem., 52, 6456–6466.PubMedCrossRefGoogle Scholar
  43. 43.
    Bender, A., Jenkins, J. L., Glick, M., Deng, Z., Nettles, J. H., and Davies, J. W. (2006) “Bayes Affinity Fingerprints” improve retrieval rates in virtual screening and define orthogonal bioactivity space: when are multitarget drugs a feasible concept? J. Chem. Inf. Model., 46, 2445–2456.PubMedCrossRefGoogle Scholar
  44. 44.
    Keiser, M. J., Roth, B. L., Armbruster, B. N., Ernsberger, P., Irwin, J. J., and Shoichet, B. K. (2007) Relating protein pharmacology by ligand chemistry. Nat. Biotechnol., 25, 197–206.PubMedCrossRefGoogle Scholar
  45. 45.
    Glen, R. C., and Allen, S. C. (2003) Ligand-protein docking: cancer research at the interface between biology and chemistry. Curr. Med. Chem., 10, 767–782.CrossRefGoogle Scholar
  46. 46.
    Warren, G. L., Andrews, C. W., Capelli, A.-M., Clarke, B., LaLonde, J., Lambert, M. H., et al. (2006) A critical assessment of docking programs and scoring functions. J. Med. Chem., 49, 5912–5931.PubMedCrossRefGoogle Scholar
  47. 47.
    Prathipati, P., and Saxena, A. K. (2006) Evaluation of binary QSAR models derived from LUDI and MOE scoring functions for structure based virtual screening. J. Chem Inf. Model., 46, 39–51.PubMedCrossRefGoogle Scholar
  48. 48.
    Klon, A. E., Glick, M., Thoma, M., Acklin, P., and Davies, J. W. (2004) Finding more needles in the haystack: a simple and efficient method for improving high-throughput docking results. J. Med. Chem., 47, 2743–2749.PubMedCrossRefGoogle Scholar
  49. 49.
    Yoon, S., Smellie, A., Hartsough, D., and Filikov, A. (2005) Surrogate docking: structure-based virtual screening at high throughput speed. J. Comput. Aided Mol. Des., 19, 483–497.PubMedCrossRefGoogle Scholar
  50. 50.
    Cotesta, S., Giordanetto, F., Trosset, J. Y., Crivori, P., Kroemer, R. T., Stouten, P. F., et al. (2005) Virtual screening to enrich a compound collection with CDK2 inhibitors using docking, scoring, and composite scoring models. Proteins, 60, 629–643.PubMedCrossRefGoogle Scholar
  51. 51.
    Nidhi, Glick, M., Davies, J. W., and Jenkins, J. L. (2006) Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J. Chem. Inf. Model., 46, 1124–1133.Google Scholar
  52. 52.
    Young, D. W., Bender, A., Hoyt, J., McWhinnie, E., Chirn, G. W., Tao, C. Y., et al. (2008) Integrating high-content screening and ligand-target prediction to identify mechanism of action. Nat. Chem. Biol., 4, 59–68.PubMedCrossRefGoogle Scholar
  53. 53.
    Feng, Y., Mitchison, T. J., Bender, A., Young, D. W., and Tallarico, J. A. (2009) Multi-parameter phenotypic profiling: using cellular effects to characterize small-molecule compounds. Nat. Rev. Drug Discov., 8, 567–578.PubMedCrossRefGoogle Scholar
  54. 54.
    Whitebread, S., Hamon, J., Bojanic, D., and Urban, L. (2005) In vitro safety pharmacology profiling: an essential tool for successful drug development. Drug Discov. Today, 10, 1421–1433.PubMedCrossRefGoogle Scholar
  55. 55.
    Rantanen, V. V., Gyllenberg, M., Koski, T., and Johnson, M. S. (2003) A Bayesian molecular interaction library. J. Comput. Aided Mol. Des., 17, 435–461.PubMedCrossRefGoogle Scholar
  56. 56.
    Rantanen, V. V., Denessiouk, K. A., Gyllenberg, M., Koski, T., and Johnson, M. S. (2001) A fragment library based on Gaussian mixtures predicting favorable molecular interactions. J. Mol. Biol., 313, 197–214.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Gorlaeus Laboratories, Center for Drug Research, Medicinal ChemistryUniversiteit Leiden/AmsterdamLeidenThe Netherlands

Personalised recommendations