Skip to main content

Bayesian Methods in Virtual Screening and Chemical Biology

  • Protocol
  • First Online:
Book cover Chemoinformatics and Computational Chemical Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 672))

Abstract

The Naïve Bayesian Classifier, as well as related classification and regression approaches based on Bayes’ theorem, has experienced increased attention in the cheminformatics world in recent years. In this contribution, we first review the mathematical framework on which Bayes’ methods are built, and then continue to discuss implications of this framework as well as practical experience under which conditions Bayes’ methods give the best performance in virtual screening settings. Finally, we present an overview of applications of Bayes’ methods to both virtual screening and the chemical biology arena, where applications range from bridging phenotypic and mechanistic space of drug action to the prediction of ligand–target interactions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bayes, T. (1763) An essay towards solving a problem in the doctrine of chances. Philos. Trans. R. Soc. London, 53, 370–418.

    Article  Google Scholar 

  2. Kohavi, R., Becker, B., and Sommerfield, D. (1997) Improving simple Bayes. Proc. 9th Europ. Conf. Mach. Learn., 78–87.

    Google Scholar 

  3. Domingos, P., and Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Mach. Learn., 29, 103–130.

    Article  Google Scholar 

  4. Dougherty, J., Kovahi, R., and Sahami, M. (1995) Supervised and unsupervised discretization of continuous features. Proc. 12th Int. Conf. Mach. Learn., 194–202.

    Google Scholar 

  5. Rish, I., Hellerstein, J., and Thathachar, J. (2001) An analysis of data characteristics that affect naive Bayes performance. IBM Research Report RC21993.

    Google Scholar 

  6. Rish, I., Hellerstein, J. L., and Jayram, T. S. (2001) An analysis of naive Bayes Classifer on low-entropy distributions. IBM Research Report RC91994.

    Google Scholar 

  7. Bender, A., and Glen, R. C. (2004) Molecular similarity: a key technique in molecular informatics. Org. Biomol. Chem., 2, 3204–3218.

    Article  PubMed  CAS  Google Scholar 

  8. Glick, M., Jenkins, J. L., Nettles, J. H., Hitchings, H., and Davies, J. W. (2006) Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and Laplacian-modified naive Bayesian classifiers. J. Chem. Inf. Model., 46, 193–200.

    Article  PubMed  CAS  Google Scholar 

  9. Lameijer, E. W., Kok, J. N., Back, T., and Ijzerman, A. P. (2006) Mining a chemical database for fragment co-occurrence: discovery of “chemical cliches”. J. Chem. Inf. Model., 46, 553–562.

    Article  PubMed  CAS  Google Scholar 

  10. Abdo, A., and Salim, N. (2009) Similarity-based virtual screening with a Bayesian inference network. ChemMedChem, 4, 210–218.

    Article  PubMed  CAS  Google Scholar 

  11. Cloutier, L. M., and Sirois, S. (2008) Bayesian versus Frequentist statistical modeling: a debate for hit selection from HTS campaigns. Drug Discov. Today, 13, 536–542.

    Article  PubMed  CAS  Google Scholar 

  12. Zhou, Y. (2004) Choice of designs and doses for early phase trials. Fundam. Clin. Pharmacol., 18, 373–378.

    Article  PubMed  CAS  Google Scholar 

  13. Gilmore, S. J. (2008) Evaluating statistics in clinical trials: making the unintelligible intelligible. Australas. J. Dermatol., 49, 177–184; quiz 185–186.

    Article  PubMed  Google Scholar 

  14. Klon, A. E. (2009) Bayesian modeling in virtual high throughput screening. Comb. Chem. High Throughput Screen., 12, 469–483.

    Article  PubMed  CAS  Google Scholar 

  15. Labute, P. (1999) Binary QSAR: a new method for the determination of quantitative structure-activity relationships. Pac. Symp. Biocomput., 4, 444–455.

    Google Scholar 

  16. Chen, B., Harrison, R. F., Papadatos, G., Willett, P., Wood, D. J., Lewell, X. Q., et al. (2007) Evaluation of machine-learning methods for ligand-based virtual screening. J. Comput. Aided Mol. Des., 21, 53–62.

    Article  PubMed  Google Scholar 

  17. Bender, A., Mussa, H. Y., Glen, R. C., and Reiling, S. (2004) Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J. Chem. Inf. Comput. Sci., 44, 1708–1718.

    Article  PubMed  CAS  Google Scholar 

  18. Gao, H., Williams, C., Labute, P., and Bajorath, J. (1999) Binary quantitative structure-activity relationship (QSAR) analysis of estrogen receptor ligands. J. Chem. Inf. Comput. Sci., 39, 164–168.

    Article  PubMed  CAS  Google Scholar 

  19. Stahura, F. L., Godden, J. W., Xue, L., and Bajorath, J. (2000) Distinguishing between natural products and synthetic molecules by descriptor Shannon entropy analysis and binary QSAR calculations. J. Chem. Inf. Comput. Sci., 40, 1245–1252.

    Article  PubMed  CAS  Google Scholar 

  20. Labute, P., Nilar, S., and Williams, C. (2002) A probabilistic approach to high throughput drug discovery. Comb. Chem. High Throughput Screen., 5, 135–145.

    Article  PubMed  CAS  Google Scholar 

  21. Jacobsson, M., Liden, P., Stjernschantz, E., Bostrom, H., and Norinder, U. (2003) Improving structure-based virtual screening by multivariate analysis of scoring data. J. Med. Chem., 46, 5781–5789.

    Article  PubMed  CAS  Google Scholar 

  22. Bender, A., Mussa, H. Y., Glen, R. C., and Reiling, S. (2004) Molecular similarity searching using atom environments, information-based feature selection, and a naive bayesian classifier. J. Chem. Inf. Comput. Sci., 44, 170–178.

    Article  PubMed  CAS  Google Scholar 

  23. Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., et al. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci., 44, 1177–1185.

    Article  PubMed  CAS  Google Scholar 

  24. Glen, R. C., Bender, A., Arnby, C. H., Carlsson, L., Boyer, S., and Smith, J. (2006) Circular fingerprints: flexible molecular descriptors with applications from physical chemistry to ADME. IDrugs, 9, 199–204.

    CAS  Google Scholar 

  25. Bender, A., Mussa, H. Y., Gill, G. S., and Glen, R. C. (2004) Molecular surface point environments for virtual screening and the elucidation of binding patterns (MOLPRINT 3D). J. Med. Chem., 47, 6569–6583.

    Article  PubMed  CAS  Google Scholar 

  26. Liu, Y. (2004) A comparative study on feature selection methods for drug discovery. J. Chem. Inf. Comput. Sci., 44, 1823–1828.

    Article  PubMed  CAS  Google Scholar 

  27. Godden, J. W. and Bajorath, J. (2003) An information-theoretic approach to descriptor selection for database profiling and QSAR modeling. QSAR Comb. Sci., 22, 487–497.

    Article  CAS  Google Scholar 

  28. Vogt, M., and Bajorath, J. (2008) Bayesian similarity searching in high-dimensional descriptor spaces combined with Kullback-Leibler descriptor divergence analysis. J. Chem. Inf. Model., 48, 247–255.

    Article  PubMed  CAS  Google Scholar 

  29. Diller, D. J., and Hobbs, D. W. (2004) Deriving knowledge through data mining high-throughput screening data. J. Med. Chem., 47, 6373–6383.

    Article  PubMed  CAS  Google Scholar 

  30. Wasserman, L. (2000) Bayesian model selection and model averaging. J. Math. Psychol., 44, 92–107.

    Article  PubMed  Google Scholar 

  31. Angelopoulos, N., Hadjiprocopis, A., and Walkinshaw, M. D. (2009) Bayesian model averaging for ligand discovery. J. Chem. Inf. Model., 49, 1547–1557.

    Article  PubMed  CAS  Google Scholar 

  32. Parker, C. N. (2005) McMaster university data-mining and docking competition – computational models on the catwalk. J. Biomol. Screen., 10, 647–648.

    Article  PubMed  Google Scholar 

  33. Rogers, D., Brown, R. D., and Hahn, M. (2005) Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening follow-up. J. Biomol. Screen., 10, 682–686.

    Article  PubMed  CAS  Google Scholar 

  34. Bender, A., Mussa, H. Y., and Glen, R. C. (2005) Screening for dihydrofolate reductase inhibitors using MOLPRINT 2D, a fast fragment-based method employing the naive Bayesian classifier: limitations of the descriptor and the importance of balanced chemistry in training and test sets. J. Biomol. Screen., 10, 658–666.

    Article  PubMed  CAS  Google Scholar 

  35. Glick, M., Klon, A. E., Acklin, P., and Davies, J. W. (2004) Enrichment of extremely noisy high-throughput screening data using a naive Bayes classifier. J. Biomol. Screen., 9, 32–36.

    Article  PubMed  CAS  Google Scholar 

  36. Schneider, G., Schneider, P., and Renner, S. (2006) Scaffold-hopping: how far can you jump? QSAR Comb. Sci., 25, 1162–1171.

    Article  CAS  Google Scholar 

  37. Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., et al. (2004) Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Org. Biomol. Chem., 2, 3256–3266.

    Article  PubMed  CAS  Google Scholar 

  38. Crisman, T. J., Bender, A., Milik, M., Jenkins, J. L., Scheiber, J., Sukuru, S. C., et al. (2008) “Virtual fragment linking”: an approach to identify potent binders from low affinity fragment hits. J. Med. Chem., 51, 2481–2491.

    Article  PubMed  CAS  Google Scholar 

  39. Burden, F. R., and Winkler, D. A. (1999) Robust QSAR models using Bayesian regularized neural networks. J. Med. Chem., 42, 3183–3187.

    Article  PubMed  CAS  Google Scholar 

  40. Manning, G., Whyte, D. B., Martinez, R., Hunter, T., and Sudarsanam, S. (2002) The protein kinase complement of the human genome. Science, 298, 1912–1934.

    Article  PubMed  CAS  Google Scholar 

  41. Sutherland, J. J., Higgs, R. E., Watson, I., and Vieth, M. (2008) Chemical fragments as foundations for understanding target space and activity prediction. J. Med. Chem., 51, 2689–2700.

    Article  PubMed  CAS  Google Scholar 

  42. Vieth, M., Erickson, J., Wang, J., Webster, Y., Mader, M., Higgs, R., et al. (2009) Kinase inhibitor data modeling and de novo inhibitor design with fragment approaches. J. Med. Chem., 52, 6456–6466.

    Article  PubMed  CAS  Google Scholar 

  43. Bender, A., Jenkins, J. L., Glick, M., Deng, Z., Nettles, J. H., and Davies, J. W. (2006) “Bayes Affinity Fingerprints” improve retrieval rates in virtual screening and define orthogonal bioactivity space: when are multitarget drugs a feasible concept? J. Chem. Inf. Model., 46, 2445–2456.

    Article  PubMed  CAS  Google Scholar 

  44. Keiser, M. J., Roth, B. L., Armbruster, B. N., Ernsberger, P., Irwin, J. J., and Shoichet, B. K. (2007) Relating protein pharmacology by ligand chemistry. Nat. Biotechnol., 25, 197–206.

    Article  PubMed  CAS  Google Scholar 

  45. Glen, R. C., and Allen, S. C. (2003) Ligand-protein docking: cancer research at the interface between biology and chemistry. Curr. Med. Chem., 10, 767–782.

    Article  Google Scholar 

  46. Warren, G. L., Andrews, C. W., Capelli, A.-M., Clarke, B., LaLonde, J., Lambert, M. H., et al. (2006) A critical assessment of docking programs and scoring functions. J. Med. Chem., 49, 5912–5931.

    Article  PubMed  CAS  Google Scholar 

  47. Prathipati, P., and Saxena, A. K. (2006) Evaluation of binary QSAR models derived from LUDI and MOE scoring functions for structure based virtual screening. J. Chem Inf. Model., 46, 39–51.

    Article  PubMed  CAS  Google Scholar 

  48. Klon, A. E., Glick, M., Thoma, M., Acklin, P., and Davies, J. W. (2004) Finding more needles in the haystack: a simple and efficient method for improving high-throughput docking results. J. Med. Chem., 47, 2743–2749.

    Article  PubMed  CAS  Google Scholar 

  49. Yoon, S., Smellie, A., Hartsough, D., and Filikov, A. (2005) Surrogate docking: structure-based virtual screening at high throughput speed. J. Comput. Aided Mol. Des., 19, 483–497.

    Article  PubMed  CAS  Google Scholar 

  50. Cotesta, S., Giordanetto, F., Trosset, J. Y., Crivori, P., Kroemer, R. T., Stouten, P. F., et al. (2005) Virtual screening to enrich a compound collection with CDK2 inhibitors using docking, scoring, and composite scoring models. Proteins, 60, 629–643.

    Article  PubMed  CAS  Google Scholar 

  51. Nidhi, Glick, M., Davies, J. W., and Jenkins, J. L. (2006) Prediction of biological targets for compounds using multiple-category Bayesian models trained on chemogenomics databases. J. Chem. Inf. Model., 46, 1124–1133.

    Google Scholar 

  52. Young, D. W., Bender, A., Hoyt, J., McWhinnie, E., Chirn, G. W., Tao, C. Y., et al. (2008) Integrating high-content screening and ligand-target prediction to identify mechanism of action. Nat. Chem. Biol., 4, 59–68.

    Article  PubMed  CAS  Google Scholar 

  53. Feng, Y., Mitchison, T. J., Bender, A., Young, D. W., and Tallarico, J. A. (2009) Multi-parameter phenotypic profiling: using cellular effects to characterize small-molecule compounds. Nat. Rev. Drug Discov., 8, 567–578.

    Article  PubMed  CAS  Google Scholar 

  54. Whitebread, S., Hamon, J., Bojanic, D., and Urban, L. (2005) In vitro safety pharmacology profiling: an essential tool for successful drug development. Drug Discov. Today, 10, 1421–1433.

    Article  PubMed  CAS  Google Scholar 

  55. Rantanen, V. V., Gyllenberg, M., Koski, T., and Johnson, M. S. (2003) A Bayesian molecular interaction library. J. Comput. Aided Mol. Des., 17, 435–461.

    Article  PubMed  CAS  Google Scholar 

  56. Rantanen, V. V., Denessiouk, K. A., Gyllenberg, M., Koski, T., and Johnson, M. S. (2001) A fragment library based on Gaussian mixtures predicting favorable molecular interactions. J. Mol. Biol., 313, 197–214.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Bender, A. (2010). Bayesian Methods in Virtual Screening and Chemical Biology. In: Bajorath, J. (eds) Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology, vol 672. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-839-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-839-3_7

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60761-838-6

  • Online ISBN: 978-1-60761-839-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics