Skip to main content

Predicting the Performance of Fingerprint Similarity Searching

  • Protocol
  • First Online:
Chemoinformatics and Computational Chemical Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 672))

Abstract

Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the “background.” By quantifying the difference in feature distribution using the Kullback–Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Willett, P., Barnard, J. M., and Downs, G. M. (1998) Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996.

    Article  CAS  Google Scholar 

  2. Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nature Rev. Drug Discov. 1, 882–894.

    Article  CAS  Google Scholar 

  3. Willett, P. (2005) Searching techniques for databases of two- and three-dimensional chemical structures. J. Med. Chem. 48, 4183–4199.

    Article  PubMed  CAS  Google Scholar 

  4. Willett, P. (2006) Similarity-based virtual screening using 2D fingerprints. Drug Discov. Today 11, 1046–1053.

    Article  PubMed  CAS  Google Scholar 

  5. Barnard, J. M. and Downs, G. M. (1997) Chemical fragment generation and clustering software. J. Chem. Inf. Comput. Sci. 37, 141–142.

    Article  CAS  Google Scholar 

  6. Durant, J. L., Leland, B. A., Henry, D. R., and Nourse, J. G. (2002) Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280.

    Article  PubMed  CAS  Google Scholar 

  7. MACCS Structural Keys. Symyx Technologies, Inc., Sunnyvale, CA, http://www.symyx.com (accessed Sep 1, 2009).

  8. James, C. A, Weininger, D. Daylight Theory Manual, Vers. 4.9, Daylight Chemical Information Systems Inc., Aliso Viejo, CA, http://www.daylight.com/dayhtml/doc/theory (accessed Sep 1, 2009).

  9. Xue, L., Godden, J. W., Stahura, F. L., and Bajorath, J. (2003) Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme. J. Chem. Inf. Comput. Sci. 43, 1151–1157.

    Article  PubMed  CAS  Google Scholar 

  10. Bender, A, Mussa, Y, Glen, R. C., and Reiling, S. (2004) Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J. Chem. Inf. Comput. Sci. 44, 1708–1718.

    Article  PubMed  CAS  Google Scholar 

  11. Eckert, H. and Bajorath, J. (2006) Design and evaluation of a novel class-directed 2D fingerprint to search for structurally diverse active compounds. J. Chem. Inf. Model. 46, 2515–2526.

    Article  PubMed  CAS  Google Scholar 

  12. Mason, J. S., Morize, I., Menard, P. R., Cheney, D. L., Hulme, C., and Labaudiniere, R. F. (1999) New 4-point pharmacophore method for molecular similarity and diversity applications: overview over the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. J. Med. Chem. 42, 3251–3264.

    Article  PubMed  CAS  Google Scholar 

  13. Bradley, E. K., Beroza, P., Penzotti, J. E., Grootenhuis, P. D. J., Spellmeyer, D. C., and Miller, J. L. (2000) A rapid computational method for lead evolution: description and application to α1-adrenergic antagonists. J. Med. Chem. 43, 2770–2774.

    Article  PubMed  CAS  Google Scholar 

  14. Maggiora, G. M., and Johnson, M. A. (1990) Concepts and Applications of Molecular Similarity. Wiley: New York, NY, pp 99–117.

    Google Scholar 

  15. Hert, J., Willet, P., and Wilton, D. J. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci. 44, 1177–1185.

    Article  PubMed  CAS  Google Scholar 

  16. Schuffenhauer, A., Floersheim, P., Acklin, P., and Jacoby, E. (2003) Similarity metrics for ligands reflecting the similarity of the target protein. J. Chem. Inf. Comput. Sci. 43, 391–405.

    Article  PubMed  CAS  Google Scholar 

  17. Whittle, E., Gillet, V. J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: theoretical model. J. Chem. Inf. Model. 46, 2193–2205.

    Article  PubMed  CAS  Google Scholar 

  18. Whittle, E., Gillet, V. J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: similarity searching and group fusion. J. Chem. Inf. Model. 46, 2206–2219.

    Article  PubMed  CAS  Google Scholar 

  19. Hert, J., Willett, P, and Wilton, D. J. (2006) New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J. Chem. Inf. Model. 46, 462–470.

    Article  PubMed  CAS  Google Scholar 

  20. Lewis, D. D. (1998) Naïve (Bayes) at forty: the independence assumption in information retrieval. In Lecture notes in computer science: Machine learning ECML-98, Springer: Berlin, 4–15.

    Google Scholar 

  21. Zhang, H. (2004) The optimality of naïve Bayes. In Proceedings of the seventeenth Florida artificial intelligence research society conference. The AAAI Press: Menlo Park, CA, 562–567.

    Google Scholar 

  22. Ormerod, A., Willett, P., Bawden, D. (1989) Comparison of fragment weighting schemes for substructural analysis. Quant. Struct.-Act. Relat. 8, 115–129.

    Article  CAS  Google Scholar 

  23. Eckert, H. and Bajorath, J. (2007) Molecular similarity analysis in virtual screening: foundations, limitations, and novel approaches. Drug Discov. Today 12, 225–233.

    Article  PubMed  CAS  Google Scholar 

  24. Sheridan, R. P. and Kearsley, S. K. (2002) Why do we need so many chemical similarity search methods? Drug Discov. Today 7, 903–911.

    Article  PubMed  Google Scholar 

  25. Vogt, M. and Bajorath, J. (2007) Introduction of a generally applicable method to estimate retrieval of active molecules for similarity searching using fingerprints. ChemMedChem 2, 1311–1320.

    Article  PubMed  CAS  Google Scholar 

  26. Vogt, M., Godden, J. W., and Bajorath J. (2007) Bayesian interpretation of a distance function for navigating high-dimensional descriptor spaces. J. Chem. Inf. Model. 47, 39–46.

    Article  PubMed  CAS  Google Scholar 

  27. Vogt, M. and Bajorath, J. (2007) Introduction of an information-theoretic method to predict recovery rates of active compounds for Bayesian in silico screening. J. Chem. Inf. Model. 47, 337–341.

    Article  PubMed  CAS  Google Scholar 

  28. Berthold, M. and Hand, D. J. (2007) Intelligent Data Analysis: An Introduction. Springer: Berlin, Heidelberg, Germany, pp 245–246.

    Google Scholar 

  29. Kullback, S. (1997) Information Theory and Statistics. Dover Publications: Mineola, MN, pp. 1–11.

    Google Scholar 

  30. Cover, T. M., Thomas, J. A. (1991) Elements of Information Theory. Wiley-Interscience: New York, NY, pp. 224–238.

    Book  Google Scholar 

  31. Molecular Operating Environment (MOE), Vers. 2005.06, Chemical Computing Group Inc., 1255 University Street, Montreal, Quebec, Canada, H3B 3X3, http://www.chemcomp.com (accessed Sep 1, 2009).

  32. McGregor, M. and Pallai, P. (1997) Clustering of large databases of compounds: using the MDL “keys” as structural descriptors. J. Chem. Inf. Model. 37, 443–448.

    Article  CAS  Google Scholar 

  33. Irwin, J. J. and Shoichet, B. K. (2005) ZINC – A free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45, 177–182.

    Article  PubMed  CAS  Google Scholar 

  34. Vogt, M. and Bajorath, J. (2008) Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and fingerprints. Chem. Biol. Drug Design 71, 8–14.

    Article  CAS  Google Scholar 

  35. Vogt, M., Nisius, B., and Bajorath, J. (2009) Predicting the similarity search performance of fingerprints and their combination with molecular property descriptors using probabilistic and information-theoretic modeling. Stat. Anal. Data Mining 2, 123–134.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Vogt, M., Bajorath, J. (2010). Predicting the Performance of Fingerprint Similarity Searching. In: Bajorath, J. (eds) Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology, vol 672. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-839-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-839-3_6

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60761-838-6

  • Online ISBN: 978-1-60761-839-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics