Predicting the Performance of Fingerprint Similarity Searching

Vogt, Martin; Bajorath, Jürgen

doi:10.1007/978-1-60761-839-3_6

Martin Vogt² &
Jürgen Bajorath³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 672))

3197 Accesses
4 Citations

Abstract

Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the “background.” By quantifying the difference in feature distribution using the Kullback–Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Willett, P., Barnard, J. M., and Downs, G. M. (1998) Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996.
Article CAS Google Scholar
Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nature Rev. Drug Discov. 1, 882–894.
Article CAS Google Scholar
Willett, P. (2005) Searching techniques for databases of two- and three-dimensional chemical structures. J. Med. Chem. 48, 4183–4199.
Article PubMed CAS Google Scholar
Willett, P. (2006) Similarity-based virtual screening using 2D fingerprints. Drug Discov. Today 11, 1046–1053.
Article PubMed CAS Google Scholar
Barnard, J. M. and Downs, G. M. (1997) Chemical fragment generation and clustering software. J. Chem. Inf. Comput. Sci. 37, 141–142.
Article CAS Google Scholar
Durant, J. L., Leland, B. A., Henry, D. R., and Nourse, J. G. (2002) Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280.
Article PubMed CAS Google Scholar
MACCS Structural Keys. Symyx Technologies, Inc., Sunnyvale, CA, http://www.symyx.com (accessed Sep 1, 2009).
James, C. A, Weininger, D. Daylight Theory Manual, Vers. 4.9, Daylight Chemical Information Systems Inc., Aliso Viejo, CA, http://www.daylight.com/dayhtml/doc/theory (accessed Sep 1, 2009).
Xue, L., Godden, J. W., Stahura, F. L., and Bajorath, J. (2003) Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme. J. Chem. Inf. Comput. Sci. 43, 1151–1157.
Article PubMed CAS Google Scholar
Bender, A, Mussa, Y, Glen, R. C., and Reiling, S. (2004) Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J. Chem. Inf. Comput. Sci. 44, 1708–1718.
Article PubMed CAS Google Scholar
Eckert, H. and Bajorath, J. (2006) Design and evaluation of a novel class-directed 2D fingerprint to search for structurally diverse active compounds. J. Chem. Inf. Model. 46, 2515–2526.
Article PubMed CAS Google Scholar
Mason, J. S., Morize, I., Menard, P. R., Cheney, D. L., Hulme, C., and Labaudiniere, R. F. (1999) New 4-point pharmacophore method for molecular similarity and diversity applications: overview over the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. J. Med. Chem. 42, 3251–3264.
Article PubMed CAS Google Scholar
Bradley, E. K., Beroza, P., Penzotti, J. E., Grootenhuis, P. D. J., Spellmeyer, D. C., and Miller, J. L. (2000) A rapid computational method for lead evolution: description and application to α₁-adrenergic antagonists. J. Med. Chem. 43, 2770–2774.
Article PubMed CAS Google Scholar
Maggiora, G. M., and Johnson, M. A. (1990) Concepts and Applications of Molecular Similarity. Wiley: New York, NY, pp 99–117.
Google Scholar
Hert, J., Willet, P., and Wilton, D. J. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci. 44, 1177–1185.
Article PubMed CAS Google Scholar
Schuffenhauer, A., Floersheim, P., Acklin, P., and Jacoby, E. (2003) Similarity metrics for ligands reflecting the similarity of the target protein. J. Chem. Inf. Comput. Sci. 43, 391–405.
Article PubMed CAS Google Scholar
Whittle, E., Gillet, V. J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: theoretical model. J. Chem. Inf. Model. 46, 2193–2205.
Article PubMed CAS Google Scholar
Whittle, E., Gillet, V. J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: similarity searching and group fusion. J. Chem. Inf. Model. 46, 2206–2219.
Article PubMed CAS Google Scholar
Hert, J., Willett, P, and Wilton, D. J. (2006) New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J. Chem. Inf. Model. 46, 462–470.
Article PubMed CAS Google Scholar
Lewis, D. D. (1998) Naïve (Bayes) at forty: the independence assumption in information retrieval. In Lecture notes in computer science: Machine learning ECML-98, Springer: Berlin, 4–15.
Google Scholar
Zhang, H. (2004) The optimality of naïve Bayes. In Proceedings of the seventeenth Florida artificial intelligence research society conference. The AAAI Press: Menlo Park, CA, 562–567.
Google Scholar
Ormerod, A., Willett, P., Bawden, D. (1989) Comparison of fragment weighting schemes for substructural analysis. Quant. Struct.-Act. Relat. 8, 115–129.
Article CAS Google Scholar
Eckert, H. and Bajorath, J. (2007) Molecular similarity analysis in virtual screening: foundations, limitations, and novel approaches. Drug Discov. Today 12, 225–233.
Article PubMed CAS Google Scholar
Sheridan, R. P. and Kearsley, S. K. (2002) Why do we need so many chemical similarity search methods? Drug Discov. Today 7, 903–911.
Article PubMed Google Scholar
Vogt, M. and Bajorath, J. (2007) Introduction of a generally applicable method to estimate retrieval of active molecules for similarity searching using fingerprints. ChemMedChem 2, 1311–1320.
Article PubMed CAS Google Scholar
Vogt, M., Godden, J. W., and Bajorath J. (2007) Bayesian interpretation of a distance function for navigating high-dimensional descriptor spaces. J. Chem. Inf. Model. 47, 39–46.
Article PubMed CAS Google Scholar
Vogt, M. and Bajorath, J. (2007) Introduction of an information-theoretic method to predict recovery rates of active compounds for Bayesian in silico screening. J. Chem. Inf. Model. 47, 337–341.
Article PubMed CAS Google Scholar
Berthold, M. and Hand, D. J. (2007) Intelligent Data Analysis: An Introduction. Springer: Berlin, Heidelberg, Germany, pp 245–246.
Google Scholar
Kullback, S. (1997) Information Theory and Statistics. Dover Publications: Mineola, MN, pp. 1–11.
Google Scholar
Cover, T. M., Thomas, J. A. (1991) Elements of Information Theory. Wiley-Interscience: New York, NY, pp. 224–238.
Book Google Scholar
Molecular Operating Environment (MOE), Vers. 2005.06, Chemical Computing Group Inc., 1255 University Street, Montreal, Quebec, Canada, H3B 3X3, http://www.chemcomp.com (accessed Sep 1, 2009).
McGregor, M. and Pallai, P. (1997) Clustering of large databases of compounds: using the MDL “keys” as structural descriptors. J. Chem. Inf. Model. 37, 443–448.
Article CAS Google Scholar
Irwin, J. J. and Shoichet, B. K. (2005) ZINC – A free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45, 177–182.
Article PubMed CAS Google Scholar
Vogt, M. and Bajorath, J. (2008) Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and fingerprints. Chem. Biol. Drug Design 71, 8–14.
Article CAS Google Scholar
Vogt, M., Nisius, B., and Bajorath, J. (2009) Predicting the similarity search performance of fingerprints and their combination with molecular property descriptors using probabilistic and information-theoretic modeling. Stat. Anal. Data Mining 2, 123–134.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Life Science Informatics, B-IT, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
Martin Vogt
Department of Life Science Informatics, B-IT, LIMES, Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany
Jürgen Bajorath

Authors

Martin Vogt
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Bajorath
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

, Department of Life Science Informatics, Rheinische Friedrich-Wilhelms-Universitä, Dahlmannstr. 2, Bonn, 53113, Germany
Jürgen Bajorath

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Vogt, M., Bajorath, J. (2010). Predicting the Performance of Fingerprint Similarity Searching. In: Bajorath, J. (eds) Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology, vol 672. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-839-3_6

Download citation

DOI: https://doi.org/10.1007/978-1-60761-839-3_6
Published: 28 August 2010
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60761-838-6
Online ISBN: 978-1-60761-839-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics