Abstract
Fingerprints are bit string representations of molecular structure that typically encode structural fragments, topological features, or pharmacophore patterns. Various fingerprint designs are utilized in virtual screening and their search performance essentially depends on three parameters: the nature of the fingerprint, the active compounds serving as reference molecules, and the composition of the screening database. It is of considerable interest and practical relevance to predict the performance of fingerprint similarity searching. A quantitative assessment of the potential that a fingerprint search might successfully retrieve active compounds, if available in the screening database, would substantially help to select the type of fingerprint most suitable for a given search problem. The method presented herein utilizes concepts from information theory to relate the fingerprint feature distributions of reference compounds to screening libraries. If these feature distributions do not sufficiently differ, active database compounds that are similar to reference molecules cannot be retrieved because they disappear in the “background.” By quantifying the difference in feature distribution using the Kullback–Leibler divergence and relating the divergence to compound recovery rates obtained for different benchmark classes, fingerprint search performance can be quantitatively predicted.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Willett, P., Barnard, J. M., and Downs, G. M. (1998) Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996.
Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nature Rev. Drug Discov. 1, 882–894.
Willett, P. (2005) Searching techniques for databases of two- and three-dimensional chemical structures. J. Med. Chem. 48, 4183–4199.
Willett, P. (2006) Similarity-based virtual screening using 2D fingerprints. Drug Discov. Today 11, 1046–1053.
Barnard, J. M. and Downs, G. M. (1997) Chemical fragment generation and clustering software. J. Chem. Inf. Comput. Sci. 37, 141–142.
Durant, J. L., Leland, B. A., Henry, D. R., and Nourse, J. G. (2002) Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42, 1273–1280.
MACCS Structural Keys. Symyx Technologies, Inc., Sunnyvale, CA, http://www.symyx.com (accessed Sep 1, 2009).
James, C. A, Weininger, D. Daylight Theory Manual, Vers. 4.9, Daylight Chemical Information Systems Inc., Aliso Viejo, CA, http://www.daylight.com/dayhtml/doc/theory (accessed Sep 1, 2009).
Xue, L., Godden, J. W., Stahura, F. L., and Bajorath, J. (2003) Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme. J. Chem. Inf. Comput. Sci. 43, 1151–1157.
Bender, A, Mussa, Y, Glen, R. C., and Reiling, S. (2004) Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J. Chem. Inf. Comput. Sci. 44, 1708–1718.
Eckert, H. and Bajorath, J. (2006) Design and evaluation of a novel class-directed 2D fingerprint to search for structurally diverse active compounds. J. Chem. Inf. Model. 46, 2515–2526.
Mason, J. S., Morize, I., Menard, P. R., Cheney, D. L., Hulme, C., and Labaudiniere, R. F. (1999) New 4-point pharmacophore method for molecular similarity and diversity applications: overview over the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. J. Med. Chem. 42, 3251–3264.
Bradley, E. K., Beroza, P., Penzotti, J. E., Grootenhuis, P. D. J., Spellmeyer, D. C., and Miller, J. L. (2000) A rapid computational method for lead evolution: description and application to α1-adrenergic antagonists. J. Med. Chem. 43, 2770–2774.
Maggiora, G. M., and Johnson, M. A. (1990) Concepts and Applications of Molecular Similarity. Wiley: New York, NY, pp 99–117.
Hert, J., Willet, P., and Wilton, D. J. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci. 44, 1177–1185.
Schuffenhauer, A., Floersheim, P., Acklin, P., and Jacoby, E. (2003) Similarity metrics for ligands reflecting the similarity of the target protein. J. Chem. Inf. Comput. Sci. 43, 391–405.
Whittle, E., Gillet, V. J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: theoretical model. J. Chem. Inf. Model. 46, 2193–2205.
Whittle, E., Gillet, V. J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: similarity searching and group fusion. J. Chem. Inf. Model. 46, 2206–2219.
Hert, J., Willett, P, and Wilton, D. J. (2006) New methods for ligand-based virtual screening: use of data fusion and machine learning to enhance the effectiveness of similarity searching. J. Chem. Inf. Model. 46, 462–470.
Lewis, D. D. (1998) Naïve (Bayes) at forty: the independence assumption in information retrieval. In Lecture notes in computer science: Machine learning ECML-98, Springer: Berlin, 4–15.
Zhang, H. (2004) The optimality of naïve Bayes. In Proceedings of the seventeenth Florida artificial intelligence research society conference. The AAAI Press: Menlo Park, CA, 562–567.
Ormerod, A., Willett, P., Bawden, D. (1989) Comparison of fragment weighting schemes for substructural analysis. Quant. Struct.-Act. Relat. 8, 115–129.
Eckert, H. and Bajorath, J. (2007) Molecular similarity analysis in virtual screening: foundations, limitations, and novel approaches. Drug Discov. Today 12, 225–233.
Sheridan, R. P. and Kearsley, S. K. (2002) Why do we need so many chemical similarity search methods? Drug Discov. Today 7, 903–911.
Vogt, M. and Bajorath, J. (2007) Introduction of a generally applicable method to estimate retrieval of active molecules for similarity searching using fingerprints. ChemMedChem 2, 1311–1320.
Vogt, M., Godden, J. W., and Bajorath J. (2007) Bayesian interpretation of a distance function for navigating high-dimensional descriptor spaces. J. Chem. Inf. Model. 47, 39–46.
Vogt, M. and Bajorath, J. (2007) Introduction of an information-theoretic method to predict recovery rates of active compounds for Bayesian in silico screening. J. Chem. Inf. Model. 47, 337–341.
Berthold, M. and Hand, D. J. (2007) Intelligent Data Analysis: An Introduction. Springer: Berlin, Heidelberg, Germany, pp 245–246.
Kullback, S. (1997) Information Theory and Statistics. Dover Publications: Mineola, MN, pp. 1–11.
Cover, T. M., Thomas, J. A. (1991) Elements of Information Theory. Wiley-Interscience: New York, NY, pp. 224–238.
Molecular Operating Environment (MOE), Vers. 2005.06, Chemical Computing Group Inc., 1255 University Street, Montreal, Quebec, Canada, H3B 3X3, http://www.chemcomp.com (accessed Sep 1, 2009).
McGregor, M. and Pallai, P. (1997) Clustering of large databases of compounds: using the MDL “keys” as structural descriptors. J. Chem. Inf. Model. 37, 443–448.
Irwin, J. J. and Shoichet, B. K. (2005) ZINC – A free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 45, 177–182.
Vogt, M. and Bajorath, J. (2008) Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and fingerprints. Chem. Biol. Drug Design 71, 8–14.
Vogt, M., Nisius, B., and Bajorath, J. (2009) Predicting the similarity search performance of fingerprints and their combination with molecular property descriptors using probabilistic and information-theoretic modeling. Stat. Anal. Data Mining 2, 123–134.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Vogt, M., Bajorath, J. (2010). Predicting the Performance of Fingerprint Similarity Searching. In: Bajorath, J. (eds) Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology, vol 672. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-839-3_6
Download citation
DOI: https://doi.org/10.1007/978-1-60761-839-3_6
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-60761-838-6
Online ISBN: 978-1-60761-839-3
eBook Packages: Springer Protocols