Abstract
The objective of this chapter is to summarize and evaluate some of the most common chemoinformatic methods that are applied to the analysis of high-throughput-screening data. The chapter will briefly describe current high-throughput-screening practices and will stress how the major constraint on the application of chemoinformatics is often the quality of high-throughput-screening data. Discussion of the NCI dataset and how it differs from most high-throughputscreening datasets will be made to highlight this point.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kubinyi, H. (1999) Chance favors the prepared mind—from serendipity to rational drug design. J. Recep. Signal Transduc. Research 19, 15–39.
Lepre, C. A. (2001) Library design for NMR-based screening. Drug Discovery Today 6, 133–140.
Nienaber, V. L., Richardson, P. L., Klighofer, V., Bouska, J. L., Giranda, V. L., and Greer, J. (2000) Discovering novel ligands for macromolecules using x-ray crystallographic screening. Nature 18, 1105–1108.
Su, A. I., Lorber, D. M., Weston, G. S., Baase, W. A., Mathews, B. W., and Shoichet, B. K. (2001) Docking molecules by families to increase the diversity of hits in database screens: computational strategy and experimental evaluation. Proteins: Struc. Func. Genet. 42, 279–293.
Joseph-McCarthy, D. (1999) Computational approaches to structure-based ligand design. Pharmacology Therapeutics 84, 179–191.
Frye, S. V. (1999) Structure-activity relationship homology (SARAH): a conceptual framework for drug discovery in the genomic era. Chem. Biol. 6, R3–R7.
Butte, A. J., Tamayo, P., Slonim, D., Golub, T. R., and Kohane, I. S. (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl. Acad. Sci. USA 97, 12,182–12,186.
Sills, M. A., Weiss, D., Pham, Q., Schweitzer, R., Wu, X., and Wu, J. J. (2002) Comparison of assay technologies for a tyrosine kinase assay generates different results in high throughput screening. J. Biomol. Screen. 7, 191–214.
Caron, P. R., Mullican, M. D., Mashal, R. D., Wilson, K. P., Su, M. S., and Murcko, M. A. (2001) Chemgenomic approaches to drug discovery. Current Opinion Chem. Biol. 5, 464–470.
Engels M. F., Wouters L., Verbeeck R., and Vanhoof G. (2002) Outlier mining in high throughput screening experiments. J. Biomol. Screen. 7, 341–351.
Weinstein, J. N. (1998) Fishing expeditions. Science 282, 628–629.
Spencer, R. W. (1997) Diversity analysis in high throughput screening. J. Biomol. Screen. 2, 69–70.
Hann, M., Hudson, B., Lewell, X., Lifely, R., Miller, L., and Ramsden, N. (1999) Strategic pooling of compounds for high-throughput screening. J. Chem. Inf. Comput. Sci. 39, 897–902.
Balkenhohl, F., van dem Bussche-Hunnefeld, C., Lansky, A., and Zechel, C. (1996) Combinatorial synthesis of small organic molecules. Angew. Chem. Int. Ed. 35, 2288–2337.
Villar, H. O. and Koehler, R. T. (2000) Comments on the design of chemical libraries for screening. Molecular Diversity 5, 13–24.
Lajiness, M. S. (1997) Dissimilarity-based compound selection techniques. Perspect. Drug Disc, Des. 7/8, 65–84.
Ferguson, A. M., Patterson, D. E., Garr, C. D., and Underiner, T. L. (1996) Designing chemical libraries for lead discovery. J. Biomol. Screen. 1, 65–73.
Doman, T. N., Cibulskis, J. M., Cibulskis, M. J., McCray, P. D., and Spangler, D. P. (1996) Algorithm5: A technique for fuzzy similarity clustering of chemical inventories. J. Chem. Inf. Comput. Sci. 36, 1195–1204.
Schnur, D. (1999) Design and diversity analysis of large combinatorial libraries using cell-based methods. J. Chem. Inf. Comput. Sci. 39, 36–45.
Nilakantan, R., Immermann, F., and Haraki, K. (2002) A novel approach to combinatorial library design. Combi. Chem. High Through. Screen. 5, 105–110.
Xu., Y.-J. and Johnson, M. (2001) Algorithm for naming molecular equivalence classes represented by labeled pseudographs. J. Chem. Inf. Comput. Sci. 41, 181–185.
Voigt, J. H., Bienfait, B., Wang, S., and Nicklaus, M. C. (2001) Comparison of the NCI open database with seven large chemical structural databases. J. Chem. Inf. Comput. Sci. 41, 702–712.
Teig, S. L. (1998) Informative libraries are more useful than diverse ones. J. Biomol. Screen. 3, 85–88.
Stanton, D. T., Morris, T. W., Roychoudhury, S., and Parker, C. N. (1999) Application of nearest neighbor and cluster analysis in pharmaceutical lead discovery. J. Chem. Inf. Comput. Sci. 39, 21–27.
Schneider, G., Neidhart, W., Giller, T., and Schmid, G. (1999) “Scaffold-hopping” by topological pharmacophore search: a contribution to virtual screening. Angew. Chem. Int. Ed. 38, 2894–2896.
Martin, Y. C., Kofron, J. L., and Traphagen, L. M. (2002) Do structurally similar molecules have similar biological activity? J. Med. Chem. 45, 4350–4358.
Brown, R. D. and Martin, Y. C. (1996) Use of structure-activity to compare structurebased clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584.
Brown, R. D. and Martin, Y. C. (1997) The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding. J. Chem. Inf. Comput. Sci. 37, 1–9.
Sheridan, R. P. and Kearsley, S. K. (2002) Why do we need so many chemical similarity search methods? Drug Disc. Today 7, 903–911.
Holliday, J. D., Hu, C.-Y., and Willett, P. (2002) Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Combi. Chem. High Through. Screen. 5, 155–166.
Morgan, J. N. and Sonquist, J. A. (1963) Problems in analysis of survey data and a proposal. J. Am. Statist. Assoc. 58, 415–434.
Rusinko III, A., Farmen, M., Lambert, C. G., Brown, P. B., and Young, S. S. (1999) Analysis of a large structure/biology activity data set using recursive partitioning. J. Chem. Inf. Comput. Sci. 39, 1017–1026.
Chen, X., Rusinko III, A., Tropsha, A., and Young, S. S. (1999) Automated pharmacophore identification for large chemical data sets. J. Chem. Inf. Comput. Sci. 39, 887–896.
Jones-Hertzog, D. K., Mukhopadhyay, P., Keefer, C. E., and Young, S. S. (1999) Use of recursive partitioning in the sequential screening of G-protein-couples receptors. J. Pharmacol. Toxicol. 42, 207–215.
Blower, P., Fligner, M., Verducci, J., and Bjoraker, J. (2001) On combining recursive partitioning and simulated annealing to detect groups of biologically active compounds. J. Chem Inf. Comput. Sci. 42, 393–404.
Welch, W. J., Lam, R. L. H., and Young, S. S. (2002) Cell-based analysis of high throughput screening data for drug discovery, WTO 02/12568 A2.
Abt, M., Lim, Y. B., Sacks, J., Xie, M., and Young, S. S. (2001) A sequential approach for identifying lead compounds in large chemical databases. Stat. Science 16, 154–168.
van Rhee, A. M., Stocker J., Printzenhoff, D., Creech, C., Wagoner, P. K., and Spear, K. L. (2001) Retrospective analysis of an experimental high-throughput screening data set by recursive partitioning. J. Comb. Chem. 3, 267–277.
Godden J. W., Furr J. R., and Bajorath, J. (2003) Recursive median partitioning for virtual screening of large databases. J. Chem. Inf. Comput. Sci. 43, 182–188.
Tong, W., Hong, H., Fang, H., Xie, Q., and Perkins, R. (2003) Decision forest: combining the predictions of multiple independent decision tree models. J. Chem. Inf. Comput. Sci. 43, 525–531.
Miller, D. W. (2001) Results of a new classification algorithm combining K nearest neighbors and recursive partitioning. J. Chem Inf. Comput. Sci. 41, 168–175.
Roberts, G., Myatt, G. J., Johnson, W. P., Cross, K. P., and Blower, P. E. (2000) LeadScope: software for exploring large sets of screening data. J. Chem Inf. Comput. Sci. 40, 1302–1314.
Xu, Y. J. and Johnson, M. (2002) Using molecular equivalence numbers to visually explore structural features that distinguish chemical libraries. J. Chem. Inf. Comput. Sci. 42, 912–926.
Nilakantan, R., Bauman, N., Haraki, K. S., and Venkataraghavan, R. (1990) A ring-based chemical structural query system: use of a novel ring-complexity heuristic. J. Chem. Inf. Comput. Sci. 30, 65–68.
Labute, P. (1999) Binary QSAR: a new method for the determination of quantitative structure activity relationships. Pac. Symp. Biocomput., pp. 444–455.
Klopman, G. (1984) Artificial intelligence approach to structure-activity studies. Computer automated structure evaluation of biological activity of organic molecules. J. Am. Chem. Soc. 106, 7315–7318.
Klopman, G. (1998) The MultiCASE program II. Baseline activity identification algorithm (BAIA). J. Chem. Inf. Comput. Sci. 38, 78–81.
ter Harr, E., Rosenkranz, H. S., Hamel, E., and Day, B. W. (1996) Computational and molecular modeling evaluation of the structural basis for tubulin polymerization inhibition by colchicine site agents. Bioorg. Med. Chem. 4, 1659–1671.
Gao, H. (2001) Application of BCUT metrics and genetic algorithm in binary QSAR analysis. J. Chem. Inf. Comput. Sci. 41, 402–407.
Gao, H., Lajiness, M. S., and Van Drie, J. (2002) Enhancement of binary QSAR analysis by a GA-based variable selection method. J. Mol. Graph. Model. 20, 259–268.
Harper, G., Bradshaw, J., Gittins, J. C., Green, D. V. S., and Leach, A. R. (2001) Prediction of biological activity for high-throughput screening using binary kernel discrimination. J. Chem. Inf. Comput. Sci. 41, 1295–1300.
Gao, H. and Bajorath, J. (1990) Comparision of binary and 2D QSAR analysis using inhibitors of human carbonic anhydrase II as a test case. Mol. Div. 4, 115–130.
Maxwell, A. (1997) DNA gyrase as a drug target. Trends Microbiol. 5, 102–109.
Wermuth, C. G. (2001) The SOSA approach, an alternative to high-throughput screening. Med. Chem. Res. 10, 431–439.
Hopfinger, A. J. and Duca, J. S. (2000) Extraction of pharmacophore information from high-throughput screens. Curr. Opin. Biotech. 11, 97–103.
Hecker, E. A., Duraiswami C., Andrea T. A., and Diller D. J. (2002) Use of catalyst pharmacophore models for screening of large combinatorial libraries. J. Chem. Inf. Comput. Sci. 42, 1204–1211.
Tamura, S. Y., Bacha, P. A., Gruver, H. S., and Nutt, R. F. (2002) Data analysis of high-throughput screening results: application of multidomain clustering to the NCI anti-HIV data set. J. Med. Chem. 45, 3082–3093.
Bacha, P. A., Gruver, H. S., Den Hartog, B. K., Tamura, S. Y., and Nutt, R. F. (2002) Rule extraction from a mutagenicity data set using adaptively grown phylogenetic-like trees. J. Chem. Inf. Comput. Sci. 42, 1104–1111.
Andersson, P. M., Linusson, A., Wold, S., Sjostrom, M., Lundstedt, T., and Norden, B. (1999) Design of small molecule libraries for lead exploration. In Molecular diversity in drug design, Dean, P. M. and Lewis, R. A. (eds.), Kluwer Academic Publishers, pp. 197–220.
Brown, P. J., Smith-Oliver, T. A., Charifson, P. S., et al. (1997) Identification of peroxisome proliferator-activated receptor ligands from a biased chemical library. Chem. Biol. 4, 909–918.
Schreyer, S. K., Parker, C. N., and Maggiora, G. M. (2004) Data Shaving—A Novel Strategy for Analysis of High Throughput Screening Data, J. Chem. Inf. Comput. Sci., in press.
Xue, L., Stahura, F. L., Godden, J. W., and Bajorath, J. (2001) Fingerprint scaling increases the probability of identifying molecules with similar activity in virtual screening calculations. J. Chem. Inf. Comput. Sci. 41, 746–753.
Brown, N., Willett, P., and Wilton, D. J. (2003) Generation and display of activityweighted chemical hyperstructures. J. Chem. Inf. Comput. Sci. 43, 288–297.
Bissantz, C., Folkers, G., and Rognan, D. (2000) Protein-based virtual screening of chemical databases. 1. Evaluation of different docking/scoring combinations. J. Med. Chem. 43, 4759–4767.
Andersson, P. M., Sjostrom, M., Wold, S., and Lundstedt, T. (2001) Strategies for subset selection of parts of an in-house chemical library. J. Chemometrics 15, 353–369.
Agrafiotis, D. K. and Rassokhin, D. N. (2001) Design and prioritization of plates for high-throughput screening. J. Chem. Inf. Comput. Sci. 41, 798–805.
Zhang J.-H., Chung, T. D., and Oldenburg K. R. (1999) A simple statistical parameter for use in evaluation and validation of high throughput screening assays. J. Biomol. Screen. 4, 67–73.
Zhang, J.-H., Chung, T. D. Y., and Oldenburg, K. R. (2000) Confirmation of primary active substances from high throughput screening of chemical and biological populations: a statistical approach and practical considerations. J. Comb. Chem. 2, 258–265.
Yurek, D. A., Branch, D. L., and Kuo, M. S. (2002) Development of a system to evaluate compound identity, purity, and concentration in a single experiment and its application in quality assessment of combinatorial libraries and screening hits. J. Comb. Chem. 4, 138–148.
Humphrey, P. (2002) Studies on compound stability in DMSO, presented at the Sample Management Special Interest Group Meeting, September 24, 8th Annual General Meeting of The Society for Biomolecular Screening, The Hague, The Netherlands.
Lipinski, C. A. (2000) Drug-like properties and the causes of poor solubility and poor permeability. J. Pharmacol. Toxicol. Methods 44, 235–249.
Veber, D. F., Johnson, S. R., Cheng, H.-Y., Smith, B. R., Ward, K. W., and Kopple, K. D. (2002) Molecular properties that influence the oral bioavailability of drug candidates. J. Med. Chem. 45, 2615–2623.
Hagadone, T. R. (1992) Molecular substructure searching: efficient retrieval in two-dimensional structure databases. J. Chem. Inf. Comput. Sci. 32, 515–521.
Lajiness, M. S. (2000) Using Enterprise Miner to explore and exploit drug discovery data. Published in the proceedings of the SAS Users Group International-SUGI 25 paper, 266–255.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Humana Press Inc.
About this protocol
Cite this protocol
Parker, C.N., Schreyer, S.K. (2004). Application of Chemoinformatics to High-Throughput Screening. In: Bajorath, J. (eds) Chemoinformatics. Methods in Molecular Biology™, vol 275. Humana Press. https://doi.org/10.1385/1-59259-802-1:085
Download citation
DOI: https://doi.org/10.1385/1-59259-802-1:085
Publisher Name: Humana Press
Print ISBN: 978-1-58829-261-2
Online ISBN: 978-1-59259-802-1
eBook Packages: Springer Protocols