Comparison of Methods Based on Diversity and Similarity for Molecule Selection and the Analysis of Drug Discovery Data

Lam, Raymond L.H.; Welch, William J.

doi:10.1385/1-59259-802-1:301

Raymond L.H. Lam³ &
William J. Welch^4,5

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 275))

1218 Accesses
3 Citations

Abstract

The concepts of diversity and similarity of molecules are widely used in quantitative methods for designing (selecting) a representative set of molecules and for analyzing the relationship between chemical structure and biological activity. We review methods and algorithms for design of a diverse set of molecules in the chemical space using clustering, cell-based partitioning, or other distance-based approaches. Analogous cell-based and clustering methods are described for analyzing drug-discovery data to predict activity in virtual screening. Some performance comparisons are made. The choice of descriptor variables to characterize chemical structure is also included in the comparative study. We find that the diversity of a selected set is quite sensitive to both the statistical selection method and the choice of molecular descriptors and that, for the dataset used in this study, random selection works surprisingly well in providing a set of data for analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abt, M., Lim, Y.-B., Sacks, J., Xie, M., and Young, S. S. (2001) A sequential approach for identifying lead compounds in large chemical databases. Stat. Sci. 16, 154–168.
Article Google Scholar
Engels, M. F. M. and Venkatarangan, P. (2001) Smart screening: approaches to efficient HTS. Curr. Opin. Drug Disc. Dev. 4, 275–283.
CAS Google Scholar
Jones-Hertzog, D. K., Mukhopadhyay, P., Keefer, C. E., and Young, S. S. (1999) Use of recursive partitioning in the sequential screening of G-protein-coupled receptors. J. Pharmacol. Toxicol. 42, 207–215.
Article CAS Google Scholar
van Rhee, A. M., Stocker, J., Printzenhoff, D., Creech, C., Wagoner, P. K., and Spear, K. L. (2001) Retrospective analysis of an experimental high-throughput screening data set by recursive partitioning. J. Comb. Chem. 3, 267–277.
Article PubMed Google Scholar
Warmuth, M. K., Liao, J., Rätsch, G., Mathieson, M., Putta, S., and Lemmen, C. (2003) Active learning with support vector machines in the drug discovery process. J. Chem. Inf. Comput. Sci. 43, 667–673.
PubMed CAS Google Scholar
Todeschini, R. and Consonni, V. (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, Germany.
Book Google Scholar
Leach, A. R. and Gillet, V. J. (2003) An introduction to chemoinformatics. Kluwer Academic Publishers, London, UK.
Google Scholar
Brown, R. D. and Martin, Y. C. (1996) Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584.
CAS Google Scholar
Feng, J., Lurati, L., Ouyang, H., et al. (2003) Predictive toxicology: benchmarking molecular descriptors and statistical methods. J. Chem. Inf. Comput. Sci. 43, 1463–1470.
PubMed CAS Google Scholar
Burden, F. R. (1989) Molecular identification number for substructure searches. J. Chem. Inf. Comput. Sci. 29, 225–227.
CAS Google Scholar
Pearlman, R. S. and Smith, K. M. (1998) Novel software tools for chemical diversity. Persp. Drug Disc. Des. 09/10/11, 339–353.
Article CAS Google Scholar
Hastie, T., Tibshirani, R., and Friedman, J. (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York, NY.
Google Scholar
Zemroch, P. J. (1986) Cluster analysis as an experimental design generator, with application to gasoline blending experiments. Technometrics 28, 39–49.
Article Google Scholar
Hansch, C., Unger, S. H., and Forsythe, A. B. (1973) Strategy in drug design. Cluster analysis as an aid in the selection of substituents. J. Med. Chem. 16, 1217–1222.
Article PubMed CAS Google Scholar
Hodes, L. (1989) Clustering a large number of compounds. 1. Establishing the method on an initial sample. J. Chem. Inf. Comput. Sci. 29, 66–71.
PubMed CAS Google Scholar
Cummins D. J., Andrews C. W., Bentley J. A., and Cory, M. (1996) Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds. J. Chem. Inf. Comput. Sci. 36, 750–763.
PubMed CAS Google Scholar
Menard, P. R., Mason, J. S., Morize, I., and Bauerschmidt, S. (1998) Chemistry space metrics in diversity analysis, library design, and compound selection. J. Chem. Inf. Comput. Sci. 38, 1204–1213.
CAS Google Scholar
McFarland, J. W. and Gans, D.J. (1986) On the significance of clusters in the graphical display of structure-activity data. J. Med. Chem. 29, 505–514.
Article PubMed CAS Google Scholar
Lam, R. L. H. (2001) Design and analysis of large chemical databases for drug discovery, Ph.D. Dissertation, University of Waterloo.
Google Scholar
Lam, R. L. H., Welch, W. J., and Young, S. S. (2002) Uniform coverage designs for molecule selection. Technometrics 44, 99–109.
Article Google Scholar
Pearlman, R. S. and Smith, K. M. (1999) Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci. 39, 28–35.
CAS Google Scholar
Kennard, R. W., and Stone, L. A. (1969) Computer aided design of experiments. Technometrics 11, 137–148.
Article Google Scholar
Johnson, M. E., Moore, L. M., and Ylvisaker, D. (1990) Minimax and maximin distance designs. J. Statist. Plan. Infer. 26, 131–148.
Article Google Scholar
Higgs, R. E., Bemis, K. G., Watson, I. A., and Wikel, J. H. (1997) Experimental designs for selecting molecules from large chemical databases. J. Chem. Inf. Comput. Sci. 37, 861–870.
CAS Google Scholar
Lam, R. L. H., Welch, W. J., and Young, S. S. (2002) Cell-based analysis of high throughput screening data for drug discovery. Research Report RR-02-02, Institute for Improvement in Quality and Productivity, University of Waterloo.
Google Scholar
Yi, B., Hughes-Oliver, J. M., Zhu, L., and Young, S. S. (2002) A factorial design to optimize cell-based drug discovery analysis. J. Chem. Inf. Comput. Sci. 42, 1221–1229.
PubMed CAS Google Scholar
Young, S. S., Farmen, M., and Rusinko, A. III (1996) Random versus rational: Which is better for general compound screening? Network Science online publication, available at URL: http://www.netsci.org/Science/Screening/feature09.html.

Download references

Author information

Authors and Affiliations

Department of Data Exploration Sciences, GlaxoSmithKline, King of Prussia, Pennsylvania, USA
Raymond L.H. Lam
Department of Statistics, University of British Columbia, Vancouver, British Columbia
William J. Welch
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada
William J. Welch

Authors

Raymond L.H. Lam
View author publications
You can also search for this author in PubMed Google Scholar
William J. Welch
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Albany Molecular Research Inc., Bothell Research Center, Bothell, WA
Jürgen Bajorath
University of Washington, Seattle, WA
Jürgen Bajorath

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Lam, R.L., Welch, W.J. (2004). Comparison of Methods Based on Diversity and Similarity for Molecule Selection and the Analysis of Drug Discovery Data. In: Bajorath, J. (eds) Chemoinformatics. Methods in Molecular Biology™, vol 275. Humana Press. https://doi.org/10.1385/1-59259-802-1:301

Download citation

DOI: https://doi.org/10.1385/1-59259-802-1:301
Publisher Name: Humana Press
Print ISBN: 978-1-58829-261-2
Online ISBN: 978-1-59259-802-1
eBook Packages: Springer Protocols

Publish with us

Policies and ethics