Skip to main content

Comparison of Methods Based on Diversity and Similarity for Molecule Selection and the Analysis of Drug Discovery Data

  • Protocol
Chemoinformatics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 275))

Abstract

The concepts of diversity and similarity of molecules are widely used in quantitative methods for designing (selecting) a representative set of molecules and for analyzing the relationship between chemical structure and biological activity. We review methods and algorithms for design of a diverse set of molecules in the chemical space using clustering, cell-based partitioning, or other distance-based approaches. Analogous cell-based and clustering methods are described for analyzing drug-discovery data to predict activity in virtual screening. Some performance comparisons are made. The choice of descriptor variables to characterize chemical structure is also included in the comparative study. We find that the diversity of a selected set is quite sensitive to both the statistical selection method and the choice of molecular descriptors and that, for the dataset used in this study, random selection works surprisingly well in providing a set of data for analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abt, M., Lim, Y.-B., Sacks, J., Xie, M., and Young, S. S. (2001) A sequential approach for identifying lead compounds in large chemical databases. Stat. Sci. 16, 154–168.

    Article  Google Scholar 

  2. Engels, M. F. M. and Venkatarangan, P. (2001) Smart screening: approaches to efficient HTS. Curr. Opin. Drug Disc. Dev. 4, 275–283.

    CAS  Google Scholar 

  3. Jones-Hertzog, D. K., Mukhopadhyay, P., Keefer, C. E., and Young, S. S. (1999) Use of recursive partitioning in the sequential screening of G-protein-coupled receptors. J. Pharmacol. Toxicol. 42, 207–215.

    Article  CAS  Google Scholar 

  4. van Rhee, A. M., Stocker, J., Printzenhoff, D., Creech, C., Wagoner, P. K., and Spear, K. L. (2001) Retrospective analysis of an experimental high-throughput screening data set by recursive partitioning. J. Comb. Chem. 3, 267–277.

    Article  PubMed  Google Scholar 

  5. Warmuth, M. K., Liao, J., Rätsch, G., Mathieson, M., Putta, S., and Lemmen, C. (2003) Active learning with support vector machines in the drug discovery process. J. Chem. Inf. Comput. Sci. 43, 667–673.

    PubMed  CAS  Google Scholar 

  6. Todeschini, R. and Consonni, V. (2000) Handbook of molecular descriptors. Wiley-VCH, Weinheim, Germany.

    Book  Google Scholar 

  7. Leach, A. R. and Gillet, V. J. (2003) An introduction to chemoinformatics. Kluwer Academic Publishers, London, UK.

    Google Scholar 

  8. Brown, R. D. and Martin, Y. C. (1996) Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. J. Chem. Inf. Comput. Sci. 36, 572–584.

    CAS  Google Scholar 

  9. Feng, J., Lurati, L., Ouyang, H., et al. (2003) Predictive toxicology: benchmarking molecular descriptors and statistical methods. J. Chem. Inf. Comput. Sci. 43, 1463–1470.

    PubMed  CAS  Google Scholar 

  10. Burden, F. R. (1989) Molecular identification number for substructure searches. J. Chem. Inf. Comput. Sci. 29, 225–227.

    CAS  Google Scholar 

  11. Pearlman, R. S. and Smith, K. M. (1998) Novel software tools for chemical diversity. Persp. Drug Disc. Des. 09/10/11, 339–353.

    Article  CAS  Google Scholar 

  12. Hastie, T., Tibshirani, R., and Friedman, J. (2001) The elements of statistical learning: data mining, inference, and prediction. Springer, New York, NY.

    Google Scholar 

  13. Zemroch, P. J. (1986) Cluster analysis as an experimental design generator, with application to gasoline blending experiments. Technometrics 28, 39–49.

    Article  Google Scholar 

  14. Hansch, C., Unger, S. H., and Forsythe, A. B. (1973) Strategy in drug design. Cluster analysis as an aid in the selection of substituents. J. Med. Chem. 16, 1217–1222.

    Article  PubMed  CAS  Google Scholar 

  15. Hodes, L. (1989) Clustering a large number of compounds. 1. Establishing the method on an initial sample. J. Chem. Inf. Comput. Sci. 29, 66–71.

    PubMed  CAS  Google Scholar 

  16. Cummins D. J., Andrews C. W., Bentley J. A., and Cory, M. (1996) Molecular diversity in chemical databases: Comparison of medicinal chemistry knowledge bases and databases of commercially available compounds. J. Chem. Inf. Comput. Sci. 36, 750–763.

    PubMed  CAS  Google Scholar 

  17. Menard, P. R., Mason, J. S., Morize, I., and Bauerschmidt, S. (1998) Chemistry space metrics in diversity analysis, library design, and compound selection. J. Chem. Inf. Comput. Sci. 38, 1204–1213.

    CAS  Google Scholar 

  18. McFarland, J. W. and Gans, D.J. (1986) On the significance of clusters in the graphical display of structure-activity data. J. Med. Chem. 29, 505–514.

    Article  PubMed  CAS  Google Scholar 

  19. Lam, R. L. H. (2001) Design and analysis of large chemical databases for drug discovery, Ph.D. Dissertation, University of Waterloo.

    Google Scholar 

  20. Lam, R. L. H., Welch, W. J., and Young, S. S. (2002) Uniform coverage designs for molecule selection. Technometrics 44, 99–109.

    Article  Google Scholar 

  21. Pearlman, R. S. and Smith, K. M. (1999) Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci. 39, 28–35.

    CAS  Google Scholar 

  22. Kennard, R. W., and Stone, L. A. (1969) Computer aided design of experiments. Technometrics 11, 137–148.

    Article  Google Scholar 

  23. Johnson, M. E., Moore, L. M., and Ylvisaker, D. (1990) Minimax and maximin distance designs. J. Statist. Plan. Infer. 26, 131–148.

    Article  Google Scholar 

  24. Higgs, R. E., Bemis, K. G., Watson, I. A., and Wikel, J. H. (1997) Experimental designs for selecting molecules from large chemical databases. J. Chem. Inf. Comput. Sci. 37, 861–870.

    CAS  Google Scholar 

  25. Lam, R. L. H., Welch, W. J., and Young, S. S. (2002) Cell-based analysis of high throughput screening data for drug discovery. Research Report RR-02-02, Institute for Improvement in Quality and Productivity, University of Waterloo.

    Google Scholar 

  26. Yi, B., Hughes-Oliver, J. M., Zhu, L., and Young, S. S. (2002) A factorial design to optimize cell-based drug discovery analysis. J. Chem. Inf. Comput. Sci. 42, 1221–1229.

    PubMed  CAS  Google Scholar 

  27. Young, S. S., Farmen, M., and Rusinko, A. III (1996) Random versus rational: Which is better for general compound screening? Network Science online publication, available at URL: http://www.netsci.org/Science/Screening/feature09.html.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Humana Press Inc.

About this protocol

Cite this protocol

Lam, R.L., Welch, W.J. (2004). Comparison of Methods Based on Diversity and Similarity for Molecule Selection and the Analysis of Drug Discovery Data. In: Bajorath, J. (eds) Chemoinformatics. Methods in Molecular Biology™, vol 275. Humana Press. https://doi.org/10.1385/1-59259-802-1:301

Download citation

  • DOI: https://doi.org/10.1385/1-59259-802-1:301

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-261-2

  • Online ISBN: 978-1-59259-802-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics