Molecular Diversity

, Volume 10, Issue 3, pp 333–339 | Cite as

JEDA: Joint entropy diversity analysis. An information-theoretic method for choosing diverse and representative subsets from combinatorial libraries

  • Melissa R. Landon
  • Scott E. Schaus
Full–length Paper


The joint entropy-based diversity analysis (JEDA) program is a new method of selecting representative subsets of compounds from combinatorial libraries. Similar to other cell-based diversity analyses, a set of chemical descriptors is used to partition the chemical space of a library of compounds; however, unlike other metrics for choosing a compound from each partition, a Shannon-entropy based scoring function implemented in a probabilistic search algorithm determines a representative subset of compounds. This approach enables the selection of compounds that are not only diverse but that also represent the densities of chemical space occupied by the original chemical library. Additionally, JEDA permits the user to define the size of the subset that the chemist wishes to create so that restrictions on time and chemical reagents can be considered. Subsets created from a chemical library by JEDA are compared to subsets obtained using other partition-based diversity analyses, namely principal components analysis and median partitioning, on a combinatorial library derived from the Comprehensive Medical Chemistry Dataset.

Key words

chemical diversity Shannon entropy representative subset selection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kitchen, D.B., Stahura, F.L. and Bajorath, J., Computational techniques for diversity analysis and compound classification, Mini. Rev. Med. Chem., 4 (2004) 1029–1039.PubMedGoogle Scholar
  2. 2.
    Godden, J.W. Median Partitioning: A novel method for the selection of representative subsets from large compound pools J. Chem. Inf. Comput. Sci., 42 (2002) 885–893.PubMedCrossRefGoogle Scholar
  3. 3.
    Glen, W.G., Dunn, W.J. and Scott, D. R., Principal components analysis and partial least squares regression, Tetrahedron Comput. Methodol., 2 (1989) 349–376.CrossRefGoogle Scholar
  4. 4.
    Bayley, M.J. and Willett, P., Binning schemes for partition-based compound selection, J. Mol. Graph Model., 17 (1999) 10–18.PubMedCrossRefGoogle Scholar
  5. 5.
    Raymond, J.W., Blankley, C.J. and Willett, P., Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures J. Mol. Graph Model., 21 (2003) 421–433.PubMedCrossRefGoogle Scholar
  6. 6.
    MacCuish, J., Nicolaou, C. and MacCuish, N.E., Ties in proximity and clustering compounds J. Chem .Inf. Comput. Sci., 41 (2001) 134–146.PubMedCrossRefGoogle Scholar
  7. 7.
    Shannon, C., A Mathematical Theory of Communication Bell System Technical J., 27 (1948) 623–656.Google Scholar
  8. 8.
    Lin, S.K., Molecular diversity assessment: Logarithmic relations of information and species diversity and logarithmic relations of entropy and indistinguishability after rejection of Gibbs paradox of entropy mixing Molecules, 1 (1996) 57–67.CrossRefGoogle Scholar
  9. 9.
    Agrafiotis, D.K., On the use of information theory for assessing molecular diversity J. Chem. Inf. Comput. Sci., 37 (1997) 576–580.CrossRefGoogle Scholar
  10. 10.
    Godden, J.W. and Bajorath, J., Shannon entropy – a novel concept in molecular descriptor and diversity analysis J. Mol. Graph Model, 18 (2000) 73–76.PubMedGoogle Scholar
  11. 11.
    Godden, J.W., Stahura, F.L. and Bajorath, J., Variability of molecular descriptors in compound databases revealed by Shannon entropy calculations J. Chem. Inf. Comput. Sci., 40 (2000) 796–800.PubMedCrossRefGoogle Scholar
  12. 12.
    Godden, J.W. and Bajorath, J., Differential Shannon entropy as a sensitive measure of differences in database variability of molecular descriptors, J. Chem. Inf. Comput. Sci., 41 (2001) 1060–1066.PubMedCrossRefGoogle Scholar
  13. 13.
    Miller, J.L., Bradley, E.K. and Teig, S.L., Luddite: An information-theoretic library design tool J. Chem. Inf. Comput. Sci., 43 (2003) 47–54.PubMedCrossRefGoogle Scholar
  14. 14.
    Xue, L., Godden, J.W. and Bajorath, J., Database searching for compounds with similar biological activity using short binary bit string representations of molecules J. Chem. Inf. Comput. Sci., 39 (1999) 881–886.PubMedCrossRefGoogle Scholar
  15. 15.
    Xue, L., et al., Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme, J. Chem. Inf. Comput. Sci., 43 (2003) 1151–7.PubMedCrossRefGoogle Scholar
  16. 16.
    Comprehensive Medicinal Chemistry, MDL Information Systems, Inc.: San Leandro, CA, 2004.Google Scholar
  17. 17.
    ChemFinder Ultra, Cambridgesoft, Cambridge, MA, 2001.Google Scholar
  18. 18.
    Molecular Operating Environment (MOE), Chemical Computing Group, Montreal, Quebec, 2004.Google Scholar
  19. 19.
    Labute, P., A widely applicable set of descriptors, J. Mol. Graph Model, 18 (2000) 464–477.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science + Business Media, Inc. 2006

Authors and Affiliations

  1. 1.Graduate Program in Bioinformatics and Systems BiologyBostonU.S.A
  2. 2.Center for Chemical Methodology and Library DevelopmentBostonU.S.A
  3. 3.Department of ChemistryBoston UniversityBostonU.S.A

Personalised recommendations