Abstract
Here we describe a statistically based partitioning method called median partitioning (MP), which involves the transformation of value distributions of molecular property descriptors into a binary classification scheme. The MP approach fundamentally differs from other partitioning approaches that involve dimension reduction of chemical spaces such as cell-based partitioning, since MP directly operates in original, albeit simplified, chemical space. Modified versions of the MP algorithm have been implemented and successfully applied in diversity selection, compound classification, and virtual screening. These findings have demonstrated that dimension reduction techniques, although elegant in their design, are not necessarily required for effective partitioning of molecular datasets. An attractive feature of statistical partitioning approaches such as decision tree methods or MP is their computational efficiency, which is becoming an important criterion for the analysis of compound databases containing millions of molecules.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Pearlman, R. S. and Smith, K. M. (1998) Novel software tools for chemical diversity. Perspect. Drug Discov. Design 9, 339–353.
Mason, J. S. and Pickett, S. D. (1997) Partition-based selection. Perspect. Drug Discov. Design 7/8, 85–114.
Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nature Drug Discov. Rev. 1, 337–346.
Stahura, F. L. and Bajorath, J. (2003) Partitioning methods for the identification of active molecules. Curr. Med. Chem. 10, 707–715.
Friedman, J. A. (1977) Recursive partitioning decision rules for non-arametric classification. IEEE Trans. Comput. 26, 404–408.
Chen, X., Rusinko, A. III, and Young, S. S. (1998) Recursive partitioning analysis of a large structure-activity data set using three-dimensional descriptors. J. Chem. Inf. Comput. Sci. 38, 1054–1062.
Rusinko, A. III, Farmen, M. W., Lambert, C. G., Brown, P. L., and Young, S. S. (1999) Analysis of a large structure/biological activity data set using recursive partitioning. J. Chem. Inf. Comput. Sci. 39, 1017–1026.
Agrafiotis, D. K., Lobanov, V. S., and Salemme, R. F. (2002) Combinatorial informatics in the post-genomics era. Nature Drug Discov. Rev. 1, 337–346.
Ward, J. H. (1963) Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244.
Snarey, M., Terrett, N. K., Willett, P., and Wilton, D. J. (1997) Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph. Model. 15, 372–285.
Higgs, R. E., Bemis, K. G., Watson, I. A., and Wikel, J. H. (1997) Experimental designs for selecting molecules from large chemical databases. J. Chem. Inf. Comput. Sci. 37, 861–870.
Willett, P. (1999) Dissimilarity-based algorithms for selecting structurally diverse sets of compounds. J. Comput. Biol. 6, 447–457.
Godden J. W., Xue, L., Kitchen, D. B., Stahura, F. L., Schermerhorn, E. J., and Bajorath, J. (2002) Median partitioning: A novel method for the selection of representative subsets from large compound pools. J. Chem. Inf. Comput. Sci. 42, 885–893.
Godden, J. W., Xue, L., and Bajorath, J. (2002) Classification of biologically active compounds by median partitioning. J. Chem. Inf. Comput. Sci. 42, 1263–1269.
Godden, J. W., Furr, J. R., and Bajorath, J. (2003) Recursive median partitioning for virtual screening of large databases. J. Chem. Inf. Comput. Sci. 43, 182–188.
Livingstone, D. J. (2000) The characterization of chemical structures using molecular properties. A survey. J. Chem. Inf. Comput. Sci. 40, 195–209.
Xue, L. and Bajorath, J. (2000) Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. Combin. Chem. High Throughput Screen. 3, 363–372.
Meier, P. C. and Zünd, R. E. (2000) Statistical methods in analytical chemistry. Wiley, New York, NY.
Godden, J. W. and Bajorath, J. (2002) Chemical descriptors with distinct levels of information content and varying sensitivity to differences between selected compound databases identified by SE-DSE analysis. J. Chem. Inf. Comput. Sci. 42, 87–93.
Shannon, C. E. and Weaver, W. (1963) The mathematical theory of communication. University of Illinois Press, Urbana, IL.
Forrest, S. (1993) Genetic algorithms-principles of natural selection applied to computation. Science 261, 872–878.
Agrafiotis, D. K. (2001) A constant time algorithm for estimating the diversity of large chemical libraries. J. Chem. Inf. Comput. Sci. 41, 159–167.
Xue, L. and Bajorath, J. (2002) Accurate partitioning of compounds belonging to diverse activity classes. J. Chem. Inf. Comput. Sci. 42, 757–764.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Humana Press Inc.
About this protocol
Cite this protocol
Godden, J.W., Bajorath, J. (2004). Partitioning in Binary-Transformed Chemical Descriptor Spaces. In: Bajorath, J. (eds) Chemoinformatics. Methods in Molecular Biology™, vol 275. Humana Press. https://doi.org/10.1385/1-59259-802-1:291
Download citation
DOI: https://doi.org/10.1385/1-59259-802-1:291
Publisher Name: Humana Press
Print ISBN: 978-1-58829-261-2
Online ISBN: 978-1-59259-802-1
eBook Packages: Springer Protocols