Advertisement

Ranking and Clustering of Chemical Structure Databases

  • Peter Willett

Abstract

This paper summarises an extended research programme to investigate the use of fragment-based measures of inter-molecular similarity in chemical information systems, with particular reference to structure-property correlation. Comparative studies are reported of structural similarity measures and of clustering methods for chemical structure databases. The methods are most appropriate when very sparse data matrices are available; in such cases, a very fast nearest neighbour searching algorithm can be used for the calculation of the requisite similarities.

Keywords

Weighting Scheme Similarity Coefficient Neighbour Search Property Prediction Tanimoto Coefficient 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Adamson, G.W. and Bawden, D. Comparison of hierarchical cluster analysis techniques for the automatic classification of chemical structures. Journal of Chemical Information and Computer Sciences 21, 204–209, 1981.Google Scholar
  2. [2]
    Adamson, G.W. and Bush, J.A. A method for the automatic classification of chemical structures. Information Storage and Retrieval 9, 561–568, 1973.CrossRefGoogle Scholar
  3. [3]
    Adamson, G.W. and Bush, J.A. A comparison of the performance of some similarity and dissimilarity measures in the automatic classification of chemical structures. Journal of Chemical Information and Computer Sciences 15, 55–58, 1975.Google Scholar
  4. [4]
    Ash, J.E., Chubb, P.A., Ward, S.E., Welford, S.M. and Willett, P. Communication, Storage and Retrieval of Chemical Information Ellis Horwood, Chichester, 1985.Google Scholar
  5. [5]
    Bawden, D. Computerized chemical structure-handling techniques in structure- activity studies and molecular property prediction. Journal of Chemical Information and Computer Sciences 23, 14–22, 1983.Google Scholar
  6. [6]
    Brint, A.T. and Willett, P. Algorithms for the identification of three- dimensional maximal common substructures. Journal of Chemical Information nd Computer Sciences 27, 152–158, 1987.CrossRefGoogle Scholar
  7. [7]
    Brint, A.T. and Willett, P. Identifying 3-D maximal common substructures using transputer networks. Journal of Molecular Graphics 5, 200–207, 1987.CrossRefGoogle Scholar
  8. [8]
    Broto, P. Moreau, G. and Vandycke, C. Molecular structures: perception, auto-correlation descriptor and SAR studies. European Journal of Medicinal Chemistry 19, 66–70, 1984.Google Scholar
  9. [9]
    Carhart, R.E., Smith, D.H. and Venkataraghavan, R. Atom pairs as molecular features in structure-activity studies: definition and application. Journal of Chemical Information and Computer Sciences 25, 64–73, 1985.Google Scholar
  10. [10]
    Cone, M.M., Venkataraghavan, R. and McLafferty, F.W. Molecular structure comparison program for the identification of maximal common substructures. Journal of the American Chemical Society 99, 7668–7671, 1977.CrossRefGoogle Scholar
  11. [11]
    Cormack, R.M. A review of classification. Journal of the Royal Statistical Society 134, 321–367, 1971.CrossRefGoogle Scholar
  12. [12]
    Crandell, C.W. and Smith, D.H. Computer-assisted examination of compounds for common three-dimensional substructures. Journal of Chemical Information and Computer Sciences 23, 186–197.Google Scholar
  13. [13]
    Gabanyi, Z., Surjan, P. and Naray-Szabo, G. Application of topological molecular transforms to rational drug design. European Journal of Medicinal Chemistry 17, 307–311, 1982.Google Scholar
  14. [14]
    Jarvis, R.A. and Patrick, E.A. Clustering using a similarity measure based on shared nearest neighbours. IEEE Transactions on Computers C-22, 1025–1034, 1973.CrossRefGoogle Scholar
  15. [15]
    Johnson, M., Nain, M., Nicholson, V. and Tsai, C.C. Comparing the substructure metric to some fragment-based measures of inter-molecular structural similarity. In: Hadzi, D. and Jerman-Blazic, B. (editors) QSAR in Drug Design and Toxicology (in press).Google Scholar
  16. [16]
    Lucarella, D. A document retrieval system based on nearest neighbour searching. Journal of Information Science 14, 25–33, 1988.CrossRefGoogle Scholar
  17. [17]
    Randic, M. and Wilkins, C.L. Graph theoretical approach to recognition of structural similarity in molecules. Journal of Chemical Information and Computer Sciences 19, 31–16, 1979.Google Scholar
  18. [18]
    Rubin, V. and Willett, P. A comparison of some hierarchal monothetic divisive clustering algorithms for structure property correlation. Analytica Chimica Acta 151, 161–166, 1983.CrossRefGoogle Scholar
  19. [19]
    Salton, G. and McGill, M.J. Introduction To Modern Information Retrieval McGraw-Hill, New York, 1983.Google Scholar
  20. [20]
    Sneath, P.H.A. and Sokal, R.R. Numerical Taxonomy Freeman, San Francisco, 1973.Google Scholar
  21. [21]
    Topliss, J.G. and Edwards, R.P. Chance factors in studies of quantitative structure-activity relationships. Journal of Medicinal Chemistry 22, 1238–1244, 1979.CrossRefGoogle Scholar
  22. [22]
    Varkony, T.H., Shiloach, Y. and Smith, D.H. Computer-assisted examination of chemical compounds for structural similarities. Journal of Chemical Information and Computer Sciences 19, 104–111, 1979.Google Scholar
  23. [23]
    Wilkins, C.L. and Randic, M. A graph theoretical approach to structure- property and structure-activity correlations. Theoretica Chimica Acta 58, 45–68, 1980.CrossRefGoogle Scholar
  24. [24]
    Willett, P. A comparison of some hierarchal agglomerative clustering algorithms for structure-property correlation. Analytica Chimica Acta 136, 29–37, 1982.CrossRefGoogle Scholar
  25. [25]
    Willett, P. Evaluation of relocation clustering algorithms for the automatic classification of chemical structures. Journal of Chemical Information and Computer Sciences 24, 29–33, 1984.Google Scholar
  26. [26]
    Willett, P. Modern Approaches to Chemical Reaction Searching Gower, Aldershot, 1986.Google Scholar
  27. [27]
    Willett, P. Similarity and Clustering in Chemical Information Systems Research Studies Press, Letchworth, 1987.Google Scholar
  28. [28]
    Willett, P. and Winterman, V. A comparison of some measures for the determination of inter-molecular structural similarity. Quantitative Structure-Activity Relationships 5, 18–25, 1986.CrossRefGoogle Scholar
  29. [29]
    Willett, P., Winterman, V. and Bawden, D. Implementation of nearest neighbour searching in an online chemical structure search system. Journal of Chemical Information and Computer Sciences, 26, 36–41, 1986.Google Scholar
  30. [30]
    Willett, P. Recent trends in hierarchic document clustering: a critical review. Information Processing and Management(in press).Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1988

Authors and Affiliations

  • Peter Willett
    • 1
  1. 1.Department of Information StudiesUniversity of SheffieldSheffieldUK

Personalised recommendations