The Use of Similarity and Clustering Techniques for the Prediction of Molecular Properties

  • Geoffrey M. Downs
  • Peter Willett
Part of the Eurocourses: Chemical and Environmental Science book series (EUCE, volume 2)


The fine chemicals industry makes extensive use of systems for the storage and manipulation of chemical structure information. The primary function of these systems is to provide facilities for storage and retrieval, but the close relationship that is known to exist between the structure of a molecule and its physical, chemical and biological properties has led to increasing interest in the use of chemical structure databases for the prediction of molecular properties.


Similarity Coefficient Property Prediction Tanimoto Coefficient Inverted File Connection Table 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Adamson, G.W. and Bawden, D. (1981). J. Chem. Inform. Comput. Sci. 21, 204.CrossRefGoogle Scholar
  2. Adamson, G.W. and Bush, J.A. (1973). Information Storage and Retrieval 9, 561.CrossRefGoogle Scholar
  3. Adamson, G.W. and Bush, J.A. (1975). J. Chem. Inform. Comput. Sci. 15, 55.CrossRefGoogle Scholar
  4. Ash, J.E., Chubb, P.A., Ward, S.E., Welford, S.M. and Willett, P. (1985). Communication, Storage and Retrieval of Chemical Information, Ellis Horwood, Chichester.Google Scholar
  5. Barnard, J.M. (1989). Perspect. Inform. Manag. 1, 133.Google Scholar
  6. Basak, S.C., Magnuson, V.R., Niemi, G.J. and Regal, R.R. (1988) Discrete Appl. Math. 19, 17.CrossRefGoogle Scholar
  7. Bawden, D. (1983). J. Chem. Inform. Comput. Sci. 23, 14.CrossRefGoogle Scholar
  8. Bawden, D. (1990). Applications of two-dimensional chemical similarity measures to database analysis and querying. In, Johnson, M.A. and Maggiora, G.M. (Eds.) Concepts and Applications of Molecular Similarity, Wiley, New York, pp. 65–76.Google Scholar
  9. Bawden, D., Catlow, J.T., Devon, T.K., Dalton, J.M., Lynch, M.F. and Willett, P. (1981). J. Chem. Inform. Comput. Sci. 21, 83.CrossRefGoogle Scholar
  10. Broto, P. Moreau, G. and Vandycke, C. (1984). Eur. J. Med. Chem. 19, 66.Google Scholar
  11. Carhart, R.E., Smith, D.H. and Venkataraghavan, R. (1985). J. Chem. Inform. Comput. Sci. 25, 64.CrossRefGoogle Scholar
  12. Cramer, R.D., Redl, G. and Berkoff, C.E. 91973). J. Med. Chem. 17, 533.Google Scholar
  13. Downs, G.M., Gillet V.J., Holliday J.D. and Lynch M.F. (1988). J. Chem. Inform. Comput. Sci. 29, 215.Google Scholar
  14. Downs, G.M., Poirrette, A.R., Willett, P. and Walsh, P.T. (1991). Evaluation of similarity searching methods using activity and toxicity data. Proceedings of the Second International Conference on Chemical Structures (Noordwijkerhout, Holland, June 1990). In press.Google Scholar
  15. Downs, G.M., Walsh, P.T. and Booth, A.M. (1990). Similarity and clustering of chemical structures for property prediction. Paper presented at the Second International Workshop on Computer Chemistry (Merseburg, Germany, October 1990 ), Health and Safety Executive Section Report, Project R41.35RL.Google Scholar
  16. Enslein, K. (1988). Toxicol. Indust. Health 4, 479.Google Scholar
  17. Enslein, K., Borgstedt, H.H., Blake, B.W. and Hart, J.B. (1987). In Vitro Toxicol. 1, 129.Google Scholar
  18. Figueras, J. (1972). J. Chem. Docum. 12, 237.CrossRefGoogle Scholar
  19. Franke, R. (1984). Theoretical Drug Design Methods, Elsevier, Amsterdam.Google Scholar
  20. Frierson, M.R., Klopman, G. and Rosenkranz, H.S. (1986). Environ. Mutagen. 8, 283.CrossRefGoogle Scholar
  21. Gabanyi, Z., Surjan, P. and Naray-Szabo, G. (1982). Eur. J. Med. Chem. 17, 307.Google Scholar
  22. Harrison, P.J. (1968). Appl. Stat. 17, 226.CrossRefGoogle Scholar
  23. Hodes, L. (1989). J. Chem. Inform. Comput. Sci. 29, 66.CrossRefGoogle Scholar
  24. Jarvis, R.A. and Patrick, E.A. (1973). IEEE Trans. Comput. C-22, 1025.CrossRefGoogle Scholar
  25. Johnson, M.A. (1989). J. Math. Chem. 3, 117.CrossRefGoogle Scholar
  26. Johnson, M.A. (1990). Similarity-based methods for predicting chemical and biological properties: a brief overview from a statistical perspective. In, Bawden, D. and Mitchell, E.M. (Eds.) Chemical Information Systems. Beyond the Structure Diagram, Ellis Horwood, Chichester, pp. 149–159.Google Scholar
  27. Johnson, M.A., Lajiness, M and Maggiora, G. (1989). Molecular similarity: a basis for designing drug screening programs. In, QSAR: Quantitative Structure-Activity Relationships in Drug Design; Progress in Clinical Biological Research Series 291, Alan R. Liss, Inc. pp. 167–171.Google Scholar
  28. Johnson, M.A. and Maggiora, G.M. (Eds.) (1990). Concepts and Applications of Molecular Similarity, Wiley, New York.Google Scholar
  29. Kissman, H.M. and Wexler, P. (1985). J. Chem. Inform. Comput. Sci. 25, 212.CrossRefGoogle Scholar
  30. Klopman, G. and Raychaudhury, C. (1990). J. Chem. Inform. Comput. Sci. 30, 12.CrossRefGoogle Scholar
  31. Lajiness, M.S., Johnson, M.A. and Maggiora, G.M. (1989). Prog. Clin. Biol. Res. 291, 173.Google Scholar
  32. Lipscombe, K.J., Lynch, M.F. and Willett, P. (1989). Ann. Rev. Inform. Sci. Technol. 24, 189.Google Scholar
  33. Lyman, W.J., Reehl, W.F. and Rosenblatt, D.H. (Eds.) (1981). Handbook of Chemical Property Estimation, McGraw-Hill, New York.Google Scholar
  34. Martin, Y.C., Bures, M.G. and Willett, P. (1990). Searching databases of three-dimensional structures. In, Lipkowitz, K.B. and Boyd, D.B. (Eds.). Reviews in Computational Chemistry, VCH, New York, pp. 213–263.CrossRefGoogle Scholar
  35. Morgan, H.L. (1965). J. Chem. Docum. 5, 107.CrossRefGoogle Scholar
  36. Murtagh, F. (1983). Comput. J. 26, 354.CrossRefGoogle Scholar
  37. Norager, O. (1988). ECDIN, Environmental Chemicals Data and Information Network. In, Warr, W.A. (Ed.). Chemical Structures, Springer-Verlag, Berlin Heidelberg, pp. 195–209.CrossRefGoogle Scholar
  38. Ormerod, A., Willett, P. and Bawden, D. (1989). Quant. Struct.-Activ. Relat. 8, 115.CrossRefGoogle Scholar
  39. Pepperrell, C.A., Poirrette, A.R., Willett, P. and Taylor, R. (1991). Development of an atom mapping procedure for similarity searching in databases of three-dimensional chemical structures. Submitted for publication.Google Scholar
  40. Pepperrell, C.A. and Willett, P. (1991). Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances. Submitted for publication.Google Scholar
  41. Randic, M. and Wilkins, C.L. (1979). J. Chem. Inform. Comput. Sci. 19, 31.CrossRefGoogle Scholar
  42. Rosenkranz, H.S. and Klopman, G. (1988). Toxicol. Indust. Health 4, 533.Google Scholar
  43. Rubin, V. and Willett, P. (1983). Anal. Chim. Acta 151, 161.CrossRefGoogle Scholar
  44. Sneath, P.H.A. and Sokal, R.R. (1973). Numerical Taxonomy, Freeman, San Francisco.Google Scholar
  45. Tarjan, R.E. (1977). Amer. Chem. Soc. Symp. Ser. 46, 1.Google Scholar
  46. Tosato, M.L., Marchini, S., Passerini, L., Pino, A., Eriksson, L., Lindgren, F., Hellberg, S., Jonsson, J., Sjostrom, M., Skagerberg, B. and Wold, S. (1990). Env. Toxicol. Chem. 9, 265.CrossRefGoogle Scholar
  47. Warr, W.E. (Ed.) (1988). Chemical Structures. The International Language of Chemistry, Springer, Berlin.Google Scholar
  48. Weininger, D. (1988). J. Chem. Inform. Comput. Sci. 28, 31.CrossRefGoogle Scholar
  49. Wilkins, C.L. and Randic, M. (1980). Theor. Chim. Acta 58, 45.CrossRefGoogle Scholar
  50. Willett, P. (1982). Anal. Chim. Acta 136, 29.CrossRefGoogle Scholar
  51. Willett, P. (1983). J. Chem. Inf. Comput. Sci. 23, 22.CrossRefGoogle Scholar
  52. Willett, P. (1984). J. Chem. Inform. Comput. Sci. 24, 29.CrossRefGoogle Scholar
  53. Willett, P. (1987). Similarity and Clustering in Chemical Information Systems, Research Studies Press, Letchworth.Google Scholar
  54. Willett, P. (1990). Algorithms for the calculation of similarity in chemical structure databases. In, Johnson, M.A. and Maggiora, G.M. (Eds.) Concepts and Applications of Molecular Similarity, Wiley, New York, pp. 43–63.Google Scholar
  55. Willett, P. (1991). Three-Dimensional Chemical Structure Handling, Research Studies Press, Taunton.Google Scholar
  56. Willett, P. and Downs G.M. (1989). Clustering of chemical structure databases. An investigation for the EC Joint Research Centre, Department of Information Studies, University of Sheffield.Google Scholar
  57. Willett, P. and Winterman, V. (1986). Quant. Struct.-Activ. Relat. 5, 18.CrossRefGoogle Scholar
  58. Willett, P., Winterman, V. and Bawden, D. (1986a). J. Chem. Inform. Comput. Sci. 26, 36.CrossRefGoogle Scholar
  59. Willett, P., Winterman, V. and Bawden, D. (1986b). J. Chem. Inform. Comput. Sci. 26, 109.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1991

Authors and Affiliations

  • Geoffrey M. Downs
    • 1
  • Peter Willett
    • 1
  1. 1.Department of Information StudiesUniversity of SheffieldSheffieldUK

Personalised recommendations