Skip to main content

Molecular Similarity Measures

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 672))

Abstract

Molecular similarity is a pervasive concept in chemistry. It is essential to many aspects of chemical reasoning and analysis and is perhaps the fundamental assumption underlying medicinal chemistry. Dissimilarity, the complement of similarity, also plays a major role in a growing number of applications of molecular diversity in combinatorial chemistry, high-throughput screening, and related fields. How molecular information is represented, called the representation problem, is important to the type of molecular similarity analysis (MSA) that can be carried out in any given situation. In this work, four types of mathematical structure are used to represent molecular information: sets, graphs, vectors, and functions. Molecular similarity is a pairwise relationship that induces structure into sets of molecules, giving rise to the concept of chemical space. Although all three concepts – molecular similarity, molecular representation, and chemical space – are treated in this chapter, the emphasis is on molecular similarity measures. Similarity measures, also called similarity coefficients or indices, are functions that map pairs of compatible molecular representations that are of the same mathematical form into real numbers usually, but not always, lying on the unit interval. This chapter presents a somewhat pedagogical discussion of many types of molecular similarity measures, their strengths and limitations, and their relationship to one another. An expanded account of the material on chemical spaces presented in the first edition of this book is also provided. It includes a discussion of the topography of activity landscapes and the role that activity cliffs in these landscapes play in structure–activity studies.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Rouvray, D. (1990) The evolution of the concept of molecular similarity. In Concepts and Applications of Molecular Similarity, M.A. Johnson and G.M. Maggiora, Eds., Wiley, New York, Chapter 2.

    Google Scholar 

  2. Sheridan, R.P. and Kearsley, S.K. (2002) Why do we need so many chemical similarity search methods? Drug Discovery Today 7, 903–911.

    Article  PubMed  Google Scholar 

  3. Willett, P. (1987) Similarity and Clustering in Chemical Information Systems. Research Studies Press, Letchworth.

    Google Scholar 

  4. Johnson, M.A. and Maggiora, G.M., Eds. (1990) Concepts and Applications of Molecular Similarity. Wiley, New York.

    Google Scholar 

  5. Dean, P.M., Ed. (1994) Molecular Similarity in Drug Design. Chapman & Hall, Glasgow.

    Google Scholar 

  6. Tversky, A. (1977) Features of similarity. Pyschol. Rev. 84, 327–352.

    Article  Google Scholar 

  7. Chen, X. and Brown, F.K. (2007) Asymmetry of chemical similarity. Chem. Med. Chem. 2, 180–182.

    PubMed  CAS  Google Scholar 

  8. Willett, P., Barnard, J.P., and Downs, G.M. (1998) Chemical similarity searching. J. Chem. Inf. Comput. Sci. 38, 983–996.

    Article  CAS  Google Scholar 

  9. Bender, A. and Glen, R.C. ( 2004) Molecular similarity: A key technique in molecular informatics. Org. Biomol. Chem. 2, 3204–3218.

    Article  PubMed  CAS  Google Scholar 

  10. Johnson, M.A. (1989) A review and examination of mathematical spaces underlying molecular similarity analysis. J. Math. Chem. 3, 117–145.

    Article  CAS  Google Scholar 

  11. Borg, I. and Groenen, P. (1997) Modern Multidimensional Scaling. Springer, New York.

    Google Scholar 

  12. Jolliffe, I.T. (2002) Principal Component Analysis (Second Edition). Springer, New York.

    Google Scholar 

  13. Domine, D., Devillers, J., Chastrette, M., and Karcher, W. (1993). Non-linear mapping for structure-activity and structure-property modeling. J. Chemometrics 7, 227–242.

    Article  CAS  Google Scholar 

  14. Rush, J.A. (1999) Cell-based methods for sampling high-dimensional spaces. In Rational Drug Design, Truhlar, D.G., Howe, W.J., et al., Eds., Springer, New York, pp. 73–79.

    Chapter  Google Scholar 

  15. Rohrbaugh, R.H. and Jurs, P.C. (1987) Descriptions of molecular shape applied in studies of structure/activity and structure/property relationships. Anal. Chim. Acta 199, 99–109.

    Article  CAS  Google Scholar 

  16. Verloop, A. (1987) The STERIMOL Approach to Drug Design. Marcel Dekker, New York.

    Google Scholar 

  17. Mulliken, R.S. (1955) Electronic population analysis on LCAO-MO molecular wave functions. I. J. Chem. Phys. 23, 1833–1840.

    Article  CAS  Google Scholar 

  18. Stanton, D.T.; Jurs, P.C. (1990) Development and use of charged partial surface area structural descriptors in computer-assisted quantitative structure-property relationship studies. Anal. Chem. 62, 2323–2329.

    Article  CAS  Google Scholar 

  19. Kier, L.B. (1989) An index of molecular flexibility from kappa shape attributes. Quant. Struct.-Act. Relat. 8, 221–224.

    Article  CAS  Google Scholar 

  20. Kvasnička, V. and Pospíchal, J. (1989) Two metrics for a graph-theoretical model of organic chemistry. J. Math. Chem. 3, 161–191.

    Article  Google Scholar 

  21. Kvasnička, V. and Pospíchal, J. (1991) Chemical and reaction metrics for graph-theoretical model of organic chemistry. J. Mol. Struct. (Theochem.) 227, 17–42.

    Article  Google Scholar 

  22. Randić, M. (1992) Representation of molecular graphs by basic graphs. J. Chem. Inf. Comput. Sci. 32, 57–69.

    Article  Google Scholar 

  23. Baskin, I.I., Skvortsova, M.I., Stankevich, I.V., and Zefirov, N.S. (1995) On the basis of invariants of labeled molecular graphs. J. Chem. Inf. Comput. Sci. 35, 527–531.

    Article  CAS  Google Scholar 

  24. Skvortsova, M.I., Baskin, I.I., Stankevich, I.V., Palyulin, V.A., and Zefirov, N.S. (1998) Molecular similarity. I. Analytical description of the set of graph similarity measures. J. Chem. Inf. Comput. Sci. 38, 785–790.

    CAS  Google Scholar 

  25. Ginn, C.M.R., Willett, P., and Bradshaw, J. (2000) Combination of molecular similarity measures using data fusion. Perspec. Drug Disc. Design 20, 1–16.

    Article  CAS  Google Scholar 

  26. Hert, J., Willett, P., Wilton, D.J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. J. Chem. Inf. Comput. Sci. 44, 1177–1185.

    Article  PubMed  CAS  Google Scholar 

  27. Whittle, M., Gillet, V.J., Willett, P., Alexander, A., and Loesel, J. (2004) Enhancing the effectiveness of virtual screening by fusing nearest-neighbor lists: A comparison of similarity coefficients. J. Chem. Inf. Comput. Sci. 44, 1840–1848.

    Article  PubMed  CAS  Google Scholar 

  28. Whittle, M., Gillet, V.J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: Similarity and group fusion. J. Chem. Inf. Model. 46, 2206–2219.

    Article  PubMed  CAS  Google Scholar 

  29. Mestres, J., Rohrer, D.C., and Maggiora, G.M. (1999) A molecular-field-based similarity study of non-nucleoside HIV-1 reverse transcriptase inhibitors. J. Comput.-Aided Mol. Design 13, 79–93.

    Article  CAS  Google Scholar 

  30. Trinajstić, N. (1992) Chemical Graph Theory. CRC Press, Boca Raton, Florida.

    Google Scholar 

  31. Harary, F. (1969) Graph Theory. Addison-Wesley Publishing Company, Reading, Massachusetts.

    Google Scholar 

  32. Raymond, J.W. and Willett, P. (2002) Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comput.-Aided Mol. Design 16, 521–533.

    Article  CAS  Google Scholar 

  33. Mason, J.S., Morize, I., Menard, P.R., Cheney, D.L., Hulme, C., and Labaudiniere, R.F. (1999) New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. J. Med Chem. 42, 3251–3264.

    Article  PubMed  CAS  Google Scholar 

  34. Devillers, J. and Balaban, A.T., Eds. (1999) Topological Indices and Related Descriptors in QSAR and QSPR. Gordon and Breach Science Publishers, Amsterdam, The Netherlands.

    Google Scholar 

  35. Pearlman, R.S. and Smith, K.M. (1998) Novel software tools for chemical diversity. Perspec. Drug Disc. Design 9/10/11, 339–353.

    Article  Google Scholar 

  36. Halmos, P.R. (1958) Finite-Dimensional Vector Spaces, Second Edition. D. Van Nostrand Company, Inc., Princeton, New Jersey.

    Google Scholar 

  37. Mestres, J., Rohrer, D.C., and Maggiora, G.M. (1997) MIMIC: A molecular-field matching program. Exploiting applicability of molecular similarity approaches. J. Comput. Chem. 18, 934–954.

    Article  CAS  Google Scholar 

  38. Thorner, D.A., Willett, P., Wright, P.M., and Taylor, R. (1997) Similarity searching in files of three-dimensional chemical structures: Representation and searching of molecular electrostatic potentials using field-graphs. J. Comput.-Aided Mol. Design 11, 163–174.

    Article  CAS  Google Scholar 

  39. Du, Q., Arteca, G.A., and Mezey, P.G. (1997) Heuristic lipophilicity potential for computer-aided rational drug design. J. Comput.-Aided Mol. Design 11, 503–515.

    Article  CAS  Google Scholar 

  40. Oden, J.T. and Demkowicz, L.F. (1996) Applied Functional Analysis. CRC Press, Boca Raton, Florida.

    Google Scholar 

  41. Petke, J.D. (1993) Cumulative and discrete similarity analysis of electrostatic potentials and fields. J. Comput. Chem. 14, 928–933.

    Article  CAS  Google Scholar 

  42. Cramer, R.D., Patterson, D.E., and Bunce, J.D. (1988) Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. J. Amer. Chem. Soc., 110, 5959–5967.

    Google Scholar 

  43. Bandemer, H. and Näther, W. (1992) Fuzzy Data Analysis. Kluwer Academic Publishers, Dordrecht, The Netherlands.

    Book  Google Scholar 

  44. Kaufmann, A. and Gupta, M.M. (1985) An Introduction to Fuzzy ArithmeticTheory and Applications. Van Nostrand Reinhold, New York.

    Google Scholar 

  45. McGregor, J. and Willett, P. (1981) Use of a maximal common subgraph algorithm in the automatic identification of the ostensible bond changes occurring in chemical reactions. J. Chem. Inf. Comput. Sci. 21, 137–140.

    Article  CAS  Google Scholar 

  46. Johnson, M. (1985) Relating metrics, lines, and variables defined on graphs to problems in medicinal chemistry. In Graph Theory and its Applications to Algorithms and Computer Science, Y. Alavi et al., Eds., Wiley, New York, pp.457–470.

    Google Scholar 

  47. Hagadone, T.R. (1992) Molecular substructure similarity searching: Efficient retrieval in two-dimensional structure databases. J. Chem. Inf. Comput. Sci. 32, 515–521.

    Article  CAS  Google Scholar 

  48. Rusinko, A., Farmen, M.W., Lambert, C.G., and Young, S.S. (1997) SCAM: Statistical classification of activities of molecules using recursive partitioning. 213th ACS Natl. Meeting, San Francisco, CA, CINF 068.

    Google Scholar 

  49. James, C.A., Weininger, D., and Delany, J. (2002) Daylight Theory Manual. Daylight Chemical Information Systems, Inc.

    Google Scholar 

  50. Kanerva, P. (1990) Sparse Distributed Memory. MIT Press, Cambridge, Massachusetts, pp. 26–27.

    Google Scholar 

  51. Klir, G.J. and Yuan, B. (1995) Fuzzy Sets and Fuzzy Logic: Theory and Applications. Prentice Hall PTR, Upper Saddle River, New Jersey.

    Google Scholar 

  52. Miyamoto, S. (1990) Fuzzy Sets in Information Retrieval and Cluster Analysis. Kluwer Academic Publishers, Dordrecht, The Netherlands.

    Book  Google Scholar 

  53. Maggiora, G.M., Petke, J.D., and Mestres, J. (2002) A general analysis of field-based molecular similarity indices. J. Math. Chem. 31, 251–270.

    Article  CAS  Google Scholar 

  54. Hurst, T. and Heritage, T. (1997) HQSAR – A highly predictive QSAR technique based on molecular holograms. 213th ACS Natl. Meeting, San Francisco, CA, CINF 019.

    Google Scholar 

  55. Schneider, G., Neidhart, W., Giller, T., and Schmid, G. (1999) “Scaffold-hopping” by topological pharmacophore search: A contribution to virtual screening. Angew. Chem. Int. Ed. 38, 2894–2896.

    Article  CAS  Google Scholar 

  56. Xue, L., Godden, J.W., and Bajorath, J. (1999) Database searching for compounds with similar biological activity using short binary bit string representations of molecules. J. Chem. Inf. Comput. Sci. 39, 881–886.

    Article  PubMed  CAS  Google Scholar 

  57. Wikipedia website, http://en.wikipedia.org/wiki/Euclidean_vector (Last accessed October 22, 2009).

  58. Hyvarinen, A., Karhunen, J., and Oja, E. (2001) Independent Component Analysis. Wiley, New York.

    Book  Google Scholar 

  59. Kay, D.C. (1988) Theory and Problems of Tensor Calculus, Schaum’s Outline Series. McGraw-Hill, New York.

    Google Scholar 

  60. Hodgkin, E.E. and Richards, W.G. (1987) Molecular similarity based on electrostatic potential and electric fields. Int. J. Quantum Chem.: Quantum Biol. Symp. 14, 105–110.

    Article  CAS  Google Scholar 

  61. Good, A.C. and Richards, W.G. (1998) Explicit Calculation of 3D molecular similarity. Perspec. Drug Disc. Design 9/10/11, 321–338.

    Article  Google Scholar 

  62. Lemmen, C. and Lengauer, T. (2000) Computational methods for the structural alignment of molecules. J. Comput.-Aided Mol. Design 14, 215–232.

    Article  CAS  Google Scholar 

  63. Güner, O.F., Ed. (2000) Pharmacophore Perception, Development and Use in Drug Design. International University Line, La Jolla.

    Google Scholar 

  64. Mansfield, M.L., Covell, D.G., and Jernigan, R.L. (2002) A new class of molecular shape descriptors. Theory and properties. J. Chem. Inf. Comput. Sci. 42, 259–273.

    Article  PubMed  CAS  Google Scholar 

  65. Grant, J.A., Gallardo, G.A., and Pickup, J.T. (1996) A fast method of molecular shape comparison. A simple application of a Gaussian description of molecular shape. J. Comp. Chem. 17, 1653–1666.

    Article  CAS  Google Scholar 

  66. Blinn, J.R., Rohrer, D.C., and Maggiora, G.M. (1998) Field-based similarity forcing in energy minimization and molecular matching. In Pacific Symposium on Biocomputing ’99, R.B. Altman, et al., Eds., World Scientific, Singapore, pp. 415–424.

    Google Scholar 

  67. Labute, P. (1999) Flexible alignment of small molecules. J. Chem. Comput. Group, Spring 1999 Edition [http://www.chemcomp.com/feature/malign.htm].

  68. Christoffersen, R.E. and Maggiora, G.M. (1969) Ab initio calculations on large molecules using molecular fragments. Preliminary investigations. Chem. Phys. Letts. 3, 419–423.

    Article  CAS  Google Scholar 

  69. Szabo, A. and Ostlund, N.S. (1982) Modern Quantum ChemistryIntroduction to Advanced Electronic Structure Theory. Macmillan Publishing Company, New York.

    Google Scholar 

  70. Kearsley, S.K. and Smith, G.M. (1990) An alternative method for the alignment of molecular structures: Maximizing electrostatic and steric overlap. Tetrahedron Comput. Meth. 3, 615–633.

    Article  CAS  Google Scholar 

  71. Lemmen, C., Hiller, C., and Lengauer, T. (1998) RigFit: A new approach to superimposing ligand molecules. J. Comput.-Aided Mol. Design 12, 491–502.

    Article  CAS  Google Scholar 

  72. Good, A.C., Hodgkin, E.E., and Richards, W.G. (1992) Utilization of Gaussian functions for the rapid evaluation of molecular similarity. J. Chem. Inf. Comput. Sci. 32, 188–191.

    Article  CAS  Google Scholar 

  73. Carbó, R. and Calabuig, B. (1990) Molecular similarity and quantum chemistry. In Concepts and Applications of Molecular Similarity, M.A. Johnson and G.M. Maggiora, Eds.,Wiley-Interscience, New York, pp. 147–171.

    Google Scholar 

  74. Petitjean, M. (1995) Geometric molecular similarity from volume based distance minimization: Application to Saxitoxin and Tetrodotoxin. J. Comput. Chem. 16, 80–90.

    Article  CAS  Google Scholar 

  75. Petitjean, M. (1996) Three-dimensional pattern recognition from molecular distance minimization. J. Chem. Inf. Comput. Sci. 36, 1038–1049.

    Article  CAS  Google Scholar 

  76. Ballester, P.J. and Richards, W.G. (2007) Ultrafast shape recognition for similarity search in molecular databases. Proc. Roy. Soc. A463, 1307–1321.

    Google Scholar 

  77. Nissink, J.W.M., Verdonk, M.L., Kroon, J., Mietzner, T., and Klebe, G. (1997) Superposition of molecules: Electron density fitting by application of Fourier transforms. J. Comput. Chem. 18, 638–645.

    Article  CAS  Google Scholar 

  78. Keseru, G.M. and Kolossvary, I. (1999) Molecular Mechanics and Conformational Analysis in Drug Design. Wiley-Interscience (Blackwell Publishing), New York.

    Google Scholar 

  79. Jorgensen, W.L. and Tirado-Rives, J. (2005) Potential energy functions for atomic-level simulations of water and organic and biomolecular systems. Proc. Natl. Acad. Sci. U.S.A. 102, 6665–6670.

    Article  PubMed  CAS  Google Scholar 

  80. Lee, M.S., Salsbury, F.R., and Olson, M.A. (2004). An efficient hybrid explicit/implicit solvent method for biomolecular simulations. J. Comput. Chem. 25, 1967–1978.

    Article  PubMed  CAS  Google Scholar 

  81. Chipot, C. and Pohorille, A., Eds. (2007) Free Energy Calculations. Theory and Applications in Chemistry and Biology. Springer, New York.

    Google Scholar 

  82. Petit, J., Meurice, N. and Maggiora, G.M. (2009) On the development of a “soft” Rule of Five. J. Chem. Inf. Model., submitted.

    Google Scholar 

  83. Stephens, M. A. (1974) EDF Statistics for goodness of fit and some comparisons. J. Am. Stat. Assoc. 69, 730–737.

    Article  Google Scholar 

  84. Krishnan, V. (2006) Probability and Random Processes. Wiley-Interscience, Hoboken, New Jersey.

    Book  Google Scholar 

  85. Martin, Y.C. (2001) Diverse viewpoints on computational aspects of molecular diversity. J. Comb. Chem. 3, 231–250.

    Article  PubMed  CAS  Google Scholar 

  86. Seilo, G. (1998) Similarity measures: Is it possible to compare dissimilar structures? J. Chem. Inf. Comput. Sci. 38, 691–701.

    Article  Google Scholar 

  87. Medina-Franco, J.L., Martínez-Mayorga, K., Giulianotti. M.A., Houghten, R.A., and Pinilla, C. (2008) Visualization of chemical space in drug discovery. Curr. Comput.-Aided Drug Design 4 , 322–333.

    Article  CAS  Google Scholar 

  88. Oprea, T.I. and Gottfries, J. (2001) Chemography: The art of navigating in chemical space. J. Comb. Chem., 3, 157–166.

    Article  PubMed  CAS  Google Scholar 

  89. Wishart, D.S.; Knox, C.; Guo, A.C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; and Woolsey, J. DrugBank: A comprehensive resource for in silico drug discovery and exploration, Nucl. Acids Res. 2006, 34, D668-D672. (http://www.drugbank.ca/databases. Accessed July 6, 2009)

  90. Austin, C.P., Brady, L.S., Insel, T.R., and Collins, F.S. (2004) Molecular biology: NIH Molecular libraries initiative. Science 306, 1138–1139. This library is freely accessible by querying ‘MLSMR’in PubChem (http://pubchem.ncbi.nlm.nih.gov. Accessed October 29, 2009)

    Google Scholar 

  91. Patterson, D.E., Cramer, R.D., Ferguson, A.M., Clark, R.D., and Weinberger, L.E. (1996) Neighborhood behavior: A useful concept for validation of molecular diversity. J. Med. Chem. 39, 3049–3059.

    Article  PubMed  CAS  Google Scholar 

  92. Bellman, R.E. (1961) Adaptive Control Processes. Princeton University Press, Princeton, New Jersey.

    Google Scholar 

  93. Hastie, T., Tibshirani, R., and Friedman, J. (2001) The Elements of Statistical Learning. Springer, New York.

    Google Scholar 

  94. Bishop, C. (1995) Neural Networks for Pattern Recognition. Clarendon Press, Oxford.

    Google Scholar 

  95. Raghavendra, A.S. and Maggiora, G.M. (2007) Molecular basis sets – A general similarity-based approach for representing chemical spaces. J. Chem. Info. Model. 47, 1328–1340.

    Article  CAS  Google Scholar 

  96. Simovici, D.A. and Djeraba, C. (2008) Mathematical Tools for Data Mining: Set Theory, Partial Orders, Combinatorics. Springer, London, UK.

    Google Scholar 

  97. Lee, J.A. and Verleysen, M. (2007) Nonlinear Dimensionality Reduction. Springer, New York.

    Book  Google Scholar 

  98. Walker, P.D., Maggiora, G.M., Johnson, M.A., Petke, J.D., and Mezey, P.G. (1995) Shape group-analysis of molecular similarity - Shape similarity of 6-membered aromatic ring-systems. J. Chem. Inf. Comput. Sci. 35, 568–578.

    Article  CAS  Google Scholar 

  99. Rarey, M. and Dixon, J.S. (1998) Feature trees: A new molecular similarity measure based on tree matching. J. Comput.-Aided Mol. Design 12, 471–490.

    Article  CAS  Google Scholar 

  100. Agrafiotis, D.K. and Lobanov, V.S. (2000) Nonlinear mapping networks. J. Chem. Inf. Comput. Sci. 40, 1356–1362.

    Article  PubMed  CAS  Google Scholar 

  101. Rassokhin, D., Lobanov, V.S. and Agrafiotis, D.K. (2000) Nonlinear mapping of massive data sets by fuzzy clustering and neural networks. J. Comput. Chem. 21, 1–14.

    Article  Google Scholar 

  102. Xie, D., Tropsha, A., and Schlick, T. (2000) An efficient projection protocol for chemical databases: Singular value decomposition combined with truncated-Newton minimization. J. Chem. Inf. Comput. Sci. 40, 167–177.

    Article  PubMed  CAS  Google Scholar 

  103. Kruskal, J. (1977) The relationship between multidimensional scaling and clustering in Classification and Clustering. J. Van Ryzin, Ed., Academic Press, New York.

    Google Scholar 

  104. Gower, J.C. (1966) Some distance properties of latent roots and vector methods used in multivariate analysis. Biometrika 53, 325–338.

    Google Scholar 

  105. Diamantaras, K.I. and Kung, S.Y. (1996) Principal component neural networks – Theory and Applications. Wiley, New York.

    Google Scholar 

  106. Benigni, R. and Giuliani, A. Analysis of distance matrices for studying data structures and separating classes. Struct.-Act. Relat. 12, 397–401.

    Google Scholar 

  107. Gower, J.C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27, 857–74.

    Article  Google Scholar 

  108. Gower, J.C. (1984) Distance matrices and their Euclidean approximation. In Data Analysis and Informatics, III, E. Diday et al., Eds., Elsevier Science Publishers B.V. (North-Holland).

    Google Scholar 

  109. Gower, J.C. and Legendre, P. (1986) Metric and Euclidean properties of dissimilarity coefficients. J. Classific. 3, 5–48.

    Article  Google Scholar 

  110. Benigni, R. (1994) EVE, a distance-based approach for discriminating non-linearly separable groups. Quant. Struct.-Act. Relat. 13, 406–411.

    CAS  Google Scholar 

  111. Tenenbaum, J.B., de Silva, V., and Langford, J.V. (2000) A global geometric framework for non-linear dimensionality reduction. Science 290, 2319–2323.

    Article  PubMed  CAS  Google Scholar 

  112. Roweis, S.T. and Saul, L.K. (2000) Non-linear dimensionality reduction by local linear embedding. Science 290, 2323–2326.

    Article  PubMed  CAS  Google Scholar 

  113. Friedman, J. and Tukey, J. (1974) A projection pursuit algorithm for exploratory data analysis. IEEE Trans. Comput. C23, 881–889.

    Article  Google Scholar 

  114. Agrafiotis, D.K. (2003) Stochastic proximity embedding. J. Comput. Chem. 24, 1215–1221.

    Article  PubMed  CAS  Google Scholar 

  115. Agrafiotis, D.K. and Xu, H. (2003) A geodesic framework for analyzing molecular similarities. J. Chem. Inf. Comput. Sci. 43, 475–484.

    Article  PubMed  CAS  Google Scholar 

  116. Donoho, D.L. and Grimes, C. (2003) Hessian eigenmaps: Local linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci U. S. A. 100, 5591–55.

    Article  PubMed  CAS  Google Scholar 

  117. Maggiora, G.M., Shanmugasundaram, V., Lajiness, M.S., Doman, T.N., and Schulz, M.W. (2005) A practical strategy for directed compound acquisition. In Chemoinformatics in Drug Discovery, T.I. Oprea, Ed., pp. 317–332.

    Google Scholar 

  118. Maggiora, G.M. (2006) On outliers and activity cliffs – Why QSAR often disappoints. J. Chem. Inf. Model. 46, 1535 (Editorial).

    Google Scholar 

  119. Doweyko, A.M. (2008) QSAR: dead or alive? J. Comput.-Aided Mol. Design 22, 81–89.

    Article  CAS  Google Scholar 

  120. Johnson, S. (2008) The trouble with QSAR (or how I learned to stop worrying and embrace fallacy). J. Chem. Inf. Model. 48, 25–26.

    Article  PubMed  CAS  Google Scholar 

  121. Guha, R. and Van Drie, J.H. (2008) Assessing how well a modeling protocol capture a structure-activity landscape. J. Chem. Inf. Model. 48, 1716–1728.

    Article  PubMed  CAS  Google Scholar 

  122. Bajorath, J., Peltason, L., Wawer, M., Guha, R., Lajiness, M.S., and Van Drie, J.H. (2009) Navigating structure-activity landscapes. Drug Disc. Today 14, 698–705.

    Article  CAS  Google Scholar 

  123. Shanmugasundaram, V. and Maggiora, G.M. (2001) Characterizing property and activity landscapes using an information-theoretic approach. 222 nd American Chemical Society Meeting, Division of Chemical Information Abstract no. 77.

    Google Scholar 

  124. Renner, S. and Schneider, G. (2005) Scaffold-hopping potential of ligand-based similarity concepts. Chem. Med. Chem. 1, 181–185.

    Google Scholar 

  125. Schneider, G., Schneider, P., and Renner, S. (2006) Scaffold hopping: How far can you jump? QSAR Combin. Sci. 25, 1162–1171.

    Article  CAS  Google Scholar 

  126. Maggiora, G.M. and Shanmugasundaram, V. (2005) An information-theoretic characterization of partitioned property spaces. J. Math. Chem. 38, 1–20.

    Article  CAS  Google Scholar 

  127. Medina-Franco, J.L., Maggiora, G.M., Giulianotti, M.A., Pinilla, C., and Houghten, R.A. (2007) A similarity-based data-fusion approach to the visual characterization and comparison of compound databases. Chem. Biol. Drug Design 70, 393–412.

    Article  CAS  Google Scholar 

  128. Guha, R. and Van Drie, J.H. (2008) Structure-activity landscape index: Identifying and quantifying activity cliffs. J. Chem. Inf. Model. 48, 646–658.

    Article  PubMed  CAS  Google Scholar 

  129. Peltason, L. and Bajorath, J. (2007) SAR index: Quantifying the nature of structure-activity relationships. J. Med. Chem. 50, 5571–5578.

    Article  PubMed  CAS  Google Scholar 

  130. Wawer, M., Peltason, L., Weskamp, N., Teckentrup, A., and Bajorath, J. (2008) Structure-activity relationship anatomy by network-like similarity graphs and local structure-activity relationship indices. J. Med. Chem. 51, 6075–6084.

    Article  PubMed  CAS  Google Scholar 

  131. Medina-Franco, J.L., Martínez-Mayorga, K., Bender, A., Marín, R.M., Giulianotti, M.A., Pinilla, C., and Houghten, R.A. (2009) Characterization of activity landscapes using 2D and 3D similarity methods: Consensus activity cliffs. J. Chem. Inf. Model. 49, 477–491.

    Article  PubMed  CAS  Google Scholar 

  132. Christoffersen, R.E. (1989) Basic Principles and Techniques of Molecular Quantum Mechanics. Springer, New York.

    Book  Google Scholar 

  133. Schölkopf, B. and Smola, A. (2002) Learning with Kernels. MIT Press, Cambridge, MA.

    Google Scholar 

  134. Herbrich, R. (2002) Learning Kernel Classifiers. MIT Press, Cambridge, MA.

    Google Scholar 

  135. Shawe-Taylor, J. and Cristianini, N. (2004) Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, UK.

    Book  Google Scholar 

  136. Löwdin, P.O. (1992) On linear algebra, the least square method, and the search for linear relations by regression analysis in quantum chemistry and other sciences. Adv. Quantum Chem. 23, 83–126.

    Article  Google Scholar 

  137. Meyer, C.D. (2000) Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia, Pennsylvania.

    Book  Google Scholar 

  138. Carlson, B.C. and Keller, J.M. (1957) Orthogonalization procedures and the localization of Wannier functions. Phys. Rev. 105, 102–103.

    Article  CAS  Google Scholar 

  139. Agrafiotis, D.K., Rassokhin, D.N., and Lobanov, V.S. (2001) Multi-dimensional scaling and visualization of large molecular similarity tables. J. Comput. Chem. 22, 1–13.

    Google Scholar 

  140. Kauvar, L.M., Higgins, D.L., and Villar, H.O., et al. (1995) Predicting ligand binding to proteins by affinity fingerprinting. Chem. Biol. 2, 107–118.

    Article  PubMed  CAS  Google Scholar 

  141. Randic, M. (1991) Resolution of ambiguities in structure-property studies by use of orthogonalized descriptors. J. Chem. Inf. Comput. Sci. 31, 311–320.

    Article  CAS  Google Scholar 

  142. Randic, M. (1991) Correlation of enthalpy of octanes with orthogonal connectivity indices. J. Mol. Struct.(Theochem) 233, 45–59.

    Article  Google Scholar 

  143. Randic, M. (1993) Fitting non-linear regressions by orthogonalized power series. J. Comput. Chem. 14, 363–370.

    Article  CAS  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Tom Doman for his constructive comments on the original version of this manuscript, and Mark Johnson, Mic Lajiness, John Van Drie, and Tudor Oprea for helpful discussions. Special thanks are given to Jurgen Bajorath and Jose Medina-Franco, for providing several figures and for their helpful comments.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Humana Press

About this protocol

Cite this protocol

Maggiora, G.M., Shanmugasundaram, V. (2011). Molecular Similarity Measures. In: Bajorath, J. (eds) Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology, vol 672. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-839-3_2

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-839-3_2

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60761-838-6

  • Online ISBN: 978-1-60761-839-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics