Molecular Similarity in Computational Toxicology

  • Matteo Floris
  • Stefania Olla
Part of the Methods in Molecular Biology book series (MIMB, volume 1800)


The concept of chemical similarity has many applications in several fields of cheminformatics. One common use of chemical similarity measurements, based on the principle that similar molecules have similar properties, is in the context of the read-across approach, where estimates of a specific endpoint for a chemical are obtained starting from experimental data available from highly similar compounds.

This chapter reports an implementation of chemical similarity and the analysis of multiple combinations of binary fingerprints and similarity metrics in the context of the read-across technique.

This analysis demonstrates that the classical similarity measurements can be improved with a generalizable model of similarity. The approach presented here has been implemented in two open-source software tools for computational toxicology (CAESAR and VEGA).

Key words

Chemical similarity QSAR Toxicity prediction Similarity searching Read-across 


  1. 1.
    Madan AK, Bajaj S, Dureja H (2013) Classification models for safe drug molecules. In: Reisfeld B, Mayeno AN (eds) Computational toxicology, vol 930. Humana Press, New York, pp 99–124CrossRefGoogle Scholar
  2. 2.
    Read-Across Assessment Framework (RAAF), accessed Sept 2017Google Scholar
  3. 3.
  4. 4.
  5. 5.
  6. 6., accessed Sept 2017
  7. 7. Scholar
  8. 8.,cdk.sf.netGoogle Scholar
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Cereto-Massagué A, Ojeda MJ, Valls C, Mulero M, Garcia-Vallvé S, Pujadas G (2015) Molecular fingerprint similarity search in virtual screening. Methods 71:58–63CrossRefPubMedGoogle Scholar
  14. 14.
    Daylight Chemical Information Systems Inc.,
  15. 15.
  16. 16.
    Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The chemistry development kit (CDK): an open-source java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43:493–500CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Steinbeck C, Hoppe C, Kuhn S, Floris M, Guha R, Willighagen EL (2006) Recent developments of the chemistry development kit (CDK)–an open-source java library for chemo- and bioinformatics. Curr Pharm Des 12(17):2111–2120CrossRefPubMedPubMedCentralGoogle Scholar
  18. 18.
    Todeschini R, Consonni V, Xiang H, Holliday J, Buscema M, Willett P (2012) Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real data sets. J Chem Inf Model 52:2884–2901CrossRefPubMedGoogle Scholar
  19. 19.
    Campillos M, Kuhn M, Gavin A-C, Jensen LJ, Bork P (2008) Drug target identification using side-effect similarity. Science 321:263–266CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Nickel J, Gohlke B-O, Erehman J, Banerjee P, Rong WW, Goede A, Dunkel M, Preissner R (2014) SuperPred: update on drug classification and target prediction. Nucleic Acids Res 42:W26–W31CrossRefPubMedPubMedCentralGoogle Scholar
  21. 21.
    Lounkine E, Keiser MJ, Whitebread S, Mikhailov S, Hamon J, Jenkins JL, Lavan P, Weber E, Doak AK, Côté S, Shoichet BK, Urban L (2012) Large-scale prediction and testing of drug activity on side-effect targets. Nature 486:361–367CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Drwal MN, Banerjee P, Dunkel M, Wettig MR, Preissner R (2014) ProTox: a web server for the in silico prediction of rodent oral toxicity. Nucleic Acids Res 42:W53–W58CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Manganaro A, Pizzo F, Lombardo A, Pogliaghi A, Benfenati E (2016) Predicting persistence in the sediment compartment with a new automatic software based on the k-nearest neighbor (k-NN) algorithm. Chemosphere 144:1624–1630CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
  25. 25.
    Drwal MN, Siramshetty VB, Banerjee P, Goede A, Preissner R, Dunkel M (2015) Molecular similarity-based predictions of the Tox21 screening outcome. Front Environ Sci 3:1–9CrossRefGoogle Scholar
  26. 26.
    Floris M, Manganaro A, Nicolotti O, Medda R, Mangiatordi GF, Benfenati E (2014) A generalizable definition of chemical similarity for read-across. J Chem 6:39CrossRefGoogle Scholar
  27. 27.
    VEGA project website:
  28. 28.
    Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inf Comput Sci 35:1039–1045CrossRefGoogle Scholar
  29. 29.
    Klekota J, Roth FP (2008) Chemical substructures that enrich for biological activity. Bioinformatics 24:2518–2525CrossRefPubMedPubMedCentralGoogle Scholar
  30. 30.
    MACCS Structural Keys. CA (USA): Symyx Software S. RGoogle Scholar
  31. 31.
    National Center for Biotechnology Information: “PubChem Substructure Fingerprint v1.3.” PubChem Data Specification 2009 (
  32. 32.
  33. 33.
    Al Khalifa A, Haranczyk M, Holliday J (2009) Comparison of nonbinary similarity coefficients for similarity searching, clustering and compound selection. J Chem Inf Model 49:1193–1201CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Matteo Floris
    • 1
    • 2
  • Stefania Olla
    • 1
  1. 1.Department of Biomedical SciencesUniversity of SassariSassariItaly
  2. 2.IRGB – CNR, National Research Council, Institute of Genetics and Biomedical ResearchMonserratoItaly

Personalised recommendations