alvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints

  • Andrea MauriEmail author
Part of the Methods in Pharmacology and Toxicology book series (MIPT)


In this chapter we will present alvaDesc, a software to calculate and analyze molecular descriptors and fingerprints.

Molecular descriptors and fingerprints play an essential role in quantitative structure-activity relationships (QSAR) as they are the mathematical representation of chemicals and they serve as the input for the data analysis methods used to build QSAR models.

The increasing number of newly proposed molecular descriptors and fingerprints and generally the attention paid by the scientific community to the development of novel methodologies to represent chemical structures are evidences of the relevance of these representations in the prediction of chemical properties.

Despite the complexity of dealing with a high number of variables, different types of molecular descriptors and fingerprints can highlight specific traits of molecular structures. These aspects, together with the increased availability of chemical data and methods for data analysis, are some of the challenges that researchers face in the development of QSAR models.

Key words

Molecular descriptors Molecular fingerprints MACCS keys Data analysis Principal component analysis Correlation analysis Variable reduction Software 


  1. 1.
    Ihlenfeldt WD, Bolton EE, Bryant SH (2009) The PubChem chemical structure sketcher. J Cheminform 1(1):1–9CrossRefGoogle Scholar
  2. 2.
    Kim S, Thiessen PA, Bolton EE, Bryant SH (2015) PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem. Nucleic Acids Res 43(W1):W605–W611PubMedPubMedCentralCrossRefGoogle Scholar
  3. 3.
    Davies M et al (2015) ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res 43(W1):W612–W620PubMedPubMedCentralCrossRefGoogle Scholar
  4. 4.
    Gaulton A et al (2017) The ChEMBL database in 2017. Nucleic Acids Res 45(D1):D945–D954PubMedPubMedCentralCrossRefGoogle Scholar
  5. 5.
    Irwin JJ, Shoichet BK (2005) ZINC – a free database of commercially available compounds for virtual screening. J Chem Inf Model 45(1):177–182PubMedPubMedCentralCrossRefGoogle Scholar
  6. 6.
    Worth AP (2009) The role of Qsar methodology in the regulatory assessment of chemicals. In: Recent advances in QSAR studies. Springer, Dordrecht; New YorkGoogle Scholar
  7. 7.
    Cassotti M, Ballabio D, Consonni V, Mauri A, Tetko IV, Todeschini R (2014) Prediction of acute aquatic toxicity toward Daphnia magna by using the GA-kNN method. Altern Lab Anim 42(1):31–41PubMedCrossRefPubMedCentralGoogle Scholar
  8. 8.
    Cassotti M, Consonni V, Mauri A, Ballabio D (2014) Validation and extension of a similarity-based approach for prediction of acute aquatic toxicity towards Daphnia magna. SAR QSAR Environ Res 25(12):1013–1036PubMedCrossRefPubMedCentralGoogle Scholar
  9. 9.
    Khan PM, Roy K, Benfenati E (2019) Chemometric modeling of Daphnia magna toxicity of agrochemicals. Chemosphere 224:470–479PubMedPubMedCentralGoogle Scholar
  10. 10.
    Tebby C, Mombelli E, Pandard P, Péry ARR (2011) Exploring an ecotoxicity database with the OECD (Q)SAR Toolbox and DRAGON descriptors in order to prioritise testing on algae, daphnids, and fish. Sci Total Environ 409(18):3334–3343PubMedCrossRefPubMedCentralGoogle Scholar
  11. 11.
    Grisoni F, Consonni V, Vighi M (2018) Acceptable-by-design QSARs to predict the dietary biomagnification of organic chemicals in fish. Integr Environ Assess Manag 15(1):51–63PubMedCrossRefPubMedCentralGoogle Scholar
  12. 12.
    Khan K, Roy K (2017) Ecotoxicological modelling of cosmetics for aquatic organisms: a QSTR approach. SAR QSAR Environ Res 28(7):567–594PubMedPubMedCentralCrossRefGoogle Scholar
  13. 13.
    Holmquist H, Lexén J, Rahmberg M, Sahlin U, Palm JG, Rydberg T (2018) The potential to use QSAR to populate ecotoxicity characterisation factors for simplified LCIA and chemical prioritisation. Int J Life Cycle Assess 23(11):2208–2216CrossRefGoogle Scholar
  14. 14.
    Khan K, Roy K, Benfenati E (2019) Ecotoxicological QSAR modeling of endocrine disruptor chemicals. J Hazard Mater 369:707–718PubMedPubMedCentralCrossRefGoogle Scholar
  15. 15.
    Fourches D, Muratov E, Tropsha A (2010) Trust but verify: on the importance of chemical structure curation in chemoinformatics and QSAR modeling research. J Chem Inf Model 50(7):1189–1204PubMedPubMedCentralCrossRefGoogle Scholar
  16. 16.
    Todeschini R, Consonni V (2009) Molecular Descriptors for Chemoinformatics. Vol. 1. Alphabetical Listing; Vol. 2. Appendices, References. Wiley-VCH, WeinheimGoogle Scholar
  17. 17.
    Mauri A, Consonni V, Todeschini R (2017) Molecular descriptors. In: Leszczyński J, Kaczmarek-Kedziera A, Puzyn T, Papadopoulos MG, Reis H, Shukla MK (eds) Handbook of computational chemistry. Springer International Publishing, Switzerland, pp 2065–2093Google Scholar
  18. 18.
    Moriwaki H, Tian YS, Kawashita N, Takagi T (2018) Mordred: a molecular descriptor calculator. J Cheminform 10(1):1–14CrossRefGoogle Scholar
  19. 19.
    Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32:1466PubMedPubMedCentralCrossRefGoogle Scholar
  20. 20.
    Steinbeck C, Han Y, Kuhn S, Horlacher O, Luttmann E, Willighagen E (2003) The Chemistry Development Kit (CDK): an open-source Java library for chemo- and bioinformatics. J Chem Inf Comput Sci 43(2):493–500PubMedPubMedCentralCrossRefGoogle Scholar
  21. 21.
    Willighagen EL et al (2017) The chemistry development kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J Cheminform 9(1):1–19CrossRefGoogle Scholar
  22. 22.
    RDKit: Open-source cheminformatics;
  23. 23.
    Mauri A, Consonni V, Pavan M, Todeschini R (2006) Dragon software: an easy approach to molecular descriptor calculations. Match Commun Math Comput Chem 56(2):237–248Google Scholar
  24. 24.
    Alvascience srl (2019) alvaDesc (software for molecular descriptors calculation). Available at:
  25. 25.
    Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inf Comput Sci 42(6):1273–1280PubMedCrossRefPubMedCentralGoogle Scholar
  26. 26.
    Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754PubMedCrossRefPubMedCentralGoogle Scholar
  27. 27.
    Ballabio D, Consonni V, Mauri A, Claeys-Bruno M, Sergent M, Todeschini R (2014) A novel variable reduction method adapted from space-filling designs. Chemom Intell Lab Syst 136:147–154CrossRefGoogle Scholar
  28. 28.
    Berthold MR et al (2008) KNIME: the Konstanz information miner. In: Preisach C, Burkhardt H, Schmidt-Thieme L, Decker R (eds) Data analysis, machine learning and applications, vol 11(1). Springer, Berlin/Heidelberg, pp 319–326CrossRefGoogle Scholar
  29. 29.
    Sushko I et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25(6):533–554PubMedPubMedCentralCrossRefGoogle Scholar
  30. 30.
    Young D, Martin T, Venkatapathy R, Harten P (2008) Are the chemical structures in your QSAR correct? QSAR Comb Sci 27(11–12):1337–1345CrossRefGoogle Scholar
  31. 31.
    Tropsha A (2010) Best practices for QSAR model development, validation, and exploitation. Mol Inform 29(6–7):476–488CrossRefGoogle Scholar
  32. 32.
    Randić M (1996) Molecular bonding profiles. J Math Chem 19(3):375–392CrossRefGoogle Scholar
  33. 33.
    Guha R, Willighagen E (2012) A survey of quantitative descriptions of molecular structure. Curr Top Med Chem 12(18):1946–1956PubMedPubMedCentralCrossRefGoogle Scholar
  34. 34.
    Todeschini R, Gramatica P (1997) The Whim theory: new 3D molecular descriptors for Qsar in environmental modelling. SAR QSAR Environ Res 7(1–4):89–115CrossRefGoogle Scholar
  35. 35.
    Consonni V, Todeschini R, Pavan M, Gramatica P (2002) Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. 1. Theory of the novel 3D molecular descriptors. J Chem Inf Comput Sci 42(3):682–692PubMedCrossRefPubMedCentralGoogle Scholar
  36. 36.
    Wiener H (1947) Structural determination of paraffin boiling points. J Am Chem Soc 69(1):17–20PubMedCrossRefPubMedCentralGoogle Scholar
  37. 37.
    Plavšić D, Nikolić S, Trinajstić N, Mihalić Z (1993) On the Harary index for the characterization of chemical graphs. J Math Chem 12(1):235–250CrossRefGoogle Scholar
  38. 38.
    Randić M (1975) On characterization of molecular branching. J Am Chem Soc 97(23):6609–6615CrossRefGoogle Scholar
  39. 39.
    Randić M (2001) The connectivity index 25 years after. J Mol Graph Model 20(1):19–35PubMedCrossRefPubMedCentralGoogle Scholar
  40. 40.
    Moreau JL, Broto P (1980) Autocorrelation of molecular structures: application to SAR studies. Nouv J Chim 4:757–764Google Scholar
  41. 41.
    Broto P (1984) Molecular structures: perception, autocorrelation descriptor and sar studies. Eur J Med Chem 19:66–70Google Scholar
  42. 42.
    Moran PAP (1950) Notes on continuous stochastic phenomena. Biometrika 37(1–2):17–23PubMedCrossRefPubMedCentralGoogle Scholar
  43. 43.
    Schneider G, Neidhart W, Giller T, Schmid G (1999) ‘Scaffold-Hopping’ by topological pharmacophore search: a contribution to virtual screening. Angew Chemie Int Ed 38(19):2894–2896CrossRefGoogle Scholar
  44. 44.
    Renner S, Fechner U, Schneider G (2006) Alignment-free pharmacophore patterns – a correlation vector approach. In: Langer T, Hoffmann RD (eds) Pharmacophores and pharmacophore searches. Wiley-VCH, Weinheim, pp 49–79CrossRefGoogle Scholar
  45. 45.
    Ertl P, Rohde B, Selzer P (2000) Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J Med Chem 43(20):3714–3717PubMedCrossRefPubMedCentralGoogle Scholar
  46. 46.
    Ertl P (2008) Polar Surface Area. In: Mannhold R (eds) Molecular Drug Properties. Measurement and Prediction. Wiley-VCH, Weinheim, pp 111–126Google Scholar
  47. 47.
    Moriguchi I, Hirono S, Nakagome I, Hirano H (1994) Comparison of reliability of log P values for drugs calculated by several methods. Chem Pharm Bull 42(4):976–978CrossRefGoogle Scholar
  48. 48.
    Ghose AK, Viswanadhan VN, Wendoloski JJ (1998) Prediction of hydrophobic (lipophilic) properties of small organic molecules using fragmental methods: an analysis of ALOGP and CLOGP methods. J Phys Chem A 102(21):3762–3772CrossRefGoogle Scholar
  49. 49.
    Lipinski CA (2004) Lead- and drug-like compounds: the rule-of-five revolution. Drug Discov Today Technol 1(4):337–341CrossRefGoogle Scholar
  50. 50.
    Jolliffe IT (2002) Principal component analysis. Springer-Verlag, New YorkGoogle Scholar
  51. 51.
    Kruskal JB (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1):1–14CrossRefGoogle Scholar
  52. 52.
    Van Der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605Google Scholar
  53. 53.
    Carhart RE, Smith DH, Venkataraghavan R (1985) Atom pairs as molecular features in structure-activity studies: definition and applications. J Chem Inf Comput Sci 25(2):64–73CrossRefGoogle Scholar
  54. 54.
    Kier LB, Hall LH (1990) An electrotopological-state index for atoms in molecules. Pharm Res 7(8):801–807PubMedCrossRefPubMedCentralGoogle Scholar
  55. 55.
    Hall LH, Kier LB (1995) Electrotopological state indices for atom types: a novel combination of electronic, topological, and valence state information. J Chem Inf Comput Sci 35(6):1039–1045CrossRefGoogle Scholar
  56. 56.
    Kier LB, Hall LH (1981) Derivation and significance of valence molecular connectivity. J Pharm Sci 70(6):583–589PubMedCrossRefPubMedCentralGoogle Scholar
  57. 57.
    Gombar V, Kumar A, Murthy MS (1987) Quantitative structure activity relationships part ix. A modified connectivity index as structure quantifier. Indian J Chem Sect B Org Chem Incl Med Chem 26(12):1168–1170Google Scholar
  58. 58.
    Burden FR (1989) Molecular identification number for substructure searches. J Chem Inf Comput Sci 29(3):225–227CrossRefGoogle Scholar
  59. 59.
    Santiago J, Claeys-Bruno M, Sergent M (2012) Construction of space-filling designs using WSP algorithm for high dimensional spaces. Chemom Intell Lab Syst 113:26–31CrossRefGoogle Scholar
  60. 60.
    Rojas C et al (2017) A QSTR-based expert system to predict sweetness of molecules. Front Chem 5:53PubMedPubMedCentralCrossRefGoogle Scholar
  61. 61.
    Ajmani S, Rogers SC, Barley MH, Livingstone DJ (2006) Application of QSPR to mixtures. J Chem Inf Model 46(5):2043–2055PubMedCrossRefPubMedCentralGoogle Scholar
  62. 62.
    Varnek A, Kireeva N, Tetko IV, Baskin II, Solov’ev VP (2007) Exhaustive QSPR studies of a large diverse set of ionic liquids: how accurately can we predict melting points? J Chem Inf Mod 47(3):1111–1122CrossRefGoogle Scholar
  63. 63.
    Roy K, Das RN, Popelier PLA (2014) Quantitative structure-activity relationship for toxicity of ionic liquids to Daphnia magna: aromaticity vs. lipophilicity. Chemosphere 112:120–127PubMedPubMedCentralCrossRefGoogle Scholar
  64. 64.
    Roy K, Das RN, Popelier PLA (2015) Predictive QSAR modelling of algal toxicity of ionic liquids and its interspecies correlation with Daphnia toxicity. Environ Sci Pollut Res 22(9):6634–6641CrossRefGoogle Scholar
  65. 65.
    Oprisiu I, Novotarskyi S, Tetko IV (2013) Modeling of non-additive mixture properties using the Online CHEmical database and Modeling Environment (OCHEM). J Cheminform 5(1):1CrossRefGoogle Scholar
  66. 66.
    Mauri A, Ballabio D, Todeschini R, Consonni V (2016) Mixtures, metabolites, ionic liquids: a new measure to evaluate similarity between complex chemical systems. J Cheminform 8(1):1–3CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.Alvascience srlLeccoItaly

Personalised recommendations