BCL::Mol2D—a robust atom environment descriptor for QSAR modeling and lead optimization

  • Oanh Vu
  • Jeffrey Mendenhall
  • Doaa Altarawy
  • Jens MeilerEmail author


Comparing fragment based molecular fingerprints of drug-like molecules is one of the most robust and frequently used approaches in computer-assisted drug discovery. Molprint2D, a popular atom environment (AE) descriptor, yielded the best enrichment of active compounds across a diverse set of targets in a recent large-scale study. We present here BCL::Mol2D descriptors that outperformed Molprint2D on nine PubChem datasets spanning a wide range of protein classes. Because BCL::Mol2D records the number of AEs from a universal AE library, a novel aspect of BCL::Mol2D over the Molprint2D is its reversibility. This property enables decomposition of prediction from machine learning models to particular molecular substructures. Artificial neural networks with dropout, when trained on BCL::Mol2D descriptors outperform those trained on Molprint2D descriptors by up to 26% in logAUC metric. When combined with the Reduced Short Range descriptor set, our previously published set of descriptors optimized for QSARs, BCL::Mol2D yields a modest improvement. Finally, we demonstrate how the reversibility of BCL::Mol2D enables visualization of a ‘pharmacophore map’ that could guide lead optimization for serine/threonine kinase 33 inhibitors.


QSAR Molecular descriptor Sensitivity analysis Cheminformatics Pharmacophore mapping 



Atom environment


Artificial neural network


Area under curve


BioChemical Library


Computer aided drug discovery


Ligand based computer aided drug discovery


Quantitative structure–activity relationship


Reduced short range



This study is funded by Molecular Science Software Institute (MolSSI) [30, 31] Fellowship and NIH. MolSSI is funded by the NSF Grant (ACI-1547580). Work in the Meiler laboratory is supported through NIH (R01 GM099842, R01 DK097376) and NSF (CHE 1305874). The author would like to thank Dr. Francois Berenger for discussion regarding descriptor design.

Author contributions

OV, JM and JM designed the study. OV implemented the descriptor, performed the benchmark and analysis, and wrote the manuscript. JM and DA supervised the project. JM, DA and JM edited the manuscript. All authors read and approved the final manuscript.

Supplementary material

10822_2019_199_MOESM1_ESM.pdf (205 kb)
Supplementary material 1 (PDF 205 KB)
10822_2019_199_MOESM2_ESM.txt (30 kb)
Supplementary material 2 (TXT 29 KB)
10822_2019_199_MOESM3_ESM.txt (4 kb)
Supplementary material 3 (TXT 4 KB)
10822_2019_199_MOESM4_ESM.pdf (482 kb)
Supplementary material 4: Mapping partial contributions of AEs to the ANN prediction output of STK33 inhibitors using BCL::Mol2D (Atom type, height = 1) (Fig. S5). (PDF 481 KB)
10822_2019_199_MOESM5_ESM.pdf (14 kb)
Supplementary material 5: Linear regression between increment and decrement derivatives of 16 STK inhibitors used in the sensitivity analysis. The formula and the value are denoted in red next to the red trend line (Fig. S6). (PDF 14 KB)


  1. 1.
    Kim KH, Kim ND, Seong BL (2010) Pharmacophore-based virtual screening: a review of recent applications. Expert Opin Drug Discov 5(3):205–222Google Scholar
  2. 2.
    Carlsson L, Helgee EA, Boyer S (2009) Interpretation of nonlinear QSAR models applied to ames mutagenicity data. J Chem Inf Model 49(11):2551–2558Google Scholar
  3. 3.
    Cramer RD (2012) The inevitable QSAR renaissance. J Comput Aided Mol Des 26(1):35–38Google Scholar
  4. 4.
    Sliwoski G, Kothiwale S, Meiler J, Lowe EW Jr. (2014) Computational methods in drug discovery. Pharmacol Rev 66(1):334–395Google Scholar
  5. 5.
    Bender A, Mussa HY, Glen RC, Reiling S (2004) Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance. J Chem Inf Comput Sci 44(5):1708–1718Google Scholar
  6. 6.
    Sastry M, Lowrie JF, Dixon SL, Sherman W (2010) Large-scale systematic analysis of 2D fingerprint methods and parameters to improve virtual screening enrichments. J Chem Inf Model 50(5):771–784Google Scholar
  7. 7.
    Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity—a rapid access to atomic charges. Tetrahedron 36(22):3219–3228Google Scholar
  8. 8.
    Montañez-Godínez N, Martínez-Olguín AC, Deeb O, Garduño-Juárez R, Ramírez-Galicia G (2015) QSAR/QSPR as an application of artificial neural networks. In: Cartwright H (ed) Artificial neural networks. Springer, New York, pp 319–333Google Scholar
  9. 9.
    Mendenhall J, Meiler J (2016) Improving quantitative structure–activity relationship models using Artificial Neural Networks trained with dropout. J Comput Aided Mol Des 30(2):177–189Google Scholar
  10. 10.
    Cherkasov A, Muratov EN, Fourches D, Varnek A, Baskin II, Cronin M et al (2014) QSAR modeling: where have you been? Where are you going to? J Med Chem 57(12):4977–5010Google Scholar
  11. 11.
    Tetko IV, Tanchuk VY, Chentsova NP, Antonenko SV, Poda GI, Kukhar VP et al (1994) HIV-1 reverse transcriptase inhibitor design using artificial neural networks. J Med Chem 37(16):2520–2526Google Scholar
  12. 12.
    Tetko IV, Villa AE, Livingstone DJ (1996) Neural network studies. 2. Variable selection. J Chem Inform Comput Sci 36(4):794–803Google Scholar
  13. 13.
    Guha R, Stanton DT, Jurs PC (2005) Interpreting computational neural network quantitative structure-activity relationship models: a detailed interpretation of the weights and biases. J Chem Inform Model 45(4):1109–1121Google Scholar
  14. 14.
    Guha R, Jurs PC (2005) Interpreting computational neural network QSAR models: a measure of descriptor importance. J Chem Inform Model 45(3):800–806Google Scholar
  15. 15.
    Marcou G, Horvath D, Solov’ev V, Arrault A, Vayer P, Varnek A (2012) Interpretability of SAR/QSAR models of any complexity by atomic contributions. Mol Inform 31(9):639–642Google Scholar
  16. 16.
    Nitish Srivastava GH, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958Google Scholar
  17. 17.
    Butkiewicz M, Lowe EW, Meiler J, Bcl∷ChemInfo—Qualitative analysis of machine learning models for activation of HSD involved in Alzheimer’s Disease. 2012 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB); 9–12 May 2012Google Scholar
  18. 18.
    Butkiewicz M, Lowe EW Jr, Mueller R, Mendenhall JL, Teixeira PL, Weaver CD et al (2013) Benchmarking ligand-based virtual high-throughput screening with the PubChem database. Molecules 18(1):735–756Google Scholar
  19. 19.
    Rogers DJ, Tanimoto TT (1960) A computer program for classifying plants. Science 132(3434):1115–1118Google Scholar
  20. 20.
    Baskin II, Ait AO, Halberstam NM, Palyulin VA, Zefirov NS (2002) An approach to the interpretation of backpropagation neural network models in QSAR studies. SAR QSAR Environ Res 13(1):35–41Google Scholar
  21. 21.
    Meiler J, Will M. Genius (2002) A genetic algorithm for automated structure elucidation from 13C NMR Spectra. J Am Chem Soc 124(9):1868–1870Google Scholar
  22. 22.
    Zheng W, Cho SJ, Tropsha A (1998) Rational combinatorial library design. 1. Focus-2D: a new approach to the design of targeted combinatorial chemical libraries. J Chem Inform Comput Sci 38(2):251–258Google Scholar
  23. 23.
    Sliwoski G, Mendenhall J, Meiler J (2016) Autocorrelation descriptor improvements for QSAR: 2DA_Sign and 3DA_Sign. J Comput Aided Mol Des 30(3):209–217Google Scholar
  24. 24.
    Butkiewicz M, Bryant SH, Lowe EW Jr., David C, Meiler J (2017) High-throughput screening assay datasets from the PubChem database. Chem Inform 3(1):1Google Scholar
  25. 25.
    Gasteiger J, Teckentrup A, Terfloth L, Spycher S (2003) Neural networks as data mining tools in drug design. J Phys Org Chem 16(4):232–245Google Scholar
  26. 26.
    Pierre Broto GM, Vandycke C (1984) Molecular structures: perception, autocorrelation descriptor and SAR studies. Autocorrelation descriptor. Eur J Med Chem 19(1):66–70Google Scholar
  27. 27.
    Mysinger MM, Shoichet BK (2010) Rapid context-dependent ligand desolvation in molecular docking. J Chem Inf Model 50(9):1561–1573Google Scholar
  28. 28.
    Weisstein E (2000) Normal sum distribution: Wolfram Research, Inc.
  29. 29.
    Liao Z, Thibaut L, Jobson A, Pommier Y (2006) Inhibition of human tyrosyl-DNA phosphodiesterase by aminoglycoside antibiotics and ribosome inhibitors. Mol Pharmacol 70(1):366Google Scholar
  30. 30.
    Krylov A, Windus TL, Barnes T, Marin-Rimoldi E, Nash JA, Pritchard B et al (2018) Perspective: computational chemistry software and its advancement as illustrated through three grand challenge cases for molecular science. J Chem Phys 149(18):180901Google Scholar
  31. 31.
    Wilkins-Diehr N, Crawford TD, NSF’s Inaugural Software Institutes (2018) The science gateways community institute and the molecular sciences software institute. Comput Sci Eng 20(5):26–38Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Chemistry, Center for Structural BiologyVanderbilt UniversityNashvilleUSA
  2. 2.The Molecular Sciences Software Institute (MolSSI)BlacksburgUSA
  3. 3.Department of Computer and Systems EngineeringAlexandria UniversityAlexandriaEgypt

Personalised recommendations