BESTox: A Convolutional Neural Network Regression Model Based on Binary-Encoded SMILES for Acute Oral Toxicity Prediction of Chemical Compounds

  • Jiarui ChenEmail author
  • Hong-Hin Cheong
  • Shirley Weng In Siu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12099)


Compound toxicity prediction is a very challenging and critical task in the drug discovery and design field. Traditionally, cell or animal-based experiments are required to confirm the acute oral toxicity of chemical compounds. However, these methods are often restricted by availability of experimental facilities, long experimentation time, and high cost. In this paper, we propose a novel convolutional neural network regression model, named BESTox, to predict the acute oral toxicity (\(LD_{50}\)) of chemical compounds. This model learns the compositional and chemical properties of compounds from their two-dimensional binary matrices. Each matrix encodes the occurrences of certain atom types, number of bonded hydrogens, atom charge, valence, ring, degree, aromaticity, chirality, and hybridization along the SMILES string of a given compound. In a benchmark experiment using a dataset of 7413 observations (train/test 5931/1482), BESTox achieved a squared correlation coefficient (\(R^2\)) of 0.619, root-mean-squared error (RMSE) of 0.603, and mean absolute error (MAE) of 0.433. Despite of the use of a shallow model architecture and simple molecular descriptors, our method performs comparably against two recently published models.


Drug design Machine learning Acute oral toxicity Toxicity prediction SMILES Convolutional neural network 



This work was supported by University of Macau (Grant no. MYRG2017-00146-FST).


  1. 1.
    Bailey, J., Balls, M.: Recent efforts to elucidate the scientific validity of animal-based drug tests by the pharmaceutical industry, pro-testing lobby groups, and animal welfare organisations. BMC Med. Ethics 20, 16 (2019)CrossRefGoogle Scholar
  2. 2.
    Dean, A., Lewis, S.: Screening: Methods for Experimentation in Industry, Drug Discovery, and Genetics. Springer, Cham (2006). Scholar
  3. 3.
    Hirohara, M., Saito, Y., Koda, Y., Sato, K., Sakakibara, Y.: Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinform. 19, 526 (2018)CrossRefGoogle Scholar
  4. 4.
    Idakwo, G., et al.: A review on machine learning methods for in silico toxicity prediction. J. Environ. Sci. Health Part C 36(4), 169–191 (2018)CrossRefGoogle Scholar
  5. 5.
    Karim, A., Mishra, A., Newton, M.H., Sattar, A.: Efficient toxicity prediction via simple features using shallow neural networks and decision trees. ACS Omega 4(1), 1874–1888 (2019)CrossRefGoogle Scholar
  6. 6.
    Kim, Y.: Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882 (2014)
  7. 7.
    Kubinyi, H., Mannhold, R., Timmerman, H.: Virtual Screening for Bioactive Molecules, vol. 10. Wiley, Hoboken (2008)Google Scholar
  8. 8.
    Landrum, G., et al.: RDkit: open-source cheminformatics (2006)Google Scholar
  9. 9.
    Llanos, E.J., Leal, W., Luu, D.H., Jost, J., Stadler, P.F., Restrepo, G.: Exploration of the chemical space and its three historical regimes. Proc. Natl. Acad. Sci. 116(26), 12660–12665 (2019)CrossRefGoogle Scholar
  10. 10.
    Mayr, A., Klambauer, G., Unterthiner, T., Hochreiter, S.: DeepTox: toxicity prediction using deep learning. Front. Environ. Sci. 3, 80 (2016)CrossRefGoogle Scholar
  11. 11.
    McInnes, C.: Virtual screening strategies in drug discovery. Curr. Opin. Chem. Biol. 11(5), 494–502 (2007)CrossRefGoogle Scholar
  12. 12.
    Nguyen, L.A., He, H., Pham-Huy, C.: Chiral drugs: an overview. Int. J. Biomed. Sci. IJBS 2(2), 85 (2006)Google Scholar
  13. 13.
    O’Boyle, N.M., Banck, M., James, C.A., Morley, C., Vandermeersch, T., Hutchison, G.R.: Open Babel: an open chemical toolbox. J. Cheminform. 3(1), 33 (2011)CrossRefGoogle Scholar
  14. 14.
    Oprea, T.I., Matter, H.: Integrating virtual screening in lead discovery. Curr. Opin. Chem. Biol. 8(4), 349–358 (2004) CrossRefGoogle Scholar
  15. 15.
    Quintanilha, J.C.F., Berlofa, M.: New promising approaches to treatment of chemotherapy-induced toxicities. AvidScience Chemother. 2–52 (2017)Google Scholar
  16. 16.
    Raies, A.B., Bajic, V.B.: In silico toxicology: computational methods for the prediction of chemical toxicity. Wiley Interdiscip. Rev. Comput. Mol. Sci. 6(2), 147–172 (2016)CrossRefGoogle Scholar
  17. 17.
    Roy, K., Kar, S., Das, R.: Chapter 7–validation of QSAR models. Understanding the basics of QSAR for applications in pharmaceutical sciences and risk assessment, pp. 231–289 (2015)CrossRefGoogle Scholar
  18. 18.
    Tice, R.R., Austin, C.P., Kavlock, R.J., Bucher, J.R.: Improving the human hazard characterization of chemicals: a TOX21 update. Environ. Health Perspect. 121(7), 756–765 (2013)CrossRefGoogle Scholar
  19. 19.
    Ting, N.: Dose Finding in Drug Development. Springer, Cham (2006). Scholar
  20. 20.
    Weininger, D.: SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 28(1), 31–36 (1988)CrossRefGoogle Scholar
  21. 21.
    Weininger, D., Weininger, A., Weininger, J.L.: SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Comput. Sci. 29(2), 97–101 (1989)CrossRefGoogle Scholar
  22. 22.
    Wexler, P., Gad, S.C., et al.: Encyclopedia of Toxicology. Academic Press, Cambridge (1998)Google Scholar
  23. 23.
    Wu, K., Wei, G.W.: Quantitative toxicity prediction using topology based multitask deep neural networks. J. Chem. Inf. Model. 58(2), 520–531 (2018)CrossRefGoogle Scholar
  24. 24.
    Wu, Y., Wang, G.: Machine learning based toxicity prediction: from chemical structural description to transcriptome analysis. Int. J. Mol. Sci. 19(8), 2358 (2018)CrossRefGoogle Scholar
  25. 25.
    Yap, C.W.: Padel-descriptor: an open source software to calculate molecular descriptors and fingerprints. J. Comput. Chem. 32(7), 1466–1474 (2011)MathSciNetCrossRefGoogle Scholar
  26. 26.
    Zhang, X., Zhao, J., LeCun, Y.: Character-level convolutional networks for text classification. In: Advances in Neural Information Processing Systems, pp. 649–657 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Jiarui Chen
    • 1
    Email author
  • Hong-Hin Cheong
    • 1
  • Shirley Weng In Siu
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of MacauTaipaChina

Personalised recommendations