Skip to main content

Learned Feature Generation for Molecules

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis XVII (IDA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11191))

Included in the following conference series:

Abstract

When classifying molecules for virtual screening, the molecular structure first needs to be converted into meaningful features, before a classifier can be trained. The most common methods use a static algorithm that has been created based on domain knowledge to perform this generation of features. We propose an approach where this conversion is learned by a convolutional neural network finding features that are useful for the task at hand based on the available data. Preliminary results indicate that our current approach can already come up with features that perform similarly well as common methods. Since this approach does not yet use any chemical properties, results could be improved in future versions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. ChEMBL. https://www.ebi.ac.uk/chembl/

  2. Deepchem. https://deepchem.io/

  3. DUD - A Directory of Useful Decoys. http://dud.docking.org/

  4. Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)

    Article  Google Scholar 

  5. Broach, J.R., Thorner, J., et al.: High-throughput screening for drug discovery. Nature 384(6604), 14–16 (1996)

    Article  Google Scholar 

  6. Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42(6), 1273–1280 (2002)

    Article  Google Scholar 

  7. Gaulton, A., et al.: ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40(D1), D1100–D1107 (2011)

    Article  Google Scholar 

  8. Halgren, T.A., et al.: Glide: a new approach for rapid, accurate docking and scoring. 2. enrichment factors in database screening. J. Med. Chem. 47(7), 1750–1759 (2004)

    Article  Google Scholar 

  9. Irwin, J.J.: Community benchmark for virtual screening. J. Comput.-Aided Mol. Des. 22(3–4), 193–199 (2008)

    Article  Google Scholar 

  10. Kearnes, S., McCloskey, K., Berndl, M., Pande, V., Riley, P.: Molecular graph convolutions: moving beyond fingerprints. J. Comput.-Aided Mol. Des. 30(8), 595–608 (2016)

    Article  Google Scholar 

  11. Klopman, G.: Artificial intelligence approach to structure-activity studies. computer automated structure evaluation of biological activity of organic molecules. J. Am. Chem. Soc. 106(24), 7315–7321 (1984)

    Article  Google Scholar 

  12. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

    Google Scholar 

  13. Landrum, G.A., et al.: RDKit: Open-source cheminformatics. https://www.rdkit.org/ (2006)

  14. Le Cun, Y., et al.: Handwritten zip code recognition with multilayer networks. In: Proceedings. 10th International Conference on Pattern Recognition, 1990, vol. 2, pp. 35–40. IEEE (1990)

    Google Scholar 

  15. Mayr, A., Klambauer, G., Unterthiner, T., Hochreiter, S.: Deeptox: toxicity prediction using deep learning. Front. Environ. Sci. 3, 80 (2016)

    Article  Google Scholar 

  16. Nixon, M.S., Aguado, A.S.: Feature Extraction & Image Processing for Computer Vision. Academic Press, New York (2012)

    Chapter  Google Scholar 

  17. Ramsundar, B., Kearnes, S., Riley, P., Webster, D., Konerding, D., Pande, V.: Massively multitask networks for drug discovery. arXiv preprint arXiv:1502.02072 (2015)

  18. Riniker, S., Landrum, G.A.: Open-source platform to benchmark fingerprints for ligand-based virtual screening. J. Cheminformatics 5(1), 26 (2013)

    Article  Google Scholar 

  19. Rogers, D., Hahn, M.: Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010)

    Article  Google Scholar 

  20. Rohrer, S.G., Baumann, K.: Maximum unbiased validation (MUV) data sets for virtual screening based on pubchem bioactivity data. J. Chem. Inf. Model. 49(2), 169–184 (2009)

    Article  Google Scholar 

  21. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  22. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  23. Todeschini, R., Consonni, V.: Handbook of Molecular Descriptors, vol. 11. Wiley, New York (2008)

    Google Scholar 

  24. Unterthiner, T., et al.: Deep learning as an opportunity in virtual screening. Proc. Deep Learn. Workshop NIPS 27, 1–9 (2014)

    Google Scholar 

  25. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)

    Google Scholar 

Download references

Acknowledgement

This work was partially funded by the Konstanz Research School Chemical Biology and KNIME AG.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Patrick Winter .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Winter, P., Borgelt, C., Berthold, M.R. (2018). Learned Feature Generation for Molecules. In: Duivesteijn, W., Siebes, A., Ukkonen, A. (eds) Advances in Intelligent Data Analysis XVII. IDA 2018. Lecture Notes in Computer Science(), vol 11191. Springer, Cham. https://doi.org/10.1007/978-3-030-01768-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01768-2_31

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01767-5

  • Online ISBN: 978-3-030-01768-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics