Skip to main content

Deep Semi-supervised Learning for Virtual Screening Based on Big Data Analytics

  • Conference paper
  • First Online:
Big Data, Cloud and Applications (BDCA 2018)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 872))

Included in the following conference series:

Abstract

Nowadays, scientists and researchers, are facing the problem of massive data processing, which consumes relatively too much time and cost. That is why researchers have turned to Deep Learning (DL) techniques based on Big Data Analytics. On the other hand, the ever-increasing size of unlabelled data combined with the difficulty of obtaining class labels has made semi-supervised learning an interesting alternative of significant practical importance in modern data analysis. In the same context, drug discovery has reached a state and complexity that we can no longer avoid using Deep Semi-Supervised Learning and Big Data Processing Systems. Virtual Screening (VS) is a computationally intensive process which plays a major role in the early phase of drug discovery process. The VS has to be made as fast as possible to efficiently dock the ligands from huge databases to a selected protein receptor. For these reasons, we propose a deep semi-supervised learning-based algorithmic framework named DeepSSL-VS for pre-filtering the huge set of ligands to effectively do virtual screening for the breast cancer protein receptor. The latter combines stacked autoencoders and deep neural network and is implemented using the Spark-H2O platform. The proposed technique has been compared to twenty-four different machine learning algorithms applied all on the same reference datasets, and preliminary performance assessment results have shown that our approach outperforms these techniques with an overall accuracy performance more than 99%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agrawal, A., Choudhary, A.: Perspective: materials informatics and big data: realization of the “fourth paradigm” of science in materials science. Apl Mater. 4(5), 053208 (2016)

    Article  Google Scholar 

  2. Aliper, A., Plis, S., Artemov, A., Ulloa, A., Mamoshina, P., Zhavoronkov, A.: Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm. 13(7), 2524–2530 (2016)

    Article  Google Scholar 

  3. Byvatov, E., Fechner, U., Sadowski, J., Schneider, G.: Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J. Chem. Inf. Comput. Sci. 43(6), 1882–1889 (2003)

    Article  Google Scholar 

  4. Candel, A., Parmar, V., LeDell, E., Arora, A.: Deep learning with H2O. H2O. ai Inc. (2016)

    Google Scholar 

  5. Cook, D.: Practical Machine Learning with H2O: Powerful Scalable Techniques for Deep Learning and AI. O’Reilly Media, Beijing (2016)

    Google Scholar 

  6. ZINC Database: Chembridge full library (2011). http://zinc.docking.org/

  7. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11(Feb), 625–660 (2010)

    MathSciNet  MATH  Google Scholar 

  8. Fitriawan, A., Wasito, I., Syafiandini, A.F., Azminah, A., Amien, M., Yanuar, A.: Deep belief networks for ligand-based virtual screening of drug design. In: Proceedings of 2016 6th International Workshop on Computer Science and Engineering (WCSE 2016) Tokyo, Japan, pp. 655–659 (2016)

    Google Scholar 

  9. García-Sosa, A.T., Oja, M., Hetényi, C., Maran, U.: Druglogit: logistic discrimination between drugs and nondrugs including disease-specificity by assigning probabilities based on molecular properties. J. Chem. Inf. Model. 52(8), 2165–2180 (2012)

    Article  Google Scholar 

  10. Gertrudes, J., Maltarollo, V., Silva, R., Oliveira, P., Honorio, K., Da Silva, A.: Machine learning techniques and drug design. Curr. Med. Chem. 19(25), 4289–4297 (2012)

    Article  Google Scholar 

  11. Howard, A.D., McAllister, G., Feighner, S.D., Liu, Q., Nargund, R.P., Van der Ploeg, L.H., Patchett, A.A.: Orphan G-protein-coupled receptors and natural ligand discovery. Trends Pharmacol. Sci. 22(3), 132–140 (2001)

    Article  Google Scholar 

  12. Irwin, J.J., Sterling, T., Mysinger, M.M., Bolstad, E.S., Coleman, R.G.: Zinc: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 52(7), 1757–1768 (2012)

    Article  Google Scholar 

  13. Korkmaz, S., Zararsiz, G., Goksuluk, D.: Drug/nondrug classification using support vector machines with various feature selection strategies. Comput. Methods Programs Biomed. 117(2), 51–60 (2014)

    Article  Google Scholar 

  14. Korkmaz, S., Zararsiz, G., Goksuluk, D.: MLVis: a web tool for machine learning-based virtual screening in early-phase of drug discovery and development. PloS One 10(4), e0124600 (2015)

    Article  Google Scholar 

  15. Lavecchia, A.: Machine-learning approaches in drug discovery: methods and applications. Drug Discov. Today 20(3), 318–331 (2015)

    Article  Google Scholar 

  16. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  17. Lowe, R., Mussa, H.Y., Nigsch, F., Glen, R.C., Mitchell, J.B.: Predicting the mechanism of phospholipidosis. J. Cheminform. 4(1), 2 (2012)

    Article  Google Scholar 

  18. Mannhold, R., Kubinyi, H., Folkers, G.: Virtual Screening: Principles, Challenges, and Practical Guidelines, vol. 48. Wiley, Hoboken (2011)

    Google Scholar 

  19. Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Br. Bioinform. 18(5), 851–869 (2017)

    Google Scholar 

  20. Mohamed, B., Kamel, Z., Meriem, B., Amira, K., Anouar, B.: An efficient compound classification technique based on multiple kernel learning for virtual screening. In: Proceedings of The Thirteenth International Conference on Computational Intelligence methods for Bioinformatics and Biostatistics (CIBB2016) Stirling, UK (2016)

    Google Scholar 

  21. Pérez-Sianes, J., Pérez-Sánchez, H., Díaz, F.: Virtual screening: a challenge for deep learning. In: Saberi Mohamad, M., Fdez-Riverola, F., Domínguez Mayo, F., De Paz, J. (eds.) 10th International Conference on Practical Applications of Computational Biology & Bioinformatics, pp. 13–22. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40126-3_2

    Chapter  Google Scholar 

  22. Rusiecki, A., Kordos, M., et al.: Effectiveness of unsupervised training in deep learning neural networks. Schedae Inform. 24(2015), 41–51 (2016)

    Google Scholar 

  23. Senanayake, U., Prabuddha, R., Ragel, R.: Machine learning based search space optimisation for drug discovery. In: 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 68–75. IEEE (2013)

    Google Scholar 

  24. Zhou, Y., Arpit, D., Nwogu, I., Govindaraju, V.: Is joint training better for deep auto-encoders? arXiv preprint arXiv:1405.1380 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meriem Bahi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bahi, M., Batouche, M. (2018). Deep Semi-supervised Learning for Virtual Screening Based on Big Data Analytics. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-96292-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-96291-7

  • Online ISBN: 978-3-319-96292-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics