Deep Semi-supervised Learning for Virtual Screening Based on Big Data Analytics

Bahi, Meriem; Batouche, Mohamed

doi:10.1007/978-3-319-96292-4_14

Meriem Bahi¹² &
Mohamed Batouche¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 872))

Included in the following conference series:

International Conference on Big Data, Cloud and Applications

1188 Accesses
1 Citations

Abstract

Nowadays, scientists and researchers, are facing the problem of massive data processing, which consumes relatively too much time and cost. That is why researchers have turned to Deep Learning (DL) techniques based on Big Data Analytics. On the other hand, the ever-increasing size of unlabelled data combined with the difficulty of obtaining class labels has made semi-supervised learning an interesting alternative of significant practical importance in modern data analysis. In the same context, drug discovery has reached a state and complexity that we can no longer avoid using Deep Semi-Supervised Learning and Big Data Processing Systems. Virtual Screening (VS) is a computationally intensive process which plays a major role in the early phase of drug discovery process. The VS has to be made as fast as possible to efficiently dock the ligands from huge databases to a selected protein receptor. For these reasons, we propose a deep semi-supervised learning-based algorithmic framework named DeepSSL-VS for pre-filtering the huge set of ligands to effectively do virtual screening for the breast cancer protein receptor. The latter combines stacked autoencoders and deep neural network and is implemented using the Spark-H2O platform. The proposed technique has been compared to twenty-four different machine learning algorithms applied all on the same reference datasets, and preliminary performance assessment results have shown that our approach outperforms these techniques with an overall accuracy performance more than 99%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agrawal, A., Choudhary, A.: Perspective: materials informatics and big data: realization of the “fourth paradigm” of science in materials science. Apl Mater. 4(5), 053208 (2016)
Article Google Scholar
Aliper, A., Plis, S., Artemov, A., Ulloa, A., Mamoshina, P., Zhavoronkov, A.: Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm. 13(7), 2524–2530 (2016)
Article Google Scholar
Byvatov, E., Fechner, U., Sadowski, J., Schneider, G.: Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J. Chem. Inf. Comput. Sci. 43(6), 1882–1889 (2003)
Article Google Scholar
Candel, A., Parmar, V., LeDell, E., Arora, A.: Deep learning with H2O. H2O. ai Inc. (2016)
Google Scholar
Cook, D.: Practical Machine Learning with H2O: Powerful Scalable Techniques for Deep Learning and AI. O’Reilly Media, Beijing (2016)
Google Scholar
ZINC Database: Chembridge full library (2011). http://zinc.docking.org/
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11(Feb), 625–660 (2010)
MathSciNet MATH Google Scholar
Fitriawan, A., Wasito, I., Syafiandini, A.F., Azminah, A., Amien, M., Yanuar, A.: Deep belief networks for ligand-based virtual screening of drug design. In: Proceedings of 2016 6th International Workshop on Computer Science and Engineering (WCSE 2016) Tokyo, Japan, pp. 655–659 (2016)
Google Scholar
García-Sosa, A.T., Oja, M., Hetényi, C., Maran, U.: Druglogit: logistic discrimination between drugs and nondrugs including disease-specificity by assigning probabilities based on molecular properties. J. Chem. Inf. Model. 52(8), 2165–2180 (2012)
Article Google Scholar
Gertrudes, J., Maltarollo, V., Silva, R., Oliveira, P., Honorio, K., Da Silva, A.: Machine learning techniques and drug design. Curr. Med. Chem. 19(25), 4289–4297 (2012)
Article Google Scholar
Howard, A.D., McAllister, G., Feighner, S.D., Liu, Q., Nargund, R.P., Van der Ploeg, L.H., Patchett, A.A.: Orphan G-protein-coupled receptors and natural ligand discovery. Trends Pharmacol. Sci. 22(3), 132–140 (2001)
Article Google Scholar
Irwin, J.J., Sterling, T., Mysinger, M.M., Bolstad, E.S., Coleman, R.G.: Zinc: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 52(7), 1757–1768 (2012)
Article Google Scholar
Korkmaz, S., Zararsiz, G., Goksuluk, D.: Drug/nondrug classification using support vector machines with various feature selection strategies. Comput. Methods Programs Biomed. 117(2), 51–60 (2014)
Article Google Scholar
Korkmaz, S., Zararsiz, G., Goksuluk, D.: MLVis: a web tool for machine learning-based virtual screening in early-phase of drug discovery and development. PloS One 10(4), e0124600 (2015)
Article Google Scholar
Lavecchia, A.: Machine-learning approaches in drug discovery: methods and applications. Drug Discov. Today 20(3), 318–331 (2015)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Lowe, R., Mussa, H.Y., Nigsch, F., Glen, R.C., Mitchell, J.B.: Predicting the mechanism of phospholipidosis. J. Cheminform. 4(1), 2 (2012)
Article Google Scholar
Mannhold, R., Kubinyi, H., Folkers, G.: Virtual Screening: Principles, Challenges, and Practical Guidelines, vol. 48. Wiley, Hoboken (2011)
Google Scholar
Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Br. Bioinform. 18(5), 851–869 (2017)
Google Scholar
Mohamed, B., Kamel, Z., Meriem, B., Amira, K., Anouar, B.: An efficient compound classification technique based on multiple kernel learning for virtual screening. In: Proceedings of The Thirteenth International Conference on Computational Intelligence methods for Bioinformatics and Biostatistics (CIBB2016) Stirling, UK (2016)
Google Scholar
Pérez-Sianes, J., Pérez-Sánchez, H., Díaz, F.: Virtual screening: a challenge for deep learning. In: Saberi Mohamad, M., Fdez-Riverola, F., Domínguez Mayo, F., De Paz, J. (eds.) 10th International Conference on Practical Applications of Computational Biology & Bioinformatics, pp. 13–22. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40126-3_2
Chapter Google Scholar
Rusiecki, A., Kordos, M., et al.: Effectiveness of unsupervised training in deep learning neural networks. Schedae Inform. 24(2015), 41–51 (2016)
Google Scholar
Senanayake, U., Prabuddha, R., Ragel, R.: Machine learning based search space optimisation for drug discovery. In: 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 68–75. IEEE (2013)
Google Scholar
Zhou, Y., Arpit, D., Nwogu, I., Govindaraju, V.: Is joint training better for deep auto-encoders? arXiv preprint arXiv:1405.1380 (2014)

Download references

Author information

Authors and Affiliations

Computer Science Department, Faculty of NTIC, University Constantine 2 - Abdelhamid Mehri, Biotechnology Research Center (CRBt) & CERIST, Constantine, Algeria
Meriem Bahi & Mohamed Batouche

Authors

Meriem Bahi
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Batouche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Meriem Bahi .

Editor information

Editors and Affiliations

Abdelmalek Essaâdi University, Tétouan, Morocco
Youness Tabii
Abdelmalek Essaâdi University, Tétouan, Morocco
Mohamed Lazaar
Abdelmalek Essaâdi University, Tétouan, Morocco
Mohammed Al Achhab
Université Ibn-Tofail, Tétouan, Morocco
Nourddine Enneya

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bahi, M., Batouche, M. (2018). Deep Semi-supervised Learning for Virtual Screening Based on Big Data Analytics. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-96292-4_14
Published: 14 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96291-7
Online ISBN: 978-3-319-96292-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics