Abstract
Nowadays, scientists and researchers, are facing the problem of massive data processing, which consumes relatively too much time and cost. That is why researchers have turned to Deep Learning (DL) techniques based on Big Data Analytics. On the other hand, the ever-increasing size of unlabelled data combined with the difficulty of obtaining class labels has made semi-supervised learning an interesting alternative of significant practical importance in modern data analysis. In the same context, drug discovery has reached a state and complexity that we can no longer avoid using Deep Semi-Supervised Learning and Big Data Processing Systems. Virtual Screening (VS) is a computationally intensive process which plays a major role in the early phase of drug discovery process. The VS has to be made as fast as possible to efficiently dock the ligands from huge databases to a selected protein receptor. For these reasons, we propose a deep semi-supervised learning-based algorithmic framework named DeepSSL-VS for pre-filtering the huge set of ligands to effectively do virtual screening for the breast cancer protein receptor. The latter combines stacked autoencoders and deep neural network and is implemented using the Spark-H2O platform. The proposed technique has been compared to twenty-four different machine learning algorithms applied all on the same reference datasets, and preliminary performance assessment results have shown that our approach outperforms these techniques with an overall accuracy performance more than 99%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agrawal, A., Choudhary, A.: Perspective: materials informatics and big data: realization of the “fourth paradigm” of science in materials science. Apl Mater. 4(5), 053208 (2016)
Aliper, A., Plis, S., Artemov, A., Ulloa, A., Mamoshina, P., Zhavoronkov, A.: Deep learning applications for predicting pharmacological properties of drugs and drug repurposing using transcriptomic data. Mol. Pharm. 13(7), 2524–2530 (2016)
Byvatov, E., Fechner, U., Sadowski, J., Schneider, G.: Comparison of support vector machine and artificial neural network systems for drug/nondrug classification. J. Chem. Inf. Comput. Sci. 43(6), 1882–1889 (2003)
Candel, A., Parmar, V., LeDell, E., Arora, A.: Deep learning with H2O. H2O. ai Inc. (2016)
Cook, D.: Practical Machine Learning with H2O: Powerful Scalable Techniques for Deep Learning and AI. O’Reilly Media, Beijing (2016)
ZINC Database: Chembridge full library (2011). http://zinc.docking.org/
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11(Feb), 625–660 (2010)
Fitriawan, A., Wasito, I., Syafiandini, A.F., Azminah, A., Amien, M., Yanuar, A.: Deep belief networks for ligand-based virtual screening of drug design. In: Proceedings of 2016 6th International Workshop on Computer Science and Engineering (WCSE 2016) Tokyo, Japan, pp. 655–659 (2016)
García-Sosa, A.T., Oja, M., Hetényi, C., Maran, U.: Druglogit: logistic discrimination between drugs and nondrugs including disease-specificity by assigning probabilities based on molecular properties. J. Chem. Inf. Model. 52(8), 2165–2180 (2012)
Gertrudes, J., Maltarollo, V., Silva, R., Oliveira, P., Honorio, K., Da Silva, A.: Machine learning techniques and drug design. Curr. Med. Chem. 19(25), 4289–4297 (2012)
Howard, A.D., McAllister, G., Feighner, S.D., Liu, Q., Nargund, R.P., Van der Ploeg, L.H., Patchett, A.A.: Orphan G-protein-coupled receptors and natural ligand discovery. Trends Pharmacol. Sci. 22(3), 132–140 (2001)
Irwin, J.J., Sterling, T., Mysinger, M.M., Bolstad, E.S., Coleman, R.G.: Zinc: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 52(7), 1757–1768 (2012)
Korkmaz, S., Zararsiz, G., Goksuluk, D.: Drug/nondrug classification using support vector machines with various feature selection strategies. Comput. Methods Programs Biomed. 117(2), 51–60 (2014)
Korkmaz, S., Zararsiz, G., Goksuluk, D.: MLVis: a web tool for machine learning-based virtual screening in early-phase of drug discovery and development. PloS One 10(4), e0124600 (2015)
Lavecchia, A.: Machine-learning approaches in drug discovery: methods and applications. Drug Discov. Today 20(3), 318–331 (2015)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Lowe, R., Mussa, H.Y., Nigsch, F., Glen, R.C., Mitchell, J.B.: Predicting the mechanism of phospholipidosis. J. Cheminform. 4(1), 2 (2012)
Mannhold, R., Kubinyi, H., Folkers, G.: Virtual Screening: Principles, Challenges, and Practical Guidelines, vol. 48. Wiley, Hoboken (2011)
Min, S., Lee, B., Yoon, S.: Deep learning in bioinformatics. Br. Bioinform. 18(5), 851–869 (2017)
Mohamed, B., Kamel, Z., Meriem, B., Amira, K., Anouar, B.: An efficient compound classification technique based on multiple kernel learning for virtual screening. In: Proceedings of The Thirteenth International Conference on Computational Intelligence methods for Bioinformatics and Biostatistics (CIBB2016) Stirling, UK (2016)
Pérez-Sianes, J., Pérez-Sánchez, H., Díaz, F.: Virtual screening: a challenge for deep learning. In: Saberi Mohamad, M., Fdez-Riverola, F., Domínguez Mayo, F., De Paz, J. (eds.) 10th International Conference on Practical Applications of Computational Biology & Bioinformatics, pp. 13–22. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-40126-3_2
Rusiecki, A., Kordos, M., et al.: Effectiveness of unsupervised training in deep learning neural networks. Schedae Inform. 24(2015), 41–51 (2016)
Senanayake, U., Prabuddha, R., Ragel, R.: Machine learning based search space optimisation for drug discovery. In: 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), pp. 68–75. IEEE (2013)
Zhou, Y., Arpit, D., Nwogu, I., Govindaraju, V.: Is joint training better for deep auto-encoders? arXiv preprint arXiv:1405.1380 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Bahi, M., Batouche, M. (2018). Deep Semi-supervised Learning for Virtual Screening Based on Big Data Analytics. In: Tabii, Y., Lazaar, M., Al Achhab, M., Enneya, N. (eds) Big Data, Cloud and Applications. BDCA 2018. Communications in Computer and Information Science, vol 872. Springer, Cham. https://doi.org/10.1007/978-3-319-96292-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-96292-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96291-7
Online ISBN: 978-3-319-96292-4
eBook Packages: Computer ScienceComputer Science (R0)