Abstract
Recent progress in the development of scientific libraries with machine-learning techniques paved the way for the implementation of integrated computational tools to predict ligand-binding affinity. The prediction of binding affinity uses the atomic coordinates of protein-ligand complexes. These new computational tools made application of a broad spectrum of machine-learning techniques to study protein-ligand interactions possible. The essential aspect of these machine-learning approaches is to train a new computational model by using technologies such as supervised machine-learning techniques, convolutional neural network, and random forest to mention the most commonly applied methods. In this chapter, we focus on supervised machine-learning techniques and their applications in the development of protein-targeted scoring functions for the prediction of binding affinity. We discuss the development of the program SAnDReS and its application to the creation of machine-learning models to predict inhibition of cyclin-dependent kinase and HIV-1 protease. Moreover, we describe the scoring function space, and how to use it to explain the development of targeted scoring functions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Nanard M, Nanard J (1985) A user-friendly biological workstation. Biochimie 67:429–432
Hirst JD, King RD, Sternberg MJ (1994) Quantitative structure-activity relationships by neural networks and inductive logic programming. I. The inhibition of dihydrofolate reductase by pyrimidines. J Comput Aided Mol Des 8:405–420
Hirst JD, King RD, Sternberg MJ (1994) Quantitative structure-activity relationships by neural networks and inductive logic programming. II. The inhibition of dihydrofolate reductase by triazines. J Comput Aided Mol Des 8:421–432
Heck GS, Pintro VO, Pereira RR, de Ávila MB, Levin NMB, de Azevedo WF (2017) Supervised machine learning methods applied to predict ligand-binding affinity. Curr Med Chem 24:2459–2470
Levin NM, Pintro VO, de Ávila MB, de Mattos BB, De Azevedo WF Jr (2017) Understanding the structural basis for inhibition of cyclin-dependent kinases. New pieces in the molecular puzzle. Curr Drug Targets 18:1104–1111
de Azevedo WF Jr (2016) Opinion paper: targeting multiple cyclin-dependent kinases (CDKs): a new strategy for molecular docking studies. Curr Drug Targets 17:2
Ain QU, Aleksandrova A, Roessler FD, Ballester PJ (2015) Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip Rev Comput Mol Sci 5:405–424
Xue LC, Dobbs D, Bonvin AM, Honavar V (2015) Computational prediction of protein interfaces: a review of data driven methods. FEBS Lett 589:3516–3526
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Li H, Peng J, Leung Y, Leung KS, Wong MH, Lu G et al (2018) The impact of protein structure and sequence similarity on the accuracy of machine-learning scoring functions for binding affinity prediction. Biomolecules 8:12
Bitencourt-Ferreira G, de Azevedo WF Jr (2018) Development of a machine-learning model to predict Gibbs free energy of binding for protein-ligand complexes. Biophys Chem 240:63–69
Jiménez J, Škalič M, Martínez-Rosell G, De Fabritiis G (2018) KDEEP: protein-ligand absolute binding affinity prediction via 3D-convolutional neural networks. J Chem Inf Model 58:287–296
de Ávila MB, de Azevedo WF Jr (2018) Development of machine learning models to predict inhibition of 3-dehydroquinate dehydratase. Chem Biol Drug Des 92:1468–1474
Amaral MEA, Nery LR, Leite CE, de Azevedo Junior WF, Campos MM (2018) Pre-clinical effects of metformin and aspirin on the cell lines of different breast cancer subtypes. Invest New Drugs 36:782–796
Levin NMB, Pintro VO, Bitencourt-Ferreira G, Mattos BB, Silvério AC, de Azevedo WF Jr (2018) Development of CDK-targeted scoring functions for prediction of binding affinity. Biophys Chem 235:1–8
Freitas PG, Elias TC, Pinto IA, Costa LT, de Carvalho PVSD, Omote DQ et al (2018) Computational approach to the discovery of phytochemical molecules with therapeutic potential targets to the PKCZ protein. Lett Drug Des Discov 15:488–499
Pintro VO, Azevedo WF (2017) Optimized virtual screening workflow. Towards target-based polynomial scoring functions for HIV-1 protease. Comb Chem High Throughput Screen 20:820–827
de Ávila MB, Xavier MM, Pintro VO, de Azevedo WF (2017) Supervised machine learning techniques to predict binding affinity. A study for cyclin-dependent kinase 2. Biochem Biophys Res Commun 494:305–310
Zhang L, Ai HX, Li SM, Qi MY, Zhao J, Zhao Q et al (2017) Virtual screening approach to identifying influenza virus neuraminidase inhibitors using molecular docking combined with machine-learning-based scoring function. Oncotarget 8:83142–83154
Xavier MM, Heck GS, de Avila MB, Levin NM, Pintro VO, Carvalho NL et al (2016) SAnDReS a computational tool for statistical analysis of docking results and development of scoring functions. Comb Chem High Throughput Screen 19:801–812
Wójcikowski M, Ballester PJ, Siedlecki P (2017) Performance of machine-learning scoring functions in structure-based virtual screening. Sci Rep 7:46710
Sunseri J, King JE, Francoeur PG, Koes DR (2019) Convolutional neural network scoring and minimization in the D3R 2017 community challenge. J Comput Aided Mol Des 33(1):19–34. https://doi.org/10.1007/s10822-018-0133-y
Ragoza M, Hochuli J, Idrobo E, Sunseri J, Koes DR (2017) Protein-ligand scoring with convolutional neural networks. J Chem Inf Model 57:942–957
Hochuli J, Helbling A, Skaist T, Ragoza M, Koes DR (2018) Visualizing convolutional neural network protein-ligand scoring. J Mol Graph Model 84:96–108
Afifi K, Al-Sadek AF (2018) Improving classical scoring functions using random forest: the non-additivity of free energy terms' contributions in binding. Chem Biol Drug Des 92:1429–1434
Wang C, Zhang Y (2017) Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest. J Comput Chem 38:169–177
Li H, Leung KS, Wong MH, Ballester PJ (2015) Low-quality structural and interaction data improves binding affinity prediction via random forest. Molecules 20:10947–10962
Khamis MA, Gomaa W, Ahmed WF (2015) Machine learning in computational docking. Artif Intell Med 63:135–152
Li H, Leung KS, Wong MH, Ballester PJ (2015) Improving AutoDock Vina using random forest: the growing accuracy of binding affinity prediction by the effective exploitation of larger data sets. Mol Inform 34:115–126
Zilian D, Sotriffer CA (2013) SFCscore(RF): a random forest-based scoring function for improved affinity prediction of protein-ligand complexes. J Chem Inf Model 53:1923–1933
Ballester PJ, Mitchell JB (2010) A machine learning approach to predicting protein-ligand binding affinity with applications to molecular docking. Bioinformatics 26:1169–1175
Thomsen R, Christensen MH (2006) MolDock: a new technique for high-accuracy molecular docking. J Med Chem 49:3315–3321
Heberlé G, de Azevedo WF Jr (2011) Bio-inspired algorithms applied to molecular docking simulations. Curr Med Chem 18:1339–1352
De Azevedo WF Jr (2010) MolDock applied to structure-based virtual screening. Curr Drug Targets 11:327–334
Goodsell DS, Olson AJ (1990) Automated docking of substrates to proteins by simulated annealing. Proteins 8:195–202
Morris GM, Goodsell DS, Huey R, Olson AJ (1996) Distributed automated docking of flexible ligands to proteins: parallel applications of AutoDock 2.4. J Comput Aided Mol Des 10:293–304
Morris GM, Goodsell DS, Halliday RS, Huey R, Hart WE, Belew RK et al (1998) Automated docking using a Lamarckian genetic algorithm and empirical binding free energy function. J Comput Chem 19:1639–1662
Morris GM, Huey R, Lindstrom W, Sanner MF, Belew RK, Goodsell DS et al (2009) AutoDock4 and AutoDockTools4: automated docking with selective receptor flexibility. J Comput Chem 30:2785–2791
Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem 31:455–461
Kim SH, Schulze-Gahmen U, Brandsen J, de Azevedo Júnior WF (1996) Structural basis for chemical inhibition of CDK2. Prog Cell Cycle Res 2:137–145
De Azevedo WF Jr, Mueller-Dieckmann HJ, Schulze-Gahmen U, Worland PJ, Sausville E, Kim SH (1996) Structural basis for specificity and potency of a flavonoid inhibitor of human CDK2, a cell cycle kinase. Proc Natl Acad Sci U S A 93:2735–2740
De Azevedo WF, Leclerc S, Meijer L, Havlicek L, Strnad M, Kim SH (1997) Inhibition of cyclin-dependent kinases by purine analogues: crystal structure of human cdk2 complexed with roscovitine. Eur J Biochem 243:518–526
de Ávila MB, Bitencourt-Ferreira G, de Azevedo WF Jr (2018) Structural basis for inhibition of enoyl-[acyl carrier protein] reductase (InhA) from Mycobacterium tuberculosis. Curr Med Chem. https://doi.org/10.2174/0929867326666181203125229
Volkart PA, Bitencourt-Ferreira G, Souto AA, de Azevedo WF (2019) Cyclin-dependent kinase 2 in cellular senescence and cancer. A structural and functional review. Curr Drug Targets 20(7):716–726. https://doi.org/10.2174/1389450120666181204165344
Russo S, De Azevedo WF (2018) Advances in the understanding of the cannabinoid receptor 1 - focusing on the inverse agonists interactions. Curr Med Chem. https://doi.org/10.2174/0929867325666180417165247
Pinto-Junior VR, Osterne VJ, Santiago MQ, Correia JL, Pereira-Junior FN, Leal RB et al (2017) Structural studies of a vasorelaxant lectin from Dioclea reflexa Hook seeds: Crystal structure, molecular docking and dynamics. Int J Biol Macromol 98:12–23
Abbasi WA, Asif A, Ben-Hur A, Minhas FUAA (2018) Learning protein binding affinity using privileged information. BMC Bioinformatics 19:425
Kumari M, Tiwari N, Chandra S, Subbarao N (2018) Comparative analysis of machine learning based QSAR models and molecular docking studies to screen potential anti-tubercular inhibitors against InhA of Mycobacterium tuberculosis. Int J Comput Biol Drug Des 11:3
Masand VH, El-Sayed NNE, Bambole MU, Patil VR, Thakur SD (2019) Multiple quantitative structure-activity relationships (QSARs) analysis for orally active trypanocidal N-myristoyltransferase inhibitors. J Mol Struct 1175:481–487
Maltarollo VG, Kronenberger T, Windshugel B, Wrenger C, Trossini GHG, Honorio KM (2018) Advances and challenges in drug design of PPARδ ligands. Curr Drug Targets 19:144–154
Lemos A, Melo R, Preto AJ, Almeida JG, Moreira IS, Dias Soeiro Cordeiro MN (2018) In silico studies targeting G-protein coupled receptors for drug research against Parkinson’s disease. Curr Neuropharmacol 16:786–848
Ribeiro FF, Mendonca Junior FJB, Ghasemi JB, Ishiki HM, Scotti MT, Scotti L (2018) Docking of natural products against neurodegenerative diseases: general concepts. Comb Chem High Throughput Screen 21:152–160
Aleksandrov A, Myllykallio H (2019) Advances and challenges in drug design against tuberculosis: application of in silico approaches. Expert Opin Drug Discov 14:35–46
Safarizadeh H, Garkani-Nejad Z (2019) Investigation of MI-2 analogues as MALT1 inhibitors to treat of diffuse large B-Cell 0lymphoma through combined molecular dynamics simulation, molecular docking and QSAR techniques and design of new inhibitors. J Mol Struct 1180:708–722
Joy M, Elrashedy AA, Mathew B, Pillay AS, Mathews A, Dev S et al (2018) Discovery of new class of methoxy carrying isoxazole derivatives as COX-II inhibitors: Investigation of a detailed molecular dynamics study. J Mol Struct 1157:19–28
Leal RB, Pinto-Junior VR, Osterne VJS, Wolin IAV, Nascimento APM, Neco AHB et al (2018) Crystal structure of DlyL, a mannose-specific lectin from Dioclea lasiophylla Mart. Ex Benth seeds that display cytotoxic effects against C6 glioma cells. Int J Biol Macromol 114:64–76
Cavada BS, Araripe DA, Silva IB, Pinto-Junior VR, Osterne VJS, Neco AHB et al (2016) Structural studies and nociceptive activity of a native lectin from Platypodium elegans seeds (nPELa). Int J Biol Macromol 107:236–246
Usman MSM, Bharbhuiya TK, Mondal S, Rani S, Kyal C, Kumari R (2018) Combined protein and ligand based physicochemical aspects of molecular recognition for the discovery of CDK9 inhibitor. Gene Rep 13:212–219
Neco AHB, Pinto-Junior VR, Araripe DA, Santiago MQ, Osterne VJS, Lossio CF et al (2018) Structural analysis, molecular docking and molecular dynamics of an edematogenic lectin from Centrolobium microchaete seeds. Int J Biol Macromol 117:124–133
Nowaczyk A, Fijałkowski Ł, Zaręba P, Sałat K (2018) Docking and pharmacodynamic studies on hGAT1 inhibition activity in the presence of selected neuronal and astrocytic inhibitors. Part I. J Mol Graph Model 85:171–181
Tong J, Lei S, Qin S, Wang Y (2018) QSAR studies of TIBO derivatives as HIV-1 reverse transcriptase inhibitors using HQSAR, CoMFA and CoMSIA. J Mol Struct 1168:56–64
Azevedo LS, Moraes FP, Xavier MM, Pantoja EO, Villavicencio B, Finck JA et al (2012) Recent progress of molecular docking simulations applied to development of drugs. Curr Bioinform 7:352–365
Dias R, de Azevedo WF Jr (2008) Molecular docking algorithms. Curr Drug Targets 9:1040–1047
Breda A, Basso LA, Santos DS, de Azevedo WF Jr (2008) Virtual screening of drugs: score functions, docking, and drug design. Curr Comput Aided Drug Des 4:265–272
Böhm HJ (1993) A novel computational tool for automated structure-based drug design. J Mol Recognit 6:131–137
Böhm HJ (1994) The development of a simple empirical scoring function to estimate the binding constant for a protein-ligand complex of known three-dimensional structure. J Comput Aided Mol Des 8:243–256
Böhm HJ (1996) Towards the automatic design of synthetically accessible protein ligands: peptides, amides and peptidomimetics. J Comput Aided Mol Des 10:265–272
Stahl M, Böhm HJ (1998) Development of filter functions for protein-ligand docking. J Mol Graph Model 16:121–132
Klebe G, Böhm HJ (1997) Energetic and entropic factors determining binding affinity in protein-ligand complexes. J Recept Signal Transduct Res 17:459–473
Böhm HJ, Banner DW, Weber L (1999) Combinatorial docking and combinatorial chemistry: design of potent non-peptide thrombin inhibitors. J Comput Aided Mol Des 13:51–56
Korb O, Stützle T, Exner TE (2009) Empirical scoring functions for advanced protein-ligand docking with PLANTS. J Chem Inf Model 49:84–96
Dias R, Timmers LF, Caceres RA, de Azevedo WF Jr (2008) Evaluation of molecular docking using polynomial empirical scoring functions. Curr Drug Targets 9:1062–1070
de Azevedo WF Jr, Dias R (2008) Evaluation of ligand-binding affinity using polynomial empirical scoring functions. Bioorg Med Chem 16:9378–9382
Legendre AM (1805) Nouvelle méthodes pour la déterminiation des orbites des comètes. Courcier, Paris
Bell J (2015) Machine learning. Hands-on for developers and technical professionals. Wiley, Indianapolis, IN
Bruce P, Bruce A (2017) Practical statistics for data scientists. 50 essential concepts. O’Reilly Media, Sebastopol
Tikhonov AN (1963) On the regularization of ill-posed problems. Dokl Akad Nauk SSSR 153:49–52
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Series B Stat Methodol 58:267–288
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Series B Stat Methodol 67:301–320
Lennard-Jones JE (1931) Cohesion. Proc Phys Soc 43:461–482
Zar JH (1972) Significance testing of the Spearman rank correlation coefficient. J Am Stat Assoc 67:578–580
Morgan DO (1995) Principles of CDK regulation. Nature 374:131–134
Murray AW (1994) Cyclin-dependent kinases: regulators of the cell cycle and more. Chem Biol 1:191–195
Canduri F, de Azevedo WF Jr (2005) Structural basis for interaction of inhibitors with cyclin-dependent kinase 2. Curr Comput Aided Drug Des 1:53–64
Krystof V, Cankar P, Frysová I, Slouka J, Kontopidis G, Dzubák P et al (2006) 4-arylazo-3,5-diamino-1H-pyrazole CDK inhibitors: SAR study, crystal structure in complex with CDK2, selectivity, and cellular effects. J Med Chem 49:6500–6509
De Bondt HL, Rosenblatt J, Jancarik J, Jones HD, Morgan DO, Kim SH (1996) Crystal structure of cyclin-dependent kinase 2. Nature 363:595–602
Schulze-Gahmen U, De Bondt HL, Kim SH (1996) High-resolution crystal structures of human cyclin-dependent kinase 2 with and without ATP: bound waters and natural ligand as guides for inhibitor design. J Med Chem 39:4540–4546
Pang X, Liu Z, Zhai G (2014) Advances in non-peptidomimetic HIV protease inhibitors. Curr Med Chem 21:1997–2011
Berti F, Frecer V, Miertus S (2014) Inhibitors of HIV-protease from computational design. A history of theory and synthesis still to be fully appreciated. Curr Pharm Des 20:3398–3411
Canduri F, Teodoro LG, Fadel V, Lorenzi CC, Hial V, Gomes RA et al (2001) Structure of human uropepsin at 2.45 A resolution. Acta Crystallogr D Biol Crystallogr 57:1560–1570
Miller M, Jaskólski M, Rao JK, Leis J, Wlodawer A (1989) Crystal structure of a retroviral protease proves relationship to aspartic protease family. Nature 337:576–579
Navia MA, Fitzgerald PM, McKeever BM, Leu CT, Heimbach JC, Herber WK et al (1989) Three-dimensional structure of aspartyl protease from human immunodeficiency virus HIV-1. Nature 337:615–620
Liu F, Kovalevsky AY, Tie Y, Ghosh AK, Harrison RW, Weber IT (2008) Effect of flap mutations on structure of HIV-1 protease and inhibition by saquinavir and darunavir. J Mol Biol 381:102–115
Lv Z, Chu Y, Wang Y (2015) HIV protease inhibitors: a review of molecular selectivity and toxicity. HIV AIDS (Auckl) 7:95–104
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H et al (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242
Gasteiger J, Marsili M (1980) Iterative partial equalization of orbital electronegativity—a rapid access to atomic charges. Tetrahedron 36:3219–3228
Korb O, Stutzle T, Exner TE (2009) Empirical scoring functions for advanced protein-ligand docking with PLANTS. J Chem Inf Model 49:84–96
Acknowledgments
This work was supported by grants from CNPq (Brazil) (308883/2014-4). This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nivel Superior—Brasil (CAPES)—Finance Code 001. GB-F acknowledges support from PUCRS/BPA fellowship. WFA is a senior researcher for CNPq (Brazil) (Process Numbers: 308883/2014-4 and 309029/2018-0).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Bitencourt-Ferreira, G., de Azevedo, W.F. (2019). Machine Learning to Predict Binding Affinity. In: de Azevedo Jr., W. (eds) Docking Screens for Drug Discovery. Methods in Molecular Biology, vol 2053. Humana, New York, NY. https://doi.org/10.1007/978-1-4939-9752-7_16
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9752-7_16
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-4939-9751-0
Online ISBN: 978-1-4939-9752-7
eBook Packages: Springer Protocols