Science China Chemistry

, Volume 62, Issue 4, pp 506–514 | Cite as

Screen efficiency comparisons of decision tree and neural network algorithms in machine learning assisted drug design

  • Qiumei Pu
  • Yinghao Li
  • Hong Zhang
  • Haodong Yao
  • Bo Zhang
  • Bingji Hou
  • Lin Li
  • Yuliang Zhao
  • Lina ZhaoEmail author


In view of huge search space in drug design, machine learning has become a powerful method to predict the affinity between small molecular drug and targeting protein with the development of artificial intelligence technology. However, various machine learning algorithms including massive different parameters make the prediction framework choice to be quite difficult. In this work, we took a recent drug design competition (from XtalPi company on the DataCastle platform) as the typical case to find the optimized parameters for different machines learning algorithms and the most effective algorithm. After the parameter optimizations, we compared the typical machine learning methods as decision tree (XGBoost, LightGBM) and artificial neural network (MLP, CNN) with root-mean-square error (RMSE) and coefficient of determination (R2) evaluation. As a result, decision tree is more effective than the neural network as LightGBM>XGBoost>CNN>MLP in the affinity prediction of the specific drug design problem with ~160000 samples. For a much larger screening task in a more complicated drug design study, the sophisticated neural network model may go beyond the decision tree algorithm after generalization enhancing and overfitting reducing. The advanced machine learning methods could extract more information of protein-ligand bindings than traditional ones and improve the screen efficiency of drug design up to 200–1000 times.


drug design affinity prediction protein-ligand binding machine learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



This work was supported by the National Natural Science Foundation of China (31571026, 21727817).

Supplementary material

11426_2018_9412_MOESM1_ESM.pdf (313 kb)
Screen efficiency comparisons of decision tree and neural network algorithms in machine learning assisted drug design


  1. 1.
    Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Pharmacol Therapeutics, 2013, 138: 333–408CrossRefGoogle Scholar
  2. 2.
    Zhang GB, Maddili SK, Tangadanchu VKR, Gopala L, Gao WW, Cai GX, Zhou CH. Sci China Chem, 2018, 61: 557–568CrossRefGoogle Scholar
  3. 3.
    Song CM, Lim SJ, Tong JC. Briefings BioInf, 2009, 10: 579–591CrossRefGoogle Scholar
  4. 4.
    DiMasi JA, Hansen RW, Grabowski HG. J Health Economics, 2003, 22: 151–185CrossRefGoogle Scholar
  5. 5.
    Begley CG, Ellis LM. Nature, 2012, 483: 531–533CrossRefGoogle Scholar
  6. 6.
    Talele T, Khedkar S, Rigby A. Curr Top Med Chem, 2010, 10: 127–141CrossRefGoogle Scholar
  7. 7.
    Mayr LM, Fuerst P. J Biomol Screen, 2008, 13: 443–448CrossRefGoogle Scholar
  8. 8.
    Zhang H, Liu Y, Sun Y, Li M, Ni W, Zhang Q, Wan X, Chen Y. Sci China Chem, 2017, 60: 366–369CrossRefGoogle Scholar
  9. 9.
    Liu J, Zheng N, Hu Z, Wang Z, Yang X, Huang F, Cao Y. Sci China Chem, 2017, 60: 1136–1144CrossRefGoogle Scholar
  10. 10.
    Evers A, Klabunde T. J Med Chem, 2005, 48: 1088–1097CrossRefGoogle Scholar
  11. 11.
    Ferrari S, Morandi F, Motiejunas D, Nerini E, Henrich S, Luciani R, Venturelli A, Lazzari S, Calo S, Gupta S, Hannaert V, Michels PAM, Wade RC, Costi MP. J Med Chem, 2010, 54: 211–221CrossRefGoogle Scholar
  12. 12.
    Su P, Chen H, Wu W. Sci China Chem, 2016, 59: 1025–1032CrossRefGoogle Scholar
  13. 13.
    Gerogiokas G, Calabro G, Henchman RH, Southey MWY, Law RJ, Michel J. J Chem Theor Comput, 2013, 10: 35–48CrossRefGoogle Scholar
  14. 14.
    Rastelli G, Del Rio A, Degliesposti G, Sgobba M. J Comput Chem, 2010, 31: 797–810Google Scholar
  15. 15.
    Sliwoski G, Kothiwale S, Meiler J, Lowe EW. Pharmacol Rev, 2014, 66: 334–395CrossRefGoogle Scholar
  16. 16.
    Montavon G, Rupp M, Gobre V, Vazquez-Mayagoitia A, Hansen K, Tkatchenko A, Müller KR, Anatole von Lilienfeld O. New J Phys, 2013, 15: 095003CrossRefGoogle Scholar
  17. 17.
    Ain QU, Aleksandrova A, Roessler FD, Ballester PJ. WIREs Comput Mol Sci, 2015, 5: 405–424CrossRefGoogle Scholar
  18. 18.
    Kurczab R, Smusz S, Bojarski AJ. J Cheminform, 2014, 6: 32CrossRefGoogle Scholar
  19. 19.
    Domingos P. Commun ACM, 2012, 55: 78CrossRefGoogle Scholar
  20. 20.
    Jordan MI, Mitchell TM. Science, 2015, 349: 255–260CrossRefGoogle Scholar
  21. 21.
    Sidorov G, Velasquez F, Stamatatos E, Gelbukh A, Chanona-Hernández L. Expert Syst Appl, 2014, 41: 853–860CrossRefGoogle Scholar
  22. 22.
    Nanni L, Lumini A, Ferrara M, Cappelli R. Neurocomputing, 2015, 149: 526–535CrossRefGoogle Scholar
  23. 23.
    Libbrecht MW, Noble WS. Nat Rev Genet, 2015, 16: 321–332CrossRefGoogle Scholar
  24. 24.
    Michalski RS, Carbonell JG, Mitchell TM. Machine Learning: An Artificial Intelligence Approach. Berlin-Heidelberg: Springer Science & Business Media, 2013Google Scholar
  25. 25.
    Lavecchia A. Drug Discov Today, 2015, 20: 318–331CrossRefGoogle Scholar
  26. 26.
    Murphy RF. Nat Chem Biol, 2011, 7: 327–330CrossRefGoogle Scholar
  27. 27.
    Barros RC, Basgalupp MP, de Carvalho ACPLF, Freitas AA. IEEE Trans Syst Man Cybern C, 2012, 42: 291–312CrossRefGoogle Scholar
  28. 28.
    Fan CY, Chang PC, Lin JJ, Hsieh JC. Appl Soft Comput, 2011, 11: 632–644CrossRefGoogle Scholar
  29. 29.
    Garg V, Kumar H, Sinha R. Speech based emotion recognition based on hierarchical decision tree with SVM, BLG and SVR classifiers. In: 2013 National Conference on Communications. New Delhi: IEEE, 2013. 1–5Google Scholar
  30. 30.
    Zhang Z. Artificial neural network. In: Zhang Z, Ed. Multivariate Time Series Analysis in Climate and Environmental Research. Cham: Springer, 2018. 1–35Google Scholar
  31. 31.
    Li H, Lin Z, Shen X, Brandt J, Hua G. A convolutional neural network cascade for face detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE, 2015.5325–5334Google Scholar
  32. 32.
    Moal IH, Agius R, Bates PA. Bioinformatics, 2011, 27: 3002–3009CrossRefGoogle Scholar
  33. 33.
    Medina F, Aguila S, Baratto MC, Martorana A, Basosi R, Alderete JB, Vazquez-Duhalt R. Enzyme Microbial Tech, 2013, 52: 68–76CrossRefGoogle Scholar
  34. 34.
    Pereira JC, Caffarena ER, Dos Santos CN. J Chem Inf Model, 2016, 56: 2495–2506CrossRefGoogle Scholar
  35. 35.
    Tian K, Shao M, Wang Y, Guan J, Zhou S. Methods, 2016, 110: 64–72CrossRefGoogle Scholar
  36. 36. Scholar
  37. 37.
    Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J. Nucleic Acids Res, 2015, 44: D1045–D1053CrossRefGoogle Scholar
  38. 38.
    Rehurek R, Sojka P. Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta: IEEE, 2010Google Scholar
  39. 39.
    Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining. San Francisco, 2016. 785–794CrossRefGoogle Scholar
  40. 40.
    Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu TY. Lightgbm: a highly efficient gradient boosting decision tree. In: Advances in Neural Information Processing Systems. Long Beach, 2017. 3146–3154Google Scholar
  41. 41.
    Tang J, Deng C, Huang GB. IEEE Trans Neural Netw Learn Syst, 2016, 27: 809–821CrossRefGoogle Scholar
  42. 42.
    Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. Lake Tahoe, 2012. 1097–1105Google Scholar
  43. 43.
    Chen T, He T, Benesty M. Xgboost: extreme gradient boosting. R Package Version 0.4-2, 2015. 1–4Google Scholar
  44. 44.
    Orhan U, Hekim M, Ozer M. Expert Syst Appl, 2011, 38: 13475–13481CrossRefGoogle Scholar
  45. 45.
    Zare M, Pourghasemi HR, Vafakhah M, Pradhan B. Arab J Geosci, 2013, 6: 2873–2888CrossRefGoogle Scholar
  46. 46.
    Oquab M, Bottou L, Laptev I, et al. Learning and transferring midlevel image representations using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Columbus, 2014. 1717–1724Google Scholar
  47. 47.
    Kim Y. Convolutional neural networks for sentence classification. arXiv preprint, 1408.5882, 2014CrossRefGoogle Scholar
  48. 48.
    Vedaldi A, Lenc K. Matconvnet: convolutional neural networks for MATLAB. In: Proceedings of the 23rd ACM International Conference on Multimedia. New York: ACM, 2015. 689–692CrossRefGoogle Scholar
  49. 49.
    Chai T, Draxler RR. Geosci Model Dev Discuss, 2014, 7: 1525–1534CrossRefGoogle Scholar
  50. 50.
    Lee SH, Goddard ME, Wray NR, Visscher PM. Genet Epidemiol, 2012, 36: 214–224CrossRefGoogle Scholar

Copyright information

© Science China Press and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Qiumei Pu
    • 1
    • 2
  • Yinghao Li
    • 1
    • 2
  • Hong Zhang
    • 2
  • Haodong Yao
    • 1
    • 3
  • Bo Zhang
    • 4
  • Bingji Hou
    • 3
  • Lin Li
    • 4
  • Yuliang Zhao
    • 1
    • 3
  • Lina Zhao
    • 1
    • 3
    Email author
  1. 1.CAS Key Laboratory for Biomedical Effects of Nanomaterials and Nanosafety, Institute of High Energy PhysicsChinese Academy of SciencesBeijingChina
  2. 2.School of Information EngineeringMinzu University of ChinaBeijingChina
  3. 3.University of Chinese Academy of SciencesBeijingChina
  4. 4.School of ComputerBeijing Institute of TechnologyBeijingChina

Personalised recommendations