Machine Learning Scoring Functions Based on Random Forest and Support Vector Regression
Accurately predicting the binding affinities of large sets of diverse molecules against a range of macromolecular targets is an extremely challenging task. The scoring functions that attempt such computational prediction exploiting structural data are essential for analysing the outputs of Molecular Docking, which is in turn an important technique for drug discovery, chemical biology and structural biology. Conventional scoring functions assume a predetermined theory-inspired functional form for the relationship between the variables that characterise the complex and its predicted binding affinity. The inherent problem of this approach is in the difficulty of explicitly modelling the various contributions of intermolecular interactions to binding affinity.
Recently, a new family of 3D structure-based regression models for binding affinity prediction has been introduced which circumvent the need for modelling assumptions. These machine learning scoring functions have been shown to widely outperform conventional scoring functions. However, to date no direct comparison among machine learning scoring functions has been made. Here the performance of the two most popular machine learning scoring functions for this task is analysed under exactly the same experimental conditions.
Keywordsmolecular docking scoring functions machine learning chemical informatics structural bioinformatics
- 1.Moitessier, N., et al.: Towards the development of universal, fast and highly accurate docking/scoring methods: a long way to go. Br. J. Pharmacol. 153, S7–S26 (2008)Google Scholar
- 22.Breiman, L., et al.: Classification and regression trees. Chapman & Hall/CRC (1984)Google Scholar
- 25.The Comprehensive R Archive Network (CRAN) Package e1071, http://cran.r-project.org/web/packages/e1071/index.html (last accessed November 2, 2011).
- 28.Joachims, T.: Making large-Scale SVM Learning Practical. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press (1999)Google Scholar
- 30.LIBSVM - A Library for Support Vector Machines, http://www.csie.ntu.edu.tw/~cjlin/libsvm/ (last accessed November 2, 2011).
- 31.CSAR, http://www.csardock.org (last accessed November 2, 2011).
- 32.The PDBbind database, http://www.pdbbind-cn.org/ (last accessed November 2, 2011).
- 34.The Comprehensive R Archive Network (CRAN) Package caret, http://cran.r-project.org/web/packages/caret/index.html (last accessed November 2, 2011).