Stacking Multiple Molecular Fingerprints for Improving Ligand-Based Virtual Screening
Currently, most of machine learning based virtual screening methods use a molecular fingerprint. There are numerous fingerprints proposed for various aims, and it is known that the best fingerprint is different for each target, and it is difficult to select the most suitable fingerprint. To overcome this problem, we propose a new technique for the use of multiple fingerprints for drug activity prediction. The method implies that each molecular fingerprint extracts different features of a compound, and prediction based on a different fingerprint returns different results. We applied the ensemble learning technique to integrate predictions based on multiple fingerprints. The method builds prediction models based on 8 different major molecular fingerprints, and then integrates multiple prediction results from those models. As a result of performance evaluation, the proposed method increased the predicted performance as compared to the prediction models involving a single molecular fingerprint.
KeywordsLigand-based virtual screening Molecular fingerprint Ensemble learning
- 12.Wang, Y., Bryant, S.H., Cheng, T., Wang, J., Gindulyte, A., Shoemaker, B.A., Thiessen, P.A., He, S., Zhang, J.: PubChem BioAssay: 2017 update. Nucleic Acids Res. (2017)Google Scholar
- 13.Dahl, G., Jaitly, N., Salakhutdinov, R.: Multi-task Neural Networks for QSAR Predictions. arXiv preprint arXiv:1406.1231, pp. 1–21 (2014)
- 14.Landrum, G.: RDKit: Open-source cheminformatics (2006)Google Scholar
- 18.Sill, J., Takacs, G., Mackey, L., Lin, D.: Feature-Weighted Linear Stacking, pp. 1–17 (2009)Google Scholar