Stacking Multiple Molecular Fingerprints for Improving Ligand-Based Virtual Screening

  • Yusuke Matsuyama
  • Takashi IshidaEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10955)


Currently, most of machine learning based virtual screening methods use a molecular fingerprint. There are numerous fingerprints proposed for various aims, and it is known that the best fingerprint is different for each target, and it is difficult to select the most suitable fingerprint. To overcome this problem, we propose a new technique for the use of multiple fingerprints for drug activity prediction. The method implies that each molecular fingerprint extracts different features of a compound, and prediction based on a different fingerprint returns different results. We applied the ensemble learning technique to integrate predictions based on multiple fingerprints. The method builds prediction models based on 8 different major molecular fingerprints, and then integrates multiple prediction results from those models. As a result of performance evaluation, the proposed method increased the predicted performance as compared to the prediction models involving a single molecular fingerprint.


Ligand-based virtual screening Molecular fingerprint Ensemble learning 


  1. 1.
    Morgan, H.L.: The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service. J. Chem. Doc. 5(2), 107–113 (1965)CrossRefGoogle Scholar
  2. 2.
    Durant, J.L., Leland, B.A., Henry, D.R., Nourse, J.G.: Reoptimization of MDL keys for use in drug discovery. J. Chem. Inf. Comput. Sci. 42(6), 1273–1280 (2002)CrossRefGoogle Scholar
  3. 3.
    Rogers, D., Hahn, M.: Extended-connectivity fingerprints. J. Chem. Inf. Model. 50(5), 742–754 (2010)CrossRefGoogle Scholar
  4. 4.
    Riniker, S., Landrum, G.A.: Open-source platform to benchmark fingerprints for ligand-based virtual screening. J. Cheminform. 5(5), 26 (2013)CrossRefGoogle Scholar
  5. 5.
    Carhart, R.E., Smith, D.H., Venkataraghavan, R.: Atom pairs as molecular features in structure-activity studies: definition and applications. J. Chem. Inf. Comput. Sci. 25(2), 64–73 (1985)CrossRefGoogle Scholar
  6. 6.
    McGregor, M.J., Muskal, S.M.: Pharmacophore fingerprinting. 2. Application to primary library design. J. Chem. Inf. Comput. Sci. 40(1), 117–125 (1999)CrossRefGoogle Scholar
  7. 7.
    O’Boyle, N.M., Sayle, R.A.: Comparing structural fingerprints using a literature based similarity benchmark. J. Cheminform. 8(1), 1–14 (2016)CrossRefGoogle Scholar
  8. 8.
    Kearsley, S.K., Sallamack, S., Fluder, E.M., Andose, J.D., Mosley, R.T., Sheridan, R.P.: Chemical similarity using physiochemical property descriptors. J. Chem. Inf. Comput. Sci. 36(1), 118–127 (1996)CrossRefGoogle Scholar
  9. 9.
    Ma, J., Sheridan, R.P., Liaw, A., Dahl, G.E., Svetnik, V.: Deep neural nets as a method for quantitative structure–activity relationships. J. Chem. Inf. Model. 55(2), 263–274 (2015). PMID: 25635324CrossRefGoogle Scholar
  10. 10.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)zbMATHGoogle Scholar
  11. 11.
    Wolpert, D.H.: Stacked generalization. Neural Netw. 2(505), 241–259 (1992)CrossRefGoogle Scholar
  12. 12.
    Wang, Y., Bryant, S.H., Cheng, T., Wang, J., Gindulyte, A., Shoemaker, B.A., Thiessen, P.A., He, S., Zhang, J.: PubChem BioAssay: 2017 update. Nucleic Acids Res. (2017)Google Scholar
  13. 13.
    Dahl, G., Jaitly, N., Salakhutdinov, R.: Multi-task Neural Networks for QSAR Predictions. arXiv preprint arXiv:1406.1231, pp. 1–21 (2014)
  14. 14.
    Landrum, G.: RDKit: Open-source cheminformatics (2006)Google Scholar
  15. 15.
    Ishida, T., Kinoshita, K.: Prediction of disordered regions in proteins based on the meta approach. Bioinformatics 24(11), 1344–1348 (2008)CrossRefGoogle Scholar
  16. 16.
    Yuan, Q., Gao, J., Wu, D., Zhang, S., Mamitsuka, H., Zhu, S.: DrugE-Rank: improving drug – target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics 32, 18–27 (2016)CrossRefGoogle Scholar
  17. 17.
    Steinwart, I., Christmann, A.: Support Vector Machines, 1st edn. Springer, New York (2008). Scholar
  18. 18.
    Sill, J., Takacs, G., Mackey, L., Lin, D.: Feature-Weighted Linear Stacking, pp. 1–17 (2009)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer Science, Graduate School of Information Science and EngineeringTokyo Institute of TechnologyTokyoJapan
  2. 2.Department of Computer Science, School of ComputingTokyo Institute of TechnologyTokyoJapan
  3. 3.Education Academy of Computational Life SciencesTokyo Institute of TechnologyTokyoJapan

Personalised recommendations