Consensus of Sample-Balanced Classifiers for Identifying Ligand-Binding Residue by Co-evolutionary Physicochemical Characteristics of Amino Acids

  • Peng Chen
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 375)


Protein-ligand binding is an important mechanism for some proteins to perform their functions, and those binding sites are the residues of proteins that physically bind to ligands. So far, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. Due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets, for each of which a random forest (RF)-based classifier was trained. The ensemble of these RF classifiers formed a sequence-based protein-ligand binding site predictor. Experimental results on CASP9 targets demonstrated that our method compared favorably with the state-of-the-art.


Protein-ligand binding Random forest Co-evolutionary encoding 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abbas, A., Kong, X.B., Liu, Z., et al.: Automatic Peak Selection by Abenjamini-hochberg-based Algorithm. PLoS One 8(1), e53112 (2013)Google Scholar
  2. 2.
    Alipanahi, B., Gao, X., Karakoc, E., et al.: Picky: A Novel Svd-based Nmr Spectra Peak Picking Pethod. Bioinformatics 25(12), i268–i275 (2009)Google Scholar
  3. 3.
    Alipanahi, B., Gao, X., Karakoc, E., et al.: Error Tolerant Nmr Backbone Resonance Assignment and Automated Structure Generation. J. Bioinform. Comput. Biol. 9(1), 15–41 (2011)CrossRefGoogle Scholar
  4. 4.
    Altschul, S.F., Madden, T.L., Schaffer, A.A., et al.: Gapped Blast and Psi-blast: A New Generation of Protein Database Search Programs. Nucleic Acids Res. 25(17), 3389–3402 (1997)CrossRefGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Machine Learning 45, 5–32 (2001)zbMATHCrossRefGoogle Scholar
  6. 6.
    Chen, P., Li, J.: Sequence-based Identification of Interface Residues by An Integrative Profile Combining Hydrophobic and Evolutionary Information. BMC Bioinformatics 11, 402 (2010)CrossRefGoogle Scholar
  7. 7.
    Chen, P., Li, J.: Prediction of Protein Long-range Contacts Using An Ensemble of Genetic Algorithm Classifiers with Sequence Profile Centers. BMC Struct. Biol. 10(Suppl. 1), S2 (2010)Google Scholar
  8. 8.
    Chen, P., Wong, L., Li, J.: Detection of Outlier Residues for Improving Interface Prediction in Protein Heterocomplexes. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1155–1165 (2012)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Chen, P., Li, J., Wong, L., et al.: Accurate Prediction of Hot Spot Residues Through Physicochemical Characteristics of Amino Acid Sequences. Proteins (2013)Google Scholar
  10. 10.
    Gao, X., Bu, D., Xu, J., et al.: Improving Consensus Contact Prediction via Server Correlation Reduction. BMC Struct. Biol. 9, 28 (2009)CrossRefGoogle Scholar
  11. 11.
    Gonzalez, A.J., Liao, L., Wu, C.H.: Predicting ligand binding residues and functional sites using multipositional correlations with graph theoretic clustering and kernel cca. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 992–1001 (2012)CrossRefGoogle Scholar
  12. 12.
    Jang, R., Gao, X., Li, M.: Towards Fully Automated Structure-based NMR Resonance Assignment of 15N-labeled Proteins from Automatically Picked Peaks. J. Comput. Biol. 18(3), 347–363 (2011)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Jang, R., Gao, X., Li, M.: Combining automated peak tracking in SAR by NMR with structure-based backbone assignment from 15N-NOESY. BMC Bioinformatics 13(Suppl. 3), S4 (2012)Google Scholar
  14. 14.
    Kauffman, C., Karypis, G.: Librus: Combined Machine Learning and Homology Information for Sequence-based Ligand-binding Residue Prediction. Bioinformatics 25(23), 3099–3107 (2009)CrossRefGoogle Scholar
  15. 15.
    Kawashima, S., Pokarowski, P., Pokarowska, M., et al.: Aaindex: Amino Acid Index Database, Progress report 2008. Nucleic Acids Res. 36(Database issue), D202–D205 (2008)Google Scholar
  16. 16.
    Liu, Z., Abbas, A., Jing, B.Y., et al.: Wavpeak: Picking Nmr Peaks Through Wavelet-Based Smoothing and Volume-based Filtering. Bioinformatics 28(7), 914–920 (2012)CrossRefGoogle Scholar
  17. 17.
    Messih, M.A., Chitale, M., Bajic, V.B., et al.: Protein Domain Recurrence and Order Can Enhance Prediction of Protein Functions. Bioinformatics 28(18), i444–i450 (2012)Google Scholar
  18. 18.
    Palmer, R.A., Niwa, H.: X-ray Crystallographic Studies of Protein-ligand Interactions. Biochem. Soc. Trans. 31(Pt. 5), 973–979 (2003)CrossRefGoogle Scholar
  19. 19.
    Passerini, A., Punta, M., Ceroni, A., et al.: Identifying Cysteines and Histidines in Transition-metal-binding Sites Using Support Vector Machines and Neural Networks. Proteins 65(2), 305–316 (2006)CrossRefGoogle Scholar
  20. 20.
    Pintacuda, G., John, M., Su, X.C., et al.: Nmr Structure Determination of Protein-Ligand Complexes by Lanthanide Labeling. Acc. Chem. Res. 40(3), 206–212 (2007)CrossRefGoogle Scholar
  21. 21.
    Schmidt, T., Haas, J., Gallo Cassarino, T., et al.: Assessment of Ligand-binding Residue Predictions in Casp9. Proteins 79(Suppl. 10), 126–136 (2011)CrossRefGoogle Scholar
  22. 22.
    Wang, B., Chen, P., Huang, D.S., et al.: Predicting Protein Interaction Sites from Residue Spatial Sequence Profile and Evolution Rate. FEBS Lett. 580(2), 380–384 (2006)CrossRefGoogle Scholar
  23. 23.
    Wang, J., Li, Y., Wang, Q., et al.: Proclusensem: Predicting Membrane Protein Types by Fusing Different Modes of Pseudo Amino Acid Composition. Comput. Biol. Med. 42(5), 564–574 (2012)CrossRefGoogle Scholar
  24. 24.
    Wang, J., Gao, X., Wang, Q., et al.: Prodis-contshc: Learning Protein Dissimilarity Measures and Hierarchical Context Coherently for Protein-protein Comparison in Protein Database Retrieval. BMC Bioinformatics 13(Suppl. 7), S2 (2012)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Peng Chen
    • 1
  1. 1.Computer, Electrical and Mathematical Sciences and Engineering DivisionKing Abdullah University of Science and Technology (KAUST)ThuwalSaudi Arabia

Personalised recommendations