Consensus of Sample-Balanced Classifiers for Identifying Ligand-Binding Residue by Co-evolutionary Physicochemical Characteristics of Amino Acids
Protein-ligand binding is an important mechanism for some proteins to perform their functions, and those binding sites are the residues of proteins that physically bind to ligands. So far, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. Due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we constructed several balanced data sets, for each of which a random forest (RF)-based classifier was trained. The ensemble of these RF classifiers formed a sequence-based protein-ligand binding site predictor. Experimental results on CASP9 targets demonstrated that our method compared favorably with the state-of-the-art.
KeywordsProtein-ligand binding Random forest Co-evolutionary encoding
Unable to display preview. Download preview PDF.
- 1.Abbas, A., Kong, X.B., Liu, Z., et al.: Automatic Peak Selection by Abenjamini-hochberg-based Algorithm. PLoS One 8(1), e53112 (2013)Google Scholar
- 2.Alipanahi, B., Gao, X., Karakoc, E., et al.: Picky: A Novel Svd-based Nmr Spectra Peak Picking Pethod. Bioinformatics 25(12), i268–i275 (2009)Google Scholar
- 7.Chen, P., Li, J.: Prediction of Protein Long-range Contacts Using An Ensemble of Genetic Algorithm Classifiers with Sequence Profile Centers. BMC Struct. Biol. 10(Suppl. 1), S2 (2010)Google Scholar
- 9.Chen, P., Li, J., Wong, L., et al.: Accurate Prediction of Hot Spot Residues Through Physicochemical Characteristics of Amino Acid Sequences. Proteins (2013)Google Scholar
- 13.Jang, R., Gao, X., Li, M.: Combining automated peak tracking in SAR by NMR with structure-based backbone assignment from 15N-NOESY. BMC Bioinformatics 13(Suppl. 3), S4 (2012)Google Scholar
- 15.Kawashima, S., Pokarowski, P., Pokarowska, M., et al.: Aaindex: Amino Acid Index Database, Progress report 2008. Nucleic Acids Res. 36(Database issue), D202–D205 (2008)Google Scholar
- 17.Messih, M.A., Chitale, M., Bajic, V.B., et al.: Protein Domain Recurrence and Order Can Enhance Prediction of Protein Functions. Bioinformatics 28(18), i444–i450 (2012)Google Scholar
- 24.Wang, J., Gao, X., Wang, Q., et al.: Prodis-contshc: Learning Protein Dissimilarity Measures and Hierarchical Context Coherently for Protein-protein Comparison in Protein Database Retrieval. BMC Bioinformatics 13(Suppl. 7), S2 (2012)Google Scholar