Accurate Prediction of Hot Spots with Greedy Gradient Boosting Decision Tree
Hot spot residues play a crucial role in protein-protein interactions, which are conducive to drug discovery and rational drug design. Only several amino acid residues provide most of the binding free energy for protein interface. These amino acids are called hot spots. This work is to predict hot spot residues by an ensemble machine learning method called Gradient Boosting Decision Tree in Alanine Scanning Energetics Database (ASEdb) and Structural Kinetic and Energetic database of Mutant Protein Interactions (SKEMPI). According to properties of amino acid and protein complex chain where the amino acid is, we design the a program that will not stop until the last most unimportant feature calculated in GBDT method is discarded in every iteration. Consequently, the greedy GBDT method can get a better prediction on hot spot residues after comparing the result, one of evaluation criteria F-score reach at 0.808 in the ASEdb dataset.
KeywordsHot spot residues Greedy Gradient Boosting Decision Tree Amino acid properties Protein complex chain
The authors thank the members of Machine Learning and Artificial Intelligence Laboratory, School of Computer Science and Technology, Wuhan University of Science and Technology, for their helpful discussion within seminars. This work is supported by the National Natural Science Foundation of China (No. 61702385).
- 7.Tuncbag, N., Gursoy, A., Keskin, O.: Identification of Computational Hot Spots in Protein Interfaces: Combining Solvent Accessibility and Inter-residue Potentials Improves the Accuracy. Oxford University Press, Oxford (2009)Google Scholar
- 12.Huang, Q.Q., Zhang, X.L.: An improved ensemble learning method with SMOTE for protein interaction hot spots prediction. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 1584–1589 (2017)Google Scholar
- 22.Xu, Z., Huang, G., Weinberger, K.Q., Zheng, A.X.: Gradient boosted feature selection, pp. 522–531. ACM (2014)Google Scholar
- 23.Nan, D., Zhang, X.: Prediction of hot regions in protein-protein interactions based on complex network and community detection. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 17–23. IEEE (2014)Google Scholar