Accurate Prediction of Hot Spots with Greedy Gradient Boosting Decision Tree

  • Haomin GanEmail author
  • Jing Hu
  • Xiaolong Zhang
  • Qianqian Huang
  • Jiafu Zhao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10955)


Hot spot residues play a crucial role in protein-protein interactions, which are conducive to drug discovery and rational drug design. Only several amino acid residues provide most of the binding free energy for protein interface. These amino acids are called hot spots. This work is to predict hot spot residues by an ensemble machine learning method called Gradient Boosting Decision Tree in Alanine Scanning Energetics Database (ASEdb) and Structural Kinetic and Energetic database of Mutant Protein Interactions (SKEMPI). According to properties of amino acid and protein complex chain where the amino acid is, we design the a program that will not stop until the last most unimportant feature calculated in GBDT method is discarded in every iteration. Consequently, the greedy GBDT method can get a better prediction on hot spot residues after comparing the result, one of evaluation criteria F-score reach at 0.808 in the ASEdb dataset.


Hot spot residues Greedy Gradient Boosting Decision Tree Amino acid properties Protein complex chain 



The authors thank the members of Machine Learning and Artificial Intelligence Laboratory, School of Computer Science and Technology, Wuhan University of Science and Technology, for their helpful discussion within seminars. This work is supported by the National Natural Science Foundation of China (No. 61702385).


  1. 1.
    Chothia, C., Janin, J.: Principles of protein-protein recognition. Nature 256(5520), 705 (1975)CrossRefGoogle Scholar
  2. 2.
    Clackson, T., Wells, J.A.: A hot spot of binding energy in a hormone-receptor interface. Science 267(5196), 383–386 (1995)CrossRefGoogle Scholar
  3. 3.
    Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280, 1–9 (1998)CrossRefGoogle Scholar
  4. 4.
    Gul, S., Hadian, K.: Protein-protein interaction modulator drug discovery: past efforts and future opportunities using a rich source of low- and high-throughput screening assays. Expert Opin. Drug Discov. 9(12), 1393–1404 (2014)CrossRefGoogle Scholar
  5. 5.
    Thorn, K.S., Bogan, A.A.: ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 17(3), 284–285 (2001)CrossRefGoogle Scholar
  6. 6.
    Kortemme, T., Baker, D.: A simple physical model for binding energy hot spots in protein-protein complexes. Proc. Natl. Acad. Sci. U. S. A. 99(22), 14116–14121 (2002)CrossRefGoogle Scholar
  7. 7.
    Tuncbag, N., Gursoy, A., Keskin, O.: Identification of Computational Hot Spots in Protein Interfaces: Combining Solvent Accessibility and Inter-residue Potentials Improves the Accuracy. Oxford University Press, Oxford (2009)Google Scholar
  8. 8.
    Tuncbag, N., Keskin, O., Gursoy, A.: Hotpoint: hot spot prediction server for protein interfaces. Nucleic Acids Research 38(Web Server issue), 402–406 (2010)CrossRefGoogle Scholar
  9. 9.
    Agrawal, N.J., Bernhard, H., Trout, B.L.: A computational tool to predict the evolutionarily conserved protein-protein interaction hot-spot residues from the structure of the unbound protein. FEBS Lett. 588(2), 326–333 (2014)CrossRefGoogle Scholar
  10. 10.
    Chen, P., Li, J., Wong, L., Kuwahara, H., Huang, J., Gao, X.: Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences. Proteins Struct. Funct. Bioinform. 81(8), 1351–1362 (2013)CrossRefGoogle Scholar
  11. 11.
    Xia, J.F., Zhao, X.M., Song, J., Huang, D.S.: APIs: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinform. 11(1), 174 (2010)CrossRefGoogle Scholar
  12. 12.
    Huang, Q.Q., Zhang, X.L.: An improved ensemble learning method with SMOTE for protein interaction hot spots prediction. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 1584–1589 (2017)Google Scholar
  13. 13.
    Hu, S.S., Peng, C., Bing, W., Li, J.: Protein binding hot spots prediction from sequence only by a new ensemble learning method. Amino Acids 49(1), 1–13 (2017)CrossRefGoogle Scholar
  14. 14.
    Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Ma, X., Ding, C., Luan, S., Wang, Y., Wang, Y.: Prioritizing influential factors for freeway incident clearance time prediction using the gradient boosting decision trees method. IEEE Trans. Intell. Transp. Syst. 18(9), 2303–2310 (2017)CrossRefGoogle Scholar
  16. 16.
    Moal, I.H., Fernándezrecio, J.: SKEMPI: a structural kinetic and energetic database of mutant protein interactions and its use in empirical models. Bioinformatics 28(20), 2600–2607 (2012)CrossRefGoogle Scholar
  17. 17.
    Mihel, J., Sikić, M., Tomić, S., Jeren, B., Vlahovicek, K.: PSAIA - protein structure and interaction analyzer. BMC Struct. Biol. 8(1), 21 (2008)CrossRefGoogle Scholar
  18. 18.
    Li, X., Keskin, O., Ma, B., Nussinov, R., Liang, J.: Protein-protein interactions: hot spots and structurally conserved residues often locate in complemented pockets that pre-organized in the unbound states: implications for docking. J. Mol. Biol. 344(3), 781–795 (2004)CrossRefGoogle Scholar
  19. 19.
    Jing, H., Li, J., Chen, N., Zhang, X.: Conservation of hot regions in protein-protein interaction in evolution. Methods 110, 73–80 (2016)CrossRefGoogle Scholar
  20. 20.
    Collins, J.C., Bedford, J.T., Greene, L.H.: Elucidating the key determinants of structure, folding, and stability for the, conformation of the b1 domain of protein g using bioinformatics approaches. IEEE Trans. Nanobiosci. 15(2), 140–147 (2016)CrossRefGoogle Scholar
  21. 21.
    Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., et al.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389 (1997)CrossRefGoogle Scholar
  22. 22.
    Xu, Z., Huang, G., Weinberger, K.Q., Zheng, A.X.: Gradient boosted feature selection, pp. 522–531. ACM (2014)Google Scholar
  23. 23.
    Nan, D., Zhang, X.: Prediction of hot regions in protein-protein interactions based on complex network and community detection. In: IEEE International Conference on Bioinformatics and Biomedicine, pp. 17–23. IEEE (2014)Google Scholar
  24. 24.
    Hu, J., Zhang, X., Liu, X., Tang, J.: Prediction of hot regions in protein-protein interaction by combining density-based incremental clustering with feature-based classification. Comput. Biol. Med. 61(C), 127–137 (2015)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Haomin Gan
    • 1
    • 2
    Email author
  • Jing Hu
    • 1
    • 2
  • Xiaolong Zhang
    • 1
    • 2
  • Qianqian Huang
    • 1
    • 2
  • Jiafu Zhao
    • 1
    • 2
  1. 1.School of Computer Science and TechnologyWuhan University of Science and TechnologyWuhanChina
  2. 2.Hubei Laboratory of Intelligent Information Processing and Real-Time Industrial SystemWuhanChina

Personalised recommendations