Skip to main content
Log in

Protein binding hot spots prediction from sequence only by a new ensemble learning method

  • Original Article
  • Published:
Amino Acids Aims and scope Submit manuscript

Abstract

Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model that combines physicochemical features with the relative accessible surface area of amino acid sequences for hot spot prediction. The model consists of 83 classifiers involving the IBk (Instance-based k means) algorithm, where instances are encoded by important properties extracted from a total of 544 properties in the AAindex1 (Amino Acid Index) database. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods, yielding an F1 score of 0.80 on the benchmark binding interface database (BID) test set.Availability: http://www2.ahu.edu.cn/pchen/web/HotspotEC.htm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Aha DW, Kibler D, Albert MK (1991) Instance-Based Learning Algorithms. Machine Learning. 6(1):37–66

    Google Scholar 

  • Bogan AA, Thorn KS (1998 Jul) Anatomy of hot spots in protein interfaces. J Mol Biol. 280:1–9

    Article  CAS  PubMed  Google Scholar 

  • Brenke R, Kozakov D, Chuang GY, Beglov D, Hall D, Landon MR, et al. Fragment-based identification of druggable ’hot spots’ of proteins using Fourier domain correlation techniques. Bioinformatics (Oxford, England). 2009;25:621–7

  • Chen R, Chen W, Yang S, Wu D, Wang Y, Tian Y et al (2011) Rigorous assessment and integration of the sequence and structure based features to predict hot spots. BMC Bioinformatics. 12:311–311

    Article  CAS  Google Scholar 

  • Chothia C, Janin J (1975) Principles of proteinprotein recognition. Nature. 256(5520):705

  • Clackson T, Wells JA (1995 Jan) A hot spot of binding energy in a hormone-receptor interface. Science (New York, NY) 267:383–6

    Article  CAS  Google Scholar 

  • Chen P, Li J, Wong L, Kuwahara H, Huang JZ, Gao X. Accurate prediction of hot spot residues through physicochemical characteristics of amino acid sequences. Proteins. 2013 Aug;81(8):1351–1362. Available from: http://dx.doi.org/10.1002/prot.24278

  • Chou KC (2001 May) Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins. 43:246–55

    Article  CAS  PubMed  Google Scholar 

  • Ki Cho (2009 May) Kim D, Lee D. A feature-based approach to modeling protein-protein interaction hot spots. Nucleic acids research. 37:2672–87

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Darnell SJ, Page D, Mitchell JC (2007 Sep) An automated decision-tree approach to predicting protein interaction hot spots. Proteins. 68:813–23

    Article  CAS  PubMed  Google Scholar 

  • Darnell SJ, LeGault L, Mitchell JC (2008 Jul) KFC Server: interactive forecasting of protein interaction hot spots. Nucleic acids research. 36:W265–9

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • DeLano WL (2002 Feb) Unraveling hot spots in binding interfaces: progress and challenges. Current opinion in structural biology. 12:14–20

    Article  CAS  PubMed  Google Scholar 

  • Emsley J, Knight CG, Farndale RW, Barnes MJ, Liddington RC (2000 Mar) Structural basis of collagen recognition by integrin alpha2beta1. Cell. 101:47–56

    Article  CAS  PubMed  Google Scholar 

  • Fasman GD, Sober HA, et al. Handbook of biochemistry and molecular biology. vol. 1. CRC press, Cleveland; 1977

  • Fernandezrecio J (2011) Prediction of protein binding sites and hot spots. Wiley Interdisciplinary Reviews: Computational Molecular Science. 1(5):680–698

    Article  CAS  Google Scholar 

  • Fischer TB, Arunachalam KV, Bailey D, Mangual V, Bakhru S, Russo R et al (2003) The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics. 19(11):1453–1454

    Article  CAS  PubMed  Google Scholar 

  • Di Giulio M (2005 Feb) A comparison of proteins from Pyrococcus furiosus and Pyrococcus abyssi: barophily in the physicochemical properties of amino acids and in the genetic code. Gene. 346:1–6

    Article  PubMed  Google Scholar 

  • Guerois R, Nielsen JE, Serrano L (2002 Jul) Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. Journal of molecular biology. 320:369–87

    Article  CAS  PubMed  Google Scholar 

  • Kawashima S, Kanehisa M (2000 Jan) AAindex: amino acid index database. Nucleic acids research. 28:374

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M, AAindex: amino acid index database, progress report 2008. Nucleic Acids Res. (2008) Jan; 36(Database issue):D202–D205. Available from. doi:10.1093/nar/gkm998

    Article  CAS  PubMed  Google Scholar 

  • Kim DE, Chivian D, Baker D (2004 Jul) Protein structure prediction and analysis using the Robetta server. Nucleic acids research. 32:W526–31

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kortemme T, Baker D (2002 Oct) A simple physical model for binding energy hot spots in protein-protein complexes. Proceedings of the National Academy of Sciences of the United States of America. 99:14116–21

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Kortemme T, Kim DE, Baker D. Computational alanine scanning of protein-protein interfaces. Science’s STKE : signal transduction knowledge environment. 2004 Feb;2004:pl2

  • Li J, Liu Q. ’Double water exclusion’: a hypothesis refining the O-ring theory for the hot spots at protein interfa. Bioinformatics (Oxford, England). 2009 25:743–50

    Article  CAS  Google Scholar 

  • Li Z, Wong L, Li J (2011) DBAC: a simple prediction method for protein binding hot spots based on burial levels and deeply buried atomic contacts. BMC systems biology. 5(Suppl 1):S5

    Article  CAS  Google Scholar 

  • Martins JM, Ramos RM, Pimenta AC, Moreira IS (2014 Mar) Solvent-accessible surface area: How well can be applied to hot-spot detection? Proteins. 82:479–90

    Article  CAS  PubMed  Google Scholar 

  • Moal IH, Fernandezrecio J (2012) SKEMPI: A Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models. Bioinformatics. 28(20):2600–2607

    Article  CAS  PubMed  Google Scholar 

  • Moreira IS, Fernandes PA, Ramos MJ (2007 Sep) Hot spots-a review of the protein-protein interface determinant amino-acid residues. Proteins. 68:803–12

    Article  CAS  PubMed  Google Scholar 

  • Naderi-Manesh H, Sadeghi M, Arab S, Moosavi Movahedi AA (2001 Mar) Prediction of protein surface accessibility with information theory. Proteins. 42:452–9

    Article  CAS  PubMed  Google Scholar 

  • Ofran Y, Rost B. ISIS: interaction sites identified from sequence. Bioinformatics (Oxford, England). 2007 Jan;23:e13–6

  • Ofran Y, Rost B (2007 Jul) Protein-protein interaction hotspots carved into sequences. PLoS computational biology. 3:e119

    Article  PubMed  PubMed Central  Google Scholar 

  • Shen HB, Chou KC (2008 Feb) PseAAC: a flexible web server for generating various kinds of protein pseudo amino acid composition. Analytical biochemistry. 373:386–8

    Article  CAS  PubMed  Google Scholar 

  • Shingate P, Sukhwal A, Sowdhamini R (2014) ECMIS: computational approach for the identification of hotspots at protein-protein interfaces. BMC Bioinformatics. 15(1):303

    Article  Google Scholar 

  • Sueki M, Lee S, Powers SP, Denton JB, Konishi Y, Scheraga HA (1984) Helix-coil stability constants for the naturally occurring amino acids in water. XXII. Histidine parameters from poly[(hydroxybutyl) glutamine-co-l-histidine]. Macromolecules. 17:148–155

    Article  CAS  Google Scholar 

  • Thorn KS, Bogan AA (2001) ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics. 17(3):284–285

    Article  CAS  PubMed  Google Scholar 

  • Tuncbag N, Keskin O, Gursoy A (2010 Jul) HotPoint: hot spot prediction server for protein interfaces. Nucleic acids research. 38:W402–6

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Tuncbag N, Gursoy A, Keskin O (2009) Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics. 25(12):1513–1520

    Article  CAS  PubMed  Google Scholar 

  • Wang L, Liu Z, Zhang X, Chen L (2012) Prediction of hot spots in protein interfaces using a random forest model with hybrid features. Protein Engineering Design & Selection. 25(3):119–126

    Article  CAS  Google Scholar 

  • Wang L, Zhang W, Gao Q, Xiong C (2014) Prediction of hot spots in protein interfaces using extreme learning machines with the information of spatial neighbour residues. Iet Systems Biology. 8(4):184–190

    Article  CAS  PubMed  Google Scholar 

  • Wells JA (1991) Systematic mutational analyses of protein-protein interfaces. Methods in enzymology. 202:390–411

    Article  CAS  PubMed  Google Scholar 

  • Xia J, Zhao X, Song J, Huang D (2010) APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics. 11:174

    Article  Google Scholar 

  • Xu B, Wei X, Deng L, Guan J, Zhou S (2012) A semi-supervised boosting SVM for predicting hot spots at protein-protein interfaces. BMC systems biology. 6(Suppl 2):S6

    Article  PubMed  PubMed Central  Google Scholar 

  • Ye L, Kuang Q, Jiang L, Luo J, Jiang Y, Ding Z et al (2014) Prediction of hot spots residues in proteinprotein interface using network feature and microenvironment feature. Chemometrics and Intelligent Laboratory Systems. 131:16–21

    Article  CAS  Google Scholar 

  • Zhu X, Mitchell JC (2011 Sep) KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features. Proteins. 79:2671–83

    Article  CAS  PubMed  Google Scholar 

  • Zwahlen C, Li SC, Kay LE, Pawson T, Forman-Kay JD (2000 Apr) Multiple modes of peptide recognition by the PTB domain of the cell fate determinant Numb. The EMBO journal. 19:1505–15

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Nos. 61672035, 61300058, 61472282, 61271098 and 61374181).

Author information

Authors and Affiliations

Authors

Contributions

Author contributions

SH and PC conceived the study; SH participated in the experimental design; SH and PC carried it out and drafted the manuscript. All authors revised the manuscript critically. JL and PC approved the final manuscript.

Corresponding author

Correspondence to Peng Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interests.

Ethical statement

The authors declare that their manuscript complies to the Ethical Rules applicable for this journal.

Additional information

Handling Editor: L. Taher.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6 KB)

Supplementary material 2 (pdf 6 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, SS., Chen, P., Wang, B. et al. Protein binding hot spots prediction from sequence only by a new ensemble learning method. Amino Acids 49, 1773–1785 (2017). https://doi.org/10.1007/s00726-017-2474-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00726-017-2474-6

Keywords

Navigation