A Logistic Regression Approach for Identifying Hot Spots in Protein Interfaces

  • Peipei Li
  • Keun Ho RyuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9267)


Protein–protein interactions occur when two or more proteins bind together, often to carry out their biological function. A small fraction of interfaces on protein surface found providing major contributions to the binding free energy are referred as hot spots. Identifying hot spots is important for examining the actions and properties occurring around the binding sites. However experimental studies require significant effort; and computational methods still have limitations in prediction performance and feature interpretation.

In this paper we describe a hot spots residues prediction measure which provides a significant improvement over other existing methods. Combining 8 features derived from accessibility, sequence conservation, inter-residue potentials, computational alanine scanning, small-world structure characteristics, phi-psi interaction, and contact number, logistic regression is used to derive a prediction model. To demonstrate its effectiveness, the proposed method is applied to ASEdb. Our prediction model achieves an accuracy of 0.819, F1 score of 0.743. Experimental results show that the additional features can improve the prediction performance. Especially phi-psi has been found to give important effort. We then perform an exhaustive comparison of our method with various machine learning based methods and those previously published prediction models in the literature. Empirical studies show that our method can yield significantly better prediction performance.


Protein–protein interactions Binding sites Protein hot spots prediction Logistic regression 



This work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (No. 2013R1A2A2A01068923) and by the ITRC(Information Technology Research Center) support program (NIPA-2014-H0301-14-1002)


  1. 1.
    Li, P., Heo, L., Li, M., Ryu, K.H.: Protein function prediction using frequent patterns in protein-protein interaction networks. FSDK 3, 1664–1668 (2011)Google Scholar
  2. 2.
    Jones, S., Thornton, J.M.: Principles of protein-protein interactions. Proc. Natl. Acad. Sci. 93(1), 13–20 (1996)CrossRefGoogle Scholar
  3. 3.
    Clackson, T., Wells, J.A.: A hot spot of binding energy in a hormone-receptor interface. Science 267(5196), 383–386 (1995)CrossRefGoogle Scholar
  4. 4.
    Morrison, K.L., Weiss, G.A.: Combinatorial alanine-scanning. Curr. Opin. Chem. Biol. 5(3), 302–307 (2001)CrossRefGoogle Scholar
  5. 5.
    Thorn, K.S., Bogan, A.A.: ASEdb: a database of alanine mutations and their effects on the free energy of binding in protein interactions. Bioinformatics 17(3), 284–285 (2001)CrossRefGoogle Scholar
  6. 6.
    Fischer, T.B., Arunachalam, K.V., Bailey, D., Mangual, V., Bakhru, S., et al.: The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)CrossRefGoogle Scholar
  7. 7.
    Bogan, A.A., Thorn, K.S.: Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280(1), 1–9 (1998)CrossRefGoogle Scholar
  8. 8.
    Ma, B., Elkayam, T., Wolfson, H., Nussinov, R.: Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc. Natl. Acad. Sci. 100(10), 5772–5777 (2003)CrossRefGoogle Scholar
  9. 9.
    Keskin, O., Ma, B., Nussinov, R.: Hot regions in protein–protein interactions: the organization and contribution of structurally conserved hot spot residues. J. Mol. Biol. 345(5), 1281–1294 (2005)CrossRefGoogle Scholar
  10. 10.
    Chen, X., Jeong, J.: Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 25(5), 585–591 (2009)CrossRefGoogle Scholar
  11. 11.
    Li, N., Sun, Z., Jiang, F.: Prediction of protein-protein binding site by using core interface residue and support vector machine. BMC Bioinform. 9(1), 553 (2008)CrossRefGoogle Scholar
  12. 12.
    Xia, J.F., Zhao, X.M., Song, J., Huang, D.S.: APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinform. 11, 174 (2010)CrossRefGoogle Scholar
  13. 13.
    Tuncbag, N., Keskin, O., Gursoy, A.: HotPoint: hot spot prediction server for protein interfaces. Nucleic Acids Res. 38, W402–W406 (2010)CrossRefGoogle Scholar
  14. 14.
    Darnell, S.J., Page, D., Mitchell, J.C.: An automated decision-tree approach to predicting protein interaction hot spots. Proteins 68, 813–823 (2007)CrossRefGoogle Scholar
  15. 15.
    Del, Sol: A. and O’Meara, P.: Small-world network approach to identify key residues in protein-protein interaction. Proteins 58(3), 672–682 (2005)CrossRefGoogle Scholar
  16. 16.
    Shrake, A., Rupley, J.A.: Environment and exposure to solvent of protein atoms. Lysozyme and insulin. J. Mol. Biol. 79, 351–371 (1973)CrossRefGoogle Scholar
  17. 17.
    Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)CrossRefGoogle Scholar
  18. 18.
    Tuncbag, N., Gursoy, A., Keskin, O.: Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25(12), 1513–1520 (2009)CrossRefGoogle Scholar
  19. 19.
    Hubbard, S.J., Thornton, J.M.: NACCESS. Department of Biochemistry and Molecular Biology, University College, London (1993)Google Scholar
  20. 20.
    Sankararaman, S., Sha, F., Kirsch, J.F., Jordan, M.I., Sjölander, K.: Active site prediction using evolutionary and structural information. Bioinformatics 26(5), 617–624 (2010)CrossRefGoogle Scholar
  21. 21.
    Guney, E., Tuncbag, N., Keskin, O., Gursoy, A.: HotSprint: database of computational hot spots in protein interfaces. Nucleic Acids Res. 36, D662–D666 (2008)CrossRefGoogle Scholar
  22. 22.
    Dodge, C., Schneider, R., Sander, C.: The HSSP database of protein structure-sequence alignments and family profiles. Nucleic Acids Res. 26(1), 313–315 (1998)CrossRefGoogle Scholar
  23. 23.
    Mayrose, I., Graur, D., Ben-Tal, N., Pupko, T.: Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. Mol. Biol. Evol. 21(9), 1781–1791 (2004)CrossRefGoogle Scholar
  24. 24.
    Jernigan, R.L., Bahar, I.: Structure-derived potentials and protein simulations. Curr. Opin. Struct. Biol. 6(2), 195–209 (1996)CrossRefGoogle Scholar
  25. 25.
    Greene, L.H., Higman, V.A.: Uncovering network systems within protein structures. J. Mol. Biol. 334(4), 781–791 (2003)CrossRefGoogle Scholar
  26. 26.
    Holland, R.C., Down, T.A., Pocock, M., Prlić, A., Huen, D., et al.: BioJava: an open-source framework for bioinformatics. Bioinformatics 24(18), 2096–2097 (2008)CrossRefGoogle Scholar
  27. 27.
    Pollastri, G., Baldi, P., Fariselli, P., Casadio, R.: Prediction of coordination number and relative solvent accessibility in proteins. Proteins 47, 142–153 (2002)CrossRefGoogle Scholar
  28. 28.
    Li, P., Pok, G., Jung, K.S., Shon, H.S., Ryu, K.H.: QSE: A new solvent exposure measure for the analysis of protein structure. Proteomics 11(19), 3793–3801 (2011)CrossRefGoogle Scholar
  29. 29.
    Karchin, R., Cline, M., Karplus, K.: Evaluation of local structure alphabets based on residue burial. Proteins. 55, 508–518 (2004)CrossRefGoogle Scholar
  30. 30.
    Levesque, R.: SPSS Programming and Data Management: A Guide for SPSS and SAS Users, 4th edn. SPSS Inc., Chicago Ill (2007)Google Scholar
  31. 31.
    Hartley, R.W.: Barnase and barstar: two small proteins to fold and fit together. Trends Biochem. Sci. 14(11), 450–454 (1989)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Database/Bioinformatics LaboratoryChungbuk National UniversityCheongjuSouth Korea

Personalised recommendations