Feature Design for Protein Interface Hotspots Using KFC2 and Rosetta

  • Franziska Seeger
  • Anna Little
  • Yang Chen
  • Tina Woolf
  • Haiyan Cheng
  • Julie C. MitchellEmail author
Part of the Association for Women in Mathematics Series book series (AWMS, volume 17)


Protein–protein interactions regulate many essential biological processes and play an important role in health and disease. The process of experimentally characterizing protein residues that contribute the most to protein–protein interaction affinity and specificity is laborious. Thus, developing models that accurately characterize hotspots at protein–protein interfaces provides important information about how to inhibit therapeutically relevant protein–protein interactions. During the course of the ICERM WiSDM workshop 2017, we combined the KFC2a protein–protein interaction hotspot prediction features with Rosetta scoring function terms and interface filter metrics. A two-way and three-way forward selection strategy was employed to train support vector machine classifiers, as was a reverse feature elimination strategy. From these results, we identified subsets of KFC2a and Rosetta combined features that show improved performance over KFC2a features alone.



The feature table and feature selection code are available by email to the corresponding author. We thank the Association for Women in Mathematics (AWM) and the Brown University Institute for Computational and Experimental Research in Mathematics (ICERM) for hosting the Women in Data Science and Mathematics (WiSDM) workshop. The Brown University Center for Computation and Visualization (CCV) and the Institute for Protein Design at the University of Washington provided computational resources used for this project. Participation by JM was sponsored by the National Science Foundation [NSF DMS 1160360]. The AWM Advance Program supported participation by FS, AL, YC, TW, and HC. Participation by TW was also supported by DIMACS. FS is generously funded by the Washington Research Foundation Institute for Protein Design Postdoctoral Innovation Fellowship.


  1. 1.
    M.E. Abram, A.L. Ferris, W. Shao, W.G. Alvord, S.H. Hughes, Nature, position, and frequency of mutations made in a single cycle of HIV-1 replication. J. Virol. 84(19), 9864–9878 (2010)CrossRefGoogle Scholar
  2. 2.
    S. Ahmad, O. Keskin, K. Mizuguchi, A. Sarai, R. Nussinov, CCRXP: exploring clusters of conserved residues in protein structures. Nucleic Acids Res. 38(Web Server issue), W398–401 (2010)Google Scholar
  3. 3.
    R.F. Alford, A. Leaver-Fay, J.R. Jeliazkov, M.J. O’Meara, F.P. DiMaio, H. Park, M.V. Shapovalov, P.D. Renfrew, V.K. Mulligan, K. Kappel, J.W. Labonte, M.S. Pacella, R. Bonneau, P. Bradley, R.L. Dunbrack, R. Das, D. Baker, B. Kuhlman, T. Kortemme, J.J. Gray, The Rosetta all-atom energy function for macromolecular modeling and design. J. Chem. Theory Comput. 13(6), 3031–3048 (2017)CrossRefGoogle Scholar
  4. 4.
    S.A. Assi, T. Tanaka, T.H. Rabbitts, N. Fernandez-Fuentes, PCRPi: Presaging Critical Residues in Protein interfaces, a new computational tool to chart hot spots in protein interfaces. Nucleic Acids Res. 38(6), e86 (2010)Google Scholar
  5. 5.
    F. Bahram, N. von der Lehr, C. Cetinkaya, L.G. Larsson, c-Myc hot spot mutations in lymphomas result in inefficient ubiquitination and decreased proteasome-mediated turnover. Blood 95(6), 2104–2110 (2000)Google Scholar
  6. 6.
    A. Ben-Shimon, M. Eisenstein, Computational mapping of anchoring spots on protein surfaces. J. Mol. Biol. 402(1), 259–277 (2010)CrossRefGoogle Scholar
  7. 7.
    A.A. Bogan, K.S. Thorn, Anatomy of hot spots in protein interfaces. J. Mol. Biol. 280(1), 1–9 (1998)CrossRefGoogle Scholar
  8. 8.
    R.T. Bradshaw, B.H. Patel, E.W. Tate, R.J. Leatherbarrow, I.R. Gould, Comparing experimental and computational alanine scanning techniques for probing a prototypical protein-protein interaction. Protein Eng. Des. Sel. 24(1–2), 197–207 (2011)CrossRefGoogle Scholar
  9. 9.
    A. Chevalier, D.A. Silva, G.J. Rocklin, D.R. Hicks, R. Vergara, P. Murapa, S.M. Bernard, L. Zhang, K.H. Lam, G. Yao et al., Massively parallel de novo protein design for targeted therapeutics. Nature 550(7674), 74–79 (2017)CrossRefGoogle Scholar
  10. 10.
    N. Christianini, J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods (Cambridge University Press, Cambridge, 2000)CrossRefGoogle Scholar
  11. 11.
    G.Y. Chuang, R. Mehra-Chaudhary, C.H. Ngan, B.S. Zerbe, D. Kozakov, S. Vajda, L.J. Beamer, Domain motion and interdomain hot spots in a multidomain enzyme. Protein Sci. 19(9), 1662–1672 (2010)CrossRefGoogle Scholar
  12. 12.
    E. Cukuroglu, A. Gursoy, O. Keskin, HotRegion: a database of predicted hot spot clusters. Nucleic Acids Res. 40(Database issue), D829–33 (2012)CrossRefGoogle Scholar
  13. 13.
    S.J. Darnell, D. Page, J.C. Mitchell, An automated decision-tree approach to predicting protein interaction hot spots. Proteins Struct. Funct. Bioinform. 68(4), 813–823 (2007)CrossRefGoogle Scholar
  14. 14.
    S.J. Darnell, L. LeGault, J.C. Mitchell, KFC server: interactive forecasting of protein interaction hot spots. Nucleic Acids Res. 36(Web Server issue), W265–W269 (2008)Google Scholar
  15. 15.
    W. DeLano, Unraveling hot spots in binding interfaces: progress and challenges. Curr. Opin. Struct. Biol. 12(1), 14–20 (2002)CrossRefGoogle Scholar
  16. 16.
    J.E. Donald, H. Zhu, R.I. Litvinov, W.F. DeGrado, J.S. Bennett, Identification of interacting hot spots in the beta3 integrin stalk using comprehensive interface design. J. Biol. Chem. 285(49), 38658–38665 (2010)CrossRefGoogle Scholar
  17. 17.
    A. Fischer, K. Arunachalam, V. Mangual, S. Bakhru, R. Russo, D. Huang, M. Paczkowski, V. Lalchandani, C. Ramachandra, B. Ellison, S. Galer, J. Shapley, E. Fuentes, J. Tsai, The binding interface database (BID): a compilation of amino acid hot spots in protein interfaces. Bioinformatics 19(11), 1453–1454 (2003)CrossRefGoogle Scholar
  18. 18.
    S. Grosdidier, J. Fernandez-Recio, Identification of hot-spot residues in protein-protein interactions by computational docking. BMC Bioinform. 9, 447 (2008)CrossRefGoogle Scholar
  19. 19.
    R. Guerois, J.E. Nielsen, L. Serrano, Predicting changes in the stability of proteins and protein complexes: a study of more than 1000 mutations. J. Mol. Biol. 320(2), 369–387 (2002)CrossRefGoogle Scholar
  20. 20.
    I. Guyon, A. Elisseeff, An introduction to variable and feature selection. J. Mach. Learn. Res. 3(Mar), 1157–1182 (2003)zbMATHGoogle Scholar
  21. 21.
    I. Halperin, H. Wolfson, R. Nussinov, Protein-protein interactions; coupling of structurally conserved residues and of hot spots across interfaces. Implications for docking. Structure (London, England : 1993) 12(6), 1027–1038 (2004)Google Scholar
  22. 22.
    S. Jones, J.M. Thornton, Analysis of protein-protein interaction sites using surface patches. J. Mol. Biol. 272(1), 121–132 (1997)CrossRefGoogle Scholar
  23. 23.
    L. Kelly, H. Fukushima, R. Karchin, J.M. Gow, L.W. Chinn, U. Pieper, M.R. Segal, D.L. Kroetz, A. Sali, Functional hot spots in human ATP-binding cassette transporter nucleotide binding domains. Protein Sci. 19(11), 2110–2121 (2010)CrossRefGoogle Scholar
  24. 24.
    O. Keskin, B.Y. Ma, R. Nussinov, Hot regions in protein-protein interactions: the organization and contribution of structurally conserved hot spot residues. J. Mol. Biol. 345(5), 1281–1294 (2005)CrossRefGoogle Scholar
  25. 25.
    D. Kim, A feature-based approach to modeling protein-protein interaction hot spots. Nucleic Acids Res. 37(8), 2672–2687 (2009)CrossRefGoogle Scholar
  26. 26.
    N. Koga, R. Tatsumi-Koga, G. Liu, R. Xiao, T.B. Acton, G.T. Montelione, D. Baker, Principles for designing ideal protein structures. Nature 491(7423), 222–227 (2012)CrossRefGoogle Scholar
  27. 27.
    R. Kohavi, G.H. John, Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)zbMATHCrossRefGoogle Scholar
  28. 28.
    T.T. Kortemme, D.D. Baker, A simple physical model for binding energy hot spots in protein-protein complexes. Proc. Natl. Acad. Sci. U. S. A. 99(22), 14116–14121 (2002)CrossRefGoogle Scholar
  29. 29.
    D.M. Krüger, H. Gohlke, DrugScorePPI webserver: fast and accurate in silico alanine scanning for scoring protein-protein interactions. Nucleic Acids Res. 38(Web Server issue), W480–W486 (2010)Google Scholar
  30. 30.
    B. Kuhlman, G. Dantas, G.C. Ireton, G. Varani, B.L. Stoddard, D. Baker, Design of a novel globular protein fold with atomic-level accuracy. Science 302(5649), 1364–1368 (2003)CrossRefGoogle Scholar
  31. 31.
    M.C. Lawrence, P.M. Colman, Shape complementarity at protein/protein interfaces. J. Mol. Biol. 234(4), 946–950 (1993)CrossRefGoogle Scholar
  32. 32.
    A. Leaver-Fay, M. Tyka, S.M. Lewis, O.F. Lange, J. Thompson, R. Jacak, K. Kaufman, P.D. Renfrew, C.A. Smith, W. Sheffler, I.W. Davis, S. Cooper, A. Treuille, D.J. Mandell, F. Richter, Y.E.A. Ban, S.J. Fleishman, J.E. Corn, D.E. Kim, S. Lyskov, M. Berrondo, S. Mentzer, Z. Popović, J.J. Havranek, J. Karanicolas, R. Das, J. Meiler, T. Kortemme, J.J. Gray, B. Kuhlman, D. Baker, P. Bradley, Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011)Google Scholar
  33. 33.
    O. Lichtarge, H.R. Bourne, F.E. Cohen, An evolutionary trace method defines binding surfaces common to protein families. J. Mol. Biol. 257(2), 342–358 (1996)CrossRefGoogle Scholar
  34. 34.
    S. Lise, C. Archambeau, M. Pontil, D.T. Jones, Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinform. 10, 365 (2009)CrossRefGoogle Scholar
  35. 35.
    Q. Liu, J. Li, Protein binding hot spots and the residue-residue pairing preference: a water exclusion perspective. BMC Bioinform. 11, 244 (2010)CrossRefGoogle Scholar
  36. 36.
    N.A.G. Meenan, A. Sharma, S.J. Fleishman, C.J. Macdonald, B. Morel, R. Boetzel, G.R. Moore, D. Baker, C. Kleanthous, The structural and energetic basis for high selectivity in a high-affinity protein-protein interaction. Proc. Natl. Acad. Sci. U. S. A. 107(22), 10080–10085 (2010)CrossRefGoogle Scholar
  37. 37.
    R. Metternich, G. Tarzia, “Hot spots” in medicinal chemistry. ChemMedChem 5(8), 1159–1162 (2010)CrossRefGoogle Scholar
  38. 38.
    I.H. Moal, J. Fernández-Recio, SKEMPI: a Structural Kinetic and Energetic database of Mutant Protein Interactions and its use in empirical models. Bioinformatics 28(20), 2600–2607 (2012)CrossRefGoogle Scholar
  39. 39.
    J. Nayak, B. Naik, H. Behera, A comprehensive survey on support vector machine in data mining tasks: applications & challenges. Int. J. Database Theory Appl. 8(1), 169–186 (2015)CrossRefGoogle Scholar
  40. 40.
    Y. Ofran, B. Rost, Protein-protein interaction hotspots carved into sequences. PLoS Comput. Biol. 3(7), e119 (2007)Google Scholar
  41. 41.
    S. Ovchinnikov, H. Park, D.E. Kim, F. DiMaio, D. Baker, Protein structure prediction using Rosetta in CASP12. Proteins: Struct. Funct. Bioinform. 86, 113–116 (2017)CrossRefGoogle Scholar
  42. 42.
    S.E.A. Ozbabacan, A. Gursoy, O. Keskin, R. Nussinov, Conformational ensembles, signal transduction and residue hot spots: application to drug discovery. Curr. Opin. Drug Discov. Dev. 13(5), 527–537 (2010)Google Scholar
  43. 43.
    F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetzbMATHGoogle Scholar
  44. 44.
    D.M. Powers, Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. Int. J. Mach. Learn. Technol. 2(1), 37–63 (2011)MathSciNetCrossRefGoogle Scholar
  45. 45.
    V. Pulim, B. Berger, J. Bienkowska, Optimal contact map alignment of protein-protein interfaces. Bioinformatics 24(20), 2324–2328 (2008)CrossRefGoogle Scholar
  46. 46.
    D. Rajamani, S. Thiel, S. Vajda, C.J. Camacho, Anchor residues in protein-protein interactions. Proc. Natl. Acad. Sci. U. S. A. 101(31), 11287–11292 (2004)CrossRefGoogle Scholar
  47. 47.
    I. Res, O. Lichtarge, Character and evolution of protein-protein interfaces. Phys. Biol. 2(2), S36–S43 (2005)CrossRefGoogle Scholar
  48. 48.
    J. Segura, N. Fernandez-Fuentes, PCRPi-DB: a database of computationally annotated hot spots in protein interfaces. Nucleic Acids Res. 39(Database issue), D755–60 (2011)CrossRefGoogle Scholar
  49. 49.
    J. Segura Mora, S.A. Assi, N. Fernandez-Fuentes, Presaging critical residues in protein interfaces-web server (PCRPi-W): a web server to chart hot spots in protein interfaces. PLoS One 5(8), e12352 (2010)Google Scholar
  50. 50.
    A. Shulman-Peleg, M. Shatsky, R. Nussinov, H.J. Wolfson, Spatial chemical conservation of hot spot interactions in protein-protein complexes. BMC Biol. 5, 43 (2007)CrossRefGoogle Scholar
  51. 51.
    A. Shulman-Peleg, M. Shatsky, R. Nussinov, H.J. Wolfson, MultiBind and MAPPIS: webservers for multiple alignment of protein 3D-binding sites and their interactions. Nucleic Acids Res. 36(Web Server issue), W260–W264 (2008)Google Scholar
  52. 52.
    K. Tharakaraman, L.N. Robinson, A. Hatas, Y.L. Chen, L. Siyue, S. Raguram, V. Sasisekharan, G.N. Wogan, R. Sasisekharan, Redesign of a cross-reactive antibody to dengue virus with broad-spectrum activity and increased in vivo potency. Proc. Natl. Acad. Sci. U.S.A. 110(17), E1555–E1564 (2013)CrossRefGoogle Scholar
  53. 53.
    N. Tuncbag, A. Gursoy, O. Keskin, Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy. Bioinformatics 25(12), 1513–1520 (2009)CrossRefGoogle Scholar
  54. 54.
    N. Tuncbag, O. Keskin, A. Gursoy, HotPoint: hot spot prediction server for protein interfaces. Nucleic Acids Res. 38(Web Server issue), W402–W406 (2010)Google Scholar
  55. 55.
    M. Ui, Y. Tanaka, T. Tsumuraya, I. Fujii, M. Inoue, M. Hirama, Structural and energetic hot-spots for the interaction between a ladder-like polycyclic ether and the anti-ciguatoxin antibody 10C9Fab. Mol. Biosyst. 7, 793–798 (2010)CrossRefGoogle Scholar
  56. 56.
    J.M. Ward, N.M. Gorenstein, J. Tian, S.F. Martin, C.B. Post, Constraining binding hot spots: NMR and molecular dynamics simulations provide a structural explanation for enthalpy-entropy compensation in SH2-ligand binding. J. Am. Chem. Soc. 132(32), 11058–11070 (2010)CrossRefGoogle Scholar
  57. 57.
    J.F. Xia, X.M. Zhao, J. Song, D.S. Huang, APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinform. 11, 174 (2010)CrossRefGoogle Scholar
  58. 58.
    L. Yu, H. Liu, Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5(Oct), 1205–1224 (2004)MathSciNetzbMATHGoogle Scholar
  59. 59.
    X. Zhu, J.C. Mitchell, KFC2: a knowledge-based hot spot prediction method based on interface solvation, atomic density, and plasticity features. Proteins Struct. Funct. Bioinform. 79(9), 1097–0134 (2011)CrossRefGoogle Scholar

Copyright information

© The Author(s) and the Association for Women in Mathematics 2019

Authors and Affiliations

  • Franziska Seeger
    • 1
  • Anna Little
    • 2
  • Yang Chen
    • 3
  • Tina Woolf
    • 4
  • Haiyan Cheng
    • 5
  • Julie C. Mitchell
    • 6
    • 7
    Email author
  1. 1.University of WashingtonInstitute for Protein DesignSeattleUSA
  2. 2.Michigan State UniversityEast LansingUSA
  3. 3.University of MichiganAnn ArborUSA
  4. 4.Jet Propulsion LaboratoryPasadenaUSA
  5. 5.Willamette UniversitySalemUSA
  6. 6.Oak Ridge National LaboratoryKnoxvilleUSA
  7. 7.University of Wisconsin - MadisonMadisonUSA

Personalised recommendations