Minimization-Aware Recursive \(K^{*}\) (\({ MARK}^{*}\)): A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape

  • Jonathan D. Jou
  • Graham T. Holt
  • Anna U. Lowegard
  • Bruce R. DonaldEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11467)


Protein design algorithms that model continuous sidechain flexibility and conformational ensembles better approximate the in vitro and in vivo behavior of proteins. The previous state of the art, iMinDEE-\(A^*\)-\(K^*\), computes provable \(\varepsilon \)-approximations to partition functions of protein states (e.g., bound vs. unbound) by computing provable, admissible pairwise-minimized energy lower bounds on protein conformations and using the \(A^*\) enumeration algorithm to return a gap-free list of lowest-energy conformations. iMinDEE-A\(^*\)-\(K^*\) runs in time sublinear in the number of conformations, but can be trapped in loosely-bounded, low-energy conformational wells containing many conformations with highly similar energies. That is, iMinDEE-\(A^*\)-\(K^*\) is unable to exploit the correlation between protein conformation and energy: similar conformations often have similar energy. We introduce two new concepts that exploit this correlation: Minimization-Aware Enumeration and Recursive \(K^{*}\). We combine these two insights into a novel algorithm, Minimization-Aware Recursive \(K^{*}\) (\({ MARK}^{*}\)), that tightens bounds not on single conformations, but instead on distinct regions of the conformation space. We compare the performance of iMinDEE-\(A^*\)-\(K^*\) vs. \({ MARK}^{*}\) by running the \(BBK^*\) algorithm, which provably returns sequences in order of decreasing \(K^{*}\) score, using either iMinDEE-\(A^*\)-\(K^*\) or \({ MARK}^{*}\) to approximate partition functions. We show on 200 design problems that \({ MARK}^{*}\) not only enumerates and minimizes vastly fewer conformations than the previous state of the art, but also runs up to two orders of magnitude faster. Finally, we show that \({ MARK}^{*}\) not only efficiently approximates the partition function, but also provably approximates the energy landscape. To our knowledge, \({ MARK}^{*}\) is the first algorithm to do so. We use \({ MARK}^{*}\) to analyze the change in energy landscape of the bound and unbound states of the HIV-1 capsid protein C-terminal domain in complex with camelid V\(_{\mathrm{{H}}}\)H, and measure the change in conformational entropy induced by binding. Thus, \({ MARK}^{*}\) both accelerates existing designs and offers new capabilities not possible with previous algorithms.



We thank Goke Ojewole, Mark Hallen, Jeffrey Martin, Marcel Frenkel, Terrence Oas, Jane and Dave Richardson, Hong Niu, and all members of the lab for helpful discussions; Jeffrey Martin for software optimizations; and the NIH (R01-GM078031 and R01-GM118543 to BRD) for funding.


  1. 1. Identifier: NCT02840474. NIAID and National Institutes of Health Clinical Center, September 2018.
  2. 2.
    Chazelle, B., Kingsford, C., Singh, M.: A semidefinite programming approach to side chain positioning with new rounding strategies. INFORMS J. Comput. 16(4), 380–392 (2004).
  3. 3.
    Chen, C.Y., Georgiev, I., Anderson, A.C., Donald, B.R.: Computational structure-based redesign of enzyme activity. Proc. Natl. Acad. Sci. USA 106(10), 3764–9 (2009). Scholar
  4. 4.
    Dahiyat, B.I., Mayo, S.L.: De novo protein design: fully automated sequence selection. Science 278(5335), 82–87 (1997)CrossRefGoogle Scholar
  5. 5.
    Davey, J.A., Damry, A.M., Goto, N.K., Chica, R.A.: Rational design of proteins that exchange on functional timescales. Nat. Chem. Biol. 13(12), 1280–1285 (2017)CrossRefGoogle Scholar
  6. 6.
    Donald, B.R.: Algorithms in Structural Molecular Biology. MIT Press, Cambridge (2011)Google Scholar
  7. 7.
    Fleishman, S.J., Khare, S.D., Koga, N., Baker, D.: Restricted sidechain plasticity in the structures of native proteins and complexes. Protein Sci. 20(4), 753–757 (2011). Scholar
  8. 8.
    Frederick, K.K., Marlow, M.S., Valentine, K.G., Wand, A.J.: Conformational entropy in molecular recognition by proteins. Nature 448(7151), 325–329 (2007). Scholar
  9. 9.
    Frey, K.M., Georgiev, I., Donald, B.R., Anderson, A.C.: Predicting resistance mutations using protein design algorithms. Proc. Natl. Acad. Sci. U.S.A. 107(31), 13,707–13,712 (2010).
  10. 10.
    Gainza, P., Nisonoff, H.M., Donald, B.R.: Algorithms for protein design. Curr. Opin. Struct. Biol. 39, 16–26 (2016)CrossRefGoogle Scholar
  11. 11.
    Gainza, P., Roberts, K.E., Donald, B.R.: Protein design using continuous rotamers. PLoS Comput. Biol. 8(1), e1002335 (2012).
  12. 12.
    Georgiev, I., Donald, B.R.: Dead-end elimination with backbone flexibility. Bioinformatics 23(13), i185–i194 (2007). Scholar
  13. 13.
    Georgiev, I., Keedy, D., Richardson, J.S., Richardson, D.C., Donald, B.R.: Algorithm for backrub motions in protein design. Bioinformatics 24(13), i196–i204 (2008). Scholar
  14. 14.
    Georgiev, I., Lilien, R.H., Donald, B.R.: Improved pruning algorithms and divide-and-conquer strategies for dead-end elimination, with application to protein design. Bioinformatics 22(14), e174–e183 (2006). Scholar
  15. 15.
    Georgiev, I., Lilien, R.H., Donald, B.R.: The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J. Comput. Chem. 29(10), 1527–1542 (2008). Scholar
  16. 16.
    Georgiev, I., et al.: Design of epitope-specific probes for sera analysis and antibody isolation. Retrovirology 9, P50 (2012)Google Scholar
  17. 17.
    Georgiev, I.S., et al.: Antibodies VRC01 and 10E8 neutralize HIV-1 with high breadth and potency even with IG-framework regions substantially reverted to germline. J. Immunol. 192(3), 1100–1106 (2014). Scholar
  18. 18.
    Gilson, M.K., Given, J.A., Bush, B.L., McCammon, J.A.: The statistical-thermodynamic basis for computation of binding affinities: a critical review. Biophys. J. 72(3), 1047–1069 (1997). Scholar
  19. 19.
    Gorczynski, M.J., et al.: Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBFbeta. Chem. Biol. 14(10), 1186–1197 (2007). Scholar
  20. 20.
    Hallen, M.A., Donald, B.R.: CATS (coordinates of atoms by taylor series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics 33(14), i5–i12 (2017). Scholar
  21. 21.
    Hallen, M.A., Gainza, P., Donald, B.R.: Compact representation of continuous energy surfaces for more efficient protein design. J. Chem. Theory Comput. 11(5), 2292–2306 (2015). Scholar
  22. 22.
    Hallen, M.A., Jou, J.D., Donald, B.R.: LUTE (local unpruned tuple expansion): accurate continuously flexible protein design with general energy functions and rigid rotamer-like efficiency. J. Comput. Biol. 24(6), 536–546 (2017). Scholar
  23. 23.
    Hallen, M.A., Keedy, D.A., Donald, B.R.: Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins 81(1), 18–39 (2013). Scholar
  24. 24.
    Hallen, M.A., et al.: OSPREY 3.0: open-source protein redesign for you, with powerful new features. J. Comput. Chem. 39(30), 2494–2507 (2018)Google Scholar
  25. 25.
    Hart, P., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. SSC 4, 100–114 (1968)Google Scholar
  26. 26.
    Hastings, W.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970). Scholar
  27. 27.
    Jou, J.D., Holt, G.T., Lowegard, A.U., Donald, B.R.: Supplementary information: minimization-aware recursive: K\(^{*}\) (MARK\(^{*}\)): A novel, provable partition function approximation algorithm that accelerates ensemble-based protein design and provably approximates the energy landscape (2019). (Available at
  28. 28.
    Kuhlman, B., Baker, D.: Native protein sequences are close to optimal for their structures. Proc. Natl. Acad. Sci. U.S.A. 97(19), 10,383–10,388 (2000)Google Scholar
  29. 29.
    Leach, A.R., Lemon, A.P.: Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Proteins 33(2), 227–239 (1998)CrossRefGoogle Scholar
  30. 30.
    Leaver-Fay, A., et al.: Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011). Scholar
  31. 31.
    Lee, C., Subbiah, S.: Prediction of protein side-chain conformation by packing optimization. J. Mol. Biol. 217(2), 373–388 (1991)CrossRefGoogle Scholar
  32. 32.
    Lee, J.: New Monte Carlo algorithm: entropic sampling. Phys. Rev. Lett. 71(2), 211–214 (1993). Scholar
  33. 33.
    Lilien, R.H., Stevens, B.W., Anderson, A.C., Donald, B.R.: A novel ensemble-based scoring and search algorithm for protein redesign and its application to modify the substrate specificity of the gramicidin synthetase a phenylalanine adenylation enzyme. J. Comput. Biol. 12(6), 740–761 (2005). Scholar
  34. 34.
    Lou, Q., Dechter, R., Ihler, A.T.: Anytime anyspace and/or search for bounding the partition function. In: AAAI (2017)Google Scholar
  35. 35.
    Lou, Q., Dechter, R., Ihler, A.T.: Dynamic importance sampling for anytime bounds of the partition function. In: NIPS (2017)Google Scholar
  36. 36.
    Lovell, S.C., Word, J.M., Richardson, J.S., Richardson, D.C.: The penultimate rotamer library. Proteins 40(3), 389–408 (2000)CrossRefGoogle Scholar
  37. 37.
    Nisonoff, H.: Efficient partition function estimation in computational protein design: probabalistic guarantees and characterization of a novel algorithm. B.S. thesis. Department of Mathematics, Duke University (2015).
  38. 38.
    Nosé, S.: A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 52(2), 255–268 (2006). Scholar
  39. 39.
    Ojewole, A., et al.: OSPREY predicts resistance mutations using positive and negative computational protein design. Methods Mol. Biol. 1529, 291–306 (2017)CrossRefGoogle Scholar
  40. 40.
    Ojewole, A.A., Jou, J.D., Fowler, V.G., Donald, B.R.: BBK* (Branch and Bound over K*): a provable and efficient ensemble-based protein design algorithm to optimize stability and binding affinity over large sequence spaces. J. Comput. Biol. 25(7), 726–739 (2018). Scholar
  41. 41.
    Qi, Y., et al.: Continuous interdomain orientation distributions reveal components of binding thermodynamics. J. Mol. Biol. 430(18 Pt B), 3412–3426 (2018)Google Scholar
  42. 42.
    Reardon, P.N., et al.: Structure of an HIV-1-neutralizing antibody target, the lipid-bound gp41 envelope membrane proximal region trimer. Proc. Natl. Acad. Sci. U.S.A. 111(4), 1391–1396 (2014). Scholar
  43. 43.
    Reeve, S.M., Gainza, P., Frey, K.M., Georgiev, I., Donald, B.R., Anderson, A.C.: Protein design algorithms predict viable resistance to an experimental antifolate. Proc. Natl. Acad. Sci. U.S.A. 112(3), 749–754 (2015). Scholar
  44. 44.
    Roberts, K.E., Cushing, P.R., Boisguerin, P., Madden, D.R., Donald, B.R.: Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS Comput. Biol. 8(4), e1002477 (2012).
  45. 45.
    Roberts, K.E., Donald, B.R.: Improved energy bound accuracy enhances the efficiency of continuous protein design. Proteins 83(6), 1151–1164 (2015). Scholar
  46. 46.
    Roberts, K.E., Gainza, P., Hallen, M.A., Donald, B.R.: Fast gap-free enumeration of conformations and sequences for protein design. Proteins 83(10), 1859–1877 (2015). Scholar
  47. 47.
    Rudicell, R.S., et al.: Enhanced potency of a broadly neutralizing HIV-1 antibody in vitro improves protection against lentiviral infection in vivo. J. Virol. 88(21), 12,669–12,682 (2014).
  48. 48.
    Sciretti, D., Bruscolini, P., Pelizzola, A., Pretti, M., Jaramillo, A.: Computational protein design with side-chain conformational entropy. Proteins 74(1), 176–191 (2009). Scholar
  49. 49.
    Silver, N.W., et al.: Efficient computation of small-molecule configurational binding entropy and free energy changes by ensemble enumeration. J. Chem. Theory Comput. 9(11), 5098–5115 (2013).
  50. 50.
    Simoncini, D., Allouche, D., de Givry, S., Delmas, C., Barbe, S., Schiex, T.: Guaranteed discrete energy optimization on large protein design problems. J. Chem. Theory Comput. 11(12), 5980–5989 (2015). Scholar
  51. 51.
    Stevens, B.W., Lilien, R.H., Georgiev, I., Donald, B.R., Anderson, A.C.: Redesigning the PheA domain of gramicidin synthetase leads to a new understanding of the enzyme’s mechanism and selectivity. Biochemistry 45(51), 15,495–15,504 (2006).
  52. 52.
    Traoré, S., et al.: A new framework for computational protein design through cost function network optimization. Bioinformatics 29(17), 2129–2136 (2013). Scholar
  53. 53.
    Tzeng, S.R., Kalodimos, C.G.: Protein activity regulation by conformational entropy. Nature 488(7410), 236–240 (2012). Scholar
  54. 54.
    Valiant, L.G.: The complexity of computing the permanent. Theoret. Comput. Sci. 8(2), 189–201 (1979)MathSciNetCrossRefGoogle Scholar
  55. 55.
    Viricel, C., Simoncini, D., Barbe, S., Schiex, T.: Guaranteed weighted counting for affinity computation: beyond determinism and structure. In: Rueher, M. (ed.) CP 2016. LNCS, vol. 9892, pp. 733–750. Springer, Cham (2016). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jonathan D. Jou
    • 1
  • Graham T. Holt
    • 1
    • 2
  • Anna U. Lowegard
    • 1
    • 2
  • Bruce R. Donald
    • 1
    • 3
    • 4
    Email author
  1. 1.Department of Computer ScienceDuke UniversityDurhamUSA
  2. 2.Computational Biology and Bioinformatics ProgramDuke UniversityDurhamUSA
  3. 3.Department of BiochemistryDuke University Medical CenterDurhamUSA
  4. 4.Department of ChemistryDuke UniversityDurhamUSA

Personalised recommendations