Abstract
Protein design algorithms that model continuous sidechain flexibility and conformational ensembles better approximate the in vitro and in vivo behavior of proteins. The previous state of the art, iMinDEE-\(A^*\)-\(K^*\), computes provable \(\varepsilon \)-approximations to partition functions of protein states (e.g., bound vs. unbound) by computing provable, admissible pairwise-minimized energy lower bounds on protein conformations and using the \(A^*\) enumeration algorithm to return a gap-free list of lowest-energy conformations. iMinDEE-A\(^*\)-\(K^*\) runs in time sublinear in the number of conformations, but can be trapped in loosely-bounded, low-energy conformational wells containing many conformations with highly similar energies. That is, iMinDEE-\(A^*\)-\(K^*\) is unable to exploit the correlation between protein conformation and energy: similar conformations often have similar energy. We introduce two new concepts that exploit this correlation: Minimization-Aware Enumeration and Recursive \(K^{*}\). We combine these two insights into a novel algorithm, Minimization-Aware Recursive \(K^{*}\) (\({ MARK}^{*}\)), that tightens bounds not on single conformations, but instead on distinct regions of the conformation space. We compare the performance of iMinDEE-\(A^*\)-\(K^*\) vs. \({ MARK}^{*}\) by running the \(BBK^*\) algorithm, which provably returns sequences in order of decreasing \(K^{*}\) score, using either iMinDEE-\(A^*\)-\(K^*\) or \({ MARK}^{*}\) to approximate partition functions. We show on 200 design problems that \({ MARK}^{*}\) not only enumerates and minimizes vastly fewer conformations than the previous state of the art, but also runs up to two orders of magnitude faster. Finally, we show that \({ MARK}^{*}\) not only efficiently approximates the partition function, but also provably approximates the energy landscape. To our knowledge, \({ MARK}^{*}\) is the first algorithm to do so. We use \({ MARK}^{*}\) to analyze the change in energy landscape of the bound and unbound states of the HIV-1 capsid protein C-terminal domain in complex with camelid V\(_{\mathrm{{H}}}\)H, and measure the change in conformational entropy induced by binding. Thus, \({ MARK}^{*}\) both accelerates existing designs and offers new capabilities not possible with previous algorithms.
J. D. Jou and G. T. Holt—These authors contributed equally to the work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
ClinicalTrials.gov Identifier: NCT02840474. NIAID and National Institutes of Health Clinical Center, September 2018. https://clinicaltrials.gov/ct2/results?cond=&term=VRC07
Chazelle, B., Kingsford, C., Singh, M.: A semidefinite programming approach to side chain positioning with new rounding strategies. INFORMS J. Comput. 16(4), 380–392 (2004). https://doi.org/10.1287/ijoc.1040.0096
Chen, C.Y., Georgiev, I., Anderson, A.C., Donald, B.R.: Computational structure-based redesign of enzyme activity. Proc. Natl. Acad. Sci. USA 106(10), 3764–9 (2009). https://doi.org/10.1073/pnas.0900266106
Dahiyat, B.I., Mayo, S.L.: De novo protein design: fully automated sequence selection. Science 278(5335), 82–87 (1997)
Davey, J.A., Damry, A.M., Goto, N.K., Chica, R.A.: Rational design of proteins that exchange on functional timescales. Nat. Chem. Biol. 13(12), 1280–1285 (2017)
Donald, B.R.: Algorithms in Structural Molecular Biology. MIT Press, Cambridge (2011)
Fleishman, S.J., Khare, S.D., Koga, N., Baker, D.: Restricted sidechain plasticity in the structures of native proteins and complexes. Protein Sci. 20(4), 753–757 (2011). https://doi.org/10.1002/pro.604
Frederick, K.K., Marlow, M.S., Valentine, K.G., Wand, A.J.: Conformational entropy in molecular recognition by proteins. Nature 448(7151), 325–329 (2007). https://doi.org/10.1038/nature05959
Frey, K.M., Georgiev, I., Donald, B.R., Anderson, A.C.: Predicting resistance mutations using protein design algorithms. Proc. Natl. Acad. Sci. U.S.A. 107(31), 13,707–13,712 (2010). https://doi.org/10.1073/pnas.1002162107
Gainza, P., Nisonoff, H.M., Donald, B.R.: Algorithms for protein design. Curr. Opin. Struct. Biol. 39, 16–26 (2016)
Gainza, P., Roberts, K.E., Donald, B.R.: Protein design using continuous rotamers. PLoS Comput. Biol. 8(1), e1002335 (2012). https://doi.org/10.1371/journal.pcbi.1002335
Georgiev, I., Donald, B.R.: Dead-end elimination with backbone flexibility. Bioinformatics 23(13), i185–i194 (2007). https://doi.org/10.1093/bioinformatics/btm197
Georgiev, I., Keedy, D., Richardson, J.S., Richardson, D.C., Donald, B.R.: Algorithm for backrub motions in protein design. Bioinformatics 24(13), i196–i204 (2008). https://doi.org/10.1093/bioinformatics/btn169
Georgiev, I., Lilien, R.H., Donald, B.R.: Improved pruning algorithms and divide-and-conquer strategies for dead-end elimination, with application to protein design. Bioinformatics 22(14), e174–e183 (2006). https://doi.org/10.1093/bioinformatics/btl220
Georgiev, I., Lilien, R.H., Donald, B.R.: The minimized dead-end elimination criterion and its application to protein redesign in a hybrid scoring and search algorithm for computing partition functions over molecular ensembles. J. Comput. Chem. 29(10), 1527–1542 (2008). https://doi.org/10.1002/jcc.20909
Georgiev, I., et al.: Design of epitope-specific probes for sera analysis and antibody isolation. Retrovirology 9, P50 (2012)
Georgiev, I.S., et al.: Antibodies VRC01 and 10E8 neutralize HIV-1 with high breadth and potency even with IG-framework regions substantially reverted to germline. J. Immunol. 192(3), 1100–1106 (2014). https://doi.org/10.4049/jimmunol.1302515
Gilson, M.K., Given, J.A., Bush, B.L., McCammon, J.A.: The statistical-thermodynamic basis for computation of binding affinities: a critical review. Biophys. J. 72(3), 1047–1069 (1997). https://doi.org/10.1016/S0006-3495(97)78756-3
Gorczynski, M.J., et al.: Allosteric inhibition of the protein-protein interaction between the leukemia-associated proteins Runx1 and CBFbeta. Chem. Biol. 14(10), 1186–1197 (2007). https://doi.org/10.1016/j.chembiol.2007.09.006
Hallen, M.A., Donald, B.R.: CATS (coordinates of atoms by taylor series): protein design with backbone flexibility in all locally feasible directions. Bioinformatics 33(14), i5–i12 (2017). https://doi.org/10.1093/bioinformatics/btx277
Hallen, M.A., Gainza, P., Donald, B.R.: Compact representation of continuous energy surfaces for more efficient protein design. J. Chem. Theory Comput. 11(5), 2292–2306 (2015). https://doi.org/10.1021/ct501031m
Hallen, M.A., Jou, J.D., Donald, B.R.: LUTE (local unpruned tuple expansion): accurate continuously flexible protein design with general energy functions and rigid rotamer-like efficiency. J. Comput. Biol. 24(6), 536–546 (2017). https://doi.org/10.1089/cmb.2016.0136
Hallen, M.A., Keedy, D.A., Donald, B.R.: Dead-end elimination with perturbations (DEEPer): a provable protein design algorithm with continuous sidechain and backbone flexibility. Proteins 81(1), 18–39 (2013). https://doi.org/10.1002/prot.24150
Hallen, M.A., et al.: OSPREY 3.0: open-source protein redesign for you, with powerful new features. J. Comput. Chem. 39(30), 2494–2507 (2018)
Hart, P., Nilsson, N.J., Raphael, B.: A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. SSC 4, 100–114 (1968)
Hastings, W.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970). https://doi.org/10.1093/biomet/57.1.97
Jou, J.D., Holt, G.T., Lowegard, A.U., Donald, B.R.: Supplementary information: minimization-aware recursive: K\(^{*}\) (MARK\(^{*}\)): A novel, provable partition function approximation algorithm that accelerates ensemble-based protein design and provably approximates the energy landscape (2019). (Available at http://www.cs.duke.edu/donaldlab/Supplementary/recomb19/markstar)
Kuhlman, B., Baker, D.: Native protein sequences are close to optimal for their structures. Proc. Natl. Acad. Sci. U.S.A. 97(19), 10,383–10,388 (2000)
Leach, A.R., Lemon, A.P.: Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. Proteins 33(2), 227–239 (1998)
Leaver-Fay, A., et al.: Rosetta3: an object-oriented software suite for the simulation and design of macromolecules. Methods Enzymol. 487, 545–574 (2011). https://doi.org/10.1016/B978-0-12-381270-4.00019-6
Lee, C., Subbiah, S.: Prediction of protein side-chain conformation by packing optimization. J. Mol. Biol. 217(2), 373–388 (1991)
Lee, J.: New Monte Carlo algorithm: entropic sampling. Phys. Rev. Lett. 71(2), 211–214 (1993). https://doi.org/10.1103/PhysRevLett.71.211
Lilien, R.H., Stevens, B.W., Anderson, A.C., Donald, B.R.: A novel ensemble-based scoring and search algorithm for protein redesign and its application to modify the substrate specificity of the gramicidin synthetase a phenylalanine adenylation enzyme. J. Comput. Biol. 12(6), 740–761 (2005). https://doi.org/10.1089/cmb.2005.12.740
Lou, Q., Dechter, R., Ihler, A.T.: Anytime anyspace and/or search for bounding the partition function. In: AAAI (2017)
Lou, Q., Dechter, R., Ihler, A.T.: Dynamic importance sampling for anytime bounds of the partition function. In: NIPS (2017)
Lovell, S.C., Word, J.M., Richardson, J.S., Richardson, D.C.: The penultimate rotamer library. Proteins 40(3), 389–408 (2000)
Nisonoff, H.: Efficient partition function estimation in computational protein design: probabalistic guarantees and characterization of a novel algorithm. B.S. thesis. Department of Mathematics, Duke University (2015). http://hdl.handle.net/10161/9746
Nosé, S.: A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 52(2), 255–268 (2006). https://doi.org/10.1080/00268978400101201
Ojewole, A., et al.: OSPREY predicts resistance mutations using positive and negative computational protein design. Methods Mol. Biol. 1529, 291–306 (2017)
Ojewole, A.A., Jou, J.D., Fowler, V.G., Donald, B.R.: BBK* (Branch and Bound over K*): a provable and efficient ensemble-based protein design algorithm to optimize stability and binding affinity over large sequence spaces. J. Comput. Biol. 25(7), 726–739 (2018). https://doi.org/10.1089/cmb.2017.0267
Qi, Y., et al.: Continuous interdomain orientation distributions reveal components of binding thermodynamics. J. Mol. Biol. 430(18 Pt B), 3412–3426 (2018)
Reardon, P.N., et al.: Structure of an HIV-1-neutralizing antibody target, the lipid-bound gp41 envelope membrane proximal region trimer. Proc. Natl. Acad. Sci. U.S.A. 111(4), 1391–1396 (2014). https://doi.org/10.1073/pnas.1309842111
Reeve, S.M., Gainza, P., Frey, K.M., Georgiev, I., Donald, B.R., Anderson, A.C.: Protein design algorithms predict viable resistance to an experimental antifolate. Proc. Natl. Acad. Sci. U.S.A. 112(3), 749–754 (2015). https://doi.org/10.1073/pnas.1411548112
Roberts, K.E., Cushing, P.R., Boisguerin, P., Madden, D.R., Donald, B.R.: Computational design of a PDZ domain peptide inhibitor that rescues CFTR activity. PLoS Comput. Biol. 8(4), e1002477 (2012). https://doi.org/10.1371/journal.pcbi.1002477
Roberts, K.E., Donald, B.R.: Improved energy bound accuracy enhances the efficiency of continuous protein design. Proteins 83(6), 1151–1164 (2015). https://doi.org/10.1002/prot.24808
Roberts, K.E., Gainza, P., Hallen, M.A., Donald, B.R.: Fast gap-free enumeration of conformations and sequences for protein design. Proteins 83(10), 1859–1877 (2015). https://doi.org/10.1002/prot.24870
Rudicell, R.S., et al.: Enhanced potency of a broadly neutralizing HIV-1 antibody in vitro improves protection against lentiviral infection in vivo. J. Virol. 88(21), 12,669–12,682 (2014). https://doi.org/10.1128/JVI.02213-14
Sciretti, D., Bruscolini, P., Pelizzola, A., Pretti, M., Jaramillo, A.: Computational protein design with side-chain conformational entropy. Proteins 74(1), 176–191 (2009). https://doi.org/10.1002/prot.22145
Silver, N.W., et al.: Efficient computation of small-molecule configurational binding entropy and free energy changes by ensemble enumeration. J. Chem. Theory Comput. 9(11), 5098–5115 (2013). https://doi.org/10.1021/ct400383v
Simoncini, D., Allouche, D., de Givry, S., Delmas, C., Barbe, S., Schiex, T.: Guaranteed discrete energy optimization on large protein design problems. J. Chem. Theory Comput. 11(12), 5980–5989 (2015). https://doi.org/10.1021/acs.jctc.5b00594
Stevens, B.W., Lilien, R.H., Georgiev, I., Donald, B.R., Anderson, A.C.: Redesigning the PheA domain of gramicidin synthetase leads to a new understanding of the enzyme’s mechanism and selectivity. Biochemistry 45(51), 15,495–15,504 (2006). https://doi.org/10.1021/bi061788m
Traoré, S., et al.: A new framework for computational protein design through cost function network optimization. Bioinformatics 29(17), 2129–2136 (2013). https://doi.org/10.1093/bioinformatics/btt374
Tzeng, S.R., Kalodimos, C.G.: Protein activity regulation by conformational entropy. Nature 488(7410), 236–240 (2012). https://doi.org/10.1038/nature11271
Valiant, L.G.: The complexity of computing the permanent. Theoret. Comput. Sci. 8(2), 189–201 (1979)
Viricel, C., Simoncini, D., Barbe, S., Schiex, T.: Guaranteed weighted counting for affinity computation: beyond determinism and structure. In: Rueher, M. (ed.) CP 2016. LNCS, vol. 9892, pp. 733–750. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44953-1_46
Acknowledgements
We thank Goke Ojewole, Mark Hallen, Jeffrey Martin, Marcel Frenkel, Terrence Oas, Jane and Dave Richardson, Hong Niu, and all members of the lab for helpful discussions; Jeffrey Martin for software optimizations; and the NIH (R01-GM078031 and R01-GM118543 to BRD) for funding.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Jou, J.D., Holt, G.T., Lowegard, A.U., Donald, B.R. (2019). Minimization-Aware Recursive \(K^{*}\) (\({ MARK}^{*}\)): A Novel, Provable Algorithm that Accelerates Ensemble-Based Protein Design and Provably Approximates the Energy Landscape. In: Cowen, L. (eds) Research in Computational Molecular Biology. RECOMB 2019. Lecture Notes in Computer Science(), vol 11467. Springer, Cham. https://doi.org/10.1007/978-3-030-17083-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-17083-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17082-0
Online ISBN: 978-3-030-17083-7
eBook Packages: Computer ScienceComputer Science (R0)