Testing automatic methods to predict free binding energy of host–guest complexes in SAMPL7 challenge

Serillon, Dylan; Bo, Carles; Barril, Xavier

doi:10.1007/s10822-020-00370-6

Testing automatic methods to predict free binding energy of host–guest complexes in SAMPL7 challenge

Open access
Published: 19 January 2021

Volume 35, pages 209–222, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Computer-Aided Molecular Design Aims and scope Submit manuscript

Testing automatic methods to predict free binding energy of host–guest complexes in SAMPL7 challenge

Download PDF

3015 Accesses
7 Citations
6 Altmetric
1 Mention
Explore all metrics

Abstract

The design of new host–guest complexes represents a fundamental challenge in supramolecular chemistry. At the same time, it opens new opportunities in material sciences or biotechnological applications. A computational tool capable of automatically predicting the binding free energy of any host–guest complex would be a great aid in the design of new host systems, or to identify new guest molecules for a given host. We aim to build such a platform and have used the SAMPL7 challenge to test several methods and design a specific computational pipeline. Predictions will be based on machine learning (when previous knowledge is available) or a physics-based method (otherwise). The formerly delivered predictions with an RMSE of 1.67 kcal/mol but will require further work to identify when a specific system is outside of the scope of the model. The latter is combines the semiempirical GFN2B functional, with docking, molecular mechanics, and molecular dynamics. Correct predictions (RMSE of 1.45 kcal/mol) are contingent on the identification of the correct binding mode, which can be very challenging for host–guest systems with a large number of degrees of freedom. Participation in the blind SAMPL7 challenge provided fundamental direction to the project. More advanced versions of the pipeline will be tested against future SAMPL challenges.

Overview of the SAMPL6 host–guest binding affinity prediction challenge

Article 01 October 2018

Blinded predictions of host-guest standard free energies of binding in the SAMPL5 challenge

Article 08 August 2016

The SAMPL5 host–guest challenge: computing binding free energies and enthalpies from explicit solvent simulations by the attach-pull-release (APR) method

Article 16 September 2016

Introduction

Supramolecular chemistry has experienced enormous growth in recent years. Supramolecular processes, and host–guest systems in particular, are studied both from a fundamental perspective and for their possible applications [1,2,3,4]. By improving the stability or modifying the properties of an encapsulated compound, or even by enhancing binding selectivity, we foresee a wide range of opportunities that span from industrial processes [5] to the medical field [6], such as drug delivery targeting cancer cells [7]. At the moment, breakthrough discoveries in supramolecular chemistry are hampered by the complexity of the thermodynamic and kinetic characterization of the inclusion/release processes, which make it difficult to generate useful predictions about molecular encapsulation [8]. Quantitative predictions of binding free energies are particularly difficult, but essential to guide the synthetic efforts, leading to more efficient design and discovery of host–guest systems with the desired activities. In this context, some of the tools currently used in computer-aided drug design (CADD) could be as useful for this endeavor, in the same way as they are for drug discovery [9,10,11,12]. At the same time, host–guest systems are orders of magnitude simpler (in terms of degrees of freedom) than biomolecular complexes and, because they are chemically stable, can be studied in a variety of well-controlled environmental conditions. This offers an opportunity to test and validate computational methods before they become part of the CADD arsenal.

Several years of SAMPL (Statistical Assessment of the Modeling of Proteins and Ligands) challenge have shown interesting approaches to compute the binding free energy of host–guest complexes, with a relatively large range of methods and performances [3,4,5,6,7,8]. In the context of the NOAH European Network, our interest is to develop automated methods for the prediction of arbitrary host–guest systems under different solvation and other environmental conditions. To fulfill this task, our focus is on automation (to enable users without a computational chemistry background), throughput (to deliver fast predictions and enable multiple concurrent users), and accuracy (to deliver useful predictions). We are not tied to any particular method, but must be able to generate predictions for a wide range of systems, including both novel and previously-described host systems.

In any computational strategy for structure-based design, an important step is the prediction of probable conformations of the receptor-ligand complex [13,14,15]. As a compromise between performance and accuracy, here we explore molecular docking (optionally refined with molecular mechanics) and molecular dynamics to solve the docking problem. Then, the complex can be scored with a variety of methods [16,17,18,19]. We explore the use of machine learning, which can be accurate and very efficient, but is limited by the amount of pre-existing data. As a complementary tool that can be applied to any arbitrary host–guest system, we also implement a physics-based method. In particular, recent progress in semi-empirical methods lead us to build MMGBSA-like methodology [19], but using the xtb software and GFN2B basis set [20] instead of a molecular mechanics forcefield.

At this stage, more important than the actual predictions, participation in the SAMPL7 was extremely useful to test several ideas in blind mode, resulting in the design of an automatic pipeline that will be evaluated in subsequent SAMPL challenges and eventually become publicly available.

Methods

Docking

Docking is used to generating a first guess of the Host–Guest structure. We chose ADV for our assessment for several reasons. It (i) is faster and generally performs better than AutoDock itself, (ii) is freely available and competitive with commercial tools [21, 22].

Docking was performed using AutoDockVina v. 1.1.2 [23]. Input comprises the host system, guest, and docking box, while the output is a list of poses ranked by ΔG_bind, the predicted binding energy in kcal/mol (‘score’ = − ΔG_bind). To obtain the maximum number of poses, we set num_modes to 20. Three different box sizes are used, all of them cubic and centered in the host cavity: one big box (edge length 10 Å) that allows completely blind docking, and two small boxes (7 Å and 5 Å, respectively) restricted to the expected binding site. For each box, the top-scored solution is extracted. Additionally, for the smallest box, the extraction is followed by a steepest descent [24] and conjugated gradient [25] minimization to correct any ligand distortion caused by the small size box.

Molecular modeling/molecular dynamic

All molecular dynamic (MD) simulations were set up in two steps:

(1)
Host and guest preparation The antechamber and tleap programs from the AMBER 18 package [26] are used to parameterize and solvate the system, respectively. Charges are derived with the AM1-BCC method [27]. For the spontaneous association simulations (SaMD), the host and the guest are placed in the same box, but not in direct contact, to observe their interaction preferences over time. Each system was solvated with ~ 2000 TIP3P water [28] in a cubic box whose dimensions were defined by a distance of 12 Å between the complex and the edges.
(2)
Minimization The system is minimized with GROMACS [29,30,31,32,33,34] using the steepest descent algorithm, then equilibrated for 100 ps with the leapfrog integrator.

Production simulations were run also with GROMACS in the NPT ensemble with temperature control using V-rescale thermostat [35] at 300 K and with tau_t = 0.1 and pressure control provided by Berendsen barostat [36]. Note that the Nosé-Hoover or Parrinello-Rahman barostats [37] are considered a better option for simulations at equilibrium, but as the MD is used to sample conformations (rather than extracting thermodynamic properties) this choice does not hamper the quality of the results. The Verlet-cutoff-scheme [38] is used and the frequency to update the neighbor list is initially set at 10, whereas long-range electrostatics were handled with the PME [39] method with PME order set at 4.0. 500 ns of simulation are realized for the systems, saving a snapshot every 0.1 ns for a total of 5000 frames for each simulation.

Semi-empirical calculations

We used the xtb program package (version 6.1) [20] to calculate both the energy and the enthalpic and entropic corrections. It uses the GFN2B parametrization on an extended semiempirical tight-binding model, which has shown to be efficient for determining structures and noncovalent interaction energies for large molecular systems (in the order of 1000 atoms) [40,41,42,43]. Water solvent effects were included through a Generalized Born (GBSA) model. The convergence criteria thresholds were set as extreme. Optimization, followed by hessian calculations were performed. The resulting geometries were verified as true minima by checking that no imaginary vibrational frequencies remained. The temperature was set to 298.15 K for assessing the thermostatistical corrections.

Machine learning

For the GDCC-7 dataset in SAMPL7, we decided to test a machine learning approach, taking advantage of pre-existing data. All OA or TEMOA host–guest systems from previous SAMPL challenges (SAMPL4 to SAMPL6) were collected, reconstructed from 2D to 3D, and optimized with the GFN2B method. Then, we used the CORINA web-platform [44] to compute 200 2D and 3D molecular descriptors for each system.

The descriptors of the dataset are reduced using the R software [45] with different approaches: (a) deleting the descriptors that have a near-zero variance; (b) deleting the most correlated descriptors using Caret package [46]; (c) using principal component analysis (PCA) [47] to combine descriptors that explain the most the variability.

In order to predict the binding free energy, several machine learning models using regression are used: neural network [48], knn [48], polynomial SVM [49], and random forest [50]. By modifying the parameters on those ML models, hundreds of different models are generated. In all cases we use a data partition of 30/70, resulting in a set of 26 cases for training and 8 cases for the test set. Our best model used to make predictions on SAMPL7 is a neural network using the "nnet" function, which provided an RMSE of 0.92 kcal/mol and with MAE about 0.85 kcal/mol, suggesting that the prediction is not excessively biased by overtraining.

Results and discussion

Thermodynamic-based approach

The Gibbs free energies of the optimized geometries were calculated as the sum of the Electronic Energy (E), which includes the D4 dispersion correction, thermostatistical corrections (G_RRHOT) calculated following a coupled rigid-rotor-harmonic-oscillator approach, and the solvation contribution (G_solv) calculated by the implicit solvation model GBSA.

$$\Delta G=E+{G}_{RRHOT}+{G}_{solv}$$

with

$${\Delta G}_{solv}={\Delta G}_{born}+{\Delta G}_{sasa}+{\Delta G}_{hb}+{\Delta G}_{shift}$$

The association Gibbs free energy is calculated from the difference of the free energies from the complex, host, and guest molecules, each on their respective conformational minimum.

$${\Delta G}_{bind}={\Delta G}_{complex}-{\Delta G}_{host}-{\Delta G}_{guest}$$

Considering the complexity of the conformational energy landscape of the complex and host molecule, we used multiple geometries of the unbound host system as starting points for minimization, thus increasing the probability of finding the absolute minimum. To do so, we extract approximately 15 structures from the classical molecular dynamics simulations and carry out a geometric optimization at a semi-empirical level, followed up by calculation of the hessian to confirm that the final energy is a true minimum (i.e. all vibrational frequencies are positive). The variation in free energy was as large as 10 kcal/mol for the different geometries, which confirmed the importance of conformational sampling. The overall lowest energy structure was defined as a reference for free energy calculation. Though the degrees of freedom of the guests are much reduced, we use a similar protocol for consistency (Fig. 1).

Retrospective analysis of trimertrip

As a proof of concept for our methodology, we used the data from the trimertrip set in the SAMPL3 challenge. This host is similar but simpler than the one in SAMPL7. Docking with a large box (15 Å³) produced complexes with negative binding energy (scoring), but the guest only formed surface interactions with the host. This led us to test two additional docking conditions where the docking space is progressively reduced. The resulting docking geometries have positive scores, indicative of conformational clashes, but in this case, the guest inserts into the host cavity. Three to five different binding modes were selected for each docking protocol. Minimization with Chimera (see “Methods”) allowed the system to relax before minimization and free energy calculation with xtb-GFN2B. Interestingly, the lowest-energy binding mode originated from the most restrictive docking protocol.

As shown in Fig. 2, the predicted binding free energies are in excellent agreement with the experiment [RMSE = 1.16 kcal/mol; MAE = 0.87 kcal/mol; Pearson’s correlation [51] (r) = 0.90; Spearman’s rank [52] correlation (ρ) = 0.75, Kendall’s tau correlation [53] = 0.62(τ)]. In fact, in four out of the seven test cases we obtain quantitative agreement. In one case the error is below 1 kcal/mol and in the two remaining cases, the errors are 1.6 kcal/mol and 2.2 kcal/mol. This led us to believe that, given the correct binding mode, the GFN2B semiempirical method could provide QM-level results at a small fraction of the computational cost (minimization plus calculation of the vibrational frequencies takes 1 to 2 h per geometry on a desktop computer).

For that specific SAMPL3 dataset, retrospective analysis of the results shows very accurate results compared to the ones that have been published initially [54].

SAMPL7 trimer-trip binding mode generation

As in the test systems above, host–guest interactions were predicted by molecular docking considering different docking volumes in order to obtain a variety of binding modes, including somewhere the guest is fully inserted into the host. In the most restrained volume (which forces the guest to be located inside the host but yields positive score values), a molecular mechanics (MM) minimization of the docking solution is performed with MOE and CHIMERA, thus removing any potential clash between host and guest. For some particular systems (G08 and G10) the MM minimization was deemed insufficient to attain a relaxed complex. In those cases, docking was followed by 200 ns of MD simulations. Even then, it failed to generate any binding mode where the guest is embedded into the cavity of the cyclic host. Further adding to our problems, the sulfonate groups tended to form unrealistic interactions after minimization with xtb. In some cases, the sulfonates were even inserted into the host pocket, which is largely hydrophobic, instead of remaining solvent-exposed, as expected for a negatively charged group (Fig. 3). This indicated that the implicit solvation model in xtb underestimates the desolvation cost of ionic groups.

Contrary to what was observed with the trimer-trip host–guest systems of previous editions, we had to conclude that a better method was necessary to generate correct binding modes for the SAMPL7 test set. Our method should allow for host flexibility in order to allow guest embedding with reasonable geometries. On the other hand, it was clear that the implicit solvation model implemented in xtb was falling short for ionic systems, and explicit solvation would be necessary for the conformational sampling stage. Both requisites pointed to MD simulations as an optimal solution, which we proceeded to implement and test.

In what we refer as Spontaneous association MD (SaMD), the host and the guest are simulated in a TIP3P water box, starting from a dissociated configuration (distance ≥ 8 Å), extending the simulation until binding is observed. For the linear guests G01, G02, and G05, SaMD successfully completed the inclusion process, which proceeded in two steps: (i) rapid formation of surface contact between host and guest, leading to stable interactions; and (ii) a small opening of the host system, enabling the entry of the guest into the host cavity and formation of a stable complex (Fig. 4). The second step is the bottleneck in the process. It occurs in a simulation time of 50 ns to 500 ns for the G01 compound, but for systems with longer alkyl chains (more degrees of freedom) takes a much longer time. In G05, for instance, the simulation had to be extended to 1 µs to observe a single association event (ca. 700 ns). The application of the same methodology to the cyclic guest (i.e. G06, G07, G08, G09, G10, G11, G18, G19) failed to produce correct binding modes. While the compounds form stable surface interactions, they do not enter the host. This is in line with the above observation that the host opening to admit the guest is the bottleneck in the association process. The bulkier nature of the cyclic guests implies that the host must (transitorily) adopt a wide-open conformation that is energetically unfavorable and cannot be sampled in the relatively short timescale of the MD simulations. To confirm this hypothesis, for the cyclic guest G07 we carried out an MD simulation starting from a fully open host system (generated by geometrical optimization in vacuum). The guest rapidly proceeds to interact with the (now exposed) interior of the host, forming a stable but dynamic binding mode. After approximately 100 ns, the host folds, trapping the guest in its interior (Fig. 4).

This result indicates that starting from metastable host conformations may be a general strategy to accelerate SaMD and generate valid host–guest geometries.

Notably, the binding mode of the guests inside the host is very dynamic, with fast rotations and frequent sliding movements that are only limited by the resistance of the charged group of the guest to enter the hydrophobic core of the host. As expected, the ionic groups rarely form direct contacts. Instead, they preserve their solvation shells. Overall, these results suggest that SaMD is an optimal and feasible strategy not only to obtain a bound conformation of the host–guest complex but also to capture the rich conformational diversity of the bound state. Unfortunately, between the setting up and testing of this protocol and the computational cost of the MD simulations, it was impossible to complete all these calculations by the challenge deadline. Posterior analysis confirms that correct identification of the binding mode through SaMD improves the quality of the binding free energy predictions (see next section).

SAMPL7 trimer-trip free energy prediction

For each complex, we extract 5 to 10 different binding modes generated with the above-described protocols. These geometries are then individually minimized at the xtb-GFN2B semi-empirical level, and only those yielding a true minimum (i.e. all vibrational frequencies are positive) are considered. The lowest energy complex is considered as the true minimum, except for a few cases where visual inspection identified issues with the corresponding geometry, always related to inadequate screening of charges by the implicit solvation method, such as those shown in Fig. 3. Predictions for each system are shown in Table 1.

Table 1 Final results with experimental, calculated free binding energy and the error related

Full size table

For guest G18 and G19 we could not find a correct binding mode SaMD, and the docking results gave positive binding energy. As both protocols failed for these two cyclic guests (presumably due to their large volumes) we desisted from making predictions for them.

We can see in Fig. 5a, three different zones in the graphics: The first zone corresponds to the five host–guest systems that have been predicted well. Concerning these systems G01, G02, and G07 are extracted from the SaMD protocol. While G08 and G10 are the two cyclic host from where interaction outside the cavity have been extracted from MD-docking. The second zone corresponds to five Host–Guest system where our prediction was incorrect, but still within a range from the experimental values (3 to 5 kcal/mol errors). These complex (G03, G05, G15, G16, G17), are mainly linear and the results originate from docking poses with the exception of G05, which originates from SaMD (result obtained after the submission deadline). The third zone corresponds to the six host–guest with large errors, including the G18 and G19 (for which none negative binding energy has been found). Most of them are cyclic and the errors can be attributed to our inability to find reasonable binding modes in the timeline of the challenge.

In Fig. 5b, we show that for the complexes where SaMD delivers a correct binding mode, the binding free energy predictions are far superior to the results obtained from docking poses. In fact, most cases (G01, G02, G05, G07) are in quantitative agreement with the experiment (± 1 kcal mol) and the overall performance statistics are excellent: for RMSE = 1.45 kcal/mol; MAE = 0.96 kcal/mol; Pearson’s correlation (r) = 0.86; Spearman’s rank correlation (ρ) = 0.94, Kendall’s rank correlation = 0.91(τ). Compared to SaMD, the results from docking underestimate the binding free energy, which suggests that lower-energy conformations of the Host–Guest complex can be sampled with MD, but not with the MM protocols.

Knowledge-based approach

For GDCC prediction, as there was an important amount of pre-existing data from previous challenges, we decided to try an orthogonal approach-based ML. The dataset includes 35 compounds in total, belonging to three classes of host systems that are similar in structure and chemical composition: OA, TEMOA, and exoOA (Fig. 6). The binding free energy values range between − 3.73 and − 8.38 kcal/mol. The final model (see Methods) is a neural network, using 90 CORINA descriptors (60 describing the guest and 30 describing the host system). As expected, the predictions for the training set are very accurate, with RMSE = 0.92 kcal/mol and all the predicted values within a 1 kcal/mol range from the experimental values (Fig. 7a). For the test set, all the predicted values are close to the experimental one, with maximum and minimum errors of -1.49 kcal/mol and + 0.22 kcal/mol, respectively.

The GDCC-7 dataset to be predicted this year consisted of 8 guest compounds (4 charged and 4 non-charged) binding to two related host systems. After the model has been optimized, it takes only 10 s to calculate the free binding energy of the 8 guests in the 2 hosts. With RMSE and MAE values of 1.67 kcal/mol and 1.21 kcal/mol, respectively, the overall performance is rather satisfactory, especially by comparison with the thermodynamic-based approach. It is worth noting that the four negative guests are not predicting well, which can be explained by the limits of the model imposed by the composition of the training set: since the least favorable binding free energy value is − 3.73 kcal/mol, the model can’t predict more positive values. Even then, the hierarchy between the guest values is respected (G4 < G3 < G2). There is no experimental value for G1, so it has not been considered for this analysis. If we apply the same analysis to every subgroup (based on the positive or negative charge and the host they are interacting with) we obtain an almost perfect hierarchical prediction. The only exception is the OA-G7 complex, which was predicted lower than OA-G6 due to the fact that OA-G7 has been underestimated (− 5.67 kcal/mol instead of − 6.98 kcal/mol) while OA-G6 have been predicted very close to his experimental values (− 5.92 for − 5.83 experimental values).

In fact, all systems, except for the four negative compounds interacting with exo-OA, are predicted within 1 kcal/mol of the experimental values (Fig. 8). For the complexes involving the OA system, which features prominently in the training set, the predictions are better still, with MAE = 0.55 kcal/mol and RMSE = 0.85 kcal/mol.

Conclusions

The participation in SAMPL7 allowed us to test two orthogonal approaches to calculate host–guest binding free energies, identifying in each case strengths and limitations that will be considered for the final design of an automated platform.

The thermodynamic-based approach is absolutely general and can be used, in principle, on any host–guest system. The use of an advanced semiempirical basis set (GFN2B) to calculate energies and thermostatistical corrections offers increased performance relative to MM approaches with a moderate computational cost (1–2 h on a single CPU) and eliminates the dependency on small-molecule force-fields, which are often inaccurate [55, 56]. However, we have identified two critical aspects that can lead to incorrect predictions. The first one is a critical dependency on the structure of the host–guest complex used to generate the prediction (the binding mode). For systems with significant host flexibility, rigid receptor docking can be inappropriate, and host conformational sampling is necessary. Direct observation of the host–guest pair formation through molecular dynamics with explicit solvent is an optimal solution in terms of quality of the binding free energy predictions but can be unpractical due to the long simulation times, which increase with the number of degrees of freedom of the system. Future implementations of the platform will consider modifications of the MD parameters to increase efficiency, including parallelization, GPU-based implementations, and/or various MD software. In the trimertrip case, we identified a slow transition between the closed and open conformation of the host as the bottleneck in the association process. For such cases, starting the SaMD simulations with open host conformations can yield excellent results at a fraction of the simulation cost. The second limitation of our approach is the implicit solvation method (GBSA) which can underestimate the desolvation cost of ionic species in aqueous solvation, leading to the formation of ionic pairs whose contribution is overvalued. Other reports have observed a systematic bias with implicit solvation models [57]. We do not observe such systematic bias, but the implicit solvation model remains one of the weaknesses of the approach. More recent xtb versions have replaced the GB formalism for an analytically linearized Poisson-Boltzmann (ALPB) model. It will be interesting to check the performance of ALPB in future SAMPL editions. In any case, the explicit solvation in MD simulations is better suited to preserve the solvation shells around the solute’s ionic groups. Thus, the use of MD snapshots as input geometries in xtb-GFN2B calculations seems to provide better results than exhaustive conformational sampling with implicit solvation.

The use of knowledge-based methods can be highly advantageous when there is sufficient pre-existing data. Contrary to protein–ligand complexes, where a large body of data exists, host–guest systems cannot benefit from massive training sets. Thus, we were particularly interested in examining the suitability of machine learning approaches, with a particular concern on the risk of overfitting. The results obtained on the GDCC system are really encouraging and motivate us to build a database of host–guest systems, with their corresponding binding free energies, and train both general and host-specific models. Two critical aspects that will be explored are the use of other molecular descriptors to improve the predictions and the introduction of selection criteria to decide when a particular system is within the scope of the model.

Overall, the participation in SAMPL7 has allowed us to design an automatic pipeline to compute binding free energies for any Host–Guest system. We currently implementing and improving the protocol, that will be tested in subsequent SAMPL editions.

Results overview

See Table 1.

Statistical analysis

See Table 2.

Table 2 Statistical analysis of SAMPL3 calculation, SAMPL7 TRIMERTRIP and SAMPL7 GDCC prediction

Full size table

References

Wenz G (2000) An overview of host-guest chemistry and its application to nonsteroidal anti-inflammatory drugs. Clin Drug Investig 19:21–25. https://doi.org/10.2165/00044011-200019002-00003
Article CAS Google Scholar
Ma X, Zhao Y (2015) Biomedical applications of supramolecular systems based on host-guest interactions. Chem Rev 115:7794–7839. https://doi.org/10.1021/cr500392w
Article CAS PubMed Google Scholar
Yu G, Chen X (2019) Host-guest chemistry in supramolecular theranostics. Theranostics 9:3041–3074. https://doi.org/10.7150/thno.31653
Article CAS PubMed PubMed Central Google Scholar
Zhang J, Ma PX (2010) Host–guest interactions mediated nano-assemblies using cyclodextrin-containing hydrophilic polymers and their biomedical applications. Nano Today 5:337–350. https://doi.org/10.1016/j.nantod.2010.06.011
Article CAS PubMed PubMed Central Google Scholar
Mantooth SM, Munoz-Robles BG, Webber MJ (2019) Dynamic hydrogels from host-guest supramolecular interactions. Macromol Biosci 19:1800281. https://doi.org/10.1002/mabi.201800281
Article CAS Google Scholar
Webber MJ, Langer R (2017) Drug delivery by supramolecular design. Chem Soc Rev 46:6600–6620. https://doi.org/10.1039/C7CS00391A
Article CAS PubMed Google Scholar
Yi JW, Barry NPE, Furrer MA et al (2012) Delivery of floxuridine derivatives to cancer cells by water-soluble organometallic cages. Bioconjugate Chem 23:461–471. https://doi.org/10.1021/bc200472n
Article CAS Google Scholar
Muddana HS, Fenley AT, Mobley DL, Gilson MK (2014) The SAMPL4 host–guest blind prediction challenge: an overview. J Comput Aided Mol Des 28:305–317. https://doi.org/10.1007/s10822-014-9735-1
Article CAS PubMed PubMed Central Google Scholar
Yin J, Henriksen NM, Slochower DR et al (2017) Overview of the SAMPL5 host–guest challenge: are we doing better? J Comput Aided Mol Des 31:1–19. https://doi.org/10.1007/s10822-016-9974-4
Article CAS PubMed Google Scholar
Rizzi A, Murkli S, McNeill JN et al (2018) Overview of the SAMPL6 host–guest binding affinity prediction challenge. J Comput Aided Mol Des 32:937–963. https://doi.org/10.1007/s10822-018-0170-6
Article CAS PubMed PubMed Central Google Scholar
Guthrie JP (2009) A blind challenge for computational solvation free energies: introduction and overview. J Phys Chem B 113:4501–4507. https://doi.org/10.1021/jp806724u
Article CAS PubMed Google Scholar
Geballe MT, Skillman AG, Nicholls A et al (2010) The SAMPL2 blind prediction challenge: introduction and overview. J Comput Aided Mol Des 24:259–279. https://doi.org/10.1007/s10822-010-9350-8
Article CAS PubMed Google Scholar
Mobley DL, Gilson MK (2017) Predicting binding free energies: Frontiers and benchmarks. Annu Rev Biophys 46:531–558. https://doi.org/10.1146/annurev-biophys-070816-033654
Article CAS PubMed PubMed Central Google Scholar
García MA, Meurs EF, Esteban M (2007) The dsRNA protein kinase PKR: virus and cell control. Biochimie 89:799–811. https://doi.org/10.1016/j.biochi.2007.03.001
Article CAS PubMed Google Scholar
Ganesan A, Coote ML, Barakat K (2017) Molecular dynamics-driven drug discovery: leaping forward with confidence. Drug Discov Today 22:249–269. https://doi.org/10.1016/j.drudis.2016.11.001
Article CAS PubMed Google Scholar
Raha K, Merz KM (2004) A quantum mechanics-based scoring function: study of zinc ion-mediated ligand binding. J Am Chem Soc 126:1020–1021. https://doi.org/10.1021/ja038496i
Article CAS PubMed Google Scholar
Eldridge MD, Murray CW, Auton TR et al (1997) Empirical scoring functions: I. The development of a fast empirical scoring function to estimate the binding affinity of ligands in receptor complexes. J Comput Aided Mol Des 11:425–445. https://doi.org/10.1023/A:1007996124545
Article CAS PubMed Google Scholar
Wójcikowski M, Ballester PJ, Siedlecki P (2017) Performance of machine-learning scoring functions in structure-based virtual screening. Sci Rep 7:46710. https://doi.org/10.1038/srep46710
Article PubMed PubMed Central Google Scholar
Genheden S, Ryde U (2015) The MM/PBSA and MM/GBSA methods to estimate ligand-binding affinities. Expert Opin Drug Discov 10:449–461. https://doi.org/10.1517/17460441.2015.1032936
Article CAS PubMed PubMed Central Google Scholar
Bannwarth C, Ehlert S, Grimme S (2019) GFN2-xTB—an accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J Chem Theory Comput 15:1652–1671. https://doi.org/10.1021/acs.jctc.8b01176
Article CAS PubMed Google Scholar
Huang N, Shoichet BK, Irwin JJ (2006) Benchmarking sets for molecular docking. J Med Chem 49:6789–6801. https://doi.org/10.1021/jm0608356
Article CAS PubMed PubMed Central Google Scholar
Cross JB, Thompson DC, Rai BK et al (2009) Comparison of several molecular docking programs: pose prediction and virtual screening accuracy. J Chem Inf Model 49:1455–1474. https://doi.org/10.1021/ci900056c
Article CAS PubMed Google Scholar
Trott O, Olson AJ (2009) AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J Comput Chem. https://doi.org/10.1002/jcc.21334
Article Google Scholar
Haug EJ, Arora JS, Matsui K (1976) A steepest-descent method for optimization of mechanical systems. J Optim Theory Appl 19:401–424. https://doi.org/10.1007/BF00941484
Article Google Scholar
Shewchuk JR (1994) An introduction to the conjugate gradient method without the agonizing pain
Case DA, Belfon K, Ben-Shalom IY, Brozell SR, Cerutti DS, Cheatham TE III, Cruzeiro VWD, Darden TA, Duke RE, Giambasu G, Gilson MK, Gohlke H, Goetz AW, Harris R, Izadi S, Iz-mailov SA, Kasavajhala K, Kovalenko A, Krasny R, Kurtzman T, Lee TS, LeGrand S, Li P, Lin C, Liu J, Luchko T, Luo R, Man V, Merz KM, Miao Y, Mikhailovskii O, Monard G, Nguyen H, Onufriev A, Pan F, Pantano S, Qi R, Roe DR, Roitberg A, Sagui C, Schott-Verdugo S, Shen J, Simmerling CL, Skrynnikov NR, Smith J, Swails J, Walker RC, Wang J, Wilson L, Wolf RM, Wu X, Xiong Y, Xue Y, York DM, Kollman PA (2020) AMBER 2020. University of California, San Francisco
Google Scholar
Jakalian A, Jack DB, Bayly CI (2002) Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. J Comput Chem 23:1623–1641. https://doi.org/10.1002/jcc.10128
Article CAS PubMed Google Scholar
Mark P, Nilsson L (2001) Structure and dynamics of the TIP3P, SPC, and SPC/E water models at 298 K. J Phys Chem A 105:9954–9960. https://doi.org/10.1021/jp003020w
Article CAS Google Scholar
Lindahl E, Hess B, van der Spoel D (2001) GROMACS 3.0: a package for molecular simulation and trajectory analysis. J Mol Model 7:306–317. https://doi.org/10.1007/s008940100045
Article CAS Google Scholar
Hess B, Kutzner C, van der Spoel D, Lindahl E (2008) GROMACS 4: algorithms for highly efficient, load-balanced, and scalable molecular simulation. J Chem Theory Comput 4:435–447. https://doi.org/10.1021/ct700301q
Article CAS PubMed Google Scholar
Pronk S, Páll S, Schulz R et al (2013) GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29:845–854. https://doi.org/10.1093/bioinformatics/btt055
Article CAS PubMed PubMed Central Google Scholar
Berendsen HJC, van der Spoel D, van Drunen R (1995) GROMACS: a message-passing parallel molecular dynamics implementation. Comput Phys Commun 91:43–56. https://doi.org/10.1016/0010-4655(95)00042-E
Article CAS Google Scholar
Van Der Spoel D, Lindahl E, Hess B et al (2005) GROMACS: fast, flexible, and free. J Comput Chem 26:1701–1718. https://doi.org/10.1002/jcc.20291
Article CAS Google Scholar
Abraham MJ, Murtola T, Schulz R et al (2015) GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1–2:19–25. https://doi.org/10.1016/j.softx.2015.06.001
Article Google Scholar
Bussi G, Donadio D, Parrinello M (2007) Canonical sampling through velocity rescaling. J Chem Phys 126:014101. https://doi.org/10.1063/1.2408420
Article CAS PubMed Google Scholar
Berendsen HJC, Postma JPM, van Gunsteren WF et al (1984) Molecular dynamics with coupling to an external bath. J Chem Phys 81:3684–3690. https://doi.org/10.1063/1.448118
Article CAS Google Scholar
Bussi G, Zykova-Timan T, Parrinello M (2009) Isothermal-isobaric molecular dynamics using stochastic velocity rescaling. J Chem Phys 130:074101. https://doi.org/10.1063/1.3073889
Article CAS PubMed Google Scholar
Levesque D, Verlet L (1993) Molecular dynamics and time reversibility. J Stat Phys 72:519–537. https://doi.org/10.1007/BF01048022
Article Google Scholar
Wang J, Wolf RM, Caldwell JW et al (2004) Development and testing of a general amber force field. J Comput Chem 25:1157–1174. https://doi.org/10.1002/jcc.20035
Article CAS PubMed Google Scholar
Caldeweyher E, Ehlert S, Hansen A et al (2019) A generally applicable atomic-charge dependent London dispersion correction. J Chem Phys 150:154122. https://doi.org/10.1063/1.5090222
Article CAS PubMed Google Scholar
Grimme S, Bannwarth C, Shushkov P (2017) A robust and accurate tight-binding quantum chemical method for structures, vibrational frequencies, and noncovalent interactions of large molecular systems parametrized for all spd-block elements ( Z = 1–86). J Chem Theory Comput 13:1989–2009. https://doi.org/10.1021/acs.jctc.7b00118
Article CAS PubMed Google Scholar
Bohle F, Grimme S (2019) Efficient structural and energetic screening of fullerene encapsulation in a large supramolecular double decker macrocycle. J Serb Chem Soc 84:837–844. https://doi.org/10.2298/JSC190701079B
Article CAS Google Scholar
Bursch M, Neugebauer H, Grimme S (2019) Structure optimisation of large transition-metal complexes with extended tight-binding methods. Angew Chem Int Ed 58:11078–11087. https://doi.org/10.1002/anie.201904021
Article CAS Google Scholar
Chemoinformatics ProgramPackage CORINA Symphony, developed and distributed by Molecular Networks GmbH, Nuremberg, Germany and Altamira LLC, Columbus, OH, USA. www.mn-am.com
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Kuhn M. Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang and Can Candan. (2016). caret: classification and regression training. R package version 6.0–71. https://CRAN.R-project.org/package=caret
Kassambara A, Mundt F (2017) factoextra: extract and visualize the results of multivariate data analyses. R package version 1.0.5. https://CRAN.R-project.org/package=factoextra. Accessed June 2020
Venables WN, Ripley BD (2002) Modern applied statistics with S, 4th edn. Springer, New York. Accessed June 2020
Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2019). e1071: misc functions of the department of statistics, probability theory group (formerly: E1071), TU Wien. R package version 1.7–2. https://CRAN.R-project.org/package=e1071. Accessed June 2020
Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22
Google Scholar
Kirch W (2008) Pearson’s correlation coefficient. Encyclopedia of public health. Springer, Dordrecht, pp 1090–1091. Accessed June 2020
Dodge Y (2008) Spearman rank correlation coefficient. The concise encyclopedia of statistics. Springer, New York, pp 502–505
Google Scholar
Puka L (2011) Kendall’s Tau. In: Lovric M (ed) International encyclopedia of statistical science. Springer, Berlin, pp 713–715. Accessed June 2020
Muddana HS, Daniel Varnado C, Bielawski CW et al (2012) Blind prediction of host–guest binding affinities: a new SAMPL3 challenge. J Comput Aided Mol Des 26:475–487. https://doi.org/10.1007/s10822-012-9554-1
Article CAS PubMed PubMed Central Google Scholar
Butler KT, Luque FJ, Barril X (2009) Toward accurate relative energy predictions of the bioactive conformation of drugs. J Comput Chem 30:601–610. https://doi.org/10.1002/jcc.21087
Article CAS PubMed Google Scholar
Tirado-Rives J, Jorgensen WL (2006) Contribution of conformer focusing to the uncertainty in predicting free energies for protein−ligand binding. J Med Chem 49:5880–5884. https://doi.org/10.1021/jm060763i
Article CAS PubMed Google Scholar
Huai Z, Yang H, Li X, Sun Z SAMPL7 TrimerTrip host-guest binding affinities from extensive alchemical and end-point free energy calculations. 27

Download references

Acknowledgements

This work has been supported by the European Union through the NOAH project (H2020-MSCA-ITN project Ref. 765297).

Author information

Authors and Affiliations

Laboratoire de Synthèse des Assemblages Moléculaires Multifonctionnels, Institut de Chimie de Strasbourg, CNRS/UMR 7177, Université de Strasbourg, Strasbourg, France
Dylan Serillon
Institut de Biomedicina de la Universitat de Barcelona (IBUB) and Facultat de Farmacia, Universitat de Barcelona, Av. Joan XXIII 27-31, 08028, Barcelona, Spain
Dylan Serillon & Xavier Barril
Institut Català d’Investigació Química (ICIQ), The Barcelona Institute of Science and Technology, Av. Països Catalans, 17, 43007, Tarragona, Spain
Carles Bo
Catalan Institution for Research and Advanced Studies (ICREA), Passeig Lluis Companys 23, 08010, Barcelona, Spain
Xavier Barril

Authors

Dylan Serillon
View author publications
You can also search for this author in PubMed Google Scholar
Carles Bo
View author publications
You can also search for this author in PubMed Google Scholar
Xavier Barril
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dylan Serillon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Serillon, D., Bo, C. & Barril, X. Testing automatic methods to predict free binding energy of host–guest complexes in SAMPL7 challenge. J Comput Aided Mol Des 35, 209–222 (2021). https://doi.org/10.1007/s10822-020-00370-6

Download citation

Received: 04 August 2020
Accepted: 21 December 2020
Published: 19 January 2021
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10822-020-00370-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Testing automatic methods to predict free binding energy of host–guest complexes in SAMPL7 challenge

Abstract

Similar content being viewed by others

Overview of the SAMPL6 host–guest binding affinity prediction challenge

Blinded predictions of host-guest standard free energies of binding in the SAMPL5 challenge

The SAMPL5 host–guest challenge: computing binding free energies and enthalpies from explicit solvent simulations by the attach-pull-release (APR) method

Introduction