Vytautas
Gapsys§
a,
Laura
Pérez-Benito§
b,
Matteo
Aldeghi
a,
Daniel
Seeliger
c,
Herman
van Vlijmen
b,
Gary
Tresadern
*b and
Bert L.
de Groot
*a
aComputational Biomolecular Dynamics Group, Department of Theoretical and Computational Biophysics, Max Planck Institute for Biophysical Chemistry, D-37077 Göttingen, Germany. E-mail: bgroot@gwdg.de
bComputational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., Turnhoutseweg 30, B-2340 Beerse, Belgium. E-mail: gtresade@its.jnj.com
cMedicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Strasse 65, D-88397 Biberach a.d. Riss, Germany
First published on 2nd December 2019
Ligand binding affinity calculations based on molecular dynamics (MD) simulations and non-physical (alchemical) thermodynamic cycles have shown great promise for structure-based drug design. However, their broad uptake and impact is held back by the notoriously complex setup of the calculations. Only a few tools other than the free energy perturbation approach by Schrödinger Inc. (referred to as FEP+) currently enable end-to-end application. Here, we present for the first time an approach based on the open-source software pmx that allows to easily set up and run alchemical calculations for diverse sets of small molecules using the GROMACS MD engine. The method relies on theoretically rigorous non-equilibrium thermodynamic integration (TI) foundations, and its flexibility allows calculations with multiple force fields. In this study, results from the Amber and Charmm force fields were combined to yield a consensus outcome performing on par with the commercial FEP+ approach. A large dataset of 482 perturbations from 13 different protein–ligand datasets led to an average unsigned error (AUE) of 3.64 ± 0.14 kJ mol−1, equivalent to Schrödinger's FEP+ AUE of 3.66 ± 0.14 kJ mol−1. For the first time, a setup is presented for overall high precision and high accuracy relative protein–ligand alchemical free energy calculations based on open-source software.
Free energy perturbation (FEP)8,9 and thermodynamic integration (TI)10 are popular methods used for alchemical RBFE estimation. The application of FEP in alchemical calculations dates back several decades and it typically uses molecular dynamics (MD) or Monte Carlo simulations to compute the free-energy difference between two structurally related ligands, making it ideal for LO.11–15 Equilibrium FEP is arguably the most common implementation of alchemical calculations and involves many distinct equilibrium MD simulations for all states along a λ coordinate that alchemically modifies the first ligand into the second. It is common to use 12, 15 or more so-called λ intermediates wherein atoms that need to appear, disappear, or mutate between the two ligands are represented by a linear combination of end-state Hamiltonians. During alchemical transformations, van der Waals and sometimes electrostatic interactions are softened to avoid singularities and numerical instabilities.16–18 Various methods exist to calculate the free energy associated with a change of the λ coordinate, but a requisite for convergence is an overlap in conformational space between neighboring simulations along the λ path. TI differs from FEP in the way free energy difference is calculated as a function of λ: integration of the derivative of the Hamiltonian with respect to λ results in the free energy difference between end states. For FEP, if 1, 5 or 10 ns trajectories are required per λ window in both solvent and complex the computation becomes expensive when performing hundreds or thousands of perturbations in a drug discovery LO program. Recently, however, this cost has been dramatically reduced by using graphics processing units (GPU) or massively parallel resources.19–21 For instance, Schrödinger's FEP+3 implementation uses the GPU-enabled MD-engine Desmond.22 This has led to an explosion of interest in this approach. In turn, application of FEP to a vast range of protein–ligand systems revealed that the method can indeed deliver accurate relative binding affinity predictions with an error of <1 kcal mol−1 with respect to experiment.23–36 However, the application of FEP using most MD software remains challenging, preventing its widescale uptake.
Contrary to naturally-occurring amino acids, small molecules cover an almost infinite chemical space. Hence, deriving appropriate force field parameters for ligands can itself be challenging, and several recent reports address this.37–39 The challenge in the RBFE calculations setup is to automatically recognize the structural differences between the ligands and prepare a sensible hybrid topology for MD simulations. Several programs that help with this40–43 and other steps in the process44,45 have been reported. Work from the de Groot lab has led to the development of pmx,46,47 a tool to prepare inputs for alchemical free energy calculations48 in GROMACS.49 So far, pmx has delivered accurate results for the prediction of the effect of protein mutations on thermodynamic stabilities,27,35,50,51 changes in protein–protein interaction free energies,27 shifts in the equilibria between protein conformational substates,52 as well as DNA nucleotide mutations.29 In this report, we demonstrate the first application of pmx to relative protein–ligand binding free energies.
In our approach, pmx is used to identify optimal mappings between ligand atoms and generate hybrid structures and topologies for subsequent GROMACS-based free energy calculations. In contrast to the typical FEP approach based on equilibrium sampling described above, we estimate free energy differences with alchemical non-equilibrium transitions using a TI approach. Equilibrium simulations are first performed on the ligand-bound and -unbound states; then, short non-equilibrium simulations are used to perturb the ligands. Hundreds of short perturbations can be performed in the forward and backward direction, starting from snapshots covering the conformational space sampled from the equilibrated end states. The resulting free energy difference is derived from the overlap of work distributions associated with the forward and backward transitions using the Crooks Fluctuation Theorem.53
A primary feature that discriminates between equilibrium and non-equilibrium alchemical approaches is the amount of sampling performed at the physical end states. Equilibrium FEP employs a number of intermediate non-physical simulations along the alchemical path and only two simulations sample the physical end states. The free energy difference of interest is, however, solely defined by the end states – in fact, the role of the intermediate states is merely to ensure a converged ΔG estimate. The non-equilibrium approach, in contrast, invests more sampling time in the end states, as only very short simulations in alchemical space are performed to connect the physical end states. In a few studies, the efficiency of the non-equilibrium approaches was compared to that of equilibrium methods. However, which of the two approaches is more efficient in practice is yet to be determined conclusively. For example, Ytreberg et al.54 and Goette and Grubmüller55 found bi-directional non-equilibrium approaches to be more efficient than equilibrium FEP. In contrast, Yildirim et al.56 found equilibrium FEP to be more efficient; however, criticism of this study with respect to how efficiency was defined was expressed.57 Notwithstanding the lack of consensus in the scientific community on this matter, our non-equilibrium protocols58,59 have already provided high-accuracy predictions in a number of applications involving amino acid and nucleotide mutations.27,29,34,35,59,60
Here, we use pmx to calculate the difference in binding free energy for 482 ligand perturbations across 13 different ligand–protein activity datasets in two contemporary force fields. The calculated free energy differences were combined into a consensus estimate from the results of both force fields providing further increase in accuracy. In this case the consensus approach consists of a simple averaging, but future extensions may also involve more sophisticated schemes, e.g. employing machine learning approaches to assign different weights to force fields.27 We also used the commercial FEP+ implementation from Schrödinger as a state-of-the-art comparison. This is one of the largest protein ligand relative free energy calculation studies to date, and amongst the first providing a large-scale comparison of implementations on different MD-engine software.61 The overall average unsigned error (AUE) of the predicted ΔΔG was 3.64 ± 0.14 kJ mol−1 with pmx and 3.66 ± 0.14 kJ mol−1 with FEP+. The pmx tool is freely available at https://github.com/deGrootLab/pmx.
Having parameterized the ligands, hybrid structures and topologies for the ligand pairs were generated using pmx. A mapping between the atoms of two molecules was established following a predefined set of rules to ensure minimal perturbation and system stability during the simulations. pmx follows a sequential, dual mapping approach. In the first step, pmx identifies the maximum common substructure between the two molecules and proposes this as a basis for mapping. In the second step, pmx superimposes the molecules and suggests a mapping based on the inter-atomic distances. Finally, the mapping with more atoms identified for direct morphing between the ligands is selected. Additionally, pmx ensures that no ring breaking and disconnected fragments in the mapping occur. The obtained mapping is used to create hybrid structures and topologies following a single topology approach.
The simulation systems for the solvated ligands and ligand–protein complexes were prepared by placing the molecules in dodecahedral boxes with at least 1.5 nm distance to the box walls. The TIP3P water model76 was used to solvate the molecules. Sodium and chloride ions were added to neutralize the systems and reach 150 mM salt concentration. Proteins were parameterized in two different force fields: Amber99sb*ILDN77–79 and CHARMM36m.80 Ion parameters by Joung & Cheatham81 were used for simulations in Amber/GAFF force field; Charmm/CGenFF simulations were performed with the default Charmm ion parameters.
For every pair of ligands, the prepared systems were simulated in their physical state A and state B, representing ligand 1 and ligand 2, respectively. Firstly, the systems were energy minimized, followed by a 10 ps equilibration in the NVT ensemble at a temperature of 298 K. Afterwards, the production runs were performed for 6 ns in the NPT ensemble at 298 K and a pressure of 1 bar. Subsequently, 80 snapshots were extracted equidistantly from each of the trajectories generated, after discarding the first 2 ns accounting for the system equilibration. From each extracted configuration, an alchemical transition was started (from A to B and vice versa). Every transition was performed in 50 ps. This procedure adds up to 20 ns of simulation time invested to calculate one free energy difference for the ligand in its bound/solvated state. We used 3 replicas of every ΔΔG calculation, in total investing 60 ns for one ΔG estimate, which is equivalent to the simulation time employed by one repeat of the FEP+ approach.
The temperature in the simulations was controlled by the velocity rescaling thermostat82 with a time constant of 0.1 ps. The pressure was kept at 1 bar by means of the Parrinello–Rahman barostat83 with a time constant of 5 ps. All bond lengths were constrained using the LINCS algorithm.84 Particle Mesh Ewald (PME)85,86 was used to treat long-range electrostatics: a direct space cutoff of 1.1 nm and a Fourier grid spacing of 0.12 nm were used, and the relative strength of the interactions at the cutoff was set to 10−5. The van der Waals interactions were smoothly switched off between 1.0 and 1.1 nm. A dispersion correction for energy and pressure was used. For the alchemical transitions the non-bonded interactions were treated with a modified soft-core potential.18
For every transition, the derivatives of the Hamiltonian with respect to the λ parameter were recorded and subsequently used to obtain the work associated with each transition. The maximum likelihood estimator87 based on the Crooks Fluctuation Theorem53 was used to relate the non-equilibrium work distributions to the equilibrium free energy differences. The standard errors of the ΔG estimates were obtained by bootstrap. These were propagated when calculating the ΔΔG values for the individual and consensus force field results. The consensus approach comprises averaging the estimated ΔΔG values from different force fields and multiple replicas, where every replica encompasses the full free energy calculation procedure including equilibration, production and transition runs.
The double free energy differences (ΔΔG) were compared to experimental measurements by calculating average unsigned errors (AUE), Pearson correlation coefficients, and the percentage of estimates deviating from experiment by less than 1 kcal mol−1 (4.184 kJ mol−1). The errors for these observables were bootstrapped and reflect the variability in the datasets analyzed.
Fig. 1 Average unsigned errors (AUE, upper plots) and correlations (lower plots) between the calculated and experimentally measured double free energy differences. In the FEP+ panels, the dark red circles represent the three separate replica calculations, and the dark red square the results when the ΔΔG values per ligand are averaged over the three replicas. For the pmx GAFF and CGenFF panels, the circle symbols denote results averaged over three replicas (60 ns per ΔG in total). In the consensus panel, the results were averaged to correspond to 60 ns (circle) and 2 × 60 ns (square) of sampling time per ΔG estimate. (A) Averaging performed over all the investigated protein–ligand complexes; 482 ligand modifications in total. (B) Subset of systems analyzed by Wang et al.;3 330 ligand modifications. The light red circle in this panel corresponds to the result reported by Wang et al. (C) Subset of systems added in this work; 152 ligand modifications. |
The comparisons described above took into consideration all the simulations performed for each approach, i.e. a total of 3 × 60 ns for every ΔG estimate with FEP+, and 2 × 60 ns (i.e. 60 ns for each force field, GAFF and CGenFF: in total, 6 replicas of 20 ns each were combined for a ΔG estimate) for the pmx-based free energy calculations. When considering the equivalent time of 60 ns per ΔG value, the accuracies obtained by both approaches are nearly identical: FEP+ returns an AUE of 3.72 ± 0.15 kJ mol−1 and a correlation of 0.68 ± 0.03; the consensus force field pmx calculations yield an AUE of 3.72 ± 0.15 kJ mol−1 and a correlation of 0.63 ± 0.03.
The dataset analyzed in Fig. 1A can be decomposed into two subsets: the set of 8 protein–ligand systems (330 mutations) assembled and analyzed by Wang et al.3 and an additional set of 5 protein–ligand systems comprising 152 mutations. Wang et al. used an earlier version of the OPLS force field (v2.1) to investigate the subset of 330 mutations, thus, it is interesting to compare the evolution of the FEP+ method and force field with the accuracy of the open-source pmx-based calculations (Fig. 1B). Wang et al. reported an AUE of 3.87 ± 0.17 kJ mol−1. Subsequently, Harder et al. reported an improved AUE of 3.36 ± 0.15 kJ mol−1 for the same 8 protein–ligand systems with the OPLSv3 force field.88 However, the simulation time used for obtaining the latter result is not reported, complicating a direct comparison. In the current work, using FEP+ with the OPLSv3 force field and combining free energy estimates from three independent FEP+ runs resulted in an AUE of 3.66 ± 0.14 kJ mol−1. The non-equilibrium free energy calculations performed comparably to FEP+ and reached an AUE of 3.70 ± 0.17 kJ mol−1 when using 60 ns per ΔG estimate, and 3.58 ± 0.18 kJ mol−1 when using 2 × 60 ns. In terms of correlation, the newer OPLSv3 shows improvement over OPLSv2.1: 0.65 ± 0.04 versus 0.59 ± 0.03. The pmx-based calculations show slightly lower correlation of 0.55 ± 0.04. In a recent study, the Wang et al. dataset was investigated with equilibrium TI calculations using the Amber18 simulation package.89 The authors reported substantially worse performance than obtained in the current work: AUE of 4.9 kJ mol−1 and correlation of 0.48, investing 74 ns per ΔG estimate.
For the dataset of 152 mutations (Fig. 1C) assembled from the literature for this study, both FEP+ and non-equilibrium calculations reach similar correlation: 0.79 ± 0.04 and 0.76 ± 0.04, respectively. Interestingly, for this subset the AUE of the FEP+ predictions is lower than that of the consensus approach by 0.57 ± 0.33 kJ mol−1 (3.2 ± 0.21 and 3.77 ± 0.25 kJ mol−1 for FEP+ and pmx, respectively). These observations suggest the accuracy is dependent on the particular protein–ligand system studied. It is also important to note that the number of data points varies among the datasets, ranging from 7 in the case of galectin, to 71 in the case of MCL1. This emphasizes the importance of using large datasets for reliable method comparison.
For all the sets depicted in Fig. 1, the GAFF force field outperforms CGenFF. Combining the results of both into a consensus estimate consistently yields a higher, or at least equivalent, accuracy compared to the GAFF force field. Increasing the simulation time invested to obtain a ΔΔG estimate has only a marginal effect on the results, given the time scales considered (at least 60 ns per ΔG). We have also probed the effect of simulation length on FEP+ accuracy by running 1 ns per λ window and using 3 replicas, resulting in 36 ns per ΔG estimate, as opposed to the standard protocol using 5 ns per λ window (180 ns per ΔG). Also in this case, the accuracy was only marginally affected by the shorter simulations: AUE of 3.88 ± 0.15 (1 ns) and 3.66 ± 0.14 kJ mol−1 (5 ns), and correlation 0.68 ± 0.03 and 0.69 ± 0.03, respectively.
To further assess the sensitivity of the GROMACS calculations to the invested sampling time, we estimated ΔΔG values after discarding half of the simulation time. Such a protocol resulted in a setup using 3 replicas of 10 ns, which closely matches the 1 ns FEP+ protocol (1 ns × 12 λ-windows × 3 replicas). The AUE of the GAFF calculations was 4.03 ± 0.16 kJ mol−1 and the Pearson correlation 0.59 ± 0.03. The CGenFF calculations had an AUE of 4.7 ± 0.19 kJ mol−1 and correlation of 0.53 ± 0.04. The modest decrease in accuracy matches well with the similar effect observed for the FEP+ calculations. It appears that even the shorter investigated sampling times are sufficient to explore the local minima in the vicinity of the starting structure to obtain a converged free energy estimate. This is corroborated by our earlier explorations of sampling strategies applied in drug resistance mutation studies, where investing 54 ns per ΔG value yielded converged results.35
The scatter plots of the calculated and experimental double free energy differences provide an intuitive understanding of the ranges spanned by the datasets and the calculated values (Fig. 2). While taken separately the GAFF and CGenFF force fields produce more outliers than FEP+ with OPLSv3 (Fig. 2 and S1‡), the consensus results reduce the number of outliers. The proportion of estimates falling within 1 kcal mol−1 (4.184 kJ mol−1) of experiment is 68 ± 2% and 66 ± 2% for the FEP+ and the pmx-based consensus force field approach respectively. The overall range spanned by the estimated ΔΔG values is comparable between the methods and force fields as well as similar to the distribution of experimental values (Fig. S2‡). The consensus non-equilibrium estimates were more accurate than FEP+ for the perturbations associated with a small ΔΔG (Fig. S4‡), whereas FEP+ was more accurate for larger ΔΔG perturbations.
A notable difference between the results of the methods is the magnitude of estimated errors (Fig. 2 and S3‡): the non-equilibrium free energy estimates have larger associated errors than those predicted by FEP+. It is important to note that error estimates for the individual ΔΔG values comprise both the uncertainty of the estimator and the standard error of the estimates coming from the different simulation replicas. Furthermore, the consensus approach increases the errors because the GAFF and CGenFF estimates may differ from each other more than the estimates obtained with individual force fields. While this feature allows for an increased prediction accuracy, it also increases the uncertainty associated with an estimate.
The improved accuracy due to combination of results from different force fields may seem counterintuitive. In fact, if both force fields yield ΔΔG estimates deviating from experiment in the same direction, the consensus approach would yield only an intermediate quality prediction falling in between the two individual force fields. Such an outcome would still be preferable in a prospective study, since relying on a single force field might lead the investigation in a wrong direction. In the current work, however, employing a consensus approach generally resulted in an improved prediction accuracy over any of the single force fields. This is only possible because in 33% of all the calculated double free energy differences the values obtained by GAFF and CGenFF force fields were pointing in opposite directions from the experimental measurement (see also Fig. S10‡ for a graphical depiction of the signed deviations from experiment for both force fields plotted one against the other).
The variable performance of calculated ΔΔG for individual protein–ligand complexes can be seen from the scatter plots in Fig. 4 (for the FEP+ estimates see Fig. S6‡). In the majority of cases, the estimates fall within 1 kcal mol−1 (4.184 kJ mol−1) of the experimental measurement. This indicates that the accuracy is mainly reduced by a small number of outliers. The latter observation holds for both the consensus approach based on the non-equilibrium calculations (Fig. 4) and FEP+ using the OPLSv3 force field (Fig. S6‡). Interestingly, both approaches have difficulties with the MCL1 dataset where only half of the estimates fall within 1 kcal mol−1 (4.184 kJ mol−1) of the experimental measurement. 45% of the non-equilibrium estimates fell outside this range for the BACE set of Hunt et al.64 and for the cMET set. FEP+ had comparable difficulties with the BACE set of Cumming et al.90 and PDE2.33
The range spanned by the ΔΔG values also has an influence on the prediction accuracy (Fig. S5‡). An illustrative example for this effect is a set of thrombin inhibitors. The experimental range of the double free energy differences is narrow. The non-equilibrium approach captured the ΔΔG values very accurately in terms of AUE (2.23 ± 0.57 kJ mol−1). However, no correlation for the small differences between the ligands was observed. In contrast, FEP+ had a significantly larger AUE of 4.51 ± 0.82 kJ mol−1. However, it was able to achieve moderate correlation (0.45 ± 0.18). In general, FEP+ obtained higher correlations: the averaged correlation coefficient was higher for 9 out of 13 datasets (Fig. S7‡). In terms of AUE, on average, FEP+ was more accurate than the pmx-based calculations for 7 out of 13 datasets. When compared to the previous generation of the OPLS force field (v2.1),3 the consensus force field approach performs better in 6 out of 8 cases in terms of AUE and 3/8 in terms of correlation. It is worth noting that the earlier FEP+ results reported by Wang et al. were more accurate for BACE and thrombin than those obtained here with the newer OPLS version.
Fig. 5 Detail of a PTP1B structure (PDB ID: 2qbs) depicting the close proximity of the thiol group of Cys215 to the carboxyl group of the co-crystallized inhibitor. |
We further probed whether the ligand's carboxyl group or Cys215 is more likely to be protonated. The empirical pKa predictor PROPKA 3.194,95 suggested that the pKa for the carboxyl group is less than 5.0 for every ligand in the set. The low carboxyl pKa was also confirmed by the ChemAxon's predictor.96 In contrast, the pKa for Cys215 in the complexed system was predicted to be between 9.8 and 10.5, depending on the inhibitor. Taken together, these observations suggest that Cys215 ought to be protonated for the inhibitor set synthesized by Wilson et al.93
Wang et al.,3 however, modeled a deprotonated variant of Cys215 in their free energy calculations, whilst also keeping the ligand's carboxyl group deprotonated. Although the carboxyl groups of the ligand are not modified in the alchemical simulations it is plausible that structurally diverse inhibitors may be affected differently by the two negative charges nearby. Using the Wang et al. setup with the deprotonated Cys215 we obtained similar quality free energy estimates (Fig. 6). Briefly, the Cys(−1) results from Wang et al. had an AUE and correlation of 3.87 ± 0.52 kJ mol−1 and 0.64 ± 0.06, respectively, compared to 3.66 ± 0.56 kJ mol−1 and 0.61 ± 0.08 from the pmx consensus predictions, also with similar outliers as seen in the correlation plots. Interestingly, the FEP+ calculations performed here using OPLSv3 showed substantially better accuracy (AUE of 2.8 ± 0.27 kJ mol−1 and correlation of 0.91 ± 0.03), suggesting the newer force field includes updates that have an improved representation of interactions between the deprotonated thiol and carboxyl group for the investigated set of ligands. Since our empirical prediction suggests that Cys215 could be protonated we have also calculated free energy differences with this protonation state. Interestingly, upon protonation of Cys215 the quality of FEP+ OPLSv3 prediction drops (Fig. 6): AUE 3.68 ± 0.49 kJ mol−1, correlation 0.8 ± 0.07.
Fig. 6 Details of the ΔΔG calculations for the PTP1B protein–ligand system. The top row depicts the experimental ΔΔG values plotted against the calculated results. The two bottom panels summarize these calculations in terms of AUE and Pearson correlation. From left to right: Wang et al.3 calculations using deprotonated Cys215; FEP+ with OPLS v3 using deprotonated Cys215; FEP+ with OPLS v3 with protonated Cys215; pmx-based consensus force field approach with deprotonated Cys215; pmx-based consensus force field approach with deprotonated, but neutral Cys215; pmx-based consensus force field approach with protonated Cys215. |
The pmx calculations using the consensus force field approach follow a different trend. When Cys215 is deprotonated and turned into a neutral residue (by redistributing charges on the side-chain atoms), the agreement with experiment increases. This artificially constructed cysteine residue should not be interpreted in physical terms (e.g. as a radical). It rather represents a convenient intermediate step between the negative deprotonated cysteine in the active site of PTP1B and the properly protonated neutral Cys215. Agreement with experiment further improves when Cys215 is protonated (Fig. 6): AUE 3.23 ± 0.42 kJ mol−1, correlation 0.74 ± 0.06. The increased accuracy when protonating Cys215 could be an artifact of a deficient parameterization of the thiolate group in Amber and CHARMM force fields.97 On the other hand, it may also suggest that the thiol group of the cysteine residue is protonated upon binding of the ligands from the investigated set of PTP1B inhibitors.
It is also important to note that here we only analyzed the effects of protonation changes of the cysteine's thiol group, while the protonation state of the ligands was kept fixed. For a complete picture of the free energy landscape underlying the affinity differences for this PTP1B ligand set it might be necessary to include alternative protonation states for the ligands98,99 and potentially allow the molecules to change their protonation upon binding. Although in the current analysis we verified ligand protonation states by means of empirical predictors, future systematic free energy calculations including ligand protonation effects may improve estimation accuracy.
The set of galectin inhibitors contains only 8 ligands connected by 7 perturbations. OPLSv3 performed particularly well in this case: AUE of 1.2 ± 0.5 kJ mol−1, correlation of 0.98 (Fig. 3). Both GAFF and CGenFF force fields show a lower accuracy in terms of AUE (3.0 ± 1.2 and 2.2 ± 0.4 kJ mol−1, respectively). In terms of correlation, GAFF has a below-average agreement with experiment and a large associated uncertainty (0.41 ± 0.4). A closer look into the ΔΔG estimates obtained with the GAFF force field highlights a peculiar case of possible error cancellation in the free energy estimates (Fig. 7). A large AUE for the perturbation transforming a methylamino group (–NHMe) to methoxy (–OMe) suggests that the parameterization of one or both of these moieties might be imperfect. However, perturbations of these groups into more chemically similar substituents gave more accurate ΔΔG estimates: methylamine to di-methylamine; methoxy to hydroxyl. The parameterization errors pertaining to a specific chemical group cancel out until transformations involving different chemistry (with different parameterization errors) are introduced: e.g. free energy differences within the group of ligands containing methylamine in the current case are represented correctly. Similarly, the free energy differences within the group of compounds containing methoxy and hydroxyl groups are accurately estimated. However, the free energy difference between these two sets of ligands containing different chemical groups is not captured accurately (at least not with the sampling time used in the current study).
A closer look at the major discrepancies between force fields reveals some peculiar trends that could be useful for further force field fine tuning. For example, in all four transformations with compound 4200_15, GAFF overestimates the binding affinity of this ligand in comparison to both CGenFF and experiment. Similarly, compound 400_10 is consistently (3 transformations) predicted by GAFF to be a higher affinity binder than determined experimentally. In contrast, all 6 transformations involving ligand 5300_8 with GAFF suggest the inhibitor to be a far worse binder than measured experimentally. Although pinpointing the exact force field parameters that are responsible for these inaccuracies is not trivial, the trends observed for certain chemical groups suggest likely candidates for re-parameterization. Similarly, we can envisage future work using large-scale scans of such calculated thermodynamic properties of biomolecular complexes to aid force field development.
The significant difference in standard errors obtained from repeated calculations represents an interesting difference between the FEP+ and non-equilibrium TI based free energy results. With an average standard error of 0.57 kJ mol−1 per ΔΔG value, FEP+ provides predictions with high precision. That is, the ΔΔG estimates from FEP+ converge to highly similar values, with little spread in the results. This might be a consequence of the enhanced sampling technique (REST65) ensuring convergence of the FEP+ simulations. pmx-based non-equilibrium calculations, on the other hand, come with higher uncertainty: 2.36 kJ mol−1 on average for the consensus results. The larger spread of the calculated ΔΔG values, in comparison to FEP+, suggests that the non-equilibrium calculations could still benefit from an increased convergence: longer simulations or an enhanced sampling approach present a compelling direction for further investigation. Considering that both FEP+ and pmx-based calculations have, on average, a similar AUE of ∼3.7 kJ mol−1, the high precision associated with FEP+ indicates that the method is highly precise even for those predictions that are substantially different from experiment. The pmx-based calculations give a larger prediction uncertainty, thus encompassing the experimental observation within the confidence interval of the estimate. It remains to be explored whether increased precision of the pmx-based calculations (using longer simulations or an enhanced sampling technique) will have an effect on the accuracy of free energy estimates.
The success of combining results from GAFF and CGenFF indicates differences in the force field parameterization. Naturally, the simplistic forms of the potential energy functions used by the classical molecular mechanics force fields cannot capture the full complexity of molecular interactions, for which a more complex representation would be required, e.g., polarizability.100,101 Force field parameterization based on a large number of quantum chemical calculations is helpful, as illustrated by the high accuracy achieved by FEP+ with the OPLSv3 force field. However, the simplified description of the potential energy leads to unavoidable, inherent limitations. Thus, at this time, combining estimates from different force fields may be an attractive avenue to pursue. Given that parameterization of different force fields relies on different theoretical premises, combining their results may indirectly capture features of molecular interactions that are inaccessible to a single force field. Finally, the significant prediction differences obtained when altering the protonation state of a single amino acid sidechain highlight the sensitivity of alchemical methods to the simulation setup and force field parameterization details. Furthermore, this example emphasizes the need for transparent and open-source force field parameters akin to those put forward by the Open Force Field Consortium.102
Footnotes |
† The input files used for simulations, as well as the calculated ΔΔG values are available on the pmx git repository: https://github.com/deGrootLab/pmx. |
‡ Electronic supplementary information (ESI) available. See DOI: 10.1039/c9sc03754c |
§ Contributed equally to the manuscript. |
This journal is © The Royal Society of Chemistry 2020 |