Yong
Wang
,
João Miguel
Martins
and
Kresten
Lindorff-Larsen
*
Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark. E-mail: lindorff@bio.ku.dk
First published on 12th July 2017
The behaviour of biomolecular systems is governed by their thermodynamic and kinetic properties. It is thus important to be able to calculate, for example, both the affinity and rate of binding and dissociation of a protein–ligand complex, or the populations and exchange rates between distinct conformational states. Because these are typically rare events, calculating these properties from long molecular dynamics simulations remains extremely difficult. Instead, one often adopts a divide-and-conquer strategy in which equilibrium free-energy differences and the fastest state-to-state transition (e.g. ligand association or minor-to-major state conversion) are combined to estimate the slow rate (e.g. ligand dissociation) using a two-state assumption. Here we instead address these problems by using a previously developed method to calculate both the forward and backward rates directly from simulations. We then estimate the thermodynamics from the rates, and validate these values by independent means. We applied the approach to three systems of increasing complexity, including the association and dissociation of benzene to a fully buried cavity inside the L99A mutant variant of T4 lysozyme. In particular, we were able to determine both millisecond association and dissociation rates, and the affinity, of the protein–ligand system by directly observing dozens of rare events in atomic detail. Our approach both sheds light on the precision of methods for calculating kinetics and further provides a generally useful test for the internal consistency of kinetics and thermodynamics. We also expect our route to be useful for obtaining both the kinetics and thermodynamics at the same time in more challenging cases.
Similarly to ligand binding, thermodynamics and kinetics are also linked in biomolecular conformational transitions, for example in protein folding. Indeed, one of the standard tests for whether protein folding conforms to a two-state model involves comparing the free energy difference estimated by equilibrium measurements with the value from kinetics.3
In principle, molecular dynamics (MD) simulations can provide a one-shot solution for calculating all of these properties. MD may, however, be limited by its inability to sample the long-timescale biological processes and the whole conformational space. Thus, although it has been possible to calculate rates (kfolding and kunfolding) and thermodynamics (ΔGfolding) from a single, long trajectory of protein folding,4 this has only been possible at temperatures where ΔG ≈ 0. To our knowledge, the same problem has so far not been solved for ligand binding or for cases where the two rates differ substantially in magnitude.
Instead of pursuing a one-shot solution, one may take a divide-and-conquer strategy in which the thermodynamics and kinetics are calculated separately using different, but complementary techniques.5–8 For the calculation of thermodynamics, several different methods have been developed. These methodologies can be categorized roughly into two types: enhanced sampling methods9 and alchemical methods.10,11 The enhanced sampling methods, including umbrella sampling (US),12 non-equilibrium approaches13 and metadynamics,14 are based on the idea that an external bias is added to the original force field so as to make rare events occur more rapidly, but in a way that the unbiased free energy difference can be calculated by reweighting techniques.15 The alchemical methods, including free-energy perturbation (FEP)16 and thermodynamic integration,17 are based on the idea that free energy differences can be calculated via a computationally convenient but not necessarily physical path because the free energy is a state function and path-independent.
While the free energy can often be calculated without knowing or studying the physical pathways that connect the reactant and product states (e.g. the bound and unbound states), the calculation of kinetics is generally more time-consuming and computationally expensive because the full physical reaction pathways have to be explored. It is still challenging but now possible to perform MD simulations up to a few milliseconds.18 Focusing on ligand-binding, it has recently been demonstrated that it is possible to observe spontaneous binding of ligands to their targets, e.g. the cancer drug dasatinib to Src kinase,19 the ‘beta blocker’ drugs propranolol and alprenolol to the β2-adrenergic receptor,6 a transition state analogue to the purine nucleoside phosphorylase enzyme8 and benzamidine to the serine protease trypsin.20–22 These observations demonstrate that the overall drug-binding process can successfully be observed starting from unbound states either in a few long trajectories6,8,19,20 or multiple shorter trajectories (analysed e.g. with the help of Markov state models or the weighted ensemble path sampling).21,23,24 The success of these examples results, in part, from the fact that the binding rate (1/τon = kon[L]) can be accelerated by simply increasing the ligand concentration [L]. However, to the best of our knowledge, unbiased MD simulations have yet not been used to determine the mechanism and kinetics of ligand escape for a tight protein–drug complex. Although it is potentially possible to determine the off rates from the height of the free energy barriers, ΔG‡off, this requires good reaction coordinates that represent the entire set of slowly varying degrees of freedom and also a good estimate of the pre-exponential factor Aoff to convert barrier height to a rate. Finding good reaction coordinates and calculating the pre-exponential factor are other challenging tasks. Thus, koff has usually been estimated from ΔGbinding and kon, while assuming two-state kinetics.6,8
Inspired by the work of Grubmüller25 and Voter,26 Tiwary and Parrinello recently developed an enhanced sampling method, coined ‘infrequent metadynamics’,9,27 to calculate the rate of a state-to-state transition. Metadynamics is an enhanced sampling method that discourages the system from sampling already visited conformational regions by continuously adding an external history-dependent repulsive potential at the present value of a small number of slowly changing order parameters, called collective variables (CVs).14 The traditional objective of metadynamics simulations is to reconstruct the underlying free energy surface, or potential of mean force (PMF), as a function of the CVs used in the simulations or other order parameters. In standard metadynamics simulations it is very difficult to obtain kinetic properties.28,29 The key idea of infrequent metadynamics is to bias the system with a frequency slow enough so that the transition state regions are not substantially biased; therefore the sequence of basin-to-basin transitions are unaffected and rates can be estimated by applying transition state theory. This method has recently been used to reproduce the kinetics of conformational change of a mutant of T4 lysozyme,30 unbinding of the inhibitor benzamidine from trypsin,31 the unfolding of the villin head-piece,32 and state-to-state transitions of other model systems.7,27,33
Here we use MD simulations with the aid of infrequent metadynamics to address the relationship between thermodynamics and kinetics in three systems of increasing complexity. In systems of slow conformational transitions and ligand association/dissociation, we show that infrequent metadynamics provides the necessary accuracy and efficiency to determine the kinetic and thermodynamic properties. To distinguish the thermodynamic properties obtained from different sources, we term the free energies from enhanced sampling methods (e.g. metadynamics) or alchemical methods (e.g. FEP) as ‘thermodynamic ΔG’, while the free energies calculated from the forward and backward rates we call ‘kinetic ΔG’. We in particular show that thermodynamic and kinetic ΔG values are generally consistent, even in cases where the rates are estimated from biased simulations or when the free energy landscape contains multiple intermediate states. In total, we provide a novel and feasible route to calculate kinetics and thermodynamics at the same time, and a useful consistency check for the calculated kinetic and thermodynamic parameters.
We estimated the free energy difference ΔGPMFα–β between the two states to be 2.6 ± 0.1 kcal mol−1 by summing the populations on the both sides of the barrier. To calculate the kinetic ΔGkineα–β, we obtained the transition times, τα→β and τβ→α for both directions (α to β and β to α) using the statistics from 40 independent unbiased MD simulations (Table 1). Due to the low computational cost to sample this model system, these simulations could be performed without enhanced-sampling techniques. From the calculated rates, we estimated ΔGkineα–β to be kBTln(τβ→α/τα→β) = 2.8 ± 0.2 kcal mol−1, in excellent agreement with thermodynamic ΔGkineα–β obtained from the potential of mean force.
τ α→β (ns) | τ β→α (ns) | ΔGkineα–β (kcal mol−1) | ΔGPMFα–β (kcal mol−1) |
---|---|---|---|
2.3 ± 0.6 | 231 ± 56 | 2.8 ± 0.2 | 2.6 ± 0.1 |
Fig. 2 Potential of mean force of Ace-Ala3-Nme at 300 K in implicit solvent. Representative conformations are shown next to the free energy basins. The potential of mean force as a function of the optimized collective variable allows us to estimate the free energy differences ΔGPMF between these four basins. The convergence properties of free energy differences are shown in Fig. S3.† |
Despite its simplicity, this model system displays a more complicated free energy landscape with multiple minima, and may thus serve as a relevant model for some of the complexities that one would expect to encounter in larger biological systems. In particular, this system violates the two-state assumption, allowing us to test the practical utility of our approach and the accuracy of free energy differences estimated from the kinetics in such cases. In particular, we asked the question: can we estimate all six free energy differences between states from the transition rates between states? If so, how precisely?
We obtained each transition time (e.g. τA→B) from 40 independent unbiased MD simulations starting from the corresponding reactant state (e.g. state A) and ending with the corresponding productive state (e.g. state B) (Table 2). In all cases but one, we observed complete state-to-state transitions for all 40 simulations. In the case of A-to-D transitions we observed only 28 successful events among 50 trials and 300 μs simulation time in total, and used a maximum likelihood method37 to estimate of τAD.
τ A→D | τ D→A | ΔGkineAD | ΔGPMFAD |
---|---|---|---|
10.7 ± 2.0 μs | 7.7 ± 1.0 ns | 4.3 ± 0.2 | 5.6 ± 0.2 |
τ A→C | τ C→A | ΔGkineAC | ΔGPMFAC |
---|---|---|---|
722.4 ± 136.6 ns | 6.9 ± 0.9 ns | 2.8 ± 0.2 | 3.5 ± 0.2 |
τ A→B | τ B→A | ΔGkineAB | ΔGPMFAB |
---|---|---|---|
27.5 ± 5.1 ns | 1.4 ± 0.5 ns | 1.8 ± 0.3 | 1.9 ± 0.1 |
τ B→C | τ C→B | ΔGkineBC | ΔGPMFBC |
---|---|---|---|
23.3 ± 4.7 ns | 3.0 ± 0.6 ns | 1.2 ± 0.2 | 1.6 ± 0.1 |
τ B→D | τ D→B | ΔGkineBD | ΔGPMFBD |
---|---|---|---|
709.7 ± 130.0 ns | 6.1 ± 1.6 ns | 2.9 ± 0.2 | 3.7 ± 0.1 |
τ C→D | τ D→C | ΔGkineCD | ΔGPMFCD |
---|---|---|---|
8.1 ± 1.4 ns | 0.5 ± 0.1 ns | 1.7 ± 0.2 | 2.0 ± 0.1 |
The twelve transition times were subsequently used to estimate the six kinetic ΔGkine values (Table 2). As in the case of the dipeptide, we find here also that the values are overall in very good agreement with ΔGPMF estimated from the PMF (Fig. 3). The overall calculation of kinetic ΔG values achieved a mean absolute error (MAE) of 0.6 kcal mol−1 and a Pearson correlation coefficient of 0.99 with respect to ΔGPMF from the converged free energy landscape.
Fig. 3 Comparison between kinetic ΔGkine and thermodynamic ΔGPMF values in the state-to-state transitions of the five-residue peptide. |
We note that although we find a strong, linear correlation between ΔGPMF and ΔGkine, there appears to be a systematic deviation from a unity slope. This cannot easily be explained by lack of convergence of the free energy landscape (Fig. S3†) nor by how we estimate the rates (Table S1†). Instead we note that the deviations are only observed for the largest free energy differences which also correspond to state-to-state transitions that cross one or two metastable states. In such cases, the processes will in general be multi-exponential and the dominant barrier might differ for the forward and backward reaction. Encouragingly, however, our results show that ΔGkine still provides a rather accurate estimate of the free energy difference with an acceptable overall MAE, in the current case involving two metastable states. Overall, our results suggest that in practice it is possible to estimate the ΔG even when the free energy landscape and the intermediates are not fully known.
τ A→D | p-Value | Cost | |
---|---|---|---|
Unbiased MD | 10.7 ± 2.0 μs | 300 μs | |
InMetaD | 16.4 ± 4.5 μs | 0.41 ± 0.26 | 3 μs |
τ A→C (μs) | p-Value | Cost | |
---|---|---|---|
Unbiased MD | 722 ± 137 ns | 0.34 ± 0.24 | 26 μs |
InMetaD | 553 ± 114 ns | 0.50 ± 0.28 | 1 μs |
τ B→D (μs) | p-Value | Cost | |
---|---|---|---|
Unbiased MD | 709 ± 126 ns | 0.48 ± 0.25 | 25 μs |
InMetaD | 765 ± 200 ns | 0.34 ± 0.23 | 1 μs |
Despite intensive research, several questions pertaining to the mechanism of ligand binding however remain open, in particular with the conformational dynamics underlying access to the internal site remaining unclear. Previous NMR experiments have shown that L99A T4L exchanges between a ground state and an alternative, higher-energy conformational state, but further structural studies showed that both of states are sterically inaccessible to incoming ligands.41,45 This poses a long-standing question of how the ligands access the internal cavity buried in the protein core.47–49 By using metadynamics simulations to explore the native state free energy landscape of L99A T4L in the absence of ligands, we recently discovered transiently formed tunnels that connect the interior binding site to the solvent,30 and suggested these to be relevant for ligand binding.
Here we make a step forward by using enhanced sampling simulations to observe multiple events during which the ligand (benzene) either enters into or escapes from the buried cavity in L99A T4L. This not only provides us with valuable information on the ligand binding mechanism, but also allows us to determine ligand association and dissociation rates as well as binding thermodynamics. Here we focus on the kinetics and thermodynamics and analyse and describe the mechanistic aspects in future studies.
We used a total of ∼12 μs infrequent metadynamics simulations to collect 20 ligand association and 20 dissociation events. At the ligand concentration (5 mM) used in the simulation, we find τon and τoff to be 9 ± 5 ms and 168 ± 59 ms, respectively (Fig. S4†). From these values we in turn determined on- and off-rates and compared these to experiments (Table 4). Also in this case, the KS test shows compatibility of the data with a Poisson process, though the relatively lower reliability for τon estimation suggests that binding events are more difficult to sample than the unbinding events.
Simulation | Experiment | |
---|---|---|
a p-Value of τon is 0.10 ± 0.12. b p-Value of τoff is 0.38 ± 0.26. c Dissociation constant of 0.8 ± 0.12 mM is from ref. 39. d Dissociation constant of 0.2 ± 0.04 mM is from ref. 50. | ||
k on | 3.5 ± 2 × 104 M−1 s−1 | 8 × 105 M−1 s−1 |
k off | 7 ± 2 s−1 | 800 ± 200 s−1 |
K d | 0.3 ± 0.1 mM | 0.8 ± 0.12c/0.2 ± 0.04d mM |
ΔGbinding | −5.0 ± 0.6 kcal mol−1 | −4.2 ± 0.1c/−5.2 ± 0.2d kcal mol−1 |
From the calculated rates we also calculated the binding affinity, which we find to be in good agreement with the two experimental estimates from either calorimetric analysis50 or NMR39 (Table 4). To obtain an independent computational estimate of the binding affinity, we also performed an alchemical free energy calculation on the L99A T4L–benzene complex using the same force field. The result obtained (ΔGFEPbinding = −4.9 ± 0.1 kcal mol−1) is within error the same as that obtained from the ligand kinetics, demonstrating consistence between these two completely different approaches. Finally, we note that even in a relatively long metadynamics simulation we were not able to obtain a converged equilibrium free energy landscape, and so could not estimate ΔGPMFbinding in this case. The fact that we can estimate the free energy difference (from the kinetics) even in the case where equilibrium sampling is not possible, suggests that the approach could be useful in cases where other free energy methods are more difficult to apply such as when charged ligand alchemical modifications are needed or exact binding poses are not well known.
Although we obtained very good agreement between experiment and simulation for the ligand-binding thermodynamics, we note that both the on- and off-rates are one-to-two orders of magnitude too slow. Because of the internal consistency between the calculated rates and thermodynamics we suspect that the error is due to remaining force field discrepancies that manifest themselves in the rates more than in thermodynamics. Indeed, when parameterizing force fields one often focuses accuracy on the populated (free) energy minima.51 Interestingly we also note a similar discrepancy between experimental rates of the conformational exchange of L99A T4L30, and suggest that future research should focus on potential systematic biases in kinetic properties.
In other cases (such as conformational exchange in proteins30), it may be difficult to estimate the free energy difference by these methods, and as we demonstrate here the calculation from kinetics is a viable alternative. Further, our finding of internal consistency between ligand affinity and rates in T4 lysozyme lends further support to the precision of the rates calculated by the enhanced sampling method used. This suggests also that these approaches can be used more generally to estimate the kinetics of ligand binding and escape. Further, we suggest that when calculating the kinetics of ligand binding and unbinding, it is useful to validate the results by comparing the kinetic free energy difference with that obtained e.g. from free energy perturbation.
α = τ/τM = 〈eV(s,t)/kT〉M |
As the basin-to-basin transition is a rare event, its characteristic time is expected to be a Poisson-distributed random variable. In principle, its mean μ, standard deviation σ and median tm/ln2 should be equal. In practice, however, they are somewhat sensitive to insufficient sampling, and so rather than simply calculating averages, we estimated the transition time τ from a fit of the empirical cumulative distribution function to the theoretical cumulative distribution function (TCDF):38
A bootstrap approach was used to estimate the errors. To examine whether the observed times indeed follow the expected Poisson distribution we used a KS test to obtain a p-value that quantifies the similarity between the empirical and theoretical distributions. The key parameters for infrequent metadynamics simulations, including the bias deposition pace and the initial bias height, are listed in Table 5.
In the unbiased simulations of Ace-Ala3-Nme we used a maximum likelihood method37 to estimate of τAD because not all simulations reached the D state:
For Ala2 and Ace-Ala3-Nme, the simulations were performed with Amber ff99SB force field57 and an implicit solvent model.58 The temperature was kept constant at 300 K using the v-rescale thermostat.59 Bonds involving hydrogens were constrained using the LINCS algorithm. A time-step of 2 fs was used.
For L99A T4L–benzene system, the simulations were performed with the CHARMM22* force field60 in the NPT ensemble using a periodic cubic box with a side length of 7 nm that includes ∼10000 water molecules. The temperature and pressure were kept constant at 298 K using the v-rescale thermostat59 with a 1 ps coupling constant and at 1.0 bar using the Parrinello–Rahman barostat61 with a 2 ps time coupling constant, respectively. We employed the virtual sites for hydrogen atoms with a single LINCS iteration (expansion order 6)62 and constrained the bonds involving hydrogen atoms using the LINCS algorithm, allowing simulations to be performed with an integration time step of 4 fs. The long-range electrostatic interactions were calculated by the means of the particle mesh Ewald decomposition algorithm with a 0.16 nm mesh spacing.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc01627a |
This journal is © The Royal Society of Chemistry 2017 |