Joe G.
Greener
Medical Research Council Laboratory of Molecular Biology, Cambridge CB2 0QH, UK. E-mail: jgreener@mrc-lmb.cam.ac.uk
First published on 21st February 2024
Implicit solvent force fields are computationally efficient but can be unsuitable for running molecular dynamics on disordered proteins. Here I improve the a99SB-disp force field and the GBNeck2 implicit solvent model to better describe disordered proteins. Differentiable molecular simulations with 5 ns trajectories are used to jointly optimise 108 parameters to better match explicit solvent trajectories. Simulations with the improved force field better reproduce the radius of gyration and secondary structure content seen in experiments, whilst showing slightly degraded performance on folded proteins and protein complexes. The force field, called GB99dms, reproduces the results of a small molecule binding study and improves agreement with experiment for the aggregation of amyloid peptides. GB99dms, which can be used in OpenMM, is available at https://github.com/greener-group/GB99dms. This work is the first to show that gradients can be obtained directly from nanosecond-length differentiable simulations of biomolecules and highlights the effectiveness of this approach to training whole force fields to match desired properties.
Major inaccuracies of implicit solvent models include the tendency to overcompact disordered proteins into rigid, often α-helical structures, the tendency to cause any pair of proteins to bind strongly, and the poor secondary structure match to experiments for peptides.12–15 There has been considerable recent effort to alleviate similar, and less severe, problems for explicit solvent force fields,16–18 resulting in a better match to experimental data for both folded and disordered proteins.19–22 This forms part of a wider attempt to use data to improve force fields.23–27 However there has been less attention on improving implicit solvent models in this area,28 with notable exceptions being the ABSINTH model29 and coarse-grained approaches that combine multiple atoms into one site.30,31 This is a missed opportunity: despite their fundamental limitations, implicit solvent models need not perform as poorly as they do on disordered systems. In fact they are especially well-suited to studying these systems, as they are not slowed by the large solvent boxes required for explicit solvent simulations and can be used to probe slow events such as protein aggregation32,33 with the increased conformational sampling resulting from low viscosity. Whereas explicit solvent force fields have been improved in tandem with associated water models,34–37 this is rarely the case for implicit solvent models,12,38 suggesting that adjusting both at the same time could lead to improvements. Whilst it is possible and even desirable to train force fields using only quantum mechanical data,39–43 matching to structural properties as well can give useful models now, and in future can estimate parts of the Hamiltonian that are difficult to obtain from quantum mechanics such as dispersion in complex media.
In this study I modify the parameters of an existing force field and implicit solvent model to better describe disordered proteins and protein aggregation, whilst retaining acceptable performance on folded proteins. The technique of differentiable molecular simulation (DMS) is used to modify many force field parameters at the same time. This emerging technique allows structural and dynamic properties to be targeted to parameterise whole force fields using automatic differentiation (AD).44 This is part of the paradigm of differentiable programming45 in which AD, more commonly associated with training neural networks, is used to obtain gradients from arbitrary algorithms. DMS has previously been used to train a coarse-grained force field for proteins from scratch,46 to learn pairwise potentials,47,48 for enhanced sampling,49 to predict protein structure50 and to explore statistical physics models.51 Dedicated software packages such as Jax MD,52 TorchMD53 and DMFF54 have been developed specifically for differentiable simulations. The Molly.jl software developed as part of this work provides a flexible and fast option for DMS and for MD more broadly. The combined force field and implicit solvent model presented here, called GB99dms, is available to the community and is easy to run from OpenMM. Here it is shown for the first time that DMS can be used to train force fields over nanosecond-length simulations.
I choose to start from the a99SB-disp force field21 since it has been developed for both folded and disordered proteins, and the GBNeck2 implicit solvent model63 since it shows good performance on folded proteins.11 These have been developed over many years using both quantum mechanical and experimental data, and here we seek to improve the combination of the models for disordered proteins. GBNeck2 is a Generalized Born approach that approximates the exact Poisson–Boltzmann equation describing the electrostatic environment of a solute in a solvent. Generalized Born methods model the solute as a set of spheres with a different dielectric constant to the external solvent. The “neck” correction improves the prediction of the molecular surface, which is used when determining the Born radii.64 The a99SB-disp modified backbone O–H interaction term is not used, as it was later found to not impact the ability to fit quantum mechanical data.22 Since some atom types in the force field appear rarely in the training data, parameters are modified for 16 common atom types: CA, CT, C, C8, C9, N, N3, O, O2, OH, H, H1, HA, HC, HO and HP. This gives 108 parameters to change: partial charge scaling, Lennard-Jones (LJ) σ and LJ ε for the 16 atom types (46 non-zero parameters); torsion k values for 13 common proper torsions (33); LJ and Coulomb 1–4 interaction scalings (2); GBNeck2 atom radii (6); GBNeck2 atom parameters (12); GBNeck2 screening parameters (4); and the GBNeck2 parameters neck cutoff, neck scale, offset, probe radius and surface area factor (5). To maintain the overall charge of residues the partial charges are not changed directly, instead a charge scaling is learned – see the methods. Torsion phases, which in a99SB-disp are all symmetrical around 0° apart from backbone φ and ψ, are not modified as this has questionable physical validity.
8 small proteins ranging in size from 19 to 39 residues, 4 folded and 4 disordered, are used for training. These are shown in Fig. 1B. The folded proteins are Trp-cage (PDB ID 2JOF), BBA (1FME), GTT (2F21) and NTL9 (2HBA) and contain both α-helices and β-sheets. The disordered proteins are Htt-1-19, a slightly helical intrinsically disordered protein (IDP) derived from huntingtin's N-terminus; histatin-5; and the N-terminal and C-terminal halves of ACTR, split up to avoid excessive computational demands during training. At each epoch of training, each protein is simulated using a Langevin integrator with the current force field parameters for 5 ns (5 million steps). Residue–residue distances are periodically recorded and used at the end of the simulation to calculate the mean and standard deviation for each residue–residue distance over the simulation. Since explicit solvent simulations with the a99SB-disp force field and its corresponding water model seem to accurately describe the properties of both folded and disordered proteins,21 I train to match the residue–residue distances in 2 μs simulations of the training proteins with this force field. Previous approaches to developing implicit solvent force fields have also matched to explicit solvent data.65
Fig. 1 Differentiable molecular simulation to improve an implicit solvent force field. (A) For each protein a simulation is run with Langevin dynamics and the mean (μ) and standard deviation (σ) of the Cα residue–residue distances are calculated. These are compared using the KL divergence to corresponding values from reference explicit solvent simulations to obtain a loss value. AD is then used to obtain gradients of the loss with respect to the force field parameters, which can be used to improve the force field. (B) The 8 proteins used for training, consisting of 4 folded proteins (top) and 4 disordered proteins (bottom). (C) The 7 parameters that change by at least 4.5% in absolute value from the starting values in a99SB-disp + GBNeck2. (D) Summary of numerical results from the paper. Rg error is the sum of squared total residuals between the experimental and simulation radius of gyration for each protein as shown in Fig. 2A, lower is better. SS error is the α-helical fraction error as shown in Fig. 3B, lower is better. Binding corr is the correlation between simulation contact probabilities and NMR chemical shift perturbation data for α-synuclein and fasudil as described in the results, higher is better. |
The Kullback–Leibler (KL) divergence is calculated in both directions between the training and reference simulations, and averaged over residues. This gives a loss value; since the simulation is implemented in AD-compatible software, the gradients of the loss with respect to the force field parameters can be calculated. These are combined across the training proteins and used to change the force field parameters, before the next epoch is started. Simulations of the folded proteins start from the PDB coordinates and a single repeat is run. For the disordered proteins two repeats are run, starting from different snapshots of the reference explicit solvent simulations. Each 5 ns simulation takes 15–24 hours on a GeForce RTX 2080 Ti GPU depending on the size of the protein. The parameters after 5 epochs of training are used, meaning training takes 5 days on 12 GPUs (the repeats for disordered proteins give 12 simulations per epoch). This overall training process is shown in Fig. 1A.
The parameters of the trained force field, named GB99dms, are similar to the starting parameters, with only 19 out of 108 parameters changing by more than 3% in absolute value. Staying close to the starting values was a deliberate attempt to find a point in the high-dimensional parameter space that works across a variety of protein systems but remains similar to the well-studied parameters available currently.66 The direction of parameter changes are generally consistent across epochs of training, as shown in Figure S1. The parameters that change the most are shown in Fig. 1C and all parameters are listed in Table S1. The Born radius parameters for carbonyl O and backbone amide H decrease, indicating less screening from solvent, though the Born radius parameters for C and H increase. The LJ σ parameters for N and C increase, indicating a larger interaction distance for these atoms. The LJ ε parameters increase for N and C but decrease for HC and CT, indicating a shift in strength of LJ interactions between atom types. Small changes are made to torsion parameters; as expected due to the required changes in secondary structure preferences, backbone φ and ψ change the most. Partial charges do not change much, except C which becomes more positive (0.597 to 0.625 for backbone carbonyl C in alanine). The Coulomb and LJ 1–4 interaction weightings for atoms separated by 3 bonds both increase, indicating less shielding from non-bonded forces for nearby atoms. Whilst the bonded/non-bonded parameters and implicit solvent parameters of this force field could be used separately in future, this was not tested and it is recommended that they are used together as they were trained.
GB99dms was compared to existing force fields on a set of proteins not homologous to those used for training. I compare to the combination of Amber ff14SBonlysc11,68 and GBNeck2 (ref. 63) used successfully to fold proteins,11 the combination of a99SB-disp21 and GBNeck2 used at the start of training in this work, and available explicit solvent a99SB-disp data.21 First, performance is assessed on 8 IDPs ranging in size from 40 to 140 residues. This is the set used in Robustelli et al. 2018,21 excluding ACTR which is used here for training. For each protein 3 simulations of 2 μs were run for each force field, with an initial burn-in period of 0.5 μs starting from an extended conformation being discarded. All validation simulations were run with a Langevin collision frequency γ of 1 ps−1 to increase conformational sampling.11 As shown in Fig. 2 and Fig. 1D the radius of gyration (Rg) using GB99dms matches experimental data21 and observed scaling laws67 considerably better than the existing implicit solvent force fields across a range of protein sizes. Explicit solvent a99SB-disp still matches better, indicating that the accuracy of the reference data used during training is not the factor limiting improvement. Fig. 3 and 2C indicate that this is partly due to existing force fields giving structures that are too α-helical and inflexible, and Fig. 2C shows that the experimental Rg may be sampled with GB99dms even when the mean simulation Rg is different. The overall α-helical content of IDPs also matches experiment better with GB99dms than with existing implicit solvent force fields, as shown in Fig. 3B and C. The error is 0.022 for GB99dms compared to 0.050 for ff14SBonlysc and 0.155 for a99SB-disp. There is still some discrepancy between GB99dms and experiment, for example in the location of α-helices in Ntail and PaaA2, in line with the variety of secondary structure propensities shown previously by explicit solvent force fields for these IDPs.21
Fig. 2 Radius of gyration of IDPs with different force fields. Experimental values for Rg and simulation Rg values for explicit solvent a99SB-disp with its corresponding water model are taken from Robustelli et al. 2018.21. (A) Comparison of simulation and experimental Rg. Each point (excepting a99SB-disp explicit) is the mean Rg over 3 simulations of 2 μs, with an initial burn-in period of 0.5 μs starting from an extended conformation being discarded for each simulation. The error bars in the x direction represent uncertainty in the experimental value and error bars in the y direction represent 95% confidence intervals of the mean calculated from the standard error of the mean across the 3 simulations. The dotted line represents equal experimental and simulation values. Summary data is shown in Fig. 1D. (B) Comparison of simulation Rg to sequence length for the same data. The dotted grey line represents the experimentally observed power law relationship RG = R0Nv with v = 0.598 ± 0.028 and R0 = 1.927 Å.67 The shaded grey area represents the 95% confidence interval of v. This power law relationship is for chemically denatured proteins and was mainly fit on proteins longer than 50 residues. The best fit v values from the simulation data for GB99dms, ff14SBonlysc+GBNeck2, a99SB-disp+GBNeck2 and a99SB-disp explicit are 0.793, 0.380, 0.327 and 0.656 respectively. (C) Distributions of simulation Rg for the same data. |
Fig. 3 Secondary structure content of IDPs with different force fields. Experimental values for α-helical fraction and simulation values for explicit solvent a99SB-disp with its corresponding water model are taken from Robustelli et al. 2018.21. (A) The α-helical fraction for each residue of 4 IDPs with different force fields. The simulations are the same as in Fig. 2. (B) A table showing the mean α-helical fraction across residues for each protein and force field. An overall error value is also calculated per force field as the sum of squared total residuals between the experimental and simulation mean α-helical fractions for each protein. (C) A plot of the simulation mean α-helical fractions against experimental values. The dotted line represents equal experimental and simulation values. |
Having established improved performance on IDPs, it is important to test whether performance has degraded for folded proteins. 3 Simulations of 2 μs for each force field were run on the 4 folded proteins in Robustelli et al. 2018,21 ranging in size from 56 to 129 residues: GB3 (PDB ID 1P7E), ubiquitin (1D3Z), hen egg white lysozyme (HEWL, 6LYZ) and bovine pancreatic trypsin inhibitor (BPTI, 5PTI). Fig. 4A shows the root-mean-square deviation (RMSD) to the native structure across the trajectory for each simulation. GB3 and ubiquitin show similar stability over the simulations with all 3 force fields. HEWL and BPTI show less stability with GB99dms. For HEWL 3 of the α-helices and the β-sheet lose their secondary structure, though the overall tertiary structure remains the same. For BPTI the β-sheet remains intact but the α-helices lose their secondary structure and the loops show significant flexibility. The difficulty of balancing secondary structure preferences in implicit solvent force fields has been established previously.13
Performance is also assessed on 4 medium-sized protein dimers from Piana et al. 202022 ranging in size from 197 to 235 total residues: barnase/barstar (PDB ID 1X1X), colE7/Im7 (7CEI), SGPB/OMTKY3 (3SGB) and CD2/CD58 (1QA9). The dimers were simulated with no periodic boundaries since the intention was to explore initial unbinding events, which should not occur on the time scales used here. There has been less work on developing and assessing implicit solvent force fields for protein complexes, but here it is found that the 4 dimers are generally stable under the two existing force fields. Fig. 4B indicates that with GB99dms barnase/barstar and colE7/Im7 seem stable, though SGPB/OMTKY3 and CD2/CD58 dissociate quickly. The dissociating complexes have less favourable experimental association free energy than those that remain bound.22 With a collision frequency γ of 91 ps−1 more representative of the viscous drag of water,69 SGPB/OMTKY3 remains bound at 5 Å after 2 μs RMSD but CD2/CD58 still dissociates. The difficulty of improving performance on IDPs whilst not allowing protein complexes to dissociate has also been encountered with explicit solvent force fields.21,22 Overall GB99dms has slightly degraded performance on folded proteins, though could still be useful for studying systems containing a mix of folded and disordered proteins if the stability and structural properties of the folded proteins with GB99dms are verified initially.
MD simulations can be used to study the interactions of small molecules with IDPs, assisting in the difficult process of finding drugs to target the many IDPs implicated in disease. For example, a recent study70 used a99SB-disp to carry out 1.5 ms of explicit solvent simulation of α-synuclein with the small molecule fasudil. α-Synuclein is associated with Parkinson's disease and fasudil has been shown to delay α-synuclein aggregation.71 The contact probability over the simulation of each residue with fasudil was shown to correlate with NMR chemical shift perturbation (CSP) data, allowing an interpretation to be made of how the drug affects the protein. CSPs are sensitive to changes in the local environment of each backbone amide bond and a higher CSP indicates protein–ligand interaction near that residue. I carried out similar simulations, using the faster sampling of implicit solvent to compare 5 simulations of 2 μs with the NMR data and significantly longer explicit solvent simulations from Robustelli et al. 2022.70Fig. 5A shows that GB99dms reproduces a similar profile to the NMR data. There is elevated contact probability in the residue 121–140 C-terminal region, peaks around residues 5–9/39–44/59–72/121–127/136–138, and Y136 has the highest contact probability. In contrast, ff14SBonlysc shows blocks of interacting residues consistent with fasudil interacting with the surface of a compact, inflexible structure and a99SB-disp shows consistent and lower interaction throughout the protein. The Pearson correlation coefficients between the contact probabilities and the NMR chemical shift perturbation data are 0.44, 0.11 and 0.08 for GB99dms, ff14SBonlysc and a99SB-disp respectively, compared to 0.67 for the explicit solvent simulations.70 This indicates that GB99dms could be useful for assessing small molecule binding to IDPs during drug discovery. Though it is unusual to use periodic boundary conditions with implicit solvent – one advantage of implicit solvent is not having to deal with boundaries – it is possible since the solvent behaviour is unrelated to the boundary.
Fig. 5 Ligand binding and amyloid aggregation with implicit solvent force fields. Grey and black spheres on structures represent the N- and C-termini respectively. (A) The interaction of α-synuclein and the small molecule fasudil. The NMR CSP data from Robustelli et al. 202270 is shown. The fraction of the time each residue is in contact with fasudil over 5 × 2 μs simulations is shown for each force field, along with a snapshot where fasudil is in contact with α-synuclein. The error bars for each residue represent 95% confidence intervals of the mean calculated from the standard error of the mean across the 5 simulations. The Pearson correlation coefficient between the contact probabilities and the NMR CSP data is also shown. (B) Oligomerisation of 6 × Aβ16–22. Three capped peptides are studied: KLVFFAE (wild-type), F19L (faster aggregation) and F19V/F20V (no aggregation). The oligomer size over 3 × 2 μs simulations is shown for each sequence and force field. Repeats are shown separately, with the plots truncated at 1.2 μs as there is little change beyond this time. Snapshots are recorded every 0.5 ns and the mean oligomer size of a window extending 10 snapshots either side of the given snapshot is shown. The final oligomer from the first wild-type repeat is also shown. |
Finally, I investigate the behaviour of aggregating peptides with GB99dms. It has previously been shown that the oligomerisation behaviour under simulation of the 7-residue aggregating core of amyloid beta (Aβ), Aβ16–22, depends on the force field and does not always reproduce the effect of mutations.72,73 Here I study the oligomerisation of 6 capped Aβ16–22 peptides in a periodic box. I simulate 3 × 2 μs with each force field for 3 peptide sequences: the wild-type KLVFFAE, the F19L mutant which aggregates faster than wild-type, and the F19V/F20V double mutant which does not aggregate.73 All force fields show oligomerisation of all 6 peptides within 2 μs. As shown in Fig. 5B, GB99dms reproduces best the observed behaviour of the sequences: on average F19L forms oligomers fastest and F19V/F20V slowest. The wild-type forms oligomers faster than F19L for ff14SBonlysc and F19V/F20V forms oligomers at a similar speed to wild-type for ff14SBonlysc and a99SB-disp. The final oligomers adopt extended monomer conformations for GB99dms reminiscent of Aβ fibrils, whereas for ff14SBonlysc and particularly a99SB-disp the conformations are more α-helical. These results show that GB99dms could be used to study amyloid aggregation at scale. One advantage of implicit solvent is that the periodic box size, and hence the effective concentration of Aβ, can be changed without adding more atoms. Currently, high concentration is a limitation of many MD studies of aggregation.32
The simulations of 5 million steps carried out in this work are the longest differentiable molecular simulations to date and indicate that even longer simulations are possible. The next step is to train an all-atom explicit solvent force field using DMS to match experimental data such as NMR constraints alongside existing training approaches to match quantum mechanical data. This will require improvements in computational speed, as well as differentiable implementations of algorithms such as bond constraints and Ewald summation that could run into the limitations of AD.77 The flexibility of the approach allows it to be combined with other recent advances such as continuous atom typing with graph neural networks78 and exploration of different functional forms for non-bonded interactions.79 One promising approach, Time Machine (https://github.com/proteneer/timemachine), aims to use DMS for drug discovery.
A question surrounding DMS is whether accurate gradients can be obtained through long MD simulations. Gradients could explode or vanish, and there are also concerns about error propagation over long, chaotic simulations.49,50,80,81 Here, I find that the Langevin integrator is effective at propagating gradients using reverse-mode AD. By contrast, simulations in the NVE ensemble were not found to produce stable gradients. The friction and stochastic noise applied to every atom at every step in Langevin dynamics likely provides a regularisation effect that helps prevent gradient explosion but does not lead to vanishing gradients.82 This stochasticity, along with the random starting velocities, means that the gradients are a sample over a distribution. When repeating runs, around 80% of the paired parameter gradients have the same sign and the Pearson correlation coefficient of paired parameter gradients is over 0.85. This is shown in Table S2. Good correlation is also found when comparing simulations run with a 1 fs and a 0.5 fs time step and when adding noise to the starting parameters. Adjoint sensitivity methods provide another way to obtain gradients through simulations,48,83,84 but are often unstable and have had less development than reverse-mode AD. Here I find that the “simple” approach of using AD on the integrator works well. It may be possible to find speedups due to the iterative and reversible nature of molecular simulation.83
Another question is whether the computational overhead of calculating the gradients via AD makes it worthwhile compared to using a black box approach such as finite differencing. The Julia code used here is currently 100–1000× slower on the GPU when gradients are required than heavily optimised non-differentiable codes such as OpenMM85 and Gromacs.86 However the gradients for all parameters are calculated to numerical accuracy in one go, whereas a number of runs would be required per gradient when using finite differencing. It seems that the current case of optimising 108 parameters is around the crossover point, and optimising any fewer parameters would have been easier with finite differencing. As DMS code becomes faster and more parameters are included for training, for example more atom types or torsion CMAP potentials, the advantage of DMS will become clearer. One direction of future work is to improve the performance of the Julia code significantly using more advanced GPU kernels57 and further use of the Enzyme AD framework.60,61 It remains an open question how close in performance a differentiable implementation can get to a non-differentiable one. Molly.jl will be useful for exploring these questions, and for training the next generation of force fields that are transferable, reproducible and fast.
Every 5 ps (5000 steps), the Cα residue–residue distances X and square distances X2 are recorded. At the end of the simulation, the mean μs = E[X] and standard deviation of the Cα residue–residue distances are calculated. For each residue pair, the KL divergence DPQ to the reference explicit solvent distances (see below) is calculated as
Lij = ln(DPQ + 1) + ln(DQP + 1) |
In order to reduce the impact of large losses from residue pairs close in sequence with sharp residue–residue distance distributions, the loss of close residue pairs is downweighted. This meant multiplying Lij by a weighting factor which is 0 for residue separation |i − j| = 0, 1 for |i − j| ≥ 10 and linearly spaced between. The overall loss is then calculated as the mean of these weighted values over all residue pairs. AD is used to calculate the gradient of this loss with respect to each of the 108 parameters.
At each epoch of training, gradients are combined from simulations of the 8 training proteins to update the force field. For the IDPs, gradients are averaged over the two repeats. For each protein the gradients are then divided by the median of the absolute values of the gradients, meaning that all proteins contribute similarly to the parameter updates each epoch even if the gradient sizes differ. The median was chosen to avoid outliers having too much influence on the value. The gradients for LJ σ parameters and hydrogen parameters were found to be large compared to other parameters and large changes in these parameters can quickly lead to instabilities, so the gradients were weighted by a factor of 0.02. The absolute change for each parameter per protein per epoch was limited to 0.5% and the combined absolute change for each parameter per epoch was limited to 3%. Parameters were updated by gradient descent using a learning rate of 4 × 10−4. Training was repeated 3 times and the run with the best performance on the training set was used.
Instead of modifying a partial charge value for each atom type directly, which is complicated by the need to maintain overall charge and by the different partial charges of the same atom type in different residue types, a charge scaling value is learned instead. This starts at one for each atom type. After it is updated during training, partial charges are computed for each atom in a residue type by scaling the starting partial charge by the scaling value and subtracting an offset. This offset is the difference between the starting and scaled overall charge of the residue multiplied by the fraction of the sum of absolute charges present on the atom after scaling. In effect the change in a partial charge of an atom is compensated for by the partial charges of the other atoms in the residue. This allows one charge scaling value to be learned per atom type but the overall charge of each residue type to remain constant during training.
Explicit solvent trajectories used to get reference residue–residue distances for training were generated with Gromacs v2021.4.86 For folded proteins the starting structure was the PDB structure and the box size was chosen to give 1 nm padding between the protein and the edge. For disordered proteins the starting structure was the collapsed conformation at the end of a short implicit solvent simulation with a99SB-disp+GBNeck2 starting from an extended conformation. The box size was 6 nm for Htt-1-19 and histatin-5 and 7 nm for the two halves of ACTR. These simulations use the a99SB-disp force field and its corresponding water model,21 a Verlet leap frog integrator, a 2 fs time step, constrained bonds to hydrogen, a temperature of 300 K, 50 mM NaCl salt, a 1.2 nm cutoff for non-bonded interactions and particle mesh Ewald treatment of long range electrostatics. Energy minimisation, a 100 ps NVT equilibration and a 100 ps NPT equilibration with position restraints to protein heavy atoms preceded a 2 μs production run in the NPT ensemble, with snapshots saved every 50 ps. Mean and standard deviation residue–residue distances were calculated for the last 1 μs of simulation.
Once the force field parameters have been improved via training, simulations can be run with any MD package that supports the GBNeck2 implicit solvent model. Here OpenMM v8.0.0 (ref. 85) is used to run the validation simulations as it has high GPU performance and modifying force field parameters is easy. All validation simulations used a Langevin integrator with a γ of 1 ps−1, a 4 fs time step, constrained bonds to hydrogen, hydrogen mass repartitioning with a factor of 2, a temperature of 300 K, a 2 nm cutoff for non-bonded interactions and a κ value of 0.7 nm−1. Energy minimisation and a 500 ps temperature equilibration with position restraints to heavy atoms preceded production runs, with snapshots saved every 500 ps. Dimers were simulated with no periodic boundaries. Trajectory data was analysed with MDAnalysis89 and secondary structure was calculated with MDTraj.90 BioStructures.jl was also used for processing protein structural data.91
For the simulations of α-synuclein with fasudil, GAFF92 was used to obtain the force field parameters for fasudil. For each force field, 5 repeats were run starting from different snapshots from α-synuclein monomer simulations. Packmol93 was used to pack one molecule each of α-synuclein and fasudil in a periodic cubic box with 25 nm sides. A contact is assigned to MD frames where the minimum distance between any fasudil atom and any heavy atom of a residue side chain (CA for glycine) is less than 6 Å.70 For the Aβ16–22 simulations a periodic cubic box with 21.5 nm sides and 6 peptides were used, corresponding to a concentration of 1 mM. The N- and C-termini of the peptides were capped with acetyl (ACE) and N-methlyamide (NME) groups respectively to match experimental conditions. Starting conformations for each peptide were taken from snapshots of a short monomer simulation. For each force field and sequence, 3 repeats were run starting from different packings of 6 peptides generated with Packmol. Oligomer size was determined based on groups of contacting peptides, where any pair of atoms being within 4 Å indicates a contact between peptides.73
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3sc05230c |
This journal is © The Royal Society of Chemistry 2024 |