Sandi
Brudar
a,
Jure
Gujt
bc,
Eckhard
Spohr
b and
Barbara
Hribar-Lee
*a
aUniversity of Ljubljana, Faculty of Chemistry and Chemical Technology, Večna pot 113, SI-1000 Ljubljana, Slovenia. E-mail: barbara.hribar@fkkt.uni-lj.si; Fax: +386 1 479 8500
bUniversity Duisburg-Essen, Faculty of Chemistry, Campus Essen, Universitätsstr. 5, D-45141 Essen, Germany
cUniversity of Paderborn, Dynamics of Condensed Matter and Center for Sustainable Systems Design, Chair of Theoretical Chemistry, Warburger Str. 100, D-33098 Paderborn, Germany
First published on 27th November 2020
Proteins are the most abundant biomacromolecules in living cells, where they perform vital roles in virtually every biological process. To maintain their function, proteins need to remain in a stable (native) state. Inter- and intramolecular interactions in aqueous protein solutions govern the fate of proteins, as they can provoke their unfolding or association into aggregates. The initial steps of protein aggregation are difficult to capture experimentally, therefore we used molecular dynamics simulations in this study. We investigated the initial phase of aggregation of two different lysozymes, hen egg-white (HEWL) and T4 WT* lysozyme and also human lens γ-D crystallin by using atomistic simulations. We monitored the phase stability of their aqueous solutions by calculating time-dependent density fluctuations. We found that all proteins remained in their compact form despite aggregation. With an extensive analysis of intermolecular residue–residue interactions we discovered that arginine is of paramount importance in the initial stage of aggregation of HEWL and γ-D crystallin, meanwhile lysine was found to be the most involved amino acid in forming initial contacts between T4 WT* molecules.
Computer simulations represent an alternative way to study the mechanism of protein–protein aggregation in molecular systems. Atomistic molecular dynamics (MD) simulations are one of the most frequently used computational methods to study the behaviour of proteins in solution.9 Unfortunately, in spite of the rapidly increasing computer power, the computational cost of the simulations poses limits on the number of atoms in the system as well as on the overall simulation time of such studies. In practice, most atomistic MD simulations have focused on single protein molecules or specific complexes, without a consideration of the crowded environment10 characteristic for in vivo environments. And while coarse-grained models have been developed as a step towards studying the aggregation process in more complex environments,11,12 these models, or force fields, such as, e.g., the widely used MARTINI force field13 pose problems of their own in that they often produce erroneous results, while at the same time requiring all-atom simulations in order to be parametrized.14
During the last couple of decades, evidence has been accrued supporting a view that the so-called condensation diseases are a manifestation of LLPS that occurs in vivo in cells as well as in vitro protein solutions, as a consequence of protein–protein interactions.2,15 While detailed studies of the mechanism of protein aggregation process in the cellular environment are beyond the reach for today's computers,11,12 it is reasonable to believe that identifying the interactions responsible for protein condensation would provide valuable guidelines to control these processes.
In the present paper, we use atomistic MD simulations to study the aggregation for a few sufficiently small globular proteins, for which current force fields are optimized,12 and for which the conditions of LLPS are known experimentally.16–18 Through a detailed analysis of the time evolution of the solution structure we identified those amino–acid residues responsible for initial binding contacts between proteins, and we propose a mechanism of protein aggregation for these examples.
Our simulation protocol was as follows: first, the database structures were minimized with respect to their energy, solvated in SPC/E water and firstly equilibrated for 4 ns in the NVT ensemble and then also for 4 ns in the NPT ensemble, using the OPLS force field. Net protein charges of +8 on each HEWL and T4 WT* molecule were compensated by an adequate amount of Cl− counterions (see Table 2). No further background electrolyte was added in order to focus on primary residue–residue interactions. We plan to relax this approximation and simplification of the model systems in a future communication in which we focus on the nature and importance of the background electrolyte. Production runs were carried out in the NPT ensemble, using velocity rescaling temperature coupling with a stochastic term and a time constant of 0.1 ps. The Parrinello–Rahman barostat with a time constant of 2.0 ps was used to maintain a pressure of 1 bar. The production run for all simulations lasted for 300 ns. Even though similar in shape and size, the three proteins differ significantly in the characteristic distribution of surface amino acids, summarized in Table 1, as well as in the distribution of secondary structure elements, as indicated by different colours in Fig. 1. Nonetheless, all three investigated proteins have one similarity – a large proportion of arginine residues is located on their surface, which is evident by the fact that arginine is among the top three most frequent amino acids for all three studied proteins.
HEWL | T4 WT* | γ-D crystallin |
---|---|---|
ASN 14 | LYS 13 | ARG 22 |
ARG 11 | ARG 11 | ASP 12 |
GLY 10 | ASN 10 | SER 12 |
SER 7 | ASP 10 | GLY 11 |
LYS 6 | GLU 8 | GLU 10 |
THR 6 | THR 8 | GLN 8 |
ASP 6 | ALA 8 | TYR 8 |
ALA 4 | GLY 7 | ASN 7 |
VAL 3 | LEU 5 | HIS 5 |
CYS 3 | SER 5 | PRO 4 |
LEU 3 | GLN 5 | LEU 4 |
TYR 3 | TYR 4 | PHE 3 |
ILE 2 | VAL 4 | MET 3 |
GLU 2 | MET 3 | THR 2 |
TRP 2 | PHE 3 | ILE 2 |
GLN 2 | PRO 3 | CYS 1 |
PRO 2 | ILE 2 | LYS 1 |
HIS 1 | TRP 2 | |
HIS 1 | ||
Total: 87 | Total: 112 | Total: 115 |
Protein | γ (mg mL−1) | T (K) | N water | N proteins+ions | L (nm) | D/10−5 cm2 s−1 |
---|---|---|---|---|---|---|
HEWL | 93 | 267 | 124615 | 16 + 128 | 16 | 0.26 |
HEWL | 93 | 300 | 124615 | 16 + 128 | 16 | 0.47 |
HEWL | 42 | 267 | 142004 | 8 + 64 | 16.5 | 0.29 |
HEWL | 42 | 300 | 142004 | 8 + 64 | 16.5 | 0.65 |
T4 WT* | 90 | 267 | 124850 | 12 + 96 | 16 | 0.17 |
T4 WT* | 90 | 300 | 124850 | 12 + 96 | 16 | 0.38 |
T4 WT* | 45 | 267 | 129713 | 6 + 48 | 16 | 0.33 |
T4 WT* | 45 | 300 | 129713 | 6 + 48 | 16 | 0.68 |
γ-D crystallin | 100 | 300 | 123966 | 12 + 0 | 16 | 0.32 |
γ-D crystallin | 100 | 320 | 123966 | 12 + 0 | 16 | 0.67 |
Standard three-dimensional periodic boundary conditions were applied. No buffer molecules or ions were included in the simulation. An appropriate number of chloride ions was added to the simulation boxes of HEWL and T4 WT* to maintain electroneutrality, while the solutions of the electroneutral γ-D crystallin protein molecules do not contain additional ions. HEWL and T4 WT* solutions were simulated at two different concentrations and temperatures with the intent to study these proteins both inside and outside of the range of experimentally established LLPS conditions. Since the CPU resources available to us did not allow a systematic investigation over a broad range of protein concentrations within the experimentally determined two-phase region, we had to choose concentrations that, according to our experimental data, can be expected to exhibit the LLPS phenomenon. These concentrations are around 40 mg mL−1 and 90 mg mL−1, respectively16,17 for the lysozymes, and around 100 mg mL−1 for crystallin.26 The temperatures in simulations (around 270 K and 300 K) were selected according to the experimentally determined position of the LLPS for the concentrations so that one of them is above and the other one is below the critical temperature. Details of the simulations are summarized in Table 2. Prior to performing any extensive analysis we established that the running protein–protein coordination numbers were reasonably independent of time during the final stages of the 300 ns simulation interval, which is documented in Fig. S3–S5 of the ESI.†
We also examined the mean-square displacement (MSD) as an indicator of the distance diffused by a molecule in the system and thus a measure of the explored portion of te system's relevant phase space:
(1) |
Angle brackets denote averaging over all the protein molecules as well as over time origins, τ. In all cases studied, the MSD increased with simulation time as expected in liquid systems, and no “freezing” was observed (see Fig. S6, ESI†), unlike in some force fields like MARTINI, where such undesired features sometimes occur.
For HEWL at low temperature and high concentration, one can clearly discern relatively high density domains, the location of which changes over time. These high density domains are indicative of clustering of protein molecules, which may be the first stage of the LLPS process. At low concentration and high temperature the contrast becomes less pronounced, suggesting a more homogeneous solution, in which smaller clusters reversibly form and decompose. However, we cannot rule out that the simulation time (300 ns in all cases) is insufficient to observe the clustering processes, which can be expected to be slower at lower concentrations and higher temperature.
The density fluctuations of T4 WT* lysozyme behave similarly. At high concentration and at 267 K we again observe distinct high and low density protein regions, but less pronounced than in the HEWL case. Overall, the results suggest the formation of some protein clusters and a significant number of residual monomers. Increasing the temperature or decreasing concentration the protein molecules are even more homogeneously distributed across the solution and the high-density regions become less pronounced. However, even at 45 mg mL−1 and 300 K one can still observe some protein clustering. For γ-D crystallin, we investigated only the influence of temperature at two higher values of 300 and 320 K, since these were within (300 K) and outside (320 K) the experimentally determined conditions of LLPS.26 At the lower temperature once again one observes indications of the protein separating into two domains with different protein density and a distinct boundary forms between them (blue ‘aqueous’ region). At the higher temperature, two regions of different densities are still discernible, but the boundaries are less distinct.
Similar plots were constructed for ΔN(y,t), and ΔN(z,t) (see Fig. S1 and S2, ESI†), and they all show qualitatively same results.
The diffusion coefficients D in Table 2 also provide some indication for clustering. D was calculated by a least squares fit of the Einstein relation MSD(t) = 6D·t + n to the center-of-mass mean square displacement (MSD) of protein molecules in the time interval between 30 and 75 ns (10 to 25% of the total simulation time).28 Lynden-Bell29 found that the intercept n of the fitted straight line could be correlated with the average size of the solvent cage surrounding a certain species while investigating ionic liquids. However, we did not notice such features in our protein solutions, as intercept values of all simulations were very close to zero. For HEWL and T4 WT* the diffusion coefficient is significantly smaller at higher concentration than at lower ones, in particular at the higher temperatures. For γ-D crystallin, no conclusions are possible. For all systems, the diffusion coefficient increases with temperature, as expected from general considerations. The magnitude of D is also consistent with the protein size. HEWL is smaller than T4 WT* and γ-D crystallin, and at 300 K, it also has the largest self-diffusion coefficient. The calculated value for the SPC/E water self-diffusion coefficient of 2.47 cm2 s−1 is close to the experimentally determined value of water in salt free HEWL solutions, being 2.27 cm2 s−1 at a HEWL concentration 50 mg mL−15 and thus slightly lower than the simulation value for pure SPC/E water of 2.78 cm2 s−1 at T = 300 K. The slight reduction in water self-diffusion might be a consequence of the slowing-down of water mobility in the protein solvation shell.
Protein aggregation can be often accompanied by changes in protein structure,30 but previous simulations of aggregation processes suggest that these changes only occur during the later stages.31 To examine the simulated solutions in our study in this respect, we calculated the radii of gyration of the protein molecules as a function of time, results of which are shown in Fig. S7 (ESI†). It is quite evident that the radii of gyration do not change significantly over time. Even in those cases where the density maps of Fig. 2 suggest protein clustering, the structure of the protein molecules, as characterized by the radius of gyration, remains fairly stable; a minor increase over time is observed for T4 WT* at higher temperatures, a minor decrease for HEWL at higher temperature, and a slight decrease for γ-D crystallin at higher temperature, while the runs at the lower temperature do not show any significant trend over the 300 ns time interval, which might, however, also indicate insufficient sampling at the lower temperatures. The general behavior for HEWL is consistent with previous experimental observations for HEWL, where CD spectra also showed no structural changes in aqueous solutions below the critical temperature.5 The radii of gyration calculated here are about 1.38 nm, 1.63 nm, and 1.62 nm for HEWL, T4 WT* and γ-D crystallin, respectively.
To further examine the role of hydration properties, we have calculated the total solvent accessible surface area (SASA) of our proteins in solutions and also radial distribution functions between oxygen atoms of water molecules and the protein center of mass. Results of both calculations can be found in the Supporting information (see Fig. S8 and S9, ESI†). Fig. S8 (ESI†) shows a small plateau between 0 and 1 nm, which is indicative of a few water molecules being incorporated into the protein core. The effect is especially pronounced in the solutions of T4 WT* lysozyme, less in γ-D crystallin solutions, and even less in the HEWL solutions. Near the protein surface, the relative amount of water is slightly increased relative to the bulk water level due to hydration of polar protein groups by water, except for the simulation of the 45 mg mL−1 solution of T4 WT* at 300 K, which shows, in the region around 2.5 nm, a slight decrease of relative water densities, which is possibly due to clustering. No significant changes in protein hydration are observed in the remaining simulations, neither when varying temperature, nor for different concentrations. The extent of clustering in each protein solution is more evident when looking at the total SASA (sum of SASA of all proteins in solution; Fig. S9, ESI†). There, one can notice a very small decrease in SASA in solutions with low protein concentration, since no significant clustering of proteins is observed. At the higher protein concentrations this reduction is, on the other hand, slightly more pronounced, as larger protein assemblies are formed which diminish the protein surface available for hydration. This trend is particularly visible in γ-D crystallin solutions, where the extent of self-association appears to be higher than for HEWL or T4 WT*. Despite this, one can still conclude that clustering in solutions with high protein content does not lead to major dehydration of protein molecules. Nevertheless, a potential impact of hydration water is still taken into account when performing the analysis of residue–residue interactions, since the chosen interparticle cut-off distances are not limited to direct contacts only, but also allow for the presence of solvent separated interactions between residues of two proteins.
When comparing different simulations, the variation of features is rather erratic, which is mostly a consequence of the fact that the 300 ns simulation time is not long enough to completely average out time-dependent protein arrangements. However, the simulation time is sufficient to elucidate the most important residues that initiate contacts between proteins. Some observable trends in the RDFs are consistent with those of the density fluctuations in Fig. 2. The calculated RDFs demonstrate that the positions of major peaks denoting proteins in contact depend not solely on the conditions in the solutions, but also on the details of relative orientation between the slightly aspherical protein molecules.
Thus, a more detailed analysis of RDFs will be ineffective to reveal the most interesting features of lysozyme–lysozyme interactions. Instead, we have performed an in-depth analysis of the interactions between individual residues between two protein molecules. Towards this goal we have monitored the time-dependence between all residue–residue distances between all pairs of protein molecules for each simulation. The residue–residue distance is defined as the minimum distance between any pair of heavy atoms (thus, ignoring all hydrogen atoms) between two residues on different molecules. To reduce the raw data to a manageable level, we collected only ‘events’ as a function of time, protein molecule pair and residue pair, for which the minimum distance was at some time smaller than 1 nm. From these raw data we constructed various graphical representations that help us elucidate the details of protein interactions. Fig. 3 shows, as an example, for each amino acid residue of the 12th molecule in the HEWL simulation at 93 mg mL−1 HEWL and 300 K, the closest distance of this particular residue to any other protein residue as a function of time. The minimum distance (in Å) is indicated by the color bar. Minimum distances larger than 10 Å are not distinguished. The graph indicates rather stable interactions of this molecule, although some transient interactions occur on the time scale of 10 nanoseconds. The insets show for two examples the initial interaction dynamics which eventually stabilizes.
Fig. 4 shows a few examples of the time dependence of the interaction dynamics of individual residue pairs from HEWL simulations. Compared to the distance map of Fig. 3, the data shown are strongly reduced, in the following way: a color code is only assigned to distances less than 6 Å. Note that solvent mediated residue–residue contacts are also encompassed by this criteria. In addition, a particular residue pair will only become monitored when no other pair was monitored during the 2 ns period before this. Thus, all pairs that interact at some later time as a consequence of their proximity to the original pair are excluded. Finally, we also exclude all spurious interactions that last for less than 2 ns. With these settings, one observes a substantial variability of patterns. For example, for protein pair 7 and 11 in the 93 mg mL−1 run at 300 K a persistent interaction between Arg 128 of protein 7 and Asn 46 of protein 11, which starting after around 40 ns. For the same run, one sees between protein molecules 6 and 8 several intermittent (longer than 2 ns) interactions between Asn and Arg and, slightly later, between Arg and Pro, which all do not produce a long-lasting contact. In the low temperature run at 42 mg mL−1 the interaction between molecules 4 and 5 initially produces a short-lived Arg-Thr contact, followed by an even shorter-lived Arg-Pro contact, before, about 10 ns later, a very-long-lived but not permanent contact between Arg 45 of molecule 4 and Asp 48 of molecule 5 forms. Finally, in the 93 mg mL−1 run at 267 K a temporary contact lasting about 25 ns forms between Gly and Thr.
In summary, it is obvious that some pairs come close but do not interact for a long time, some interact for many nanoseconds and dissociate again. Other re-associate again after some time, and even other pairs come into contact at some point and stick to each other permanently. Please take note that at any given time there may be plenty of residue pairs in the vicinity, that are excluded from representation according to the rules given above. In addition to the four exemplary cases shown in Fig. 4, we summarize all interaction maps in Fig. S11–S33 in the ESI.†
Only some specific surface regions participate in the contact formation, which is consistent with previous experimental observations of so-called protein–protein interfaces, where several neighbouring amino-acid residues participate.4 On each protein, we identified four most frequently occurring regions that are involved in contacts. These are summarized in Table 3 together with their amino-acid composition. Examining these regional compositions, one can observe that arginine is the only amino-acid residue common to all proteins. Arginine also appears to be present in all regions that ever come into contact with other protein molecules during the simulations (see Fig. S11–S33, ESI†), which is in good agreement with experimental results.
The most frequently observed regions on each protein are graphically illustrated in Fig. 5. One notes that they are mostly located along the longer axes of the protein molecules, including both ends of molecules. This is consistent with our conclusions drawn from pair distribution functions: besides end-to-end contacts, the two protein molecules form also contacts aligning in a side-by-side fashion.
Fig. 5 Four regions of protein molecules that most frequently participate in protein–protein contacts for HEWL (left), T4 WT* (middle), and γ-D crystallin (right). |
By visually examining the clusters of protein molecules that appear during the simulation we noticed that the initial contacts between protein molecules were indeed formed through surface arginine residues on two different molecules in the anti-parallel orientation (see also above and Fig. 6).
While instructive from a mechanistic perspective, Fig. 3–6 lack statistical significance. Consequently we analysed all contact maps in a statistical way: First, we sampled all initiating residue pairs, i.e. those that are visible in Fig. 4 and in the corresponding figures in the Supporting Information, and ranked these pairs according to their relative abundance (Fig. 7). From these data we extracted the abundance of individual amino acid residues (Fig. 8). These data provide information about the initial contacts, which may or may not lead to a lasting protein pair. At the same time, we sampled the relative abundance of all close residue–residue pairs (Fig. 9), independent of whether the interactions initiated the particular protein dimerization, or whether it occurred as the consequence of some other initiating interaction. Again, we extracted the abundance of individual amino acids (Fig. 10) from these data.
Fig. 8 Occurrence of each distinct amino acid (irrespective of sequence number) in the initiating residue pairs (for definition see text). |
Fig. 9 Most frequent residue pairs whose distance at some time is smaller than 3 Å. The y-axis denotes the fraction of each contact within all contacts during the simulation. |
It has been found experimentally that the addition of arginine to solutions can suppress protein aggregation.33 One of the proposed mechanisms is that arginine residues bind to the protein surface and thereby reduce the stickiness of the protein molecules, making it less prone to aggregation.34 Further corroborating evidence shows that arginines in aqueous solution by themselves show a tendency for stacking their methylene groups to form clusters with head to tail hydrogen bonding.35,36 Our simulation results are consistent with these findings, as we also observe partial parallel and anti-parallel stacking of methylene groups in the simulations (see Fig. 6), which can be interpreted as partial burying of arginine residues inside the protein-rich phase. Such alignment of arginine residues points towards the presence of hydrophobic interactions that can provide stabilization of initial protein contacts (dimers). Pace et al. have already recognized that hydrophobic interactions can substantially contribute to the stability of protein clusters, since the average free energy contribution of burying a single methylene group was determined to be 1.1 ± 0.5 kcal mol−1,37 which is certainly a non-negligible interaction. We also observed that the arginine residue chains on the protein surface have a tendency to bind to other amino acid residues, commencing the contact between two protein molecules.
Fig. 8 shows, similarly to Fig. 7, the data sorted by relative abundance of individual amino acid residues. This figure clearly demonstrates that HEWL and γ-D crystallin interactions are strongly dominated by arginine residues. For T4 WT* the picture is less clear, as in most but not all cases (the simulation at the higher temperature and lower concentration being an exception), lysine is the most frequently occurring initiating amino acid residue. This is not unexpected, given the fact that lysine is the most frequent residue on the T4 WT* protein surface.
As trajectory inspection showed occasional rearrangement of the protein–protein configuration after the initial contact, we also collected the data in a slightly different way (Fig. 9 and 10). In Fig. 9 we collected the overall probability during the simulation to observe specific residue pairs in close contact, independent on when this occurred during the interaction interval. Similarly, Fig. 10 provides the overall probability of a single residue to be found in contact with another protein. For HEWL, the overall conclusion persists that arginine dominates protein–protein interactions. While for T4 WT* a certain prevalence for lysine and arginine still exists, the prevalence, measured over the entire simulation time, is less clear than for the initial contacts. For γ-D crystallin, the prevalence for arginine is clearly visible over the entire simulation time, but somewhat less pronounced than when focusing only on the initial stage of interaction.
Here, we have reported extensive atomistic molecular dynamics simulations of three different globular proteins, hen egg white lysozyme (HEWL), T4 WT*, and γ-D crystallin. Altogether ten systems with different temperatures and protein concentrations were studied. While even the 300 ns simulation runs performed here do not allow statistically unambiguous conclusions, we nevertheless could show that arginine residues located on the protein surfaces indeed form strong bonds with residues of neighbouring molecules. Arginine seems to be particularly instrumental in facilitating the initial contact, as is evident by its higher prevalence initially in comparison with the entire simulation interval. For T4 WT*, the lysine plays an important role in addition to arginine, but overall the results are less conclusive.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d0cp05160h |
This journal is © the Owner Societies 2021 |