Srdjan
Pusara
a,
Peyman
Yamin‡
a,
Wolfgang
Wenzel
*a,
Marjan
Krstić
ab and
Mariana
Kozlowska
*a
aInstitute of Nanotechnology, Karlsruhe Institute of Technology (KIT), Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany. E-mail: mariana.kozlowska@kit.edu
bInstitute of Theoretical Solid State Physics, Karlsruhe Institute of Technology (KIT), Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
First published on 24th May 2021
Colloidal protein–protein interactions (PPIs) of attractive and repulsive nature modulate the solubility of proteins, their aggregation, precipitation and crystallization. Such interactions are very important for many biotechnological processes, but are complex and hard to control, therefore, difficult to be understood in terms of measurements alone. In diluted protein solutions, PPIs can be estimated from the osmotic second virial coefficient, B22, which has been calculated using different methods and levels of theory. The most popular approach is based on the Derjaguin–Landau–Verwey–Overbeek (DLVO) theory and its extended versions, i.e. xDLVO. Despite much efforts, these models are not fully quantitative and must be fitted to experiments, which limits their predictive value. Here, we report an extended xDLVO-CG model, which extends existing models by a coarse-grained representation of proteins and the inclusion of an additional ion–protein dispersion interaction term. We demonstrate for four proteins, i.e. lysozyme (LYZ), subtilisin (Subs), bovine serum albumin (BSA) and immunoglobulin (IgG1), that semi-quantitative agreement with experimental values without the need to fit to experimental B22 values. While most likely not the final step in the nearly hundred years of research in PPIs, xDLVO-CG is a step towards predictive PPIs calculations that are transferable to different proteins.
The solubility of molecules is affected by their solid–liquid equilibrium, which depends on solute–solute and solute–solvent interactions.5 Macroscopic deviation of the osmotic pressure from that on an ideal solution can give information about solute–solute interactions. This deviation is characterized by virial coefficients. Here, the most relevant quantity is the second virial coefficient, known in diluted systems as B22, which gives information about average two-body interactions between protein molecules. B22 is a measure of the non-specific protein binding capacity6 and stickiness, and it has been investigated widely as a semi-quantitative approach to characterize protein solubility,7–12 phase behavior13–16 and crystallization.2,7,9,16–18 Positive values of the B22 coefficient indicate the dominance of repulsion interactions between proteins and correspond to good protein solubility. When attractive interactions between proteins increase, enabling protein assembly and aggregation, the B22 values become negative. A quantitative relation between the B22 and the ideal protein crystallization conditions was proposed by George and Wilson,19 who determined a narrow range of the B22, when crystallization occurs.7,8,20 Since then, several empirical models were proposed aiming to correlate B22 quantitatively with solubility and self-assembly with parameters fitted from experimental data.11,18,21 Moreover, thermodynamic relations between the protein solubility and B22 were developed,7,12 where the solubility, S, is calculated from the measured B22 using:
(1) |
Experimentally, the osmotic second virial coefficient is often measured using membrane osmometry, static light scattering, cloud-point measurements, fluorescence-anisotropy, self-interaction chromatography or by sedimentation equilibrium.14,22–24 As an example, the functional form to relate PPIs with Rayleigh scattering in static light scattering (SLS) is the following:25
(2) |
Unfortunately, the experimentally determined values of B22 cannot be decomposed into the specific PPI contributions, limiting the systematic understanding of PPI. Presently, the number of reported B22 coefficients is still limited to specific examples, therefore, considering the huge number of proteins of industrial interest and the various conditions occurring in industrial processes, there is a huge space for improved models and theories for the B22 calculation. Quantitative models to compute the B22 coefficients for specific proteins and environmental conditions can accelerate the optimization of the industrial processing towards the stability of the protein solutions or protein crystallization beyond the current “trial and error” experiments based on the several empirical rules.28,29
From the perspective of thermodynamics, the osmotic second virial coefficient is the solvent-averaged free energy of interaction between a pair of proteins, which is Boltzmann-averaged over the relative protein orientations and separations. It has been estimated by the integration over the configuration space, Monte-Carlo simulations30,31 with the Mayer sampling,13,32–36 from molecular simulations27,37,38 using radial distribution functions or potentials of mean force or by counting all configurations in which proteins interact.36 Among available theoretical tools aimed to calculate B22 coefficients, one of the most commonly used approach is the DLVO theory, that was originally developed by Derjaguin–Landau and Verwey–Overbeek for colloidal systems.39 It models PPI by representing proteins by spheres and accounts for van der Waals and electrostatic interactions in protein solutions of a specific ionic strength. The protein is represented by the charged sphere using a Debye–Hückel model for the screening by hydrated salts and a continuum representation of the solvent. When fitted to experimental B22 data, DLVO theory represents the overall trends well.1 Several improvements of DLVO were reported, including introduction of the multiple binding sites to other proteins40 or addition of a salt-induced osmotic attraction potential.41 The latter model was also extended to compute B22 of protein solutions with excipients42 in the presence of polymers.43 Details of the extended DLVO (xDLVO) method are given in Section 2.2.
Since proteins exist in a variety of different shapes with an inhomogeneous charge distribution, resulting in anisotropic PPIs,31,44 the spherical approximation in xDLVO may be one of the reasons, why such models fail to quantitatively describe B22 coefficients without fitting to an experimental baseline. Moreover, even for the globular proteins, the surface roughness can play significant role on the solvent accessible area, and therefore, on the total potential of mean force (PMF) and B22.45 Anisotropic shape is also important for the protein surface complementarity during the assembly and crystallization processes.46 Methods using coarse-grained (CG) representations of proteins improve the accuracy of B22 calculation for different types of proteins,27,47,48 starting from the small globular lysozyme to monoclonal antibodies. In most cases, the protein model was parameterized based on the experimental data.13,27,30–36 One might assume that using a fully atomistic, but computationally costly representation would fully resolve the difficulties stemming from the spherical approximation, but this is not the case: investigations with fine-grained protein models using atomistic and MARTINI force fields have shown significant deviations from experiment, in particular they result in a large overestimation of the PMF well depths.31,37,38
Here, we aim to develop a transferable model for reasonably efficient B22 calculations using optimized coarse-grained (CG) representation for the protein that is better able to account for the shape of non-spherical proteins, while not incurring the cost of fully atomistic models. We show that a shape based representation of the protein and the introduction of an additional term representing ion–protein dispersion interactions improve an agreement between experiment and model, both in values and limiting behavior. Going beyond spherical models we must sample the conformational space of the relative orientations of the proteins. We validate the model on lysozyme (LYZ), subtilisin (Subs), bovine serum albumin (BSA) and immunoglobulin type 1 (IgG1), where we find a good agreement with the available experimental data.
(3) |
McMillan and Mayer used a statistical mechanics approach to describe the osmotic properties of solutions, reducing the virial equation of state to the osmotic equation of state, where the solute particles interact with each other via effective potentials.49 Therefore, the relationship between the osmotic second virial coefficient and the potential of mean force (PMF), W22, between two proteins in solution can be derived as:49,50
(4) |
W22(r) = WHS(r) + Wdisp(r) + Wel(r) + Wosm(r) | (5) |
(6) |
The dispersion potential:
(7) |
The electrostatic potential was derived from Debye–Hückel theory,53 which models the repulsive forces of two identical protein molecules:
(8) |
(9) |
Eqn (8) cannot take into account the effect of electrolytes at ionic strengths higher than 10 mM, therefore, the salt effects were modeled via the osmotic potential,54Wosm(r):
(10) |
The traditional xDLVO model41 allowed to describe the B22 of proteins in aqueous solution as a function of pH, salt type, salt concentration and temperature by fitting to experimental B22 data. In order to compare it with the new model developed here, we re-implemented this approach, but do not fit the parameters. Instead, we use just one value of the Hamaker constant, taken from the literature (see Table 1 in Appendix), to represent dispersion interactions and we implemented new potential term, i.e. for ion–protein dispersion interactions, that improves the salt induced attractive PPIs in PMF. The detailed description of the new model is given in the next Section.
Protein | Conditions | A H [kBT] |
---|---|---|
LYZ | All pH and salts | 8.0 |
BSA | All pH and salts | 3.0 |
Subs | NaCl | 5.1 |
Subs | NaSCN | 6.8 |
IgG1 | All pH and salts | 3.0 |
All-atom structures of proteins were mapped to a coarse-grained (CG) representations using a shape-based model with a self-organizing neural network topology building algorithm,59 implemented in the VMD program (version 1.9.3).60 The neural network was initialized by two variables, ε and λ, used for the learning algorithm. The starting and the final values of ε and λ were 0.3 and 0.05 and 20 and 0.01, respectively. One CG bead represented approximately 500 atoms of a particular protein, resulting in 5, 10, 20, 40 CG beads for LYZ, Subs, BSA and IgG1 (see Fig. 1a and 5), respectively. CG beads were placed at the center of mass of atoms represented by each bead. Partial charges of atoms in each bead, as calculated by PROPKA, were used to calculate the charge of each CG bead (see Fig. S1, ESI†). Several mapping schemes were tested (see Fig. S2, ESI†). We find that there is little difference in B22 between 500-to-1 mapping compared to a calculation that used a 150-to-1 mapping, while the computational cost of B22 was reduced from 57.76 minutes to 7.7 minutes for 150-to-1 and 500-to-1 model of BSA (for one relative orientation). For this reason we opted for a 500-to-1 mapping for this investigation.
Fig. 1 Improvements of xDLVO approach implemented in xDLVO-CG: shape-based coarse-grained model of proteins, shown for lysozyme (PDB code 4nhi) represented by 5 CG beads per a protein unit (a) and continuum model of ion–protein dispersion interactions (b). Rp and D denote protein radius and maximum thickness of a shell, at which ion–protein interactions are considered. |
In order to go beyond the fitting of the Hamaker constant to protein specific B22 data in xDLVO, we aim here for a model, where the Hamaker constant is independently measured or computed and do not depend on pH and salt concentrations. Even in a CG representation we need to account for the “missing” interactions and additional terms must be considered to model the observed dependence on pH and salt concentration. To account for this effect we investigate, whether an additional ion–protein dispersion potential (Section 3.2.4), mimicking the impact of ion type on the protein salting out, as known from the Hofmeister series, can compensate for the missing effects. For the dispersion interactions, we still use a Hamaker-approach for most of the paper, but, at the end, we discuss replacing this term altogether by a Lennard-Jones potential. Therefore, the total PMF between two proteins, W22 (eqn (4)), was determined as the sum of electrostatic, dispersion (Hamaker or Lennard-Jones), osmotic and ion–protein interactions as in
(11) |
The PMF was calculated at salt concentrations from 0 to 1 M (every 0.01 M) on the center-of-mass distances between the proteins starting from the separation in the initial crystal structure (R0) up to R0 + 30 nm using 0.1 nm step (see Fig S3, ESI†). All calculations were performed at 298.15 K. The PMF and the corresponding B22 coefficients were calculated by an in-house developed code.
(12) |
The impact of the low dielectric core, which was shown to give a rise to the short-range image-charge-based repulsion of the charged polyelectrolyte chain, adsorbed on the spherical substrates,61 was not included. The image repulsion for the ion–protein electrostatics is low: Slight contributions are noticeable only at low salt concentrations (see Fig. S4, ESI†) without the influence on the B22 values. Image-charge-based repulsion is negligible at moderate ionic strengths. This is consistent with other studies in this field.62,63
(13) |
AH = π2λq1q2, | (14) |
Since the Hamaker equation needs experimentally measured values of AH, we attempted to generate a parameter-free substitute by implementing the Lennard-Jones potential:
(15) |
The Lennard-Jones parameters of each bead: εi and σi, were assigned according to Arkhipov et al.,66 as implemented in the SBCG builder in VMD. According to this model, the interaction strength of each bead was assigned based on the hydrophobic solvent accessible surface area (SASA) for the protein domain represented by a bead:
(16) |
The LJ radius for each bead, σi, was obtained as a radius of gyration of the groups of atoms represented by a CG bead, increased by a constant value, Δσ, of 0.1 nm, to mimic atoms at the edges of the bead. LJ parameters between specific beads pairs were obtained by standard combination rules:
(17) |
(18) |
Dij = dij + R3 + σ | (19) |
Salt | R 3 [nm] | Ion | B i–pr × 10−50 [J m3] |
---|---|---|---|
NaCl | 0.442 | Na+ | 0.454 |
KCl | 0.436 | K+ | 1.888 |
NaI | 0.464 | Cl− | 3.574 |
NaSCN | 0.460 | I− | 4.440 |
SCN− | 10.000 |
(20) |
The number of ions in an infinitely thin spherical shell around the protein is given by cbulkexp(−zϕ(r)/r)dV, where dV = 4πr2dr is the shell volume, z is the charge of an anion or a cation and ϕ(r) is the electrostatic potential felt by ion at distance r from the protein center (approximated by eqn (8)). As a result, the total magnitude of the ion–protein dispersion interaction in xDLVO-CG is obtained by the integration of contributions of all ions, placed in the spherical shell around the protein:
(21) |
(22) |
We employ two sampling schemes: the data labelled just with the name of the PDB code refers to a sampling of the PPI, where the protein is pulled along a linear trajectory outwards along the line connecting the respective center-of-mass of the proteins starting with the crystal structure up to a distance of R0 + 30 nm. The other data was generated as follows: To sample the protein pair interactions for the relevant relative arrangements we used a statistical sampling scheme over the configuration space. This approach results in a fast evaluation of the protein–protein configurations with on-the-fly calculation of the corresponding contributions to the PMF. For this, the initial crystal structure of the protein (see Section 3.1) was used, such that the first protein was kept at a fixed position (x1, y1, z1), while the second protein (with the center at (x2, y2, z2)) was uniformly moved around the first one on the fixed distance r0. The radial sampling of the second protein was performed by varying (θ, ϕ, r) coordinates in the spherical coordinate system with the center at (x1, y1, z1). Ten values of θ, ϕ angles were taken uniformly from θ = [0,2π] and v = [−1,1] → ϕ = −arccos(v) intervals, which resulted in 83 unique starting configurations (including the protein position in the crystal). In addition, the second protein was subsequently rotated by five different angles around (x − x2), (y − y2) or (z − z2) axis, respectively. These resulted in 16 differently rotated structures, therefore, in the total amount of different starting protein configurations of 83·16( = 1328). From these starting configurations, again the protein is pulled along a linear trajectory outwards, up to a distance of R0 + 30 nm, along the line connecting the respective center-of-mass. All configurations were checked for the possible sterical overlap that could have been generated after the radial movements and rotations. Two proteins were considered overlapping, if the distance between any bead pairs (belonging to different proteins) was smaller than the sum of radii of gyration of the corresponding beads. Overlapping structures were excluded from further calculations. The total PMF was obtained via averaging of all the angular orientations defined by the starting structures, resulting in the PMF as a function of the COM: W22(r,Ω1,Ω2) → W22(r). The second osmotic coefficient as a function of ionic strength was calculated by numerical integration of the averaged PMF at different salt conditions according to eqn (4).
Lysozyme (LYZ) is a small globular protein (Fig. 1a) that consists of 129 residues and exists as a monomer in solution for a wide range of conditions. Its main physiological function is hydrolysis of the glycosidic bonds in peptidoglycans that can be found in bacteria. LYZ was the first enzyme to be fully sequenced79 and it is the second protein for which the crystal structure was solved.80 Due to its size and simplicity of processing, LYZ is the most studied protein also in terms of its solubility via the osmotic second virial coefficients.10,14,22,81–83 As all proteins, LYZ is of zwitterionic nature and its PPIs, and therefore the solubility, are highly dependent on the pH of the solution. The main change induced by pH is the charge of the protein, resulting in a positive B22 for highly charged proteins and a negative B22 when the charge is screened enabling further attractive PPIs. To understand the aggregation propensity of LYZ as a function of pH, we perform calculations at five different pH values: pH 3, pH 4.5, pH 5, pH 7 and pH 8, for which exhaustive experimental data is available.
The B22 coefficients of LYZ at pH 4.5 and pH 7 are shown in Fig. 2. Values at low ionic strengths are positive and decrease towards negative with an increase of the salt concentration. Due to relatively slight decrease of the positive charge of a protein, i.e. from +11 to +9 at pH 4.5 and pH 7, as obtained using PROPKA protonation method (see Table S1, ESI†), the general trend for the B22 change with increasing ionic strength is similar in both cases. The data obtained correlates well with the measured B22, no fitting to experiment is required. For xDLVO-CG, we used one AH value for all pH and salt concentrations (Table 1 in Appendix), and the calculated B22 reproduces experimental observations. We have used the same AH value for xDLVO and wee see that it also works well without any additional fitting. We find that calculations using one starting protein–protein configuration from the crystal (PDB code: 4nhi, data marked with dashed blue curve) result in nearly identical results to those obtained via the sampling protocol (marked with solid black curve). This likely results from the globular and symmetrical shape of LYZ with a low level of charge anisotropy. From the same reason, the xDLVO model considered here works also well for LYZ (data marked with orange dashed curve in Fig. 2). There is a slight overestimation of B22 for concentrations lower than 200 mM, thereafter it decreases as in experiment and xDLVO-CG. This overestimation was improved in xDLVO theory reported by Herhut et al.,41 which fitted the Hamaker constant depending on pH value and salt concentration.
B 22 values, calculated with xDLVO-CG, correlate with FMAPB2 results (data marked with red dashed curve in Fig. 2), especially at low ionic strengths. Larger deviations are observed starting from 500 mM NaCl concentrations, where FMAPB2 data flattens out. It is hard to judge, which model performs better since experimental results vary widely. The computational cost of B22 calculations at one pH value using xDLVO-CG and FMAPB2 is ca. 1.85 h (on 1 desktop CPU i5-7500, 3.40 GHz) for hundred salt concentrations (in the range of 0.0–1.0 M) and 4 h on the FMAPB2 server (single node with 16 Intel Xeon E5-2650 cores) for one salt concentration, respectively.
xDLVO-CG has good semi-quantitative agreement with experimental data also at other pH values (see Fig. 3). We observe larger deviations from either experimental data from size exclusion chromatography (SIC) or static light scattering (SLS), to both xDLVO and xDLVO-CG at 300 mM NaCl at pH 3 (histogram on the left in Fig. 3). The same observation was reported by Kalyuzhnyi et al.,84 who accounted all interacting species: proteins, ions and water molecules, explicitly in the model. They also reported overestimation of calculated B22 at pH larger than 7, which agrees with our observations for lysozyme at pH 8, where probably ion-specific effects are more pronounced than we can presently account for in the model.
To understand the impact of different colloidal PPIs modulating the observed changes of the B22, interaction potentials contributing in the total PMF (Section 3.2) at specific salt conditions can be analyzed (see Fig. 10 and 11 in Appendix). From this data, we see the strong impact of the electrostatic repulsion interactions at low ionic strength and their decrease with stronger charge screening induced by an increase of the salt ions. For LYZ at pH 4.5 and 10 mM NaCl, the electrostatic potential (ca. 4.4 × 10−20 J) is approximately 5 times stronger than the dispersion potential (ca. −9 × 10−21 J). This is the reason why the total PMF (see Fig. 4) at 10 mM NaCl is dominated by repulsive interactions, resulting in positive B22 (Fig. 2).
The situation changes with the increase of the salt concentration, which results in more pronounced attractive dispersion interactions (PMF depicted in green and blue in Fig. 4), inducing negative values of B22 and protein aggregation. The impact of osmotic and ion–protein interactions is smaller, however, it increases at higher ionic strengths (Fig. 10 and 11 in Appendix), where protein salting out occurs. Inclusion of ion–protein dispersion interactions in xDLVO-CG improves the calculated B22 values (see Section 4.3).
In the B22 data for Subs at pH 5.5 (upper panel in Fig. 6), we see significant differences between calculations made with xDLVO and FMAPB2 in comparison to xDLVO-CG. The B22 crossing point, indicating a change between solubilized proteins and induction of aggregation, is shifted to the lower ionic strengths in xDLVO and shows stronger increase of attractive PPIs towards aggregation. This is caused by the representation of the charge by a sphere in xDLVO, while the protein has a significant anisotropy that is better represented by xDLVO-CG (see Fig. S1, ESI†). B22 from xDLVO, with the experimental Hamaker constant of 5.1kBT, matches the experimental B22 at 500 mM NaCl (−1.83 × 10−4 and −1.78 × 10−4 mol mL g−2, respectively), but this is the only data point that matches. Another xDLVO methodology to calculate B22 was used by Pan et al.12 They used the MacroDox program to calculate the charge of Subs and fitted the Hamaker constant to achieve nearly quantitative agreement between the experimental measurements and their xDLVO. The protein charge they obtained is higher by 2.7–3.5e (i.e. is +8.7 to +9.42e) than we obtained using PROPKA (+6e). Since there is no information about the Subs charge from experimental measurements, we cannot validate the protonation schemes used in either approach. Moreover, fits of the Hamaker constant for each protein charge and radius12 complicates the decomposition of PPIs originating from protein–protein dispersion interactions and other interactions (see Section 4.3).
B 22 calculated using FMAPB2, with an all atom protein representation, are shifted far to the negative B22 range, this model deviates far from the B22 range from experiment. This was observed only in the case of Subs. Double data check via repeated FMAPB2 calculations did not result in improved results.
Fig. 7 Comparison of B22 coefficients for BSA at pH 6.2 and IgG1 at pH 5, pH 5.75 and pH 6.5 calculated using xDLVO-CG, xDLVO and FMAPB2 with respect to the reported experimental measurements. |
xDLVO, fails to reproduce the B22 coefficients at salt concentrations higher than 100 mM NaCl (data marked in orange in Fig. 6), even if the total charge of the sphere in xDLVO (−20.06e, see Table S1, ESI†) better reproduces the experimentally reported value (−20.30e88). The xDLVO data depends strongly on the protein radius used, R2, which is known for BSA to be in the range of 3.5–4.1 nm89 as a function of pH and ionic strength. In Fig. 6, xDLVO calculations of B22 for BSA with radius of 3.64 nm90 and 4.01 nm (from 4f5s) are shown. xDLVO calculations with either radius cannot capture the experimental B22 trends.
Human immunoglobulin IgG1, used in the present studies, consists of 644 residues and has a characteristic T-shaped structure (see Fig. 5c). Its shape cannot be represented well by a spherical approach in xDLVO without fitting to the experimental data. Coarse-graining IgG1 by 40 CG beads nicely mimics the shape of this protein. We calculated B22 of IgG1 at pH 5, 5.75 and 6.5 (Fig. 7) using the Hamaker constant of 3kBT, which is in the range of a typical AH values in protein–water–protein systems.92,93
The B22 data obtained are shown in Fig. 7. Both xDLVO-CG and FMAPB2 give comparable results and similar trends for salt concentrations higher than 25 mM, despite the fact that the methodologies are very different. B22 values, calculated by xDLVO-CG, agree well with the experimental data reported by Roberts et al.,94 while B22 coefficients reported by Le Bruin et al.95 are higher and show low degree of salt concentration dependence. Calculated B22 indicate preferred attractive interactions (lower B22) of IgG1 at higher pH values (see panel on the right in Fig. 7). This is caused by the higher protein charge at lower pH, promoting solubilization of monomers in solution.
B 22 coefficients of IgG1, calculated using xDLVO without fitting fail completely to reproduce the experimental data and are totally out of the range of the values obtained by xDLVO-CG and other methods (see Fig. S5, ESI†). This again demonstrates that xDLVO methods, with protein represented as a sphere, are highly dependent on experimental data and loose their ability to accurately model PPIs and the corresponding osmotic second virial coefficients for aspherical proteins without fitting procedures.
Still, the calculated second virial coefficients for IgG1 and BSA in xDLVO-CG deviate more from experiment than for LYZ and Subs. This may result from the higher conformational flexibility of these large proteins and larger deviations in the total charge of the protein, calculated using PROPKA method, in comparison to the experimentally reported data (see Table S1, ESI†). H++ protonation improved the partial charges of BSA, so we used it to parametrize the CG beads, but still these deviations may introduce discrepancies in the electrostatic potential, therefore, the calculated B22. However, it seems that the overall trends and the behaviour at physiological ionic strengths agree better with the available experimental data. Further developments of xDLVO-CG and the inclusion of the solvent accessibility corrections, e.g. as available in the TKSA-MC server96 or GBSA,97 and other solvent effects, modulated by the change of the ionic strength,98 are required to improve the quality of predictions and enable better correlations with the experimental observations.
These discrepancies are presently overcome by fitting the Hamaker constant to the experimental data measured at each desired pH and salt conditions, limiting the detailed understanding of PPIs and to recover the trends according to the Hofmeister series.99,100 According to its original definition, the Hamaker constant describes attractive London-var der Waals interactions that should arise from the interactions between charge-neutral proteins64,65 and should not depend on the ionic strength. The impact of salt ions in promoting attractive PPIs results from the ion–protein dispersion interaction, which, in the most basic form, are insensitive towards the protein type and depend on the properties of salt ions.75
We included these interactions in xDLVO-CG using the model described in Section 3.2.4. In Fig. 8, the B22 coefficients calculated for LYZ (at pH 4.5) and Subs (at pH 5.5), using xDLVO-CG with (solid curves) and without (dotted curves) ion–protein interactions, are depicted. It is clearly seen that the attractive PPIs in LYZ, arising only from the ion type (different hydrated radii accounted in the osmotic attraction potential), are similar (see upper panel in Fig. 8) and cannot explain more pronounced differences to the experimental B22 data. Adding the ion–protein dispersion term in the PMF differentiates the salting out efficiency of ions, induces stronger separation between the data obtained using different salts and better correlation to the experiments. The same effect is seen also for Subs (bottom panel in Fig. 8), showing stronger tendency of SCN− to promote aggregation of proteins, which agrees with the trends reported by the Hofmeister series, where thiocyanate anion is known as a strong salting out agent causing protein precipitation. This is connected to the ion–protein dispersion coefficients, Bi–pr (eqn (20)), of SCN−, which is 2.8 times larger than in the case of Cl−,70 resulting in stronger ion–protein dispersion interactions.75 This interaction also relates to the lower ion hydration propensity of SCN−40 that affects PPIs stronger, resulting in a decrease of the B22 coefficients and more efficient salting out. Both ion hydration and ion–protein dispersion coefficients are modulated by the ion polarizability.73
B 22 coefficients of LYZ at pH 4.5 using NaCl and KCl (data in blue and green in the upper panel in Fig. 8) are similar and differ mostly at higher salt concentrations. The same observation was reported by Kalyuzhnyi et al.,84 which relate it to the small differences in the water binding by Na+ and K+ (dependent also on the polarizability) and generally smaller salting out introduced by cations. At the same time, I− ions show stronger salting out efficiency of LYZ than Cl−, resulting in faster decrease of B22, indicating that our model is consistent with the Hofmeister series trend, i.e. Cl− < I− < SCN−.
With the use of fitted AH, the obtained B22 coefficients are in good agreement with the experiments, however, such adjustments of AH include error compensations arising from the electrostatic, osmotic and ion–protein dispersion terms, as explained in Section 4.3. Moreover, experimental measurements for different proteins at specific solution conditions are required to parameterize the model.
In the present study, we validated xDLVO-CG using Hamaker constants reported previously (Table 1 in Appendix) and we have not changed their values at different salt and pH conditions (two AH used only for Subs). We show good correlations between the calculated and the experimental B22 using this approach. However, we aim to introduce the protein solubility model using standard potential energy functions without designed fitting to the experimental data points. Therefore, we attempted to replace Hamaker dispersion interactions in the total PMF by the Lennard-Jones interactions (eqn (11) and (15)) parameterized according to Arkhipov et al.,59 as described in Section 3.2.2.
Both dispersion potentials for LYZ at pH 7 and Subs at pH 5.5 are depicted in Fig. 9. We see that the strength of the dispersion interactions, introduced by the Hamaker constant, is decreasing faster than in the case of the Lennard-Jones interactions. This results in stronger binding of the proteins in the LJ potential model and a small shift of the calculated second virial coefficients towards negative values (see panel on the right in Fig. 9). Overall, the potential depth and the corresponding COM distance of the most attractive interactions between the proteins are similar in both cases, resulting in B22 coefficients in the range of the reported experimental data. Analysis of the B22 changes as a function of the sampled protein–protein structures indicates higher sensitivity of the PMF with LJ (Fig. S6, ESI†), that arises from the fact that all bead-bead dispersion interactions were parametrized based on the chemical composition of the residues included in a CG bead, and not by uniform values of the Hamaker constant.
Fig. 9 Dispersion potential in PPIs of LYZ (upper panel) and Subs (lower panel) represented by the Hamaker constant (in blue), taken from experimental measurements (see Table 1 in Appendix) and implemented according to eqn (13), and by the Lennard-Jones potential (in red) from the shape-based coarse-grained model (xDLVO-CG(LJ)). The B22 coefficients for LYZ at pH 7 and Subs at pH 5.5 using both potentials, eqn (11), are shown correspondingly on the right panel. |
Larger deviations between the B22 values calculated using the Hamaker constant in xDLVO-CG and xDLVO-CG(LJ) are observed for BSA and IgG1 (Fig. S7, ESI†). It appears that the standard parameterization of the LJ potentials for these large proteins is inadequate to capture PPIs that are measured experimentally. The LJ potential is significantly wider (Fig. S8, ESI†), resulting in overbinding of the proteins and more negative B22, therefore, faster protein precipitation. Such observations are also typical for the PMF calculated by all-atom force fields.31,37,38 However, this is the first attempt, to our knowledge, to replace the Hamaker dispersion interactions in xDLVO models by the LJ potential. More rigorously, the dispersion interactions and the Hamaker constant can be calculated by the continuum Lifshitz theory102 using, for example, McLachlans formulation. We hope that, with better parameterization and further improvements, the predictive power of xDLVO-CG can be further increased.
The developed model was derived from the extended DLVO theory and includes a new term for ion–protein dispersion interactions, which is shown to be relevant when the Hamaker constant is not fitted for every environmental condition. xDLVO-CG was implemented with a shape-based coarse-grained representation to better account for anisotropic protein–protein interactions. Combined, this approach reduces and, in some examples, eliminates the need to fit the experimental B22 data. Such an approach may accelerate the investigations of the protein processing conditions, starting from pharmaceutical up to food industry applications.
The unified CG protein mapping scheme in xDLVO-CG introduces a generally applicable parametrization procedure, which improves transferability of the model among proteins of different shape and physicochemical characteristics. CG bead mapping is free from experimental B22 parameters and builds directly on the all atom structure of the protein. This allows the use of the xDLVO-CG to various proteins without the need of the special model adjustments. In order to account for the anisotropic nature of the interactions, calculations of the osmotic second virial coefficients require the integration of the potential of mean force over many configurations in a model that includes electrostatic, dispersion, osmotic and ion–protein dispersion interactions.
The xDLVO-CG calculations for lysozyme, subtilisin, bovine serum albumin and immunoglobulin type 1 at different solution conditions, show that the model agrees with the experimental measurements of the B22 coefficients of lysozyme in the wide range of pH. The model is transferable to larger, irregular and non-spherical proteins and results in semi-quantitative correlations of the B22 for subtilisin, BSA and IgG1. The observed differences for the latter may result from difficulties to correctly compute the charge of the protein. However, using experimental values for the protein charge, results in a worse fit to the data. Overall, xDLVO-CG demonstrates significant improvements of the B22 modeling in comparison with the regular xDLVO models and it does not require fitting to experimental B22 values.
We believe that with further implementations in the PMF and the introduction of better dispersion terms, more accurate models for electrostatic and solvent effects, xDLVO-CG has the potential to better predict B22 coefficients of different proteins without time-consuming experiments. The developed computational scheme can be adjusted also for the calculation of the aqueous stability of other colloidal particles. Indeed, special CG parameterization scheme should be proposed as the first.
Fig. 10 The change of the interaction potentials, accounted in xDLVO-CG, with the increase of the salt concentration of the LYZ at pH 4.5. |
Fig. 11 The change of the interaction potentials, accounted in xDLVO-CG, with the increase of the salt concentration of the LYZ at pH 7.0. |
Literature reported values of the parameters used to calculate the dispersion, osmotic and ion–protein dispersion potentials are listed in Tables 1 and 2.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1cp01573g |
‡ The author passed away before the paper was completed. |
This journal is © the Owner Societies 2021 |