Josh V.
Vermaas
a,
Loukas
Petridis
b,
John
Ralph
c,
Michael F.
Crowley
*a and
Gregg T.
Beckham
*cd
aBiosciences Center, National Renewable Energy Laboratory, Golden, CO 80401, USA. E-mail: michael.crowley@nrel.gov
bUT/ORNL Center for Molecular Biophysics, Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA
cDepartment of Biochemistry and DOE Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, WI 53726, USA
dNational Bioenergy Center, National Renewable Energy Laboratory, Golden, CO 80401, USA. E-mail: gregg.beckham@nrel.gov
First published on 27th November 2018
Lignin is an abundant aromatic biopolymer within plant cell walls formed through radical coupling chemistry, whose composition and topology can vary greatly depending on the biomass source. Computational modeling provides a complementary approach to traditional experimental techniques to probe lignin interactions, lignin structure, and lignin material properties. However, current modeling approaches are limited based on the subset of lignin chemistries covered by existing lignin force fields. To fill the gap, we developed a comprehensive lignin force field that accounts for more lignin–lignin and lignin–carbohydrate interlinkages than existing lignin force fields, and also greatly expands the lignin monomer chemistries that can be modeled beyond simple alcohols and into the rich mixture of natural lignin varieties. The development of this force field utilizes recent developments in parameterization methodology, and synthesizes them into a workflow that combines target data from multiple molecules simultaneously into a single consistent and comprehensive parameter set. The parameter set represents a significant improvement to alternatives for atomic modeling of diverse lignin topologies, more accurately reproducing experimental observables while also significantly reducing the error relative to quantum calculations. The improved energetics, as well as the rigid adherence to CHARMM parameterization philosophy, enables simulation of lignin within its biological context with greater accuracy than was previously possible. The lignin force field presented here is therefore a crucial first step towards modeling lignin structure across a broad range of environments, including within plant cell walls where lignin is complexed with carbohydrates and deconstructed by bacterial or fungal enzymes, or as it exists within industrial solvent mixtures. Future simulations enabled by this updated lignin force field will thus lead to better chemical and structural understanding of lignin, providing new insight into its role in biomass recalcitrance or probing the potential for lignin to be used within industrial processes.
The potential exists for lignin to be used more extensively in creating industrially useful fuels and chemicals.4,12,13 Furthermore, as lignin is an aromatic heteropolymer formed through radical chemistry,14–16 some industrial products, such as muconate or commercial adhesives, have more direct biosynthetic routes with lignin rather than carbohydrates as a precursor, increasing their yield.17,18 However, the radical synthesis of lignin has significant structural implications; unlike DNA or proteins, whose structure is uniquely determined by sequence, lignin composition and topology is the result of stochastic synthetic processes that differ between plant species,19,20 environmental conditions,21,22 and tissue type.23,24 This means that natural lignin sample sources are highly heterogeneous,25,26 making experimental characterization of specific structure–function relationships difficult. Indeed, much of what we know about native lignin structure comes from destructive methods that cannot easily detect or quantify non-covalent interactions in intact polymers.27,28
Molecular simulation is a natural tool to address these questions, as it has both the spatial and temporal resolution to identify the molecular origins of specific interactions within biomolecules,29,30 including lignin.31,32 Molecular models of lignin have aided our understanding of lignin interaction within industrial lignocellulosic systems,31 its solvation in pretreatment solvents,33–35 and its behavior under pyrolysis.36 These models have been useful to frame the discussion around lignin's role in binding hemicellulose and cellulose together within biomass.31 Detailed modeling can thus drive mechanistic insight into how lignin affects the observed recalcitrance of biomass.37
Creating models at that level of detail depends on an accurate description of atomic-scale interactions. This can be achieved by employing a classical approximation of the underlying quantum mechanical potential energy surface, otherwise known as a force field.39 Previously, such an approximation was created for the three common lignin monomers and the most common lignin units (described by their characteristic interunit linkages).38 Since that time, the force field was expanded by using a general force field (CGenFF)40 to incorporate new linkages as the models demanded. However, these parameters taken by analogy from other similar biomolecules are known to be suboptimal, and reparameterization was warranted based on the internal quality metrics CGenFF produces.41,42 Here, we systematically reparameterize the force field to put all lignin linkages and lignin modifications within a self-consistent framework, including linkages to carbohydrates, using parameters derived from CGenFF as a starting point (Fig. 1). Through combination of these constituent elements, true native-like lignin structures and lignin degradation products can be modeled, as demonstrated in Fig. S1.†
Fig. 1 Chemical structures of all lignin monomers and linkages considered in this study, which expands on the three monomers and common linkages explicitly parameterized in the previous force field.38 We emphasize that some of the chemistries displayed here are not observed in native lignin, but are used to populate appropriate stereochemistries in combined structures, such as in dibenzodioxocin structures demonstrated in Fig. S1,† or are common degradation products. Compounds were included based on a broader suite of known lignin linkages. To aid in subsequent discussion, the ring carbons of the guaiacyl monomer have additionally been numbered as they are used consistently throughout the forcefield. Greek-letter based nomenclature for lignin linkages is used throughout. |
The reparameterization follows the standard CHARMM parameterization methodology, with water interactions used to determine charges and bonded terms optimized against relaxed potential energy scans of internal molecular degrees of freedom.40–44 Since this optimization incorporates target data from all target molecules simultaneously, the charge optimization uses a grouping scheme to create transferable charge groups that are consistent across all lignins. In addition, a new GPU-accelerated evaluation of the bonded objective function was implemented to make the bonded optimization tractable.
The result of this effort is a force field that reproduces the geometries and energies from quantum mechanical calculations more accurately than the generalized CGenFF parameter set. This includes an average 0.2 kcal mol−1 reduction in the mean squared error of the water interaction energies, improved vaporization enthalpies, as well as significant reductions in the energy residuals along a potential energy surface. These improvements in energy do not increase geometrical deviation from quantum mechanical minima, and in fact improve specific features within aqueous and crystalline lignin simulation that were not well reproduced in previous general force fields. These improvements represent a significant advance overall in lignin simulations, enabling direct simulation of most lignin topologies under a unified framework.
(1) |
This energy function includes well known physical constants, geometrical measurements, and harmonic or sinusoidal approximations built-in to classical molecular mechanics models. However, the terms within eqn (1) highlighted by circles or squares are free parameters that must be determined to describe the energetics of molecular poses. The creation of a classical molecular mechanics force field requires a collection of target data, and adjustment of the free parameters such that the parameters chosen reproduce the target data. As the different force fields have different philosophies on what target data to optimize against, and specific adjustments were made to account for the branched structure of lignin, the ESI† provides additional details about the overall parameterization procedure in CHARMM,40,44 and how it compares with force fields for other biopolymers.52–56 The extended ESI† discussion also details the features required for our lignin parameterization workflow that are missing in existing parameterization tools such as the force field toolkit (ffTK),44 ForceBalance,57 the Visual Force Field Derivation Toolkit (VFFDT),58 ForceFit,59 or the general automated atomic model parameterization (GAAMP).60
Dimers in which lignin is linked to a carbohydrate are also new with this force field (Fig. 1, Carbohydrate Linkages). Although these sugar linkages are only explicitly parameterized for five membered rings such as those found in arabinose, the modular nature of the carbohydrate force field70,71 makes creating the appropriate patches for a six membered ring a straightforward exercise in renaming the appropriate atom types, and are included in the topology files provided in the ESI.† Strong experimental evidence shows that lignin links to hemicellulose via ferulate esters, which may themselves be linked together in ways not seen in general lignin linkages.68,69,72 These 8,8-diferulate linkages are highly charged at neutral pH, which can cause problems during parameterization in the convergence of compound–water interaction calculations, so these are treated as being protonated. Other evidence shows alternative linkage topologies may also be possible, although the nomenclature is less well established.68
Natural lignin is a racemic mixture,73,74 rather than demonstrating uniform chirality as in other biological polymers such as proteins or DNA.75 To account for this, molecular geometries were optimized at a MP2/6-31G* level of theory using Gaussian 0976,77 for every possible stereochemical combination of lignin within every compound shown in Fig. 1. This resulted in 199 total optimized geometries at quantum mechanical minima, which are used as the starting points for subsequent calculations within the charge optimization and bonded optimization steps.
For each compound, initial atom typing and parameter determination was carried out through the ParamChem web interface to CGenFF.40 Attempts to split atom types based on the local geometry around each atom were not found to significantly improve the overall quality of the parameters, and so the CGenFF atom types are largely retained. However, aromatic ring carbon atoms with the original CGenFF type CG2R61 were split based on the ring substituents, where aromatic carbons bonded to methoxy groups or alcohols are given their own type (Fig. 2). To distinguish the new atom types from those found in CGenFF, new parameters found in the ESI† insert an “L” into the second position of the atom type. Thus CG2R61 becomes the CLG2R61 type, which was split further into CLG2R6A if bonded to a hydroxyl group, or CLG2R6B if bonded to another oxygen, such as in a methoxy group or ether. These new atom types inherit the Lennard-Jones parameters from the progenitor CGenFF atom type.
Fig. 2 Example of how compounds, in this case coniferyl alcohol and caffeyl alcohol (left), are translated into the graphs used for both neighborhood-based and group-based charge equivalence determinations. In these graphs, the nodes are labeled according to atomic index, and the edges show the bonding topology. The atom typing graphs (middle) are colored according to atom type, where atoms with equivalent atom types are represented by circles of the same color. Thus for coniferyl alcohol, atoms 16 and 18, the carbons of the –ene group, are colored the same, and also share a color with the equivalent carbons in caffeyl alcohol, as well as with any other similar functional groups throughout the test compound set. This also demonstrates the split of the CG2R61 type (orange), which was assigned by CGenFF to be the atom type within most aromatic rings. We create new atom types, colored here in darker and lighter purple, to split from CG2R61 when it is bonded to oxygen-containing compounds. The group assignment (right) coloration is different, in that unique colors only denote individual “groups” (atoms whose charge should sum to an integer) within each molecule. If the subgraphs formed by each group are identical (e.g. atoms 4, 5, and 6 and atoms 7, 8, and 9 within caffeyl alcohol), the group-based charge optimization assigns the same charges on equivalent atoms. More examples of conversion between chemical structure and near-integer sum groups are presented in Fig. S2.† |
1. Two alternative charge group definitions, neighbor-based and group-based (Fig. 2 & S2†). Both use subgraph isomorphism79 to determine equivalent atoms.
2. Water interactions computed at the HF/6-31G* level of theory.80
3. Optimization of bonded and nonbonded objective functions (eqn (S1) and (S2)†) with the L-BFGS-B algorithm.81
4. Structural perturbations to compute bond, angle, and dihedral scans carried out at the MP2/6-31G* level of theory.77
5. Four different approaches to restricting allowed values in the dihedral term Fourier series (Table S1†), reducible to a linear least-squares problem.82
6. Thrust83 and CUDA84 libraries were used to accelerate evaluation of the bonded objective function.
7. Incorporation of force magnitudes at quantum minima into the minimized objective function (eqn (S2)†) through a weighting parameter v.
In addition to target data comparisons, the bonded terms were also evaluated in terms of how far, in geometric space, minimized structures in the newly optimized force field diverged from previously calculated quantum mechanical optimized structures. The classical minimization was carried out in NAMD 2.1285 using the 16 different parameter combinations (4 dihedral sets, 4 different values for v in eqn (S2)†) starting from each of the 199 minimum energy configurations determined quantum mechanically. This is a surrogate metric for overall force field accuracy, as our optimization scheme does not guarantee that the minimum energy configurations determined from quantum level calculations are minimum energy points on the potential energy landscape created by our classical force field (Fig. S3†). On the multidimensional potential energy surface, if there is a net force along these degrees of freedom orthogonal to the quantum mechanical target data scans, the resulting geometry will distort. Minimizing the structures with this new classical potential energy surface informs us as to how influential these orthogonal degrees of freedom would be in typical simulation systems; we used the root mean square deviation (RMSD) between the classical and quantum minimum energy structures as a proxy for overfitting in the optimization.
Beyond minimization, a set of simulations were carried out to determine the enthalpy of vaporization both from the initial CGenFF parameter set as well as the newly optimized set. Due to the paucity of available reference data, these calculations were only carried out for phenol, catechol, guaiacol, and syringol. The enthalpy of vaporization (ΔHvap) can be estimated from the average potential energies from molecular dynamics trajectories in gas and liquid phases ΔHvap = Ug − Ul + kT, where Ug and Ul are the average molecular potential energies in gas and liquid phases, respectively, observed during simulation.86 Thus, each of the compounds were simulated four times, once with CGenFF parameters in the gas phase, once with CGenFF parameters in the liquid phase, and in gas and liquid phases with our newly optimized parameters instead. These 2 ns simulations were carried out in NAMD 2.12 (ref. 85) with 2 fs timesteps and maintained at 298 K through the use of a Langevin thermostat.87 To achieve a liquid rather than a crystalline phase, 500 copies of each compound were put into a box with 60 Å sides using the insert-molecules program from the GROMACS simulation suite,88 and pressure was maintained at 1 atm via the Langevin piston method.89 After 0.4 ns for simulation box equilibration, the last 1.6 ns of simulation were used in the calculation of ΔHvap.
Another comparison to experimental observables comes in the form of crystal simulations. Existing lignin-related compound crystal structures are present in the Cambridge Structural Database.90 We took a set of 20 of these structures (8 monomeric crystals,91–96 11 dimeric crystals,97–106,108 and a trimeric crystal107), and simulate them for 20 ns with both the general CGenFF and the newly developed lignin force field. Chimera109 was used to create a complete unit cell model of each molecule, which was then replicated along each axis using the VMD110 plugin TopoTools such that the minimum dimension of the crystal was at least 50 Å. The simulation was performed using GROMACS 2016.4,88,111 using TopoGromacs112 to facilitate the conversion between input formats. The simulation thermostat was set to the temperature at which the crystal structure was obtained (Tables 1 and 2) using a Nose–Hoover thermostat.113 Other simulation parameters were identical across crystals. Long-range electrostatics was handled using particle mesh Ewald114 with a 1.2 Å grid spacing past the typical 12 Å short-range cutoff and 10 Å switching distance. Bonds to hydrogen were constrained using P-LINCS,115 enabling a 2 fs integration timestep. To allow the triclinic unit cell vectors to change independently, an anisotropic Berendsen barostat116 was used to maintain the pressure at 1 atm. The last 10 ns of simulation was consistently used when measuring changes in crystal structure and density, as well as molecular-level changes in structure.
Structure | Shorthand | CSD code | T (K) | Structure | Shorthand | CSD code | T (K) |
---|---|---|---|---|---|---|---|
G-βO4-G | RABWUM97 | 153 | G-βO4-G | SIPPEM98 | 295 | ||
S-βO4-G | VADDOT99 | 295 | S-βO4-S | SAZHEG100 | 295 | ||
S-βO4-S | FOCGUA101 | 173 | S-βO4-S | IDIKIP102 | 183 | ||
G-ββ-G | INELIW103 | 153 | G-ββ -G | INELIW01104 | 153 | ||
G-ββ-G | FAFXUF105 | 295 | G-β5-G | FUMVUE106 | 295 | ||
Dibenzodioxocin | TUGWAT107 | 193 | G-55-G | UJOGIK108 | 153 |
Like much of the parameterization framework, the analysis leveraged several python libraries, including NumPy,117 SciPy, matplotlib,118 NetworkX,119 as well as VMD for visualization.110 To generate 2-D representations of each molecule, we extensively used Marvin and its molconvert tool, developed by ChemAxon (https://www.chemaxon.com).
These improvements in interaction energy generally required only small changes from the initial point charges provided by CGenFF. The charge changes were bounded by the imposed constraints during the optimization process, in which a maximum change of 0.25 charge units was allowed. This upper bound is almost never reached, with most atomic charge changes being restricted to less than 0.05 charge units (Fig. 4 and Table S2†). The charge changes observed depend on the identity of the atom. Most hydrogen charges were unchanged, with much of the remainder changing by less than 0.02 charge units to reflect the modifications required to make charge groups integer charges. Oxygen atoms, by contrast, tend to show the largest charge changes, with many charges becoming more positive through optimization, counterbalanced by small decreases in carbon charges. In effect, we have lowered the polarization of individual functional groups relative to the starting point. The CHARMM fixed-charge force field is intentionally over polarized to better reproduce structure and energetics in the condensed phase,70,120 which we accounted for by biasing the molecular dipole to be between 20–50% larger in our parameter set than the quantum dipoles in a vacuum, consistent with CHARMM parameterization methodology.40,44
Fig. 4 Cumulative distribution of the difference in charges between the initial charges assigned by CGenFF (qCGenFF) and the new charges assigned through the near-integer sum grouping method implemented here (qopt). This includes breaking down the difference in charge depending on the element of each atom, since most hydrogen charges were explicitly held fixed to values found elsewhere in the CHARMM force field. Most other charges change only modestly, and very few drift as much as they were allowed by the imposed bounds in the optimization. An alternative quantification including the neighbor-based charges is presented in Table S2.† |
A quantitative point of comparison to assess the impact of polarization is to compute an enthalpy of vaporization,70,120 a quantity dependent on the quality of the non-bonded parameters. As evidenced in Table 3, our newly optimized charges yielded enthalpies of vaporization that were uniformly closer to the available experimental reference values,121–123 as is particularly noticeable when a methoxy group is present (as in guaiacol or syringol). One possible explanation is that the adjacent alcohol to the methoxy group in lignin withdraws electrons more strongly relative to the methoxy groups, reducing the polarization of the methoxy groups in native lignin. However, the parameterized molecule from CGenFF that is used for methoxy charge assignment, anisole, has an isolated methoxy. The isolation of this electron withdrawing oxygen increases the magnitude of the partial negative charge, overpolarizing guaiacol and syringol in CGenFF and leading to less accurate vaporization enthalpies (Table 3). This suggests that the reduced polarization of individual functional groups while retaining the overall polarization of the whole molecule is an improvement on the CGenFF defaults, although the conclusion is limited by the availability of comparable reference data. However, given the evidence showing the improvement of the new charge set relative to the initial charges provided by CGenFF in recapitulating lignin–water interactions (Fig. 3 and Table S3†), we think that the improvement is not isolated to just the small organic molecules tested in Table 3, but that the new model more accurately tracks experimental observables more broadly.
Compound | Literature ΔHvap | CGenFF ΔHvap | Opt. ΔHvap |
---|---|---|---|
a For syringol, a heat of vaporization is not available, and a heat of sublimation is used as the reference state instead. Based on the difference between heats of sublimation and vaporization for catechol,121 the heat of vaporization for syringol is likely 20–30 lower than this literature value for sublimation. | |||
Phenol | 59.1 (ref. 122) | 62.1 ± 1.2 | 60.1 ± 0.9 |
Catechol | 70.0 ± 0.7 (ref. 121) | 64.3 ± 1.1 | 68.9 ± 1.1 |
Guaiacol | 62.6 ± 0.5 (ref. 123) | 46.0 ± 1.3 | 59.5 ± 1.3 |
Syringol | 98.4 ± 1.1a (ref. 123) | 53.1 ± 1.3 | 67.3 ± 1.4 |
Fig. 5 Examples of the quantum mechanical and classical potential energy surface for a limited subset of the 2574 bond, angle or dihedral scans used as target data. Each subpanel shows the energy trace for a series of molecular poses corresponding to a relaxed quantum mechanical energy scan along a specific degree of freedom. The quantum mechanical energy trace is drawn in black, the CGenFF energy trace is gray, and the energy trace after optimization is shown in orange, matching the color scheme used for the multi-set optimization shown in Fig. S4.† A molecular image of the compound being scanned in its central pose can be found within each panel, with a black arrow indicating which degree of freedom is being probed by the scan. (A) Shows a typical methoxy rotation, (B) demonstrates an angular change between a lignin and sugar monomer, (C) shows an α-hydroxyl rotation, and (D) shows a rotation around an ester-adjacent bond. A similar figure showing the scans for the alternative dihedral sets is presented as Fig. S4.† |
Subpanels within Fig. 5 highlight general trends within the larger population of potential energy scans. Sometimes, as in Fig. 5A, CGenFF did not have the right weighting between multiplicities to fully recapitulate the underlying quantum mechanical potential energy scan. Another related example is presented in Fig. 5B, where the optimum angle at the bridging oxygen was not originally correct in CGenFF due to poor analogy, but is improved in our optimization. In other cases, the improvements relative to CGenFF were modest, such as in Fig. 5C and D, where the quantum potential energy surface is not perfectly fit by the optimized parameter set. The residuals relative to quantum, although generally smaller than in CGenFF, were on the order of 1 kcal mol−1. Forcing the residuals to zero appears to be impossible given the structure of eqn (1), as even when all dihedral terms in the Fourier sum were nonzero, the overall energy trends were not always preserved (Fig. S4E & S4F†).
Broader analysis showed significant reductions in the quantum mechanical energy residuals from CGenFF to the newly optimized parameters (Table S4†), as the Cauchy distribution of residuals (Fig. S5†) showed significantly less spread away from the mean of zero after optimization. Ideally the distribution would have zero width, with all molecular mechanics energies coincident with quantum energies, but the harmonic and sinusoidal approximations made in the potential energy function (eqn (1)) limit the eventual accuracy of the force field. Typical errors in local potential energy minima were around 0.2 kcal mol−1, with larger errors sometimes exceeding 1 kcal mol−1 for barrier heights (Fig. 5 & S32†). The improved energies also improve local molecular structure, such as when comparing RMSDs relative to a quantum minimized structure (Fig. S6 & S7†).
Name/shorthand | CSD code | ρ Opt ratio | ρ CGenFF ratio | RMSDCOpt (Å) | RMSDCCGenFF (Å) | RMSDMOpt (Å) | RMSDMCGenFF (Å) |
---|---|---|---|---|---|---|---|
Catechol | CATCOL13 | 0.9687(2) | 0.9708(2) | 0.977(8) | 1.336(8) | 0.05(1) | 0.07(2) |
p-Hydroxybenzaldehyde | PHBALD11 | 0.9755(6) | 0.9607(6) | 1.09(2) | 1.26(2) | 0.11(3) | 0.11(3) |
Vanillin | YUHTEA01 | 0.9749(2) | 0.9571(2) | 1.331(8) | 1.426(9) | 0.10(2) | 0.09(2) |
Vanillin | YUHTEA03 | 1.061(1) | 1.0451(4) | 1.52(3) | 1.61(1) | 0.14(4) | 0.13(3) |
Syringaldehyde | IZALAW | 0.9491(4) | 0.9214(3) | 8.92(1) | 8.40(1) | 0.19(4) | 0.18(4) |
Coniferaldehyde | SIPKEH | 0.9913(6) | 0.9867(5) | 3.16(2) | 3.14(2) | 0.5(1) | 0.3(1) |
Vanillic acid | CEHGUS | 0.9360(4) | 0.9348(5) | 1.01(1) | 1.06(1) | 0.15(4) | 0.15(4) |
Ferulic acid | GASVOL01 | 0.9557(2) | 0.9468(2) | 0.837(6) | 1.352(7) | 0.11(3) | 0.13(4) |
G-βO4-G | RABWUM | 0.9630(2) | 0.9584(2) | 0.592(7) | 0.631(6) | 0.20(4) | 0.16(4) |
G-βO4-G | SIPPEM | 0.9538(3) | 0.9543(3) | 0.93(1) | 1.100(9) | 0.36(9) | 0.38(5) |
S-βO4-G | VADDOT | 0.9546(3) | 0.9451(4) | 1.15(1) | 1.72(2) | 0.29(7) | 0.30(7) |
S-βO4-S | SAZHEG | 0.9474(4) | 0.9475(4) | 0.95(1) | 0.84(1) | 0.25(6) | 0.27(9) |
S-βO4-S | FOCGUA | 0.9559(1) | 0.9393(2) | 0.917(7) | 0.84(1) | 0.19(4) | 0.17(4) |
S-βO4-S | IDIKIP | 0.9480(2) | 0.9289(3) | 0.84(1) | 1.83(1) | 0.19(4) | 0.23(6) |
G-ββ-G | INELIW | 0.9626(2) | 0.9460(3) | 0.786(9) | 1.42(1) | 0.11(7) | 0.11(7) |
G-ββ-G | INELIW01 | 0.9656(2) | 0.9543(2) | 0.82(2) | 2.322(8) | 0.12(9) | 0.2(2) |
G-ββ-G | FAFXUF | 0.9368(4) | 0.8415(8) | 0.99(1) | 3.97(2) | 0.19(4) | 1.5(3) |
G-β5-G | FUMVUE | 0.9357(3) | 0.9367(3) | 1.01(1) | 1.07(1) | 0.25(7) | 0.23(6) |
Dibenzodioxocin | TUGWAT | 0.9518(4) | 0.9603(3) | 0.88(2) | 1.23(2) | 0.29(4) | 0.26(9) |
G-55-G | UJOGIK | 0.9692(1) | 0.9482(4) | 2.521(4) | 1.77(4) | 0.78(4) | 0.6(2) |
The improvements in crystalline behavior were not the result of intramolecular interactions, as the average structural change observed within individual molecules (RMSDM) was typically small, the expected result for a thermalized crystal (Fig. S8–S27†). Within molecules, the mean differences in RMSDM were typically less than 0.04 Å between CGenFF and optimized force fields (Table 4). Of the remaining four molecules with significant differences in RMSDM, the two force fields are evenly split in performance, with two exhibiting a lower RMSD in CGenFF and two instead showing smaller deviations from the crystal under the optimized force field, although again CGenFF occasionally exhibited much higher RMSDs than is seen in the reverse direction. The high RMSDs for specific molecules are emblematic of the trends shown in Fig. 5. Usually, CGenFF parameters adequately described the underlying potential energy surface of the molecule, resulting in comparable RMSDM values with the newly optimized force field. However, there are internal degrees of freedom that are poorly described by a generic force field, such as in Fig. 5C & D, which can dramatically increase the RMSDM. In particular, the description of the bonds and angles within a β–β linkage improved significantly in the new force field, reducing the molecular RMSDM for these lignin linkages.
Given the good structural agreement between experiment and simulation, we need to consider the possibility of our parameter set being overfit given the target data provided during optimization. The minimal structural deviations observed for our parameterized compounds against quantum-derived optimum structures (Fig. S6 and Table S6†) suggested that the optimum geometries coincide. Additional tests of our force field compared against small molecule crystal structures also indicated that the molecular geometries were in line with a typical all-atom force field (Table 4). This analysis suggests that the overfitting scenario sketched out in Fig. S3† was avoided in force field development. We conclude that the developed force field for lignin should be applicable to general lignin polymers.
The extensive parameterization carried out here offers a number of improvements over a general force field. The new parameters better recapitulated experimental enthalpy of vaporizations (Table 3), improved the fit against the scanned potential energy surfaces (Fig. 5), and better mimicked structures seen in small crystals of lignin analogs (Table 4). These improvements came with only minimally increasing the number of free parameters relative to the original general force field, specifically splitting up aromatic carbon parameters depending on the bonded functional groups to reproduce the different angles seen in minimum energy structures (Fig. S28†), and adding selected dihedral terms to fit specific potential energy scans (Fig. S4A and S4B†). The improved energetics did not come at the cost of structure, with minimum energy configurations that differed from quantum calculations just as was observed for the general force field (Table S6†), suggesting that the parameters are not overfitted to the target data, and should be broadly applicable to native lignins.
There are innovations in the parameterization process that can be applied to other systems. The GPU-accelerated bonded optimization procedure is generically useful to any parameterization effort of classical force fields, allowing a quick assessment of how each term contributes to the overall quality of the optimized parameters, and how individual parameters should change to improve the global fit. Our attempts to include force information into the bonded optimization process ultimately did not improve the energetics or structure of the generated parameter set. However, with the machinery now in place to include that as part of the objective function and within the optimization workflow, we are confident that in the future others can incorporate forces into their own workflows and possibly eliminate the time-consuming potential energy scans.
It is also eminently possible that emerging generic force fields based on machine learning124–126 will obviate the need for force fields tailored to specific biopolymers in the future. However, for current ongoing work in modeling lignin within biological or industrial processes, the force field as it stands now significantly expands the set of currently tractable problems, including those featuring complex lignin topologies and interactions between lignin and hemicelluloses that were not explicitly parameterized previously. We envision tools that work in conjunction with this force field to facilitate lignin atomic model construction and enable researchers to visualize their molecules of interest.
Footnote |
† Electronic supplementary information (ESI) available: Ancillary tables and figures related to discussion within the main text, extended discussion of the different optimization strategies tried. We also provide the final topology and parameter files that result from our optimization process suitable for starting simulation, our crystal simulations, and selected scripts and procedures (including the GPU-accelerated objective function evaluator) we think would be useful to the wider research community. These are provided as four separate supporting archives, hosted at https://cscdata.nrel.gov/#/datasets/61b10985-8ed4-412c-9353-1889f200778f, and described further in the ESI. See DOI: 10.1039/C8GC03209B |
This journal is © The Royal Society of Chemistry 2019 |