Tom A.
Young
a,
Tristan
Johnston-Wood
a,
Volker L.
Deringer
*b and
Fernanda
Duarte
*a
aChemistry Research Laboratory, University of Oxford, Mansfield Road, Oxford OX1 3TA, UK. E-mail: fernanda.duartegonzalez@chem.ox.ac.uk
bDepartment of Chemistry, Inorganic Chemistry Laboratory, University of Oxford, Oxford OX1 3QR, UK. E-mail: volker.deringer@chem.ox.ac.uk
First published on 5th July 2021
Predictive molecular simulations require fast, accurate and reactive interatomic potentials. Machine learning offers a promising approach to construct such potentials by fitting energies and forces to high-level quantum-mechanical data, but doing so typically requires considerable human intervention and data volume. Here we show that, by leveraging hierarchical and active learning, accurate Gaussian Approximation Potential (GAP) models can be developed for diverse chemical systems in an autonomous manner, requiring only hundreds to a few thousand energy and gradient evaluations on a reference potential-energy surface. The approach uses separate intra- and inter-molecular fits and employs a prospective error metric to assess the accuracy of the potentials. We demonstrate applications to a range of molecular systems with relevance to computational organic chemistry: ranging from bulk solvents, a solvated metal ion and a metallocage onwards to chemical reactivity, including a bifurcating Diels–Alder reaction in the gas phase and non-equilibrium dynamics (a model SN2 reaction) in explicit solvent. The method provides a route to routinely generating machine-learned force fields for reactive molecular systems.
Empirical interatomic potentials (force fields), in combination with molecular dynamics (MD) or Monte Carlo (MC) simulations, have been widely used to sample the potential-energy surface (PES). However, they are limited in accuracy and transferability.2 Moreover, most of these potentials are parameterised for isolated entities with fixed connectivity and thus unable to describe bond breaking/forming processes. In contrast, ab initio methods provide an accurate description of the PES, which is particularly critical for reactions in solution. However, because of their high computational cost and unfavourable scaling behaviour, they are limited to a few hundred atoms and simulation times of picoseconds in ab initio molecular dynamics (AIMD) at the DFT level, and practically impossible at the computational ‘gold-standard’ [CCSD(T)].3
Machine learning (ML) approaches have the potential to revolutionise force-field based simulations, aiming to provide the best of both worlds,4–6 and have indeed begun to provide new insights into a range of challenging research problems.7–16 The development of an ML potential applicable to the whole periodic table mapping nuclear coordinates to total energies and forces is, however, precluded by the curse of dimensionality. Within small chemical subspaces, models can be achieved using neural networks (NNs),6,17–21 kernel-based methods such as the Gaussian Approximation Potential (GAP) framework22,23 or gradient-domain machine learning (GDML),24 and linear fitting with properly chosen basis functions,25,26 each with different data requirements and transferability.27 GAPs have been used to study a range of elemental,28–30 multicomponent inorganic,31,32 gas-phase organic molecular,13,33 and more recently condensed-phase systems, such as methane34 and phosphorus.35 These potentials, while accurate, have required considerable computational effort and human oversight. Indeed, condensed-phase NN36,37 and GAP fitting approaches typically require several thousand reference (“ground truth”) evaluations.
Active learning (AL), where new training data is added based on the current state of the potential, has been used for generating databases and accelerating the fitting process.31,38–42 Notable examples in materials modelling include an early demonstration of a “query-by-committee” approach in fitting a high-dimensional NN potential for elemental copper,39 the fitting of Moment Tensor Potential26 models43 to predict elemental crystal structures38 and multicomponent alloys,40 and the deep potential generator (DP-GEN)44,45 that provides an interface to deep NN potential models for materials.46 AL schemes have also been combined with GP based force fields including GAP,47 and included within a first-principles MD implementation such that it allows the “on the fly” fitting of force fields for a specific simulation system.48,49
Efficient approaches to generate reactive ML potentials become even more important when exploring chemical reactions in molecular systems, which often require a description at a computational level beyond DFT, and therefore require reference data at the same level. Very recently, AL approaches have started to be adopted for fitting reactive potentials for organic molecules based on single point evaluations at quantum-chemical levels of theory. Notable examples include the modelling of gas-phase pericyclic reactions,12 the exploration of reactivity during methane combustion,50 and the decomposition of urea in water.41
In the present work – with a view to developing potentials to simulate solution phase reactions – we consider bulk water as a test case and develop a strategy which requires just hundreds of total ground truth evaluations and no a priori knowledge of the system, apart from the molecular composition. We show how this methodology is directly transferable to different chemical systems in the gas phase as well as in implicit and explicit solvent, focusing on the applicability to a range of scenarios that are relevant in computational chemistry.
Using a train/test set split with high structural similarity between the two sets can lead to highly misleadingly accuracy whenever the potential is to be taken outside the training region in computational practice. For example, splitting an AIMD trajectory of water into a training and test set with an odd/even frame split (50:
50) and training a simple GAP model yields an energy error on the order of 1 kcal mol−1 (Fig. S1a†). However, simulations with this potential in the same configuration space sample unphysical configurations within 10 fs (Fig. S1b†), making an RMSE over a priori test data an insufficient metric in quantifying the quality of a potential.
Considering that single-point reference energy evaluations are reasonably cheap, a ‘prospective’ validation scheme is possible, where the error metric operates in the configuration space sampled in a simulation. With this in mind, we propose a temporal cumulative error metric (τacc, eqn (1)), defined as the time required for the cumulative error (absolute difference between true (E0) and predicted (EGAP)) to exceed a given threshold (ET); the larger τacc, the more robust the potential. Note that the time for which a potential is stable in MD can far exceed τacc, as shown in the following. Here only errors above a lower-bound threshold value (El) contribute to the cumulative error. The lower threshold is required to account for the residual error that is due to the finite radial cut-off of the model. In the following we take ET to be 10 times El, but it may be adjusted depending on the simulation context.
![]() | (1) |
This metric has several advantages in that (a) it ensures that a potential with high accuracy will result in stable dynamics; (b) it allows the user to specify the level of accepted error according to the quality of the training method, thus not penalising where the error is within the difference between the ground truth and the true PES (i.e. a larger threshold may be suitable for a less accurate reference method); (c) it penalises large errors, even if they only occur for single configurations, which is important as such errors may lead to instabilities in the ML-driven MD trajectory and (d) it enables a quoted accuracy to include regions that may not be accessible to direct evaluation at the ground-truth level (e.g. long-time behaviour). Overall, this metric depends on the lower bound and total error, interval between evaluations, and the simulation on which it is evaluated; so while not unique, it is – crucially – prospective. We found this metric to be essential in developing an efficient training strategy and accurate potentials for bulk water (Fig. 1).
![]() | ||
Fig. 1 Active learning of machine-learning potentials for liquid water. (a) Schematic of the active learning loop implemented for fitting GAP models, where the GAP-MD exploration is run for n3 + 2 femtoseconds, where n (the number of evaluations) is incremented after each time the error is evaluated. (b) Schematic illustrating the separation into inter- and intra-molecular terms (I + I) for a bulk water system; these are described by separate GAP models (here, using the same method to obtain the reference data), and then added to give the combined prediction for energies, E, and forces, F. (c) Learning curves for a bulk water GAP model using different training strategies. τacc with El = 0.1 eV, ET = 1 eV, 10 fs interval, 300 K, from the same random minimised configuration of 10 waters in a 7 Å cubic box. Error bars quoted as the standard errors in the mean from 5 independent repeats. The horizontal axis denotes the number of evaluations in training data generation. See Tables S1 and S2† for detailed methods. DFTB(3ob) ground truth. Minimum τacc is shown as 0.1 fs to enable plotting on a log scale. (d) Water monomer model training performance as characterised by τacc and RMSE over the full 3D PES; see Fig. S2† for details. |
We initially employed training strategies found to work well in elemental materials by, for example, fitting a combined potential with two- and three-body GAPs. However, this approach was found to be detrimental to the potential's stability. This can be understood considering a water dimer (HO–Hc⋯OH2); here, a two-body description that treats the two O–Hc interactions on the same footing is a poor approximation, in view of the different order of magnitude between the interactions at their respective minima (Fig. S3†). Therefore, we decided to proceed employing a smooth overlap of atomic positions58 (SOAP) descriptor for an exclusively many-body description of atomic environments (with the exception of AL-I + I, which uses 2 + 3 body for the intramolecular component as discussed later).
We also explored different approaches to generate the database and their influence on the generated potential. An emerging approach to generate training data for elemental GAPs is to initialise the database with randomised configurations (with reasonable constraints, as in ab initio random structure searching59), and to gradually explore configuration space with evolving versions of the potential (see, e.g., ref. 60). However, randomly placing water molecules does not in itself afford a stable potential. A similar result is observed when the most diverse configurations are selected using the CUR algorithm60,61 (Fig. S4†) or when applying intramolecular displacements, following minimisation (Fig. S4†). Selecting frames from classical MD simulations at temperatures of 100–1000 K was also found to be an ineffective strategy (Fig. S5†), reaching τacc of only a few fs (“MM-MD”, Fig. 1c). This is in line with the results reported in ref. 13. Note that this is not because the GAP cannot fit reference energies and forces from MM configurations (Fig. S6†), but because of a poor configuration space overlap with the ground truth PES (Fig. S7†). Selecting configurations from an AIMD simulation at 300 K (AIMD, Fig. 1c) was an improvement over training on random and MM-generated configurations, with τacc ∼ 10 fs. However, by adding additional AIMD configurations the increase in accuracy saturates quickly even if those are obtained at higher temperatures (Fig. S8†). Using AIMD configurations can also involve a significant cost (requiring thousands of evaluations). Finally, active learning from only a few randomly generated configurations provides a modest uplift in accuracy (AL, Fig. 1c), with accuracy on-par with GAP trained on AIMD configurations at a third of the required reference data.
Only when the relevant length and energy scales of the system are decomposed by treating intra- and inter-molecular components separately (Fig. 1b) a potential that is stable for picoseconds is obtained (AL-I + I, Fig. 1c). We note that this approach is related to the hierarchical fitting of GAPs34,63 and related ML models33,64–66 using different levels of computational approaches, and the decomposition strategy by Wengert and co-workers.67 In the present work, we employ the same ground-truth method throughout rather than combining different levels of theory for the input data, but as in prior studies we describe the stronger (e.g., covalent) and weaker intermolecular terms with separate fits that are afterwards combined to give the final model. The intramolecular GAP for water contains only 2- and 3-body terms and the training data are chosen using an evenly spaced grid over the full 3-atom PES (8 × 8 × 8 grid points in rOH and rHH, ∼0.1 Å spacing, Fig. S9†). Energy and force evaluations of this potential are a simple sum of intra- and inter-molecular terms, but require the former to be evaluated in an expanded simulation box to ensure no non-bonded hydrogen atoms are present within the cut-off radius of the 2- and 3-body descriptors on oxygen (Fig. S10†). Here the intramolecular PES is fairly low-dimensional, so a full and reasonably dense grid is available, which in turn allows us to define an error measure over the whole PES, where we find the error to be inversely correlated with τacc (Fig. 1d). Using an acceptable error of 0.2 kcal mol−1 per H2O molecule for a description of bulk water, which is similar to that achieved in a recent NN fit of water,36 we find that this potential affords τacc > 10 ps with just a few hundred ground truth evaluations (AL-I + I, Fig. 1c). To put this value in context, we measured τacc for the fully reactive water NN of Cheng et al.,36 which was trained on ∼7000 reference configurations (DFTB energy/forces) and has shown to provide a highly accurate water model over multiple states. For this state-of-the-art ML potential, a τacc value of 7.6 ± 0.7 ps for liquid at 300 K is obtained, comparable to the one obtained for the new AL-based potentials of the present work (>10 ps). Of course, direct comparison requires caution because the two potentials are different in scope: the NN potential employs a large reference database to develop a general water model, whereas the present study targets robust potentials for liquid water with minimal computational effort, in turn allowing the user to apply similar approaches to other chemical systems (as will be shown below).
The model fitted using our approach (AL-I + I) yields radial distribution functions (RDFs) in good agreement with the ground-truth method, initially chosen to be DFTB, both considering the location and intensities of the peaks corresponding to the first and second coordination shells (Fig. 2a–c). This is despite the relatively short-range atomic cut-offs (3 Å, O only) used. Only in the O–O pair RDF there is a slight deviation from the DFTB ground truth, precisely where the potential is zero outside the 3 Å cut-off radius of the SOAP descriptor. Interestingly, for a DFT-quality GAP simply re-evaluating energies and forces on DFTB-derived active-learnt configurations is insufficient, with the DFTB configurations being high in energy at the DFT level (∼5 eV, Fig. S11†). However, applying an active learning strategy with a PBE reference method and a slightly larger 3.5 Å cut-off generates excellent agreement with the AIMD simulation from ref. 62, in only a few hours of total training time (Fig. 2d–f). At this level of theory, the local structure of liquid water is predicted largely correctly, with two distinct peaks in the O–H RDF, corresponding to first and second solvation shells with the largest deviation from the ground truth again in the O–O pair around the descriptor cut-off. The real significance, of course, is in moving to more accurate ground-truth methods, for which a full MD simulation would not be straightforward: indeed, using the same method, a hybrid DFT-quality water model can be generated within a few days, which would be inaccessible with other methods (the generation of the GAP model required ∼5 days on 20 CPU cores, Fig. 2g–i). These results suggest that the training strategy (and hyperparameter selection) presented here is suitable independent of the reference method.
![]() | ||
Fig. 2 Liquid water simulations. Active learning of bulk water models at various levels of theory. Shown here are O–O, O–H, and H–H RDFs from NVT MD simulations of 64 water molecules in a 12.42 Å cubic box, with ground truth (black) and GAP (purple/red) simulations. (a–c) DFTB(3ob params) ground truth, 100 ps, 300 K, rSOAPc(O) = 3 Å. (d–f) DFT(PBE) reference RDF data extracted from ref. 62, 30 ps, 330 K, rSOAPc(O) = 3.5 Å. (g–i) DFT(revPBE0-D3) GAP, 30 ps, 330 K, rSOAPc(O) = 4.0 Å. |
Solvent | SOAP descriptors centred on | N intra | N inter |
---|---|---|---|
Acetonitrile | C, N | 269 ± 12 | 120 ± 60 |
Methanol | C, O | 221 ± 13 | 292 ± 49 |
Acetone | C, O | 566 ± 80 | 359 ± 29 |
Pyridine | C, N | 249 ± 36 | 243 ± 11 |
Ammonia | N | 38 ± 40 | 109 ± 24 |
![]() | ||
Fig. 3 Zn(aq) simulation. Zn–O radial distribution function averaged from 1 ns of cumulative (10 × 100 ps) NVT MD simulations of Zn(II) in aqueous solution at 300 K, with the experimental modal Zn–O distance shown in black. Experimental (X-ray diffraction) Zn–O distances from ref. 69, octahedral first hydration shell. The shaded area denotes the range of experimental second hydration shell (ref. 69 and cited within). GAP trained as those in Table 1 using a PBE/400 eV ground truth, intra-Zn(H2O)6 used a O-centred SOAP rc = 3.0 Å and inter rc = 4.0 Å. |
The accuracy of the local structure compared to experiment in the first and second solvation shell indicates that this partitioning is effective at capturing both strong dative M–OH2 interactions and weaker hydrogen bonding effects. From random points in the configuration space of [Zn(H2O)6]2+ and 20 water molecules (intermolecular distances >1.7 Å, 10 Å cubic box), τacc reached 0.5 ps (El = 0.8 kcal mol−1 per H2O, 20 fs interval). Note this value is far short of the 100 ps simulations performed to generate the RDF and illustrates that a potential may be ‘stable’ and not sample any high energy regions for t ≫ τacc. Here, the potential for the Zn–water cluster was trained on almost 1000 configurations, suggesting that tens of atoms per component may be the upper limit in dimensionality for which a model can be trained within a day.
Taking advantage of the symmetry in the system, a representative fragment containing one full ligand and three pyridine molecules coordinated to a Pd2+ metal ion (68 atoms) was used to fit a GAP for the entire cage (138 atoms) in the gas phase. This potential was trained in a few days (∼1400 CPUh). We used the resulting GAP to perform nanosecond MD simulations on the whole metallocage at 300 K in the gas phase. This simulation took one day and ∼100 CPUh to complete. For comparison, an equivalent AIMD simulation would take around 50 years with the reference level of theory employed here. The flexibility of the system was monitored and compared to the one obtained using classical MD simulations in dichloromethane solvent.70 Compared to classical MD simulations, using helicity as a measure of flexibility, our potential describes the cage as being more rigid; this suggests that the classical potential overestimates the dynamic flexibility (Fig. 4). This difference is expected as the classical potential has no C–CC–C dihedral barrier, which is presumably correctly captured in the GAP. This example illustrates the general applicability of the approach to increasingly complex systems, where the training of a simpler but representative fragment is sufficient to capture the relevant features of the full system.
Interestingly, and unlike our previous attempt to generate a DFT-quality GAP by evaluating energies and forces on DFTB active-learnt configurations, here, uplifting a DFT-level GAP to an accurate wavefunction-level GAP is possible. This method allows coupled cluster-quality energy profile (Fig. 5a) and dynamics to be propagated from the TS with just 55 energy and (numerical) force evaluations at the CCSD(T) level (Fig. 5b and S16†). The resultant GAP is considerably more accurate than the underlying DFT energy profile (dashed, Fig. 5a). Active learning can also be initialised from an association complex and the IRC learned without prior knowledge of the TS. For the exothermic reaction between cyanide and methyl chloride, training from reactants and initialising velocities such that 11 kcal mol−1 (0.5 eV) was present in the breaking bond, the reaction is sampled in the training (Fig. S16c†). Relaxing a nudged elastic band (NEB) using the trained GAP over an interpolated path between reactants and products affords an IRC within chemical accuracy of the true profile (RMSE = 0.9 kcal mol−1, Fig. 5c). In this case, adding a NEB refinement step to the training is essential to adequately sample the product region and reach chemical accuracy in the energy of the product, and therefore in the predicted reaction energy (orange vs. blue lines Fig. 5c). Here, uplifting the GAP to CCSD(T) affords chemical accuracy in the minima and TS regions, although with a more limited accuracy in the region in between (Fig. S16d†); this is likely due to the large differences between the PBE and CCSD(T) surface in that region. Despite this, the uplifted profile is again considerably more accurate than the underlying DFT (Fig. S16d†).
![]() | ||
Fig. 5 Gas-phase SN2 reactive dynamics. Energetics of a mode Cl− + CH3Cl → Cl− + CH3Cl SN2 reaction in the gas phase. (a) Predictions (GAP) and ground-truth (*CCSD(T) ≡ DLPNO-CCSD(T)/ma-def2-TZVPP) energy values on the MP2/ma-def2-TZVPP intrinsic reaction coordinate (IRC). Shaded region bounds the ‘chemically accurate’ (±1 kcal mol−1) region. (b) Parity plots of between GAP predictions and true energies from ten 100 fs GAP-MD trajectories initialised from the TS (300 K). Dark and light grey area bound the ±1 kcal mol−1 and ±2 kcal mol−1 error regions, respectively. (c) IRC for CN− + CH3Cl → Cl− + CH3CN but trained using uphill active learning, with nudged elastic band refinement; see Fig. S16† for additional details. |
![]() | ||
Fig. 6 GAP dynamics on a bifurcating surface. 2D PES (B3LYP/def2-SVP) along the forming bond distances (r1, r2) in the dimerisation of cyclopentadiene. An example of GAP-propagated reactive dynamics (300 K) is shown from TS1 (7N in ref. 77), which leads to reactants (representative trajectory in orange), and from TS1′ which leads to products (a representative trajectory is shown in purple). 3D projection is truncated at 2.5 eV above the minimum for plotting. Interpolated surface used a cubic spline using scipy.interp2d with the raw surfaces shown in Fig. S21.† All trajectories shown in Fig. S18 and S20.† |
Further investigation and generation of the relaxed 2D potential energy surface over the two possible forming C–C bonds (r1, r2) leading to products provided a rather different surface to the one suggested in ref. 77, with a flat portion then an incline as r1, r2 shorten below 2.9 Å, with a steeply exergonic reverse reaction (intrinsic reaction coordinate, IRC, shown in Fig. S19†). As noted by Caramella, following the IRC forwards from TS1 the reaction proceeds to another TS1′ which is similar in energy (ΔE = 2 kcal mol−1). By training a GAP at 500 K and propagating GAP-MD from TS1′ and sampling the area of the PES around a valley-ridge inflection point (VRI), trajectories lead to the expected two products (e.g., purple line, Fig. 6 and S20†). This example demonstrates that with no a priori knowledge, apart from the structure of TS1, the topology of the bifurcating surface can be revealed efficiently using GAP dynamics. This strategy is completely automated, requiring training of a few hours to days, thus providing a promising approach to routinely examine reaction dynamics in organic molecules.
![]() | ||
Fig. 7 Solution-phase SN2 reactive dynamics. (a) True (CPCM(Water)-PBE/def2-SVP, purple) and GAP predicted (orange) relaxed 2D PESs in implicit solvent, zeroed to the transition state energy. (b) Reactive GAP MD trajectories (lines) propagated from the TS, trained on implicitly solvated configurations (CPCM(Water)-PBE/def2-SVP, red) and explicitly solvated configurations (PBE/400 eV, blue). Ground truth implicitly solvated surface (cubic interpolated, 5![]() ![]() |
Adopting an identical strategy to the one employed for condensed phase systems, the intra- and inter-molecular PES dynamics can be propagated from the TS and the effect of explicit solvation interrogated (Fig. 7b). This only requires knowing a priori the gas-phase TS for the training to be complete in explicit water. Interestingly, the behaviour in explicit water (blue, Fig. 7b) differs from the implicit counterpart (red, Fig. 7b). This can be understood considering that in implicit solvent reorganisation is instantaneous, which results in oscillations in the C–Cl bond characteristic of a gas phase reaction. In contrast, the dynamics are more complex in explicit solvent, with a slower transition from the product channel. Additionally, one of the 10 trajectories re-crosses the barrier after 170 fs of simulation (Fig. S22†), where the solvent has not reorganised to accommodate the anionic chloride yet, making the path to products shallower in energy.
The component-wise separation of the system also leads to the possibility of training to a more accurate ab initio surface for the gas phase reaction, in a similar way to QM/MM, but here a ML(A)/ML(B) partition is available where A and B are two different ground truth methods.33 Application of this kind of hierarchical ML potential fitting will be the subject of further work.
![]() | ||
Fig. 8 Example Python input script required to train a bulk water model from scratch at the DFTB level using four CPU cores. |
GAP-MD simulations were performed with ASE78 interfaced to QUIP with the quippy wrapper using the Langevin integrator with 0.5 fs timesteps at 300 K unless otherwise specified. Condensed phase MD simulations were performed with three-dimensional periodic boundary conditions following minimisation and equilibration for at least 20 ps. Initial configurations, CUR61 selection and all learning curves were generated with the gap-train module, which was used to run the automated fitting.79 All active learning was performed using a ‘diff’ strategy, where a configuration is added to the training set if |E0 − EGAP| is larger than a threshold. With system-dependent hyperparameter optimisation, using a threshold on the maximum atomic variance predicted by the Gaussian process (‘gp_var’) can result in accelerated learning (Fig. S25†). CUR selection used SOAPs averaged over atoms in a configuration using the Dscribe80 package (‘inner’ averaging, over entries of the expansion coefficient vector). Intra + inter (I + I) energy and force evaluations used an expansion factor of 10 to ensure no intermolecular atoms were within the intra GAP cut-off. The NumPy81 based implementation introduces a negligible computational overhead for expanding the box (∼0.1 ms per step real time) but requires two GAP calculations on the inter and intra components, currently carried out in serial. All generated potentials, with the exception of the revPBE0-D3 water potential and metallocage, were trained in less than a day on 10 CPU cores. The revPBE0-D3 water potential was constructed without any prior data in 5 days (1 intra + 4 inter) and used 20 CPU cores, while the metallocage fragment was trained for 3 days also on 20 CPU cores. Explicit SN2 reaction dynamics simulations were performed using intra components for H2O and [Cl⋯CH3Cl]−, where the latter, due to the finite atomic cut-off employed, has the correct dissociation behaviour when Cl− and CH3Cl are distant.
Periodic DFTB calculations performed with DFTB+82 using 3ob83 parameters, and molecular equivalents using GFN2-XTB84 in XTB v. 6.2.3. Periodic pure DFT calculations were performed with GPAW85,86 v. 19.8.1 with the PBE87 functional and a 400 eV plane-wave cut-off from a dzp LCAO initial guess at the gamma point. Hybrid periodic DFT calculations with the revPBE088,89 functional combined with the D390 dispersion correction were performed with CP2K.91
Molecular DFT, MP2 and coupled cluster [DLPNO-CCSD(T)] calculations used for training were performed with ORCA92,93 v. 4.2.1 wrapped with autodE94 using PBE87 and PBE089 functionals, (ma)-def2-SVP, def2-TZVP and ma-def2-TZVPP basis sets.95 AIMD calculations at the DFTB level were performed with DFTB+ with 3ob parameters83 and MM simulations were carried out with GROMACS96,97 2019.2 with TIP3P parameters.98
To evaluate τacc on the fully reactive water NN of Cheng et al.,36 the NN was retrained on 7258 configurations from ref. 99, which were re-evaluated at DFTB(3ob) and trained using n2p2100 using the same parameters and symmetry functions.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/d1sc01825f |
This journal is © The Royal Society of Chemistry 2021 |