Vaibhav
Bihani‡
a,
Sajid
Mannan‡
a,
Utkarsh
Pratiush‡
a,
Tao
Du
b,
Zhimin
Chen
b,
Santiago
Miret
c,
Matthieu
Micoulaut
d,
Morten M.
Smedskjaer
b,
Sayan
Ranu
*a and
N. M. Anoop
Krishnan
*a
aIndian Institute of Technology Delhi, Hauz Khas, 110016, New Delhi, India. E-mail: vaibhav.bihani525@gmail.com; cez218288@civil.iitd.ac.in; utkarshp1161@gmail.com; sayanranu@cse.iitd.ac.in; krishnan@iitd.ac.in
bAalborg University, 9220 Aalborg, Denmark. E-mail: dutao220@gmail.com; zhiminc@bio.aau.dk; mos@bio.aau.dk
cIntel Labs, Santa Clara, CA, USA. E-mail: santiago.miret@intel.com
dSorbonne University, Paris, France. E-mail: matthieu.micoulaut@gmail.com
First published on 4th March 2024
Equivariant graph neural network force fields (EGraFFs) have shown great promise in modelling complex interactions in atomic systems by exploiting the graphs' inherent symmetries. Recent works have led to a surge in the development of novel architectures that incorporate equivariance-based inductive biases alongside architectural innovations like graph transformers and message passing to model atomic interactions. However, a thorough evaluation of these deploying EGraFFs for the downstream task of real-world atomistic simulations is lacking. To this end, here we perform a systematic benchmarking of 6 EGraFF algorithms (NequIP, Allegro, BOTNet, MACE, Equiformer, TorchMDNet), with the aim of understanding their capabilities and limitations for realistic atomistic simulations. In addition to our thorough evaluation and analysis of eight existing datasets based on the benchmarking literature, we release two new benchmark datasets, propose four new metrics, and three challenging tasks. The new datasets and tasks evaluate the performance of EGraFF on out-of-distribution data, in terms of different crystal structures, temperatures, and new molecules. Interestingly, evaluation of the EGraFF models based on dynamic simulations reveals that having a lower error on energy or force does not guarantee stable or reliable simulation or faithful replication of the atomic structures. Moreover, we find that no model clearly outperforms other models on all datasets and tasks. Importantly, we show that the performance of all the models on out-of-distribution datasets is unreliable, pointing to the need for the development of a foundation model for force fields that can be used in real-world simulations. In summary, this work establishes a rigorous framework for evaluating machine learning force fields in the context of atomic simulations and points to open research challenges within this domain.
Recent work has shown that GNN force fields can be further enhanced and made data-efficient by enforcing additional inductive biases, in terms of equivariance, leveraging the underlying symmetry of the atomic structures. This family of GNNs, hereafter referred to as equivariant graph neural network force fields (EGraFFs), have demonstrated their capability to model symmetries inherent in atomic systems, resulting in superior performance in comparison to other machine-learned force fields. This is achieved by explicitly accounting for symmetry operations, such as rotations and translations, and ensuring that the learned representations in EGraFFs are consistent under these transformations.
Traditionally, EGraFFs are trained on the forces and energies based on first principles simulation data, such as density functional theory. Recent work has shown that low training or test error does not guarantee the performance of the EGraFFs for the downstream task involving atomistic or molecular dynamics (MD) simulations.12 Specifically, EGraFFs can suffer from several major issues such as (i) unstable trajectory (the simulation suddenly explodes/becomes unstable due to high local forces), (ii) poor structure (the structure of the atomic system including the coordination, bond angles, and bond lengths is not captured properly), (iii) poor generalization to out-of-distribution datasets including simulations at different temperatures or pressures of the same system, simulations of different structures having the same chemical composition—for example, crystalline (ordered) and glassy (disordered) states of the same system, or simulations of different compositions having the same chemical components—for example, Li4P2S6 and Li7P3S11. Note that these are realistic tasks for which a force field that is well-trained on one system can generalize to other similar systems. As such, an extensive evaluation and comparison of EGraFFs is needed, which requires standardized datasets, well-defined metrics, and comprehensive benchmarking, that capture the diversity and complexity of atomic systems.
An initial effort to capture the performance of machine-learned force fields was made.12 In this work, the authors focused on existing datasets and some metrics, such as radial distribution functions and diffusion constants of atomic systems. However, the work did not cover the wide range of EGraFFs that has been newly proposed, many of which have shown superior performance in common tasks. Moreover, the metrics in ref. 12 were limited to stability, mean absolute error of forces radial distribution function, and diffusivity. While useful, these metrics either do not capture the variations during the dynamic simulation (e.g., how the force or energy error evolves during simulation) or require long simulations (such as diffusion constants, which requires many steps to reach the diffusive regime). Further, the work does not propose any novel tasks that can serve as a benchmark for the community developing new force fields.
With the increasing interest in EGraFFs for atomic simulations, we aim to address the gap in benchmarking by performing a rigorous evaluation of the quality of simulations obtained using modern EGraFF force fields. To this extent, we evaluate 6 EGraFFs on 10 datasets, including two new challenging datasets that we contribute, and propose new metrics based on real-world simulations. By employing a diverse set of atomic systems and benchmarking metrics, we aim to objectively and rigorously assess the capabilities and limitations of EGraFFs. The main contributions of this research paper are as follows:
• EGraFFs: We present a benchmarking package to evaluate 6 EGraFFs for atomistic simulations. As a byproduct of this benchmarking study, we release a well-curated codebase of the prominent equivariant GNN force fields in the literature enabling easier and streamlined access to relevant modeling pipelines https://github.com/M3RG-IITD/MDBENCHGNN.
• Challenging benchmark datasets: We present 10 datasets, including two new datasets, namely GeTe and LiPS20. The datasets cover a wide range of atomic systems, from small molecules to bulk systems. The datasets capture several scenarios, such as compounds with the same elements but different chemical compositions, the same composition with different crystal structures, and the same structure at different temperatures. This includes complex scenarios such as melting trajectories of crystals.
• Challenging downstream tasks: We propose several challenging downstream tasks that evaluate the ability of EGraFFs to model the out-of-distribution datasets described earlier.
• Improved metrics: We propose additional metrics that evaluate the quality of the atomistic simulations regarding the structure and dynamics with respect to the ground truth.
(1) |
Since the exact computation of this potential energy is challenging, they are approximated using empirical force fields that learn the effective potential energy surface as a function of two-, three-, or four-body interactions. In atomistic simulations, these force fields are used to obtain the system's energy. The forces on each particle are then obtained as Fi = −∂U/∂ri. The acceleration of each atom is obtained from these forces as Fi/mi where mi is the mass of each atom. Accordingly, the updated position is computed by numerically integrating the equations of motion using a symplectic integrator. These steps are repeated to study the dynamics of atomic systems.
NequIP,13 based on the tensor field networks, employs a series of self-interaction, convolution, and concatenation with the neighboring atoms. The convolution filter represented as a product of radial basis function R and spherical harmonics Ylm ensures equivariance. This was the first EGraFF proposed for atomistic simulations based on spherical harmonics.
Allegro14 merges the precision of recent equivariant GNNs with stringent locality, without message passing. Its inherent local characteristic enhances its scalability for potentially more extensive systems. In contrast to other models, Allegro predicts the energy as a function of the final edge embedding rather than the node embeddings. All the pairwise energies are summed to obtain the total energy of the system. Allegro features remarkable adaptability to data outside the training distribution, consistently surpassing other force fields in this aspect, especially those employing body-ordered strategies.
BOTNet15 is a refined body-ordered adaptation of NequIP. While maintaining the two-body interactions of NequIP in each layer, it increments the body order by one with every iteration of message passing. Unlike NequIP, BOTNet uses non-linearities in the update step.
MACE16 offers efficient equivariant messages with high body order computation. Due to the augmented body order of the messages, merely two message-passing iterations suffice to attain notable accuracy. This contrasts with the usual five or six iterations observed in other GNNs, rendering MACE both scalable and amenable to parallelization.
TorchMDNet17 introduces a transformer-based GNN architecture, utilizing a modified multi-head attention mechanism. This modification expands the traditional dot-product attention to integrate edge data, which can enhance the learning of interatomic interactions.
Equiformer18 is a transformer-based GNN architecture, introducing a new attention mechanism named ‘equivariant graph attention’. This mechanism equips conventional attention used in the transformers with equivariance.
PaiNN10 is a polarizable atom interaction neural network consisting of equivariant message passing architecture that takes into account the varying polarizability of atoms in different chemical environments, allowing for a more realistic representation of molecular behavior.
DimeNeT++19 is a directional message passing neural network where each rotationally equivariant message is associated with a direction in coordinate space.
MD17 is a widely used12,13,15–18 dataset for benchmarking ML force fields. It was proposed by ref. 20 and constitutes a set of small organic molecules, including benzene, toluene, naphthalene, ethanol, uracil, and aspirin, with energy and forces generated by ab initio MD (AIMD) simulations. Here, we select four molecules, namely aspirin, ethanol, naphthalene, and salicylic acid, to cover a range of chemical structures and topology. Further, zero-shot evaluation is performed on benzene. We train the models on 950 configurations and test them on 50.
3BPA contains a large flexible drug-like organic molecule, 3-(benzyloxy)pyridin-2-amine (3BPA), sampled from different temperature MD trajectories.21 It has three consecutive rotatable bonds leading to a complex dihedral potential energy surface with many local minima, making it challenging to approximate using classical or ML force fields. The models can be trained either on 300 K snapshots or on mixed temperature snapshots sampled from 300 K, 600 K, and 1200 K. In the following experiments, we train models on 500 configurations sampled at 300 K and test 1669 configurations sampled at 600 K.
LiPS consists of lithium, phosphorous, and sulfur (Li6.75P3S11), and is used in similar benchmarking analysis,12 as a representative system for the MD simulations to study kinetic properties in materials. Note that LiPS is a crystalline (ordered structure) material that can potentially be used in battery development. We have adopted this dataset from ref. 13 and benchmarked all models for their force and energy errors. The training and testing datasets have 19000 and 1000 configurations, respectively.
Acetylacetone (AcAc): The dataset was generated by conducting MD simulations at both 300 K and 600 K using the Langevin thermostat.15 The uniqueness of this dataset stems from the varying simulation temperatures and the range of sampled dihedral angles. While the training set restricts sampling to dihedral angles below 30°, our models are tested on angles extending up to 180°. The model must effectively generalize on the Potential Energy Surface (PES) for accurate generalization at these higher angles. This challenge presents an excellent opportunity for benchmarking GNNs. The dataset consists of 500 training configurations and 650 testing configurations.
GeTe is a new dataset generated by Car-Parrinello MD (CPMD) simulations22 of Ge and Te atoms, which builds on a density functional theory (DFT) based calculation of the interatomic forces, prior to a classical integration of the equations of motion. It consists of 200 atoms, of which 40 are Ge and 160 are Te, i.e., corresponding to the composition GeTe4 whose structural properties have been investigated in detail and reproduce a certain number of experimental data in the liquid and amorphous phase from neutron/X-ray scattering23,24 and Mössbauer spectroscopy.25 As GeTe belongs to the promising class of phase-change materials,26 it is challenging to simulate using classical force fields because of the increased accessibility in terms of time and size. Thus, an accurate force field is essential to understand the structural changes in GeTe during the crystalline to disordered phase transitions. Here, our dataset consists of 1500 structures in training, 300 in test, and 300 in validation.
LiPS20 is a new dataset generated from AIMD simulations of a series of systems containing Li, P, and S elements, including both the crystalline and disordered structures of elementary substances and compounds, such as Li, P, S, Li2P2S6, β-Li3PS4, γ-Li3PS4, and xLi2S-(100 − x)P2S5 (x = 67, 70, 75, and 80) glasses using the CP2K package.27 Details of dataset generation, structures, and compositions in this dataset are given in ESI Section A.2 and Table S2.†
(2) |
RDF essentially represents the local time-averaged density of atoms at a distance r from a central atom (see ESI Section A.4 and Fig. S1†). Hence, it captures the structure simulated by a force field concisely and one-dimensionally. A force field is considered acceptable if it can provide a WF less than 9% for bulk systems.30
(3) |
(4) |
NequIP | Allegro | BOTNet | MACE | Equiformer | TorchMDNet | PaiNN | DimeNET++ | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
E | F | E | F | E | F | E | F | E | F | E | F | E | F | E | F | |
Acetylacetone | 1.38 | 4.59 | 0.92 | 4.4 | 2.0 | 10.0 | 2.0 | 8.0 | 4.0 | 4.0 | 1.0 | 5.0 | 4.92 | 7.73 | 102.39 | 15.28 |
3BPA | 3.15 | 7.86 | 4.13 | 10.0 | 5.0 | 14.0 | 4.0 | 12.0 | 6.0 | 7.0 | 3.0 | 11.0 | 36.67 | 40.41 | 796.74 | 46.72 |
Aspirin | 6.84 | 13.89 | 5.00 | 9.17 | 7.99 | 14.06 | 8.53 | 14.01 | 6.15 | 15.29 | 5.33 | 8.97 | 41.49 | 12.41 | 133.24 | 22.07 |
Ethanol | 2.67 | 7.49 | 2.34 | 5.01 | 2.60 | 6.80 | 2.36 | 3.19 | 2.66 | 9.73 | 2.67 | 5.93 | 7.77 | 11.81 | 149.55 | 17.19 |
Naphthalene | 5.70 | 6.20 | 5.14 | 2.64 | 6.67 | 6.07 | 6.26 | 1.98 | 3.88 | 7.01 | 2.55 | 4.03 | 10.56 | 4.07 | 175.04 | 19.65 |
Salicylic acid | 5.78 | 8.42 | 5.76 | 6.30 | 5.56 | 10.21 | 5.34 | 4.24 | 5.22 | 12.39 | 6.85 | 7.19 | 24.15 | 11.12 | 169.18 | 25.48 |
LiPS | 165.43 | 5.04 | 31.75 | 2.46 | 28.0 | 13.0 | 30.0 | 15.0 | 83.20 | 51.10 | 67.0 | 61.0 | 128.80 | 112.43 | 55.22 | 42.23 |
LiPS20 | 26.80 | 3.04 | 33.17 | 3.31 | 24.59 | 5.51 | 14.05 | 4.64 | 3274.93 | 57.63 | 20.47 | 57.19 | — | — | — | — |
GeTe | 1780.951 | 244.40 | 1009.4 | 253.45 | 3034.0 | 258.0 | 2670.0 | 247.0 | 666.34 | 363.17 | 2613.0 | 371.0 | 884.28 | 330.05 | 51704.65 | 222.39 |
NequIP | Allegro | BOTNet | MACE | Equiformer | TorchMDNet | PaiNN | DimeNET++ | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
JSD | WF | JSD | WF | JSD | WF | JSD | WF | JSD | WF | JSD | WF | JSD | WF | JSD | WF | |
Acetylacetone | 28.24 | 24.55 | 29.63 | 22.17 | 30.61 | 26.04 | 31.07 | 22.90 | 29.86 | 21.78 | 29.34 | 22.49 | 26.97 | 22.33 | 66.0 | 143.36 |
3BPA | 0.82 | 6.02 | 1.13 | 7.98 | 1.07 | 7.13 | 0.98 | 8.36 | 0.94 | 7.44 | 0.87 | 7.31 | 1.39 | 6.65 | 6.87 | 89.97 |
Aspirin | 0.133 | 30.66 | 0.108 | 23.29 | 0.122 | 27.36 | 0.111 | 18.92 | 0.120 | 23.58 | 0.131 | 23.99 | 0.03 | 4.0 | 0.31 | 167.48 |
Ethanol | 0.526 | 18.34 | 0.450 | 15.89 | 0.360 | 15.57 | 0.494 | 17.93 | 0.549 | 23.48 | 0.464 | 17.70 | 0.78 | 9.42 | 3.75 | 205.75 |
Naphthalene | 0.089 | 20.96 | 0.082 | 19.44 | 0.093 | 24.65 | 0.095 | 22.89 | 0.090 | 26.72 | 0.081 | 19.25 | 0.02 | 2.52 | 0.23 | 130.40 |
Salicylic acid | 0.077 | 16.95 | 0.124 | 27.58 | 0.076 | 14.65 | 0.097 | 19.35 | 0.072 | 14.17 | 0.077 | 16.12 | 0.08 | 7.63 | 0.50 | 208.95 |
LiPS | 0.0 | 3.89 | 0.0 | 3.57 | 0.0 | 3.93 | 0.0 | 3.66 | 0.0 | 1.97 | 0.0 | 1.49 | 0.0 | 0.51 | 0.0 | 28.55 |
LiPS20 | 0.001 | 14.92 | 0.001 | 18.32 | 0.001 | 17.08 | 0.001 | 17.70 | — | — | 0.006 | 41.70 | — | — | — | — |
GeTe | 0.0 | 2.78 | 0.0 | 2.06 | 0.0 | 2.03 | 0.0 | 2.02 | — | — | 0.0 | 2.80 | 0.0 | 2.29 | 0.0 | 16.77 |
NequIP | Allegro | BOTNet | MACE | Equiformer | TorchMDNet | PaiNN | DimeNET++ | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
E | F | E | F | E | F | E | F | E | F | E | F | E | F | E | F | |
Acetylacetone | 0.960 | 0.709 | 0.820 | 0.710 | 0.923 | 0.713 | 0.813 | 0.710 | 0.810 | 0.711 | 0.836 | 0.713 | 2.129 | 0.705 | 1.043 | 0.705 |
(0.361) | (0.042) | (0.275) | (0.041) | (0.331) | (0.041) | (0.275) | (0.041) | (0.276) | (0.043) | (0.282) | (0.042) | (0.457) | (0.042) | (0.353) | (0.041) | |
3BPA | 0.810 | 0.711 | 0.729 | 0.710 | 0.680 | 0.711 | 0.760 | 0.710 | 0.803 | 0.709 | 0.814 | 0.710 | 0.893 | 0.716 | 1.92 | 0.707 |
(0.394) | (0.032) | (0.292) | (0.033) | (0.248) | (0.032) | (0.281) | (0.032) | (0.310) | (0.032) | (0.30) | (0.032) | (0.446) | (0.031) | (0.367) | (0.032) | |
Aspirin | 1.068 | 0.626 | 1.009 | 0.625 | 1.083 | 0.627 | 1.004 | 0.628 | 1.023 | 0.637 | 1.096 | 0.626 | 2.908 | 0.662 | 1.188 | 0.680 |
(0.351) | (0.081) | (0.358) | (0.085) | (0.337) | (0.078) | (0.338) | (0.075) | (0.36) | (0.083) | (0.352) | (0.077) | (0.598) | (0.061) | (0.265) | (0.055) | |
Ethanol | 3.287 | 0.684 | 3.497 | 0.686 | 3.239 | 0.698 | 3.579 | 0.690 | 3.252 | 0.698 | 3.420 | 0.686 | 4.828 | 0.687 | 3.071 | 0.708 |
(1.275) | (0.071) | (1.209) | (0.071) | (1.206) | (0.078) | (1.255) | (0.076) | (1.245) | (0.072) | (1.327) | (0.074) | (1.133) | (0.070) | (0.719) | (0.073) | |
Naphthalene | 2.45 | 0.624 | 2.305 | 0.603 | 2.524 | 0.599 | 2.59 | 0.604 | 2.593 | 0.616 | 2.700 | 0.604 | 4.071 | 0.661 | 1.778 | 0.693 |
(0.685) | (0.073) | (0.688) | (0.062) | (0.644) | (0.063) | (0.663) | (0.072) | (0.675) | (0.075) | (0.688) | (0.070) | (0.839) | (0.061) | (0.520) | (0.059) | |
Salicylic acid | 2.135 | 0.625 | 1.955 | 0.604 | 2.042 | 0.621 | 2.14 | 0.610 | 1.996 | 0.616 | 2.146 | 0.594 | 4.107 | 0.687 | 2.15 | 0.694 |
(0.468) | (0.068) | (0.465) | (0.064) | (0.45) | (0.072) | (0.444) | (0.063) | (0.477) | (0.065) | (0.529) | (0.062) | (0.696) | (0.056) | (36.01) | (0.058) | |
LiPS | 87.52 | 0.711 | 97.64 | 0.710 | 100.07 | 0.712 | 100.30 | 0.765 | 78.93 | 0.718 | 160.60 | 0.712 | 662.431 | 0.705 | 222.94 | 0.699 |
(36.342) | (0.054) | (39.990) | (0.053) | (36.839) | (0.053) | (39.041) | (0.053) | (47.28) | (0.050) | (76.441) | (0.049) | (89.605) | (0.042) | (42.777) | (0.052) | |
LiPS20 | 45.10 | 0.720 | 32.79 | 0.721 | 27.99 | 0.726 | 41.47 | 0.722 | — | — | 15108.75 | 0.834 | — | — | — | — |
(14.206) | (0.043) | (8.09) | (0.040) | (8.201) | (0.039) | (8.613) | (0.039) | — | — | (27106.23) | (0.065) | — | — | — | — | |
GeTe | 495.30 | 0.800 | 294.39 | 0.756 | 351.86 | 0.764 | 352.46 | 0.765 | — | — | 346.44 | 0.779 | 175.928 | 0.77 | 3914.07 | 0.807 |
(36.945) | (0.064) | (23.563) | (0.063) | (27.139) | (0.072) | (27.055) | (0.073) | — | — | (25.362) | (0.060) | (80.01) | (0.052) | (181.98) | (0.081) |
NequIP | Allegro | BOTNet | MACE | Equiformer | TorchMDNet | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
T | I | T | I | T | I | T | I | T | I | T | I | |
Acetylacetone | 0.66 | 3.18 | 0.17 | 1.94 | 0.11 | 1.90 | 0.04 | 2.66 | 0.52 | 9.98 | 0.11 | 1.79 |
3BPA | 1.07 | 7.07 | 1.80 | 4.92 | 0.12 | 4.46 | 0.06 | 4.18 | 0.68 | 19.25 | 0.13 | 4.83 |
Aspirin | 5.23 | 2.93 | 1.61 | 1.68 | 0.21 | 1.76 | 0.14 | 2.45 | 0.85 | 13.04 | 0.15 | 1.41 |
Ethanol | 5.49 | 2.05 | 1.62 | 0.68 | 5.03 | 1.07 | 1.15 | 1.28 | 0.81 | 5.70 | 0.14 | 0.80 |
Naphthalene | 5.26 | 3.75 | 2.11 | 1.07 | 13.47 | 1.27 | 4.728 | 2.28 | 0.85 | 14.67 | 0.14 | 1.37 |
Salicylic acid | 5.24 | 3.30 | 1.61 | 0.87 | 11.68 | 1.26 | 3.858 | 2.29 | 0.82 | 9.79 | 0.14 | 1.17 |
LiPS | 89.91 | 35.83 | 20.89 | 13.91 | 4.82 | 10.29 | 3.61 | 6.52 | 18.51 | 46.34 | 3.18 | 6.95 |
LiPS20 | 2.78 | 25.51 | 0.76 | 11.42 | 0.36 | 15.187 | 0.18 | 6.75 | 1.86 | 56.59 | 0.21 | 5.12 |
GeTe | 7.22 | 105.62 | 4.49 | 220.43 | 2.07 | 78.2 | 0.58 | 26.75 | 9.33 | 143.91 | 1.55 | 21.67 |
NequIP | Allegro | BOTNet | MACE | Equiformer | TorchMDNet | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
E | F | E | F | E | F | E | F | E | F | E | F | ||
Acetylacetone | 300 K | 0.959 | 0.7092 | 0.817 | 0.7110 | 0.924 | 0.7131 | 0.813 | 0.7096 | 0.810 | 0.7113 | 0.836 | 0.7128 |
600 K | 1.806 | 0.7145 | 1.912 | 0.7137 | 1.893 | 0.7140 | 2.215 | 0.7127 | 2.169 | 0.7137 | 1.996 | 0.7120 | |
3BPA | 300 K | 0.809 | 0.7106 | 0.708 | 0.7102 | 0.677 | 0.7109 | 0.759 | 0.7097 | 0.803 | 0.7089 | 0.814 | 0.7097 |
600 K | 1.180 | 0.7095 | 1.603 | 0.7092 | 1.607 | 0.7102 | 1.214 | 0.7087 | 1.319 | 0.7104 | 1.160 | 0.7121 |
NequIP | Allegro | BOTNet | MACE | Equiformer | TorchMDNet | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
JSD | WF | JSD | WF | JSD | WF | JSD | WF | JSD | WF | JSD | WF | ||
Acetylacetone | 300 K | 28.244 | 24.552 | 29.628 | 22.166 | 30.612 | 26.038 | 31.072 | 22.904 | 29.863 | 21.783 | 29.335 | 22.485 |
600 K | 18.868 | 31.480 | 21.068 | 26.178 | 18.332 | 26.620 | 19.295 | 28.708 | 17.938 | 27.414 | 19.054 | 29.626 | |
3BPA | 300 K | 0.821 | 6.024 | 1.130 | 7.986 | 1.069 | 7.129 | 0.976 | 8.358 | 0.923 | 6.991 | 0.874 | 7.309 |
600 K | 0.758 | 6.202 | 0.596 | 5.137 | 0.778 | 5.861 | 0.683 | 5.120 | 1.053 | 6.648 | 0.859 | 6.985 |
NequIP | Allegro | BOTNet | MACE | TorchMDNet | ||
---|---|---|---|---|---|---|
Train structures | E | 45.100 | 32.786 | 27.997 | 41.475 | 15108.747 |
F | 0.719 | 0.721 | 0.726 | 0.722 | 0.834 | |
Crystal structures | E | 108.842 | 197.276 | 27.159 | 50.380 | 40075.532 |
F | 0.717 | 0.720 | 0.726 | 0.722 | 0.886 | |
Test structure | E | 15439.338 | 16803.125 | 117.531 | 99.390 | 59906.813 |
F | 0.763 | 0.766 | 0.729 | 0.723 | 0.902 |
NequIP | Allegro | BOTNet | MACE | TorchMDNet | ||
---|---|---|---|---|---|---|
Train structures | JSD | 0.001 | 0.001 | 0.001 | 0.001 | 0.006 |
WF | 14.920 | 18.318 | 17.076 | 17.697 | 41.703 | |
Crystal structures | JSD | 0.0 | 0.0 | 0.0 | 0.0 | 0.006 |
WF | 7.909 | 8.7305 | 10.525 | 12.661 | 61.201 | |
Test structures | JSD | 0.009 | 0.01 | 0.002 | 0.001 | 0.0159 |
WF | 37.974 | 35.747 | 14.234 | 14.936 | 70.133 |
(1) Dataset matters: there was no single model that performed the best on all the datasets and all the metrics. Thus, the selection of the model depends highly on the nature of the atomic system, whether it is a small molecule or a bulk system, for instance.
(2) Structure is important: Low force or energy error during model development does not guarantee faithful reproduction of the atomic structure. Conversely, models with higher energy or force error may provide reasonable structures. Accordingly, downstream evaluation of atomic structures using structural metrics is important in choosing the appropriate model.
(3) Stability during dynamics: Models exhibiting low energy or force errors during the model development on static configurations do not guarantee low errors during forward simulation. Thus, the energy and force violations during molecular dynamics should be evaluated separately to understand the stability of the simulation.
(4) Out-of-distribution is still challenging: Discovery of novel materials relies on identifying hitherto unknown configurations with low energy. We observe that the models still do not perform reliably on out-of-distribution datasets, leaving an open challenge in materials modeling.
(5) Fast to train and fast on inference: We observe that some models are fast on training, while others are fast on inference. For instance, TorchMDNet is slow to train but fast on inference. While MACE is fast both on training and inference, it does not give the best results in terms of structure or dynamics. Thus, in cases where larger simulations are required, the appropriate model that balances the training/inference time and accuracy may be chosen.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4dd00027g |
‡ Equal contribution. |
This journal is © The Royal Society of Chemistry 2024 |