Robin
Ruff
a,
Patrick
Reiser
ad,
Jan
Stühmer
bc and
Pascal
Friederich
*ad
aInstitute of Theoretical Informatics, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany. E-mail: pascal.friederich@kit.edu
bInstitute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany
cHeidelberg Institute for Theoretical Studies, Schloß-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany
dInstitute of Nanotechnology, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany
First published on 20th February 2024
Graph neural networks (GNNs) have been applied to a large variety of applications in materials science and chemistry. Here, we systematically investigate the graph construction for crystalline (periodic) materials and investigate its impact on the GNN model performance. We propose the asymmetric unit cell as a representation to reduce the number of nodes needed to represent periodic graphs by exploiting all symmetries of the system. Without any loss in accuracy, this substantially reduces the computational cost and thus time needed to train large graph neural networks. For architecture exploration we extend the original Graph Network framework (GN) of Battaglia et al., introducing nested line graphs (Nested Line Graph Network, NLGN) to include more recent architectures. Thereby, with a systematically built GNN architecture based on NLGN blocks, we improve the state-of-the-art results across all tasks within the MatBench benchmark. Further analysis shows that optimized connectivity and deeper message functions are responsible for the improvement. Asymmetric unit cells and connectivity optimization can be generally applied to (crystal) graph networks, while the suggested nested NLGN framework can be used as a template to compare and build more GNN architectures.
Graph convolutional neural networks operate on the (spatial) graph structure to transform node embeddings and have been suggested for semi-supervised node classification.12,13 With the focus on molecular graphs, the message passing framework (MPNN) was proposed by Gilmer et al.14 in order to group and generalize many GNN architectures that update node representations by iteratively exchanging information, or messages, across edges in the graph.15,16 Neural networks designed to predict the potential energy surface of molecules for molecular dynamics simulations,17 such as the continuous-filter convolutional network SchNet,18 can also be interpreted as MPNN graph networks. Significant progress was achieved by incorporating further geometric information such as bond39 and dihedral angles19 into the MPNN scheme. To compute explicit angle information, the graph network has to operate on higher order pathways or connect edge information in a so-called line graph, which sets up a graph on edges L(G), i.e. on top of the underlying graph G.20 A state-of-the-art model that makes use of this principle and reaches good performance for materials is ALIGNN.21 Recently, equivariant graph networks have been introduced,22 which build on irreducible representations of the Euclidean symmetry groups to achieve equivariance under e.g. rotation and translation.23
In most cases, newly introduced GNNs make specific improvements over previous architectures and propose multiple reasonable new design decisions inspired by chemical domain knowledge. While this approach has so far led to a consistent improvement of model accuracy, we choose a more systematic approach inspired by the work of You et al.:31 first, we stake out a new design space based on a general extension of the original graph network (GN) framework,1 which we call nested line graph networks (NLGNs). Then we navigate through that design space to find suitable NLGN architectures.
In this work, we re-evaluate edge selection methods to build a multi-graph as suggested by Xie and Grossman24 and compare their performance on a wide set of parameters. In this context, we introduce the asymmetric unit cell as a representation to further exploit crystal symmetries and effectively reduce the number of edges. Next, we develop a connectivity-optimized crystal graph network (coGN/coNGN) from message passing and line-graph templates which are intentionally kept as general as possible. By optimizing within the generalized family of GNNs, we improve the state-of-the-art results on 6 out of 9 of the MatBench benchmark datasets32 and achieve parity results with the best models on the remaining 3 datasets, making our model the best general model on the entire set of diverse tasks in the benchmark.
Fig. 1 schematically shows a two-dimensional crystal pattern and three different methods for selecting edges between atoms. The k-nearest-neighbors approach (Fig. 1a) depends on the number of neighbors k, but can lead to largely different edge distances when the crystal density varies. The radius-based approach (Fig. 1b) limits the distance between two nodes by a hyperparameter r, but the number of neighbors is unbounded and the method can lead to either disconnected or overly dense graphs if r is chosen inappropriately. The parameter-free Voronoi-based approach (Fig. 1c) leads to an intuitive edge selection where edges are drawn between two atoms if there is a Voronoi cell ridge between them. However, at least in theory, the number of edges and their distances are also unbounded for this approach.
All three edge selection methods have been used previously in the context of crystals and GNNs.24–26,33 But to our knowledge, there is no detailed comparison between the methods and hyperparameters.
To solve this issue, Xie and Grossman24 proposed a multigraph crystal representation (Fig. 2b) introducing periodic/cyclic boundary conditions, which we will refer to as a unit cell graph. In this representation, one node represents all the shift-equivalent atoms of the crystal, which usually results in multiple edges with distance information matching their translated lattice positions. As a consequence, GNNs will always provide equivalent node embedding for periodic atoms, consistent with Bloch's theorem.
The periodicity of the unit cell graph represents translation symmetry. Since crystals often exhibit more symmetries, we propose the asymmetric unit graph representation for crystals, which considers all symmetries of a crystal structure.‡ For more information about the asymmetric unit (ASU) and space groups please see for example ref. 35–38. In this representation, all symmetry-equivalent atoms are represented by a single node in the graph. The example crystal in Fig. 2 exhibits a horizontal reflection symmetry. The two yellow atoms in the crystal unit cell are symmetry-equivalent and therefore merged into one node (with multiplicity two) for the asymmetric unit cell (Fig. 2c). Just like in unit cell graphs, self-loops and multi-edges can occur in asymmetric unit cell graphs.
Since physical target properties in ML tasks are often invariant under E(3) symmetry operations, many GNNs are designed to be E(3)-invariant, but as a consequence yield equal node embeddings for symmetrical atoms in the unit cell graph, leading to redundant computations in the message passing step. The asymmetric unit graph representation can further remove these redundancies and yet maintain the same node embeddings xv.§ However, global readout operations have to be adapted to handle asymmetric unit cells, since atoms can have different symmetry-related multiplicities (the number of symmetry equivalent atoms for each equivalence class). A simple adaptation of the readout function can fix the issue and restore equal results for unit cell graphs and asymmetric unit graphs:
It should be noted that for the ASU representation the structure needs to remain in the same space group and its atoms in their original Wyckoff sites, which means that for inference during e.g. molecular dynamics simulations and structure changes, the ASU has to be reevaluated which is cumbersome and expensive. Nonetheless, GNNs can be trained on ASU structures, which requires one ASU evaluation of the training set, and then used on the normal unit cell representation for inference.
To find the best GNN architecture for a certain task, it is instructive to systematically explore (see e.g. ref. 31) the design space of GNN modules and building blocks. Moreover, a framework has to be chosen on how to define and process GNN modules. The Message Passing Framework by Gilmer et al.,14 for example, has shown that a framework can unify and accelerate the efforts of the scientific community.
For a combinatorial generalization of GNNs, we build upon the graph network (GN) framework of Battaglia et al.1 in this work. In this framework, one GNN layer is described by a GN block, which transforms a generic attributed graph with edge, node, and global graph features via three update functions ϕ and three aggregation functions ρ.
However, many state-of-the-art models such as DimeNet,39 GemNet,19 ALIGNN21 and M3GNet,30 which incorporate many-body interactions between atoms, are only indirectly captured by GNs. Therefore we propose an extension to the framework which we term nested line graph networks (NLGNs). In the following, we introduce NLGNs by first discussing how angle information is incorporated via the line graph concept21 and then explaining the flow of NLGN calculations, before we will show concrete examples of implementations of NLGNs in Section 4.1. Note that the term nested refers here to stacking GNs on the line graph, not to confuse with nesting as introduced by Zhang and Li,40 which describes nesting and pooling GNNs on sub-graphs.
To achieve rotation invariance, popular models such as SchNet18 or MEGNet26 only include scalar distances between two atoms as edge features. However, to incorporate more geometric information, models such as DimeNet39 or ALIGNN21 additionally use E(3)-invariant angle information, represented by combinations of edges (see Fig. 6). In the line graph L(G), which can be constructed uniquely based on G (see Fig. 3 and Harary and Norman41), there is an edge eeij,ejkL(G) for every two incident edges eij, ejk in G. This enables the assignment of angular information between three atoms to the edges of the line graph. The same applies to (generalized) dihedral angles (4-node or 3-edge objects) in the second-order line graph L(L(G)).
Fig. 3 On the left: a graph G and the construction of its line graph L(G). On the right: nested line graph network architecture with GN blocks working on G and containing other GN blocks working on the line graph L(G) as an edge update function ϕE. The line graph is able to process multi-node geometric features, such as angles (see Table 1) |
NLGNs operate on the graph G as well as the line graph L(G) (potentially also L(L(G)) etc.), exploiting the one-to-one mapping of edges in G and nodes in L(G) (see Table 1). Each edge update function ϕE in GN blocks that operate on G can be instantiated as a nested GN (see Fig. 3). A more detailed description of the algorithm can be found in the ESI.† The NLGN framework extends the usual sequential compositionality of GN blocks with a hierarchical/nested compositionality, thereby increasing the expressiveness.42 The composition of simple and well-known building blocks facilitates the implementation and the ease of understanding of the framework. Furthermore, NLGNs generalize existing models such as SchNet,18 DimeNet39 and ALIGNN,21 making them more comparable and extensible (see Fig. 4).
Entity in crystal | G | L(G) | L(L(G)) |
---|---|---|---|
Atoms | Nodes | ||
Bonds | Edges | Nodes | |
Angles | Edges | Nodes | |
Dihedrals | Edges |
Fig. 5 shows the effect of different preprocessing methods for a non-nested GN. Again, results were obtained before the final ordinal hyperparameter optimization, which explains the discrepancy with Table 2. The MAE is plotted as a function of the average degree over the resulting graphs of the log_gvrh dataset for different edge selection methods. The accuracy of GNN models tends to fall for increasing graph connectivity up to an average degree of approximately 30. For a higher average degree the accuracy either gets worse or quickly converges depending on the dataset. Interestingly, for the kNN edge selection the minimum corresponds to k = 24, whereas other works use a value of 12.21,24 The Voronoi-based edge selection results in graphs with an average degree of approximately 12. Adding the area of the Voronoi cell ridge to each edge as an additional edge feature can improve the predictive power of the GNN. We found, however, that this effect is much less pronounced after optimization of ordinal hyperparameters, making the k = 24 nearest-neighbor method the preferred choice for edge selection in our experiments.
Dataset | coGN (ours) | coNGN (ours) | ALIGNN | MODNet | CGCNN | M3GNet | Matformer |
---|---|---|---|---|---|---|---|
a For is_metal the classification metric is likely to change in future versions. | |||||||
e_form ↓ | 17.0 ± 0.3 | 17.8 ± 0.4 | 21.5 ± 0.5 | 44.8 ± 3.9 | 33.7 ± 0.6 | 19.5 ± 0.2 | 21.232 ± 0.302 |
is_metala ↑ | 0.9124 ± 0.0023 | 0.9089 ± 0.0019 | 0.9128 ± 0.0015 | 0.9038 ± 0.0106 | 0.9520 ± 0.0074 | 0.958 ± 0.001 | 0.812 ± 0.05 |
Gap ↓ | 155.9 ± 1.7 | 169.7 ± 3.5 | 186.1 ± 3.0 | 219.9 ± 5.9 | 297.2 ± 3.5 | 183 ± 5 | 187.825 ± 3.817 |
Perovskites ↓ | 26.9 ± 0.8 | 29.0 ± 1.1 | 28.8 ± 0.9 | 90.8 ± 2.8 | 45.2 ± 0.7 | 33 ± 1.0 | 31.514 ± 0.71 |
log_kvrh ↓ | 0.0535 ± 0.0028 | 0.0491 ± 0.0026 | 0.0568 ± 0.0028 | 0.0548 ± 0.0025 | 0.0712 ± 0.0028 | 0.058 ± 0.003 | 0.063 ± 0.0027 |
log_gvrh ↓ | 0.0689 ± 0.0009 | 0.0670 ± 0.0006 | 0.0715 ± 0.0006 | 0.0731 ± 0.0007 | 0.0895 ± 0.0016 | 0.086 ± 0.002 | 0.077 ± 0.0016 |
Dielectric ↓ | 0.3449 ± 0.0871 | 0.2711 ± 0.0714 | 0.5988 ± 0.0833 | 0.312 ± 0.063 | 0.634 ± 0.131 | ||
Phonons ↓ | 28.887 ± 3.284 | 34.2751 ± 2.0781 | 57.7635 ± 12.311 | 34.1 ± 4.5 | 42.526 ± 11.886 | ||
jdft2d ↓ | 43.424 ± 8.949 | 33.192 ± 7.343 | 49.244 ± 11.587 | 50.1 ± 11.9 | 42.827 ± 12.281 |
The observations for non-nested GNs do not apply to NLGNs, when comparing their behavior in Fig. 5. For NLGNs, the minimum test error occurs at a lower average degree of approximately 12 edges per node and is followed by a steeper increase for higher connectivity. Higher-order nesting could potentially lead to further improvement at yet lower connectivity of the base graph, which should be systematically explored in the future.
At the same time, our observations raise the question of a trade-off between nesting and graph connectivity. This trade-off can also be explained from an analytical point of view. The reason for including angle information with NLGNs is shown in the example in Fig. 6. The two geometric graphs are not distinguishable for GNNs from relative distance information alone. Incorporating angles between edges into the GNN architecture increases expressiveness and allows for the distinction of the graphs. Yet, similar enhancements of expressiveness can also be achieved by increasing graph connectivity. In the example, adding the single dashed edge between two red nodes makes the graphs distinguishable for GNNs, without any angle information.
From our experiments, we cannot conclude that NLGNs offer a systematic advantage in accuracy over simple GNs when used on graphs with high connectivity. Inspired by the results of Fig. 5, we optimized an NLGN for less connected Voronoi (+ ridge area) preprocessed graphs and achieved the results displayed in Table 2. Although NLGNs showed the best performance on the specific dataset they have been (hyperparameter) optimized on, they cannot maintain their advantage consistently on all other datasets without re-optimizing. Unfortunately, NLGNs require the construction of line graphs and tend to have significantly more trainable parameters. Consequently, training is more than three times as computationally expensive as for non-nested GNs.
Finally, we were able to demonstrate the effectiveness of densely connected crystal graphs with the coGN model, by surpassing current state-of-the-art models on most of the MatBench datasets as shown in Table 2. The results also support that hyperparameters, which originate from the optimization on the log_gvrh dataset, generalize to other datasets and tasks.
Results of coGN/coNGN on other comparable materials benchmarks like JARVIS51 or OC22,9 which feature structure to property tasks, can be found in the ESI† and yield similar top ranking performance.
We found that for the MatBench datasets, the empirical average reduction factor for the number of nodes (n), edges (m = nL(G)) and line graph edges (mL(G)) is approximately 2.1:||
Approximately the same factor can be observed for the GPU memory footprint during training. Due to parallelization and batching the acceleration of the training runtimes is somewhat smaller and depends on the selected batch size. For batch sizes of 32, 64, and 128, for example, we observed a speedup of 1.2, 1.3, and 1.8, respectively.
Our contribution to the first aspect includes the proposal of the asymmetric unit graph representation, which decreases training time and memory consumption without effects on predictive performance for E(3)-invariant GNNs. To use asymmetric unit graphs with equivariant GNNs some adaptations to message passing and readout operations are required, which will be addressed in future work. We furthermore compared different edge selection methods and discovered that graphs with higher connectivity (compared to previous works) can yield better performance. From this insight, we construct the coGN, which achieves state-of-the-art results on the MatBench benchmark.
To explore the space of GNN architectures, we introduced the nested line graph network (NLGN) framework, which subsumes state-of-the-art GNN models that incorporate angle information in line graph form. Although for the given dataset sizes in the MatBench benchmark we could not find an architecture for nested line graph networks that substantially outperforms graph networks without nesting, we still encourage further research in this direction, specifically into the scaling laws of accuracy as a function of data set size and complexity as well as the depth of line-graph nesting. Future work might overcome the mentioned trade-off between high connectivity and nesting and potentially explore specific nested line graph network architectures, which we did not consider due to hyperparameter space restrictions. Moreover, a practical but systematic comparison between recent equivariant GNs, which are more expressive than invariant GNs, and NLGNs could potentially offer further insight and gradual improvements.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4dd00018h |
‡ The set of symmetries for crystals and thus the asymmetric unit cell representation can be determined automatically based on the normal unit cell.34 |
§ It is in principle also possible to adapt E(3)-equivariant GNNs to asymmetric unit graph representations, by specifying equivariant convolutional layers on the asymmetric unit graph and adapting message passing accordingly. |
¶ https://github.com/aimat-lab/gcnn_keras/tree/v3.0.1/kgcnn/literature/coGN. |
|| The perovskite dataset is a special outlier, where asymmetric graphs are on average not significantly smaller than unit cell graphs. |
This journal is © The Royal Society of Chemistry 2024 |