Àlex Soléab,
Albert Mosella-Montoro
a,
Joan Cardona
b,
Silvia Gómez-Coca
*b,
Daniel Aravena
*c,
Eliseo Ruiz
*b and
Javier Ruiz-Hidalgo
*a
aImage Processing Group – Signal Theory and Communications Department, Universitat Politècnica de Catalunya, Barcelona, Spain. E-mail: j.ruiz@upc.edu
bInorganic and Organic Chemistry Department, Institute of Theoretical and Computational Chemistry, Universitat de Barcelona, Barcelona, Spain. E-mail: silvia.gomez@qi.ub.es; eliseo.ruiz@qi.ub.edu
cMaterials Chemistry Department, Faculty of Chemistry and Biology, Universidad de Santiago de Chile, Santiago, Chile. E-mail: daniel.aravena.p@usach.cl
First published on 24th January 2025
In the diffraction resolution of crystal structures, thermal ellipsoids are a critical parameter that is usually more difficult to determine than atomic positions. These ellipsoids are quantified through Anisotropic Displacement Parameters (ADPs), which provide critical insights into atomic vibrations within crystalline structures. ADPs reflect the thermal behaviour and structural properties of crystal structures. However, traditional methods to compute ADPs are computationally intensive. This paper presents CartNet, a novel graph neural network (GNN) architecture designed to predict properties of crystal structures efficiently by encoding the atomic structural geometry to the Cartesian axes and the temperature of the crystal structure. Additionally, CartNet employs a neighbour equalization technique for message passing to help emphasise the covalent and contact interactions and a novel Cholesky-based head to ensure valid ADP predictions. Furthermore, a rotational SO(3) data augmentation technique has been proposed during the training phase to generalize unseen rotations. To corroborate this procedure, an ADP dataset with over 200000 experimental crystal structures from the Cambridge Structural Database (CSD) has been curated. The model significantly reduces computational costs and outperforms existing previously reported methods for ADP prediction by 10.87%, while demonstrating a 34.77% improvement over the tested theoretical computation methods. Moreover, we have employed CartNet for other already known datasets that included different material properties, such as formation energy, band gap, total energy, energy above the convex hull, bulk moduli, and shear moduli. The proposed architecture outperformed previously reported methods by 7.71% in the JARVIS dataset and 13.16% in the Materials Project dataset, proving CarNet's capability to achieve state-of-the-art results in several tasks. The project website with online demo available at: https://www.ee.ub.edu/cartnet.
ADPs are particularly valuable in crystallography, as they assist in identifying determination issues such as disorder or twinning. Several studies have shown that ADPs can also be utilized to predict thermal motion and translational and vibrational frequencies.1,2 Furthermore, as demonstrated in previous research, thermal properties such as heat capacity (Cv)3,4 and vibrational entropy5,6 are directly linked to ADPs. Additionally, ADPs have been related to the thermal expansion of crystal structures,7,8 making them especially useful for identifying materials with negative thermal expansion. From an experimental point of view, inconsistencies in atomic positions are often easily spotted using chemical intuition when solving a crystal structure, allowing for straightforward visual identification of errors. However, such intuitive assessments are not as straightforward when it comes to thermal ellipsoids, making visual detection of discrepancies more challenging. As a result, some database structures exhibit ellipsoids with seemingly anomalous sizes and orientations. Developing theoretical methods that enable quick evaluation of these ellipsoids can thus serve as valuable tools for facilitating accurate structural determination from diffraction data.
From a theoretical perspective, ADPs can be calculated using periodic electronic structure calculations to obtain vibrational frequencies based on the harmonic approximation.9,10 This method requires the numerical calculation of forces at displaced geometries for all atomic positions in solid-state systems. Thus, these calculations are computationally expensive and time-consuming, often creating a bottleneck in the process. Our contribution is particularly significant in this context, as deep learning methods based on graph neural networks can dramatically reduce the computation time required, providing a more efficient alternative to traditional approaches.
Fig. 1 shows the full pipeline proposed for this work, named CartNet. Our presented network uses a graph representation of the crystal structure and, through a set of learnable encodings and geometrical operations, predicts atomic or material properties.
Moreover, predicting ADPs presents a unique challenge and opportunity in the field of machine learning for crystal structure property prediction. Unlike most tasks typically explored in the literature, such as formation energy, band gap, or total energy, which are unaffected by rotation, predicting ADPs requires models to be sensitive to rotational orientation.
In this paper, we make the following contributions:
(1) ADP dataset: we present a meticulously curated dataset of ADPs with over 200000 experimental crystal structures from the Cambridge Structural Database (CSD).11
The ADPs are both temperature-dependent and rotation-equivariant, offering a robust foundation for exploring the necessity and impact of rotation-equivariant architectures in crystal modelling.
This necessitates the use of architectures that can handle properties dependent on direction, opening the door to designing networks capable of processing rotationally dependent features.
See Section 3 for a detailed description of the dataset.
(2) CartNet architecture: we introduce an architecture, CartNet, capable of accurately predicting material properties based on the geometry of a crystal structure.
The key contributions of this model are as follows:
(a) Geometry and temperature encoding: we propose a feature descriptor that efficiently encodes the complete 3D geometry referenced to the Cartesian axis and adaptively fuses other input attributes such as temperature.
Since the geometry is anchored to the Cartesian reference axes, there is no need to encode the unit cell.
Cell-less encoding enables the accurate prediction of the crystal's ADP orientation regardless of cell size.
(b) Neighbour equalization: we present a neighbour equalization technique designed to address the exponential increase in neighbours over distance, thereby enhancing the model's ability to detect various types of bonds and force interactions between atoms.
This technique improves the model's sensitivity to different interaction ranges, ensuring a more precise representation of atomic environments.
(c) Cholesky head: we introduce an output layer based on Cholesky decomposition, which guarantees that the model produces positive definite matrices, an essential mathematical property requirement for valid ADP predictions.
(3) Rotation SO(3) augmentation: we propose a data augmentation technique for both input features and output ADPs to facilitate the creation of SO(3) rotation-equivariant representations.
This technique enables the model to learn rotational equivariance without requiring specific layers to enforce it, thereby simplifying the overall architecture and reducing the number of trainable parameters.
These contributions allowed our model to outperform previously reported methods by 8.85% in the JARVIS dataset12 and 15.5% in the Materials Project dataset,13 vide infra.
Furthermore, in the ADP dataset examined, CartNet demonstrated a 10.87% improvement over other previously reported methods and a 34.77% improvement over the tested low-level Generalized Gradient Approximation (GGA) Density Functional Theory (DFT) calculations with simple dispersion corrections.
Mathematically, ADPs are represented as a three-dimensional tensor (3 × 3 matrix), which functions as a covariance matrix for a three-dimensional Gaussian distribution. The covariance matrix comprises variance elements along its diagonal for each X, Y and Z axis and covariance elements off the diagonal, illustrating the relationships between the axes. The covariance matrix U for a three-dimensional Gaussian distribution can be expressed using eqn (1).
![]() | (1) |
Fig. 2 shows an example with the graphical representation of the ADPs of a 5,5′-dimethyl-2,2′-bipyrazine unit cell. In the figure, each atom is depicted using the thermal ellipsoid representations of the ADPs obtained experimentally by X-ray diffraction.
ADPs can be calculated theoretically by computing the dynamical matrix of the crystal structure.15,16 The dynamical matrix is a fundamental concept in solid-state physics used to describe the vibrations of atoms in a crystal lattice, known as phonons. It is a mathematical construct that captures how atoms interact with each other when they are slightly displaced from their equilibrium positions. The dynamical matrix D(q) is computed at each point q in the Brillouin zone. The Brillouin zone represents the fundamental region in reciprocal space that contains all the unique wave vectors necessary to describe a crystal structure's physical properties. The eigenvalues of the dynamical matrix represent the phonon's frequencies ων(q), also known as phonon modes.
ADPs can be calculated by integrating all the phonon modes for all the q points using eqn (2).
![]() | (2) |
![]() | (3) |
This method has the limitation that it is extremely time-consuming since it needs a different Density Functional Theory (DFT) calculation for each of the q points. High-performance computing (HPC) clusters are often needed since DFT calculations can be computationally demanding, even for a single crystal structure.
In the case of molecules, this ability to use graphs to model the system is especially useful, since a molecule can be naturally modelled as a graph of atoms. However, how atoms are interconnected by edges in the graph, denoted as neighbourhood, is not trivial and becomes a fundamental step to achieve good representations using GNNs.
While GNNs and message-passing mechanisms had been introduced previously, Gilmer et al.18 popularized a unified message-passing framework specifically for quantum chemistry predictions, leveraging iterative information exchange between nodes (atoms) and edges (bonds) to capture local molecular interactions. This approach laid the groundwork for subsequent models that further enhance predictive accuracy for molecular properties. DimeNet24 and its successor, DimeNet++,25 build a GNN upon this concept by incorporating directional message passing between pairs of interactions, allowing these models to capture angular dependencies and long-range interactions within molecules more effectively than traditional message-passing methods using chemical bonds. This advancement significantly improves the accuracy of predicting molecular properties, such as quantum mechanical characteristics. Notably, both DimeNet and DimeNet++ are rotation-invariant, ensuring that their predictions remain consistent regardless of the molecule's orientation, which is crucial for accurate modelling in diverse molecular environments. MACE26 proposed an equivariant message-passing approach that incorporates advanced three-dimensional geometric information, ensuring rotational and translational symmetries for robust generalization across diverse molecular configurations. By enforcing rotational and translational symmetries, MACE provides robust generalization across diverse molecular configurations. Similarly, TensorNet27 utilizes tensor-factorization techniques to incorporate higher-order interactions efficiently, capturing subtle quantum effects and long-range correlations. These approaches represent a significant evolution in molecular modeling, surpassing earlier methods in both accuracy and computational efficiency.
On the other hand, crystal structures consist of an assembly of atoms, ions or molecules that are ordered in a symmetric way. The formed symmetric pattern is repeated in the three spatial dimensions. We represent these structures using their smallest repeating unit, the unit cell, defined by a 3 × 3 lattice matrix encoding the three vectors that describe its geometry. Similar GNN models have been extensively applied to crystal property prediction, showcasing their adaptability to the unique challenges of materials science. One critical adaptation in this domain involves modifying the neighbourhood definition to account for periodic boundary conditions (PBCs). PBCs are essential for modelling the infinite nature of crystalline materials. They treat the border of the simulation cell as if they were connected to the opposite border, thus creating a continuous, repeating structure. This modification ensures that the model accurately captures interactions across the boundaries of the material.
Matformer21 introduces a transformer-like GNN architecture designed explicitly for material property prediction. It proposes adding self-loops for each atom to encode the cell dimensions and the PBC radius neighbourhood, thereby enhancing the model's ability to represent the material's structure accurately. PotNet22 advances this approach by presenting a GNN with a dual-neighbourhood strategy. In addition to the PBC radius neighbourhood, PotNet approximates the infinite summation of interactions for every pair of atoms in the crystal, providing a more comprehensive representation of the material's properties. Yan et al.23 presented two GNN architectures tailored for this task, the iConformer and eConformer. The iConformer refines the cell to create a unique representation and uses the angle between the cell's axis and the Cartesian vector between atom pairs in the PBC radius neighbourhood to produce an invariant representation. In contrast, the eConformer introduces a rotation-equivariant model using tensor products,28 relying solely on Cartesian information to maintain consistency across different orientations.
Regarding the specific problem of ADPs, current state-of-the-art methods exhibit several limitations. One of them is that most approaches rely solely on the distance between atoms to create a rotationally invariant representation. Since ADPs are expressed relative to the system's Cartesian axes (XYZ), using only distance does not provide the necessary references. Models that depend exclusively on distance are unable to differentiate between the variances along the axes (Var(X), Var(Y), or Var(Z)), often resulting in spherical ellipsoids that fail to capture the true anisotropy of the ADPs.
Another limitation of current state-of-the-art methods arises from the requirement for the lattice matrix. This is particularly problematic because multiple lattice matrices can represent the same crystal structure, as discussed in iComformer.23 Although iComformer proposes a solution by creating a unique representation for each cell, several challenges remain when using this approach. The unique representation proposed by iComformer faces a border case when all three axes of the lattice matrix have the same length (a = b = c). Since iComformer's approach is based on the assumption that a < b < c, when the lattice matrix is in this degenerate case, the model is unable to differentiate between Var(X), Var(Y), or Var(Z), once again resulting in spherical ellipsoids.
Our proposed solution is a cell-less architecture that directly encodes the complete 3D geometry referenced to the Cartesian axes instead of only Euclidean distance-based approaches. This approach avoids the issues of border case scenarios and cell-based overfitting, as it relies entirely on the Cartesian space. Furthermore, it allows the inclusion of PBCs, providing a more robust and accurate representation of anisotropy in ADPs without the need for a lattice matrix.
The ADP dataset was meticulously curated from the CSD's built-in ADP subset, applying several filtering criteria to ensure its quality and reliability. First, only structures possessing 3D coordinates for all atoms and anisotropic thermal displacements for all non-hydrogen atoms were selected. Only non-polymeric crystal structures with only one type of molecule in the unit cell were considered, to avoid dispersion and errors due to solvent molecules and counterions. Structures with an R-factor less than 5% (R < 5%), free from errors and disorders, and site occupancy of 1 for all atoms were included to maintain the dataset's integrity. Structures reported at non-standard pressures were also discarded.
Additionally, temperature/pressure data in the CSD can sometimes be incomplete or missing. For pressure, it was observed that in many cases the pressure was recorded only in the remarks field rather than in the dedicated pressure field, so structures containing any kind of remarks were discarded to avoid extensive text parsing. For temperature, if the CSD Python API did not provide a value, the corresponding CIF files and experimental notes were cross-checked to verify consistency. Structures lacking reliable temperature information were excluded from the dataset.
Several additional filtering criteria were applied to ensure the quality and reliability of the ADPs. While the 3D positions of the hydrogen are saved, their thermal ellipsoids are not included, as these are often isotropic or undefined in the CSD. Structures containing any non-hydrogen atom with negative or zero eigenvalues were excluded. Structures with any ellipsoids with eigenvalue (λ) ratios λmax/λmin < 8 were excluded to avoid poorly defined or flat ellipsoids. The ratio between the ellipsoid volume and the volume of a sphere with the covalent radius,29 Vol/Volcov, was also used as a filtering criterion. Structures were discarded if any ellipsoid had a volume greater than 1.25 Å3 or if Vol/Volcov > 0.35. Furthermore, structures with any ellipsoid exhibiting a volume ratio Vol/Volcov < 10−4 at temperatures above 150 K were also excluded to remove ellipsoids that were either too small or insufficiently defined.
At the end, 208, 042 crystal structures from the CSD met the criteria outlined, resulting in an average number of 194.2 atoms and 105.95 ADPs per crystal structure. Since the ADPs of hydrogen atoms are not considered, the dataset has more atoms than ADPs per crystal.
Fig. 3 illustrates the temperature distribution across the curated dataset. Even though the dataset spans a wide range of temperatures, from 2 K to 573 K, most structures rely on the range from 100 K to 300 K since this range is the most commonly studied by diffractometry. Fig. 4 displays the distribution of the ADP's volumes used for this dataset. The same behaviour applies to the volumes, since the charts' extremes are under-represented in our data. Section 5.2.3 discusses the impact of this imbalance of the data on the prediction performance of the proposed model. Fig. 5 presents a heatmap of the atomic numbers included in this dataset. The dataset encompasses a wide range of atomic numbers, reflecting a diverse set of elements. It excludes some noble gases and most of the radioactive elements, except for some radioactive actinoids. This diversity is crucial for ensuring that the dataset can support the development of generalised models capable of predicting ADPs across a broad spectrum of chemical compositions.
![]() | ||
Fig. 3 Histogram showing the number of crystal structures within each temperature range in the ADP dataset, displayed on a logarithmic scale on the y axis. |
![]() | ||
Fig. 4 Histogram illustrating the number of atoms within each ADP volume range in the ADP dataset, presented on a logarithmic scale on both axes. |
![]() | ||
Fig. 5 Heatmap illustrating the number of atoms per element in the ADP dataset. Lighter colours represent higher counts, while darker colours indicate lower counts. The colour scale is logarithmic. |
In this study, the dataset was randomly split into 162, 270 training, 22, 219 validation, and 23, 553 test crystal structures. To ensure the integrity of these splits, we verified that all atom types, temperature ranges, and volume ranges present in the validation and test sets are also represented in the training set. This precaution was taken to prevent the model from encountering unseen data during validation and testing, which could otherwise lead to an inaccurate assessment of its performance.
Additionally, we ensured that repeated crystal structures with different temperatures or distinct CSD entries were kept together within the same split. This restriction was made to avoid any situation in which the model might be exposed to test or validation samples during the training phase, which could compromise the evaluation by introducing data leakage. By maintaining these strict controls on the dataset splits, we ensured that the training, validation, and test sets were independent. The splits used for this work are also publicly available to facilitate reproducibility.
To construct the graph representation, we employ a radius-based neighbourhood approach using PBCs for all the atoms in the unit cell. Fig. 6 shows an example of how the graph is created for a single atom. Specifically, a cutoff radius (rc = 5 Å) is defined around each central atom, and all atoms within this radius are considered part of the local neighbourhood. The choice of 5 Å is based on the analysis of the intermolecular interaction distances described in previous studies.30 The graph obtained by this neighbourhood captures both short-range, usually covalent bonds, and relevant weaker intermolecular interactions, such as hydrogen bonds, π–π stacking, halogen bonds, cation–π and anion–π, or van der Waals interactions. By including all neighbour atoms within this defined radius, the graph effectively represents each atom's local environment, which is crucial for accurately modelling the system's behaviour.
The atom type is encoded using an embedding31 layer, which generates a feature vector corresponding to each distinct atom type. The temperature is standardised using the training temperature statistics to achieve zero mean and unitary standard deviation. The standardised temperature is passed through a linear layer to ensure dimensional compatibility with the atom-type feature vector. The resulting temperature and atom-type feature vectors are combined, passed through a SiLU32 activation function, followed by another linear layer and another SiLU. Eqn (4) describes the encoding of the atom.
![]() | (4) |
![]() | ||
Fig. 8 Schematic of the edge encoder used in CartNet. The eij and the ![]() |
The edge is defined as the connection between the receiving atom i and the sender atom j. From each edge, the Euclidean distance (dij) and the direction vector (vij) are calculated based on their positions p. Eqn (5) and (6) define the distance and direction vector, respectively.
dij = ‖pj − pi‖ | (5) |
![]() | (6) |
rk(dij) = exp(−β(exp(−dij) − μk)2) | (7) |
Eqn (8) formalizes the concatenation of all rk values into vector rij to encode distances dij in a higher dimensional space.
![]() | (8) |
The director vector, vij, and the RBF-transformed distances, rij, are concatenated and passed through a Multi-Layer Perceptron (MLP) to produce the edge feature vectors. The MLPedge consists of one first linear layer that doubles the existing dimension, a SiLU, another linear layer that returns to the original dimension, and a final SiLU. This distance and direction information combination ensures that the edge encoding captures the geometric relationships necessary for accurate modelling of the geometric structure. Eqn (9) describes the edge encoding process mathematically.
![]() | (9) |
Our gating mechanism considers the sender and receiver atoms and the edge attributes connecting them and processes them through an MLPgate. The MLPgate consists of a linear layer that reduces the dimensions from 3dim to the dim, followed by a SiLU, and another linear layer that does not modify the dimensions. The gating mechanism has a dual purpose: determining the weight of the message and updating the edge attributes. Additionally, our gating mechanism incorporates an envelope function inspired by previous work in molecular systems.27 Eqn (10) describes the gating mechanism.
![]() | (10) |
![]() | (11) |
The envelope function equalizes the influence of edges based on distance. This equalization helps the model to detect the peaks from the distribution, making it easier to identify the different interatomic interactions from our dataset.
Moreover, the envelope function also softens the influence of neighbours near the cutoff distance, which is particularly valuable in noisy situations. It ensures that atoms near the cutoff radius gradually lose influence, preventing them from being considered neighbours based strictly on a hard cutoff radius. Since our dataset is derived from experimental data, noise is expected, and the envelope function provides a robust mechanism to handle such noise effectively. ESI Section S1† provides further information about the envelope.
Our message-passing mechanism constructs messages by processing the concatenated sender and receiver atom information and edge attributes through an MLPmsg. The MLPmsg has the same configuration as that of the MLPgate. Once the message is created, it is weighted using the gate function, which determines the relative importance of each feature within the message vector. The weighted messages are aggregated at the receiving node and passed through a batch normalization layer, followed by a SiLU non-linearity. The gate mechanism also updates the edge features, ensuring that the edge attributes remain consistent with the evolving node representations. To mitigate the issue of gradient vanishing, a skip connection is incorporated into both the node and edge updates, preserving the flow of information through the network layers. Eqn (12)–(14) mathematically describe the message used for message passing and how the atoms and the edges are updated.
![]() | (12) |
![]() | (13) |
![]() | (14) |
The Cholesky decomposition says that any symmetric positive-definite matrix, such as ADPs, can be uniquely decomposed into the product of a lower triangular matrix and its transpose. This decomposition can be expressed with eqn (15).
![]() | (15) |
![]() | (16) |
Based on this mathematical foundation, the feature vector of each node from the final aggregation layer is processed through a MLPhead to produce a feature vector oi with i = 1, …, 6. The MLPhead consists of a linear layer that reduces the dimensions from dim to dim/2, followed by a SiLU, and another linear layer that reduces the dimensions from dim/2 to 6. The first three elements are activated using the softplus function.33 In this context, the softplus activation ensures that the diagonal elements of the matrix L are strictly positive, which is essential for maintaining the positive-definite property of the output matrix. The remaining three elements of the feature vector are used as the lower off-diagonal elements of the matrix L. The construction of the matrix L is as follows, where the first three elements of the feature vector are placed on the diagonal and the remaining elements are placed in the lower triangular part, as can be seen in eqn (17).
![]() | (17) |
Finally, the ellipsoid matrix Upred is obtained by performing a matrix multiplication between L and its transpose, as can be seen in eqn (18).
![]() | (18) |
This construction ensures that the predicted ellipsoid matrix Upredi is always symmetric and positive-definite, which is essential for accurate modelling of ellipsoid matrices.
![]() | (19) |
The situation is particularly nuanced for the ADP dataset because ellipsoids are inherently rotationally equivariant. Therefore, the ellipsoids must rotate consistently with the input data during augmentation. Eqn (20) describes how the rotation is applied to the original Ui to rotate it. The proof of eqn (20) can be found in the ESI Section S2.†
![]() | (20) |
In this study, we compared the results of our method against those of other previously reported methods, including Matformer,21 PotNet,22 eComFormer,23 and iComFormer.23 To ensure a consistent and fair evaluation, we followed the methodology proposed by Matformer21 and used their proposed data splits. We use the mean absolute error (MAE) as our evaluation metric and report its mean and standard deviation across four random initialization seeds to confirm that our model's performance is robust rather than dependent on initialization. Section S4 in the ESI† provides the detailed CartNet modifications and training configurations used for predicting each property.
As illustrated in Table 1, our model consistently performs best across all evaluated properties. The improvements are most notable in the total energy and band gap (OPT) predictions, where our model demonstrates an improvement of 7.71% and 5.48%, respectively, over the next-best models. The low-data band gap (MBJ) scenario outperforms the next-best model by approximately 2.68%.
Method | Form energy (meV per atom) ↓ | Band gap (OPT) (meV) ↓ | Total energy (meV per atom) ↓ | Band gap (MBJ) (meV) ↓ | Ehull (meV) ↓ |
---|---|---|---|---|---|
Matformer21 | 32.5 | 137 | 35 | 300 | 64 |
PotNet22 | 29.4 | 127 | 32 | 270 | 55 |
eComFormer23 | 28.4 | 124 | 32 | 280 | ![]() |
iComFormer23 | ![]() |
![]() |
![]() |
![]() |
47 |
CartNet | 27.05 ± 0.07 | 115.31 ± 3.36 | 26.58 ± 0.28 | 253.03 ± 5.20 | 43.90 ± 0.36 |
Our model's ability to excel across various properties, including those with fewer data samples, such as the band gap (MBJ), highlights its adaptability and robustness. These results suggest that our approach performs well in traditional tasks and thrives in challenging low-data environments, providing a comprehensive solution for crystal structure property prediction.
As shown in Table 2, our method achieves the best performance across all evaluated properties. The improvements are particularly notable for form. energy, yielding an approximately 4.33% improvement. Similarly, for bulk moduli's low-data scenario, our model improves the MAE by approximately 13.16% over the next-best model. In the shear moduli task, another low-data scenario, our method achieves the same metric as the best-known reported.
These results underscore the robustness of our model, especially in low-data environments such as bulk and shear moduli tasks, where limited training data pose significant challenges. The consistent improvements across all properties confirm that our approach is well suited for predicting a broad range of crystal structure properties and offers substantial advantages over existing methods.
As shown in Fig. 4, small ADPs differ by several orders of magnitude compared to larger ones. This disparity could lead to biased conclusions when analysing the results using MAE, although the S12 does not have this problem. Nevertheless, the issue with the S12 is that it is not highly discriminative. Most of the errors are less than 1% using this metric, which might give the impression that the predictions are accurate. To address these issues, we also employed an Intersection over Union (IoU) metric over the ADP graphical ellipsoid representations. The IoU metric provides a measurement independent of the ADP's size, enabling us to assess whether smaller or larger ellipsoids are well-predicted. Furthermore, if an ADP is not well-predicted, it will have a higher impact on the IoU metric, making it more restrictive. The IoU is computed by voxelizing the 3D space. More details about the IoU implementation and the training configurations can be found in Sections S3 and S4 in the ESI,† respectively. For all models, we ran experiments using four initialization seeds to ensure robustness against initialization effects, reporting the mean and standard deviation.
Table 3 shows that our method achieves the best performance in all evaluation metrics, MAE, S12 and IoU, compared with other methods. In our experiments, CartNet outperforms the second-best model by 10.87% in MAE, 17.58% in S12, and 2% in IoU. Furthermore, CartNet needs to train 49% fewer parameters than the second-best model. This result suggests that our approach can achieve state-of-the-art results without needing specific layers to enforce the rotational equivariance. The error introduced by using our data augmentation instead of using equivariant layers is discussed and evaluated in Section 6.5.
Fig. 11 shows a visual comparison between the compared methods. Even though all tested models can encode the 3D geometry, only our approach and iComformer can encode the geometry so that the ellipsoids can be oriented. On the other hand, eComformer only creates spherical ellipsoids. If we take a closer look at the eComformer architecture, it creates an invariant descriptor based on distance. This descriptor is then updated by an equivariant layer based on spherical harmonics and then by a few invariant layers based on distance. Our intuition suggests that even though this equivariant layer enriches the invariant information and can improve the invariant predictions, it is not able to successfully encode the needed information for a correct ADP orientation prediction. Entire crystal structure comparison between methods can be found in Section S8 in the ESI.†
Fig. 12 presents a scatter plot of the predicted ADP volumes versus the experimental values. The model achieves a coefficient of determination (R2) of 0.91, demonstrating a strong linear correlation between the predictions and the actual volumes. This near-perfect linear regression indicates the model's high accuracy in predicting ADP volumes.
In Section 3, we discussed the data imbalance concerning temperature, volume, and elemental composition in the ADP dataset. We analysed the error distribution for temperature, volume, and element to determine if the model's errors align with these imbalances. Fig. 13 shows CartNet's predicted IoU as a function of temperature for the test split of the ADP dataset. The IoU decreases noticeably when the temperature drops below 80 K and when it increases above 400 K. This pattern corresponds with the data distribution depicted in Fig. 3, indicating higher errors in temperature regions that are underrepresented in the dataset. Nevertheless, the IoU is stable for the rest of the temperatures. Fig. 14 illustrates the IoU metric for CartNet's predictions across different volume ranges in the test split. Comparing this with Fig. 4, we observe that higher errors occur in volume ranges with fewer data points, especially at the extremes of the chart. Fig. 15 presents the IoU per element for CartNet's predictions on the ADP dataset. Lower metrics correspond to lower representation in the dataset for some elements, such as beryllium, technetium, caesium, and plutonium. Conversely, elements such as xenon and neptunium exhibit near-perfect IoU scores despite being underrepresented. Additionally, most of the other elements show similar IoU values, suggesting that the network is able to learn from atom numbers that appear more frequently in the dataset and generalize that information to other atomic numbers. Further discussion on the performance of different atom types in specific chemical interactions such as hydrogen bonding, π–π stacking, and tert-butyl groups, as well as additional results on different polymorphs, can be found in ESI Section S5.†
![]() | ||
Fig. 13 Plot of the IoU error with standard deviation as a function of temperature for CartNet's predictions on the ADP test dataset. |
![]() | ||
Fig. 14 Plot of the IoU error with standard deviation as a function of the ADP volume for CartNet's predictions on the ADP test dataset. |
Moreover, we aimed to investigate whether our model could predict ADPs at various temperatures while using the same crystal geometry. To this end, we selected a crystal not included in our dataset, which had been synthesised at different temperatures. The ESI† from previous studies48 indicated that the series of crystal structures of guanidinium pyridinium naphthalene-1,5-disulfonate, with the CSD refcode DOWVOC,49 met these criteria. The CSD contains 14 entries for this crystal, covering a temperature range between 155 K and 283 K, all with well-defined ellipsoids. The list of CSD refcodes and the respective temperature can be found in Section S7 in the ESI.† We conducted two experiments to assess the ability of our model to predict ADPs at any temperature fixing the geometry. In the first experiment, we examined whether our model could accurately predict the ADPs for all data points of this crystal using the experimental geometries and temperatures. In the second experiment, we evaluated whether our system could predict ADPs by employing a fixed geometry while varying the input temperature. We used the fixed geometry at 213 K and systematically adjusted the input temperature values provided to CartNet. We computed the mean of the ellipsoid volumes from experiments, predicted, and predicted from the geometry for each temperature point.
Fig. 16 presents the results of these experiments. The results demonstrate that our model can predict ADPs across the entire temperature range. However, the predicted volume diverges when using a fixed geometry and varying the input temperature. These results suggest that cell expansion due to temperature significantly affects ADPs, as the error increases with the difference between the temperature at which we fixed the geometry and the temperature at which ADPs should be predicted. Nonetheless, when predicting ADPs around the temperature from the fixed geometry, they are estimated with high accuracy. Further discussion about using the fixed geometry at 150 K and 283 K can be found in Section S7 in the ESI.†
Finally, the last experiment was to compare with traditional theoretical methods. We compared the ADPs of our method with DFT calculations. Due to the high computational cost of computing ADPs with DFT, a single crystal structure (5,5′-dimethyl-2,2′-bipyrazine, CSD refcode: ETIDEQ) has been computed. The electronic structure calculation was performed using the Vienna Ab initio Simulation Package (VASP c6.4.3)36–38 package with the optimized structure at the PBE45-D3(BJ)46 level of theory. More details about the configuration used in electronic structure calculations can be seen in Section S6 in the ESI.†
In the case of DFT calculations, the choice of the central geometry for the atomic displacements can have a strong impact on the accuracy of the calculated ADPs. In this case, three geometries were tested: (i) a full optimization, considering atomic positions and lattice parameters, (ii) geometry relaxation including atoms but fixing the lattice to its crystallographic dimensions and (iii) an atomic relaxation with a fixed volume, obtained by solving the Vinet equation of state. The latter calculation is the most sophisticated since it involves the calculation of the change in free energy with respect to the compression and expansion of the cell to derive a cell volume at a given temperature. However, the best results were obtained for the full optimization, which is a simpler method in comparison. Regarding the computational cost of this calculation, it is interesting to observe how the fixed lattice calculation required half the geometrical displacements compared to Vinet and full optimization, as this geometry retained inversion symmetry after structural optimization. Thus, the displacements over symmetry equivalent atoms were redundant and, hence, omitted.
Table 4 shows the numerical results for this comparison. As can be seen, our model still improves the MAE by 34.77% and the IoU by 6.04%. Additionally, CartNet shows real improvement in the computation time, which was reduced by several orders of magnitude. Fig. 17 compares the ADPs from CartNet and the best DFT results (full optimization). In both cases, ellipsoids closely match the experimental reference, in line with their low MAE values.
Method | MAE (Å2) ↓ | S12 (%) ↓ | IoU (%) ↑ | Time (s) ↓ |
---|---|---|---|---|
DFT (Vinet) | 1.32 × 10−2 | 3.09 | 57.33 | ∼2.88 × 106 |
DFT (fix latt.) | 1.43 × 10−2 | 4.12 | 70.75 | ∼1.44 × 106 |
DFT (full opt.) | 3.25 × 10−3 | 0.49 | 86.27 | ∼2.88 × 106 |
CartNet | 2.12 × 10−3 | 0.17 | 92.31 | ∼10−2 |
Table 5 presents the results of removing each component of our proposed method. The following subsections explain the detailed experiments done and discuss their implications in the design of our model.
As seen in the experiment number 2 from Table 5, the results indicate that excluding the hydrogen atoms decreases the IoU from 83.53% to 81.74%, resulting in a 1.32% decrease in performance. For the other metrics, an improvement of 14.89% can be seen in MAE and 0.19% in S12.
The results suggest that hydrogen atoms, while not directly involved in inferring ellipsoids, contribute valuable contextual information to the graph. This additional data benefit the model, allowing it to more accurately infer the ADPs of non-hydrogen atoms by incorporating the effects of the covalently bonded hydrogens and include the relevant hydrogen bond interactions and other intermolecular forces.
Experiment number 3 in Table 5 shows the results when training CartNet without the envelope function. Compared to full CartNet (experiment no. 1), the ADP prediction experiments show a drop of 0.32% for the IoU, 0.02% for the S12, and 5.26% for the MAE. This suggests that using neighbour equalization with the envelope function is highly effective in equalizing the neighbours.
The results presented in experiment number 4 in Table 5 clearly show the significant impact of incorporating the Cartesian direction unit vector in the edge encoder. The 9.36% decrease in IoU, the 1.71% drop in the S12, and the 53.77% reduction in MAE highlight the ability of the model to capture the spatial relationships between atoms when using the direction vector. This suggests that the direction unit vector is a crucial input feature for effectively encoding the geometric information necessary for accurate ADP prediction.
The experiment 5 in Table 5 shows the results when not including the temperature information in the input of the CartNet model. The decreases 1.31% in IoU, 0.08% in the S12, and 4.63% in MAE demonstrate that the ability of the model to capture the spatial extent of atomic displacements is enhanced when the temperature is included. These results suggest that temperature is essential for improving the performance of models in predicting ADPs, particularly when considering the thermal motion of atoms.
For these experiments, we trained our model on the ADP dataset with and without rotation SO(3) augmentation. The configuration without augmentation exposed the model to the data in its original orientation. In contrast, the configuration utilizing rotation SO(3) augmentation involved randomly applying three-dimensional rotations to the direction vectors of the atoms during training, thereby encouraging the model to learn features equivariant to spatial orientation.
Experiment number 6, compared with experiment number 1 from Table 5, demonstrates the positive impact of the rotation SO(3) augmentation on model performance. The decrease of 0.75% in IoU, 0.06% in S12, and 4.63% in MAE demonstrates that the model is more adept at generalizing to unseen orientations of the input data, making it better suited for real-world applications where molecules and crystal structures can appear in various spatial configurations.
Since CartNet does not explicitly enforce SO(3) rotation equivariance but learns it through data augmentation, we wanted to evaluate the error this method introduces into the final predictions. To quantify this, we defined two variables: Uorig represents the ADP predictions from the original, unrotated crystal structures and Urot represents the ADP predictions from the crystal structures after they have been rotated by a rotation matrix R. We then compared Urot with the rotated versions of Uorig, calculated as RUorigRT. This methodology allows us to isolate rotation-induced errors, ensuring that any observed discrepancies are attributable solely to rotation effects rather than a combination of prediction and rotation errors. To perform this comparison, we conducted a Monte Carlo experiment, applying 100 different random rotation matrices to the test set.
The results of this experiment yielded a Mean Absolute Error (MAE) of 1.01 × 10−3 Å2 ± 1.68 × 10−3 Å2, an S12 score of 0.65% ± 0.17%, and an Intersection over Union (IoU) of 94.96% ± 2.46%. Fig. 18 illustrates the results of this experiment.
In all cases, the ADP predictions for the rotated crystal structures (Urot) closely matched the rotated predictions of the original structures (RUorigRT). These results confirm that CartNet effectively generalizes to unseen rotations despite not explicitly enforcing rotation equivariance.
CartNet utilizes a novel Cartesian encoding approach that avoids reliance on the unit cell, thereby overcoming limitations faced by previous models. The incorporation of neighbour equalization helps the model to differentiate between various types of bonds and interaction forces between atoms. The Cholesky-based output layer ensures that the model generates valid ADP predictions that align with physical requirements. Additionally, by introducing a rotational generalization through data augmentation, CartNet effectively learns the directional nature of atomic vibrations without relying on specific equivariant layers. The evaluation of CartNet demonstrated its robustness and accuracy. It outperformed previously reported methods in other benchmarks, JARVIS and The Materials Project, that focused on bulk materials, instead of molecular systems, and contained structure and properties that have been computed with DFT calculations, instead of experimental structures and ADPs.
In addition to the model, we curated and presented a comprehensive ADP dataset containing over 200k crystal structures of molecular systems from the Cambridge Structural Database (CSD). This dataset spans a wide range of temperatures and atomic environments, providing a valuable resource for further research on predicting anisotropic displacements and thermal behaviours in crystalline structures.
This work provides a more efficient and accurate method for predicting properties in crystal structures, opening new possibilities for studying different material properties and designing new materials. Therefore, when CartNet is specifically used to predict ADPs, it can be used to evaluate the experimental results of new systems in cases where their experimental determination using diffraction techniques presents difficulties.
Future work could focus on predicting cell expansion to estimate ellipsoids at other temperatures based on a fixed geometry at a specific temperature. Regarding the specific case of the ADP, future work could explore the creation of equivariant methods to generate valid ADP matrices. Moreover, due to the large number of molecular crystal structures in the ADP dataset, future work can study using this dataset as pre-training for other crystal structure tasks. Finally, the efficiency and accuracy of CartNet also highlight its potential for future crystal structure prediction challenges, such as the 7th Blind Test51,52 organized by the Cambridge Crystallographic Data Centre (CCDC),53 where handling highly flexible or disordered systems remains a critical obstacle.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4dd00352g |
This journal is © The Royal Society of Chemistry 2025 |