Cameron J. Nickersona and
Erin R. Johnson*abc
aDepartment of Physics and Atmospheric Science, Dalhousie University, 6310 Coburg Road, Halifax, Nova Scotia B3H 4R2, Canada
bYusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, UK CB2 1EW
cDepartment of Chemistry, Dalhousie University, 6243 Alumni Crescent, Halifax, Nova Scotia B3H 4R2, Canada. E-mail: erin.johnson@dal.ca
First published on 19th May 2025
First-principles crystal structure prediction (CSP) of isolable polymorphs of organic compounds is a grand challenge in computational chemistry. The adoption of dispersion-correction density-functional theory (DFT) has allowed great strides to be made in the accuracy of the final energy ranking of candidate crystal structures. Consequently, CSP methods are seeing increasing use in development of new pharmaceuticals, organic electronics, energetic materials, and pigments, among other applications. However, lower-cost methods, such as classical force-field potentials, are still necessary for the early stages of CSP, where hundreds of thousands of candidates are commonly generated. Recently developed foundational machine-learned potentials represent a seductive alternative to force fields for this purpose due to their promise of near-DFT accuracy at a vastly reduced computational cost. In this work, the performance of the MACE-OFF23(M) machine-learned potential is assessed for geometry optimisation and energy ranking of candidate crystal structures of 28 compounds from the first seven CSP blind tests, as well as 12 helicene compounds. The performance of MACE-OFF23(M) is found to be highly dependent on the particular compound, providing good accuracy for compounds similar to those in its training set, but failing dramatically for compounds containing unusual functional groups (such as diazo) and organic salts. Physically motivated inclusion of long-range electrostatic interactions remains an open problem for development of foundational machine-learned potentials.
Crystal structure prediction (CSP)3–7 uses the tools of computational chemistry to suggest the polymorphs that are most likely to be observed experimentally, given only a 2D diagram of the molecule. The typical process for a CSP study begins by generating an enormous landscape of pseudo-random candidate crystal structures, possibly in the range of 106–107 structures. The landscape is then narrowed down to the few hundred structures with the lowest potential energies according to a classical force field, or other low-cost method. Use of a low-cost method at this stage is necessary due to the vast numbers of structures involved, but they are not typically accurate enough to predict the correct stability rankings. Thus, the remaining structures then have their potential energies re-evaluated using dispersion-corrected density-functional theory (DFT-D), which is more accurate but comes with a significantly increased computational cost. Finally, all candidates within a certain threshold (perhaps ∼5 kJ mol−1) of the minimum-energy structure are deemed as likely polymorphs.
The adoption of density-functional theory for the final energy ranking stage has led to much of the current success in CSP.6–13 For example, our group has obtained highly accurate lattice energies for molecular crystals using hybrid DFT in conjunction with the XDM14 dispersion correction.15,16 These studies employed the FHI-aims software package,17–19 which uses numeric atom-centered orbitals (NAOs) for its basis functions.
While DFT-D has been established as the state of the art for obtaining accurate lattice energies for molecular crystals, there is still something to be desired in the way of low-cost methods suitable for use in the initial phases of CSP. Unfortunately, the degree of error when using empirical force fields (or tight-binding DFT) is often much larger than the energy differences between real polymorphs, which can lead to the correct structure being discarded before making it to the DFT phase of the study.20 One potential avenue for improved low-cost energy-ranking methods is the emergence of machine-learned potentials as a possible alternative to empirical force fields.
There exist a number of machine-learned potentials that have been developed for organic chemistry, the most noteworthy of which are the series of ANI21–24 and AIMNet potentials.25–27 ANI makes use of neural networks based on local symmetry functions28 to create transferable potentials. In recent years, the ANI-2X model has become the most widely adopted machine-learned force field. The AIMNet models make use of a message passing architecture29 and they extend their applicability to a larger selection of chemical elements, as well as to charged species. Both ANI-2X and AIMNet were utilized for energy ranking by two groups in the seventh CSP blind test;30,31 however, both of these groups began by enhancing the model with further training. Results showed that the system-specific AIMNet machine-learned potentials ranked consistently well, whereas the other machine-learned methods performed inconsistently.
MACE-OFF2332 is a recently developed set of machine-learned potentials for organic molecules that demonstrates improved accuracy compared to the ANI-2X model. More specifically, MACE-OFF23 encompasses a series of three distinct machine-learned potentials denoted (S)mall, (M)edium, and (L)arge, which are designed to provide three different levels of accuracy and cost. Like the ANI models, the MACE method makes use of only the local atomic environment of each atom in order to calculate the energy. MACE-OFF23 was trained primarily to a subset of the SPICE dataset,33 consisting of only neutrally charged species composed of up to ten elements: H, C, N, O, F, P, S, Cl, Br, and I. SPICE contains slightly over 1.1 million conformers of selected small molecules, molecular dimers, dipeptides, and solvated amino acids, and uses reference energies and forces computed at the ωB97M-D3(BJ)34–36/def2-TZVPPD level of theory. Additionally, MACE-OFF23 was also trained on some larger systems (composed of 50–90 atoms) from the Qmugs37 dataset, as there were no systems of this size present in SPICE. Qmugs consists of minimum-energy conformations of 665000 neutral molecules comprised of the same 10 elements noted above. In the development of MACE-OFF23, the energies and forces of these molecules were reevaluated using the same level of theory as used for the SPICE reference data.
The MACE-OFF23 potential is of particular interest to us because of its proposed applicability for molecular crystals. In their work, the creators of MACE-OFF23 tested its performance on several distinct tasks, which included predicting lattice parameters and sublimation enthalpies for the X23 set38,39 of molecular crystals. For these systems, MACE-OFF23(M) resulted in a mean absolute error (MAE) of only 7.5 kJ mol−1 when comparing the computed sublimation enthalpies to experimental data;32 this represents a massive improvement compared to ANI-2X, which gave a MAE of 20.5 kJ mol−1. For comparison, the best dispersion-corrected DFT methods typically give MAEs of 2–5 kJ mol−1 for the X23 benchmark.16,38,39 This suggests that MACE-OFF23 may be directly applicable to molecular polymorph ranking without the need for additional system-specific training.
Ideally, a universal ML potential such as MACE-OFF23(M) could be a direct replacement for the FIT40 or W9941 classical force fields, which are routinely used in CSP without parameterization. We wish to avoid any system-specific training of the ML potential (akin to construction of tailor-made force fields42) as this would requires extensive DFT reference data to be generated at the outset of a CSP study. In the usual hierarchical CSP workflow, DFT calculations are performed only in the final stages, on a small set of low-energy candidates already identified by the force field. Any initial system-specific training of the ML potential to DFT would drastically increase both the complexity and computational cost and, thus, is not desirable for practical first-principles CSP on new compounds. For CSP applications, this makes bespoke ML potentials impractical and less appealing compared to foundational models that are broadly applicable to organic chemistry.
The goal of the present work is to further test the applicability of the MACE-OFF23(M) potential for molecular crystals in order to assess whether it will be a good choice as the initial energy ranking method in CSP studies going forward. Specifically, we considered the ability of the MACE-OFF23 potential to reproduce DFT geometries and relative energies for two sets of molecular crystal polymorphs: 28 organic compounds from the seven CSP blind tests9,20,30,31,43–46 and 12 helicene compounds.47,48 The results presented here will help researchers decide whether or not to use MACE-OFF23(M) as an alternative to empirical force fields in the early stages of a CSP protocol.
![]() | ||
Fig. 1 Structures of all blind test compounds considered in this work. The boxes encapsulate the various components for cocrystals and salts. |
Sets of candidate crystal structures were obtained from the ESI† of ref. 15 for blind-test compounds I–XXVI, ref. 49 for blind-test compounds XXXI–XXXIII, ref. 47 for the [n] helicenes, and ref. 48 for 1-aza[6]helicene. All of these structures were already fully geometry optimized with the B86bPBE density functional,50,51 the XDM dispersion correction,14,16 and the ‘lightdenser’ basis set and integration grid settings, using a modified copy of FHI-aims17 version 210513. The one exception was 1-aza[6]helicene, where the geometries had previously been optimized with the same functional, but with a planewave basis set, using Quantum ESPRESSO.52 The 1-aza[6]helicene structures were consequently re-optimized here using the same methodology as above for consistency.
Single-point energy evaluations were then performed on (i) the MACE-OFF23(M) optimized structures and (ii) the B86bPBE-XDM optimized structures using version 240206 of FHI-aims.17,18 The B86bPBE0-XDM hybrid functional16 (25% exact exchange) was used for the blind test compounds, while the B86bPBE-XDM functional was used for the helicenes, as in our previous works.15,47,49 The lightdense basis set and integration grids were used for both data sets. Due to some modifications to FHI-aims between versions, and the change from the lightdenser (recommended for geometry optimisations) to lightdense basis (recommended for single-point energies), the results for the DFT single-point energies at the DFT geometries are slightly different to those previously reported15,47,49 in some cases.
When tabulating results, we use the standard energy//geometry notation, where the method used for single-point energies is given first, followed by //, and then the method used for geometry optimization. Thus, MACE//MACE means MACE-OFF23(M) energies evaluated at the MACE-OFF23(M) geometries; DFT//DFT means B86bPBE-XDM or B86bPBE0-XDM energies evaluated at the B86bPBE-XDM geometries, and DFT//MACE means B86bPBE-XDM or B86bPBE0-XDM energies evaluated at the MACE-OFF23(M) geometries. This latter combination is an example of a composite method, which should ideally yield results of similar quality to the high-level method (DFT in this case) used for the single-point energies, with a greatly reduced computational cost compared to performing full geometry optimizations with that high-level method. Composite methods combining DFT with classical force fields have previously shown promise in molecular CSP.54 In theory, the DFT//MACE approach should give similar or better results than MACE//MACE in nearly all cases, with the only exceptions being due to accidental error cancellation between the geometry and the energy.
Finally, the similarity of the MACE-OFF23(M) structures, compared to the B86bPBE-XDM reference structures, was evaluated using the variable-cell powder difference (VC-PWDF) method,55 as implemented in critic2.56 These calculations compare simulated powder X-ray diffractograms of various unit-cell descriptions of a candidate crystal structure to a reference crystal structure using the de Gelder cross-correlation function.57 In this case, the candidate is the MACE-OFF23(M) optimized crystal structure and the reference is the B86bPBE-XDM optimized structure. By its construction, VC-PWDF accounts for changes in lattice constants (due to different temperatures, pressures, or computational methods used) when comparing pairs of crystal structures to determine if they are the same form or different polymorphs.
![]() | ||
Fig. 3 Distributions of computed VC-PWDF scores obtained for comparison of the MACE-OFF23(M) optimized crystal structures to the reference B86bPBE-XDM optimized crystal structures for each compound. The boxes show the interquartile ranges, the whiskers encompass 90% of the data, and remaining outliers are shown as individual points. The grey box shows the range of VC-PWDF scores where two structures can confidently be deemed the same polymorph (i.e. <0.03).55 For visual clarity, the plots are truncated at a VC-PWDF score of 0.2, despite a few outliers having scores exceeding this value. In the bottom row of plots, the [n]helicene structures are denoted by their n value, while “nhelic” refers to 1-aza[6]helicene. |
From the results in Fig. 3, it can be seen that the distributions of VC-PWDF scores tend to be tightly clustered near zero for many of the compounds considered. In these cases, there were only minimal geometry changes upon re-optimization with the MACE-OFF23(M) potential, such that the same polymorph was recovered as opposed to a migration to some other polymorph on the potential-energy surface. However, despite generally high packing similarity between the two sets of optimized structures, some of the blind test compounds were notable outliers. These included compounds II, IX, X, XII, XIII, XVII, XIX, XXI, XXII, and XXXIII, although at least half of the structures were still identified as the same polymorph before and after MACE-OFF23(M) optimization based on VC-PWDF scores of <0.03.55
The broadest distributions of VC-PWDF scores were observed for compounds XVI and XXIV, where more than half of the structures transitioned to different crystal forms during MACEOFF23(M) optimization. Compound XVI contains a diazo (N
N) functional group, which is not well represented in the training data, so it is unsurprising that the MACE-OFF23(M) energy landscape differs from the DFT one to the point where many local minima are no longer stable. We note that MACEOFF23(M) shows significantly better performance for compound XVIII, which also contains a diazo group; this is likely due to the other functional groups having a proportionally greater influence on the crystal packing.
The other large outlier, compound XXIV, is a 3-component organic salt. Thus, the poorer performance of MACE-OFF23(M) is again expected as this potential was only trained for neutral molecules and included no ions, let alone ionic solids. Accurate modeling of electrostatic interactions with machine-learned potentials is problematic due to the incompatibility in their length scales—electrostatic interactions decay as −1/r and are inherently long range, while only short-range atomic environment information is typically used as input. This is indeed the case for the MACE-OFF23(M) potential, which uses a radial cutoff of 5.0 A and contains no charge or spin information. Owing to its message-passing architecture, there is a receptive field that allows atoms to exchange information up to 10.0 Å; however, this still remains too short to properly capture electrostatic interactions.32
The VC-PWDF method is designed to assess whether the two input structures have the same packing and are the same polymorph, while ignoring volume changes.55 Thus, low VC-PWDF scores are necessary, but not sufficient, to guarantee that the MACE-OFF23(M) geometries are in good agreement with those obtained with DFT, as significant changes in unit-cell volumes are still possible. Fig. 4 plots the volume changes, expressed per molecule in the unit cell, between the MACE-OFF23(M) optimized structures and the B86bPBE-XDM optimized structures. This plot reveals that the MACE-OFF23(M) unit cells are typically more compact. This error appears quite systematic for the helicenes, where the extent of volume compaction seems to increase with ring size. However, there are some notable exceptions for some of the blind-test compounds, with the cell volumes being consistently overestimated for compound IX, and some structures having expanded volumes for compounds XVII and XXIV. Large spreads in computed volumes between MACE-OFF23(M) and B86bPBE-XDM will likely translate to poorer performance of the composite DFT//MACE approach, which relies on the MACE-OFF23(M) geometries being a good proxy for their DFT counterparts.
Polymorph | MACE//MACE | DFT//MACE | DFT//DFT | |||
---|---|---|---|---|---|---|
Rank | ΔE | Rank | ΔE | Rank | ΔE | |
Rigid molecules | ||||||
I-1 | 9 | 3.8 | 1 | 0.0 | 2 | 0.1 |
I-2 | 4 | 1.2 | 3 | 1.3 | 1 | 0.0 |
II | 1 | 0.0 | 2 | 1.0 | 2 | 0.7 |
IV | 1 | 0.0 | 1 | 0.0 | 1 | 0.0 |
V | 1 | 0.0 | 3 | 1.6 | 4 | 2.0 |
VII | 1 | 0.0 | 1 | 0.0 | 1 | 0.0 |
VIII | 1 | 0.0 | 1 | 0.0 | 1 | 0.0 |
IX | 9 | 16.1 | 1 | 0.0 | 1 | 0.0 |
XI | 7 | 2.8 | 2 | 0.4 | 1 | 0.0 |
XII | 4 | 1.9 | 1 | 0.0 | 1 | 0.0 |
XIII | 1 | 0.0 | 2 | 0.0 | 1 | 0.0 |
XVI | 2 | 0.3 | 16 | 9.0 | 1 | 0.0 |
XVII | 10 | 4.4 | 2 | 0.3 | 1 | 0.0 |
XXII | 18 | 3.9 | 25 | 7.0 | 3 | 0.4 |
Multi-component crystals | ||||||
XV | 2 | 0.3 | 1 | 0.0 | 2 | 0.6 |
XXI | 1 | 0.0 | 2 | 0.8 | 1 | 0.0 |
XXV | 1 | 0.0 | 1 | 0.0 | 1 | 0.0 |
Organic salts | ||||||
XIX | 1 | 0.0 | 14 | 28.5 | 4 | 2.3 |
XXIV | 86 | 60.5 | 50 | 31.4 | 3 | 0.8 |
XXXIII-A | 93 | 24.6 | 7 | 8.2 | 4 | 4.5 |
XXXIII-B | 192 | 33.5 | 1 | 0.0 | 1 | 0.0 |
Flexible molecules | ||||||
VI | 2 | 1.8 | 1 | 0.0 | 1 | 0.0 |
X | 12 | 26.5 | 1 | 0.0 | 2 | 0.6 |
XIV | 1 | 0.0 | 1 | 0.0 | 1 | 0.0 |
XVIII | 2 | 0.1 | 1 | 0.0 | 1 | 0.0 |
XX | 3 | 1.3 | 6 | 5.8 | 4 | 6.4 |
XXIII-A | 37 | 8.6 | 55 | 6.2 | 10 | 2.1 |
XXIII-B | 4 | 2.2 | 3 | 0.3 | 3 | 0.5 |
XXIII-C | 1 | 0.0 | 1 | 0.0 | 2 | 0.3 |
XXIII-D | 34 | 8.3 | 36 | 4.9 | 13 | 2.9 |
XXIII-E | 51 | 10.7 | 7 | 1.6 | 9 | 1.9 |
XXVI | 1 | 0.0 | 3 | 0.2 | 1 | 0.0 |
XXXI-AMaj | 7 | 6.6 | 3 | 1.8 | 4 | 2.5 |
XXXI-AMin | 42 | 12.5 | 8 | 3.0 | 10 | 4.2 |
XXXI-B | 6 | 5.9 | 18 | 6.5 | 12 | 5.0 |
XXXI-C | 85 | 19.8 | 53 | 11.6 | 70 | 11.5 |
XXXII-AMaj | 31 | 14.9 | 17 | 9.2 | 22 | 8.8 |
XXXII-B | 135 | 22.0 | 19 | 10.5 | 26 | 9.1 |
The reported data in Table 1 are the ranks of all experimentally observed polymorphs (i.e. 1 indicates this is the most stable structure, 2 indicates the second-most stable, and so forth), along with their relative energies above the global minimum obtained with that particular level of theory. As the sets of candidate structures contain duplicates in some cases (i.e. the same crystal structure was generated by multiple groups participating in the blind tests), duplicate structures were eliminated from the energy ranking in Table 1 if they had a powder difference score of less than 0.01 when compared with critic2.56 Typically, the experimentally isolated polymorphs would be expected to be among the lowest-ranked structures, with energies within <2 kJ mol−1 of the global minima (with this range combining both expected uncertainties in the DFT relative energies and the magnitude of thermal free-energy corrections,58 which are neglected here). However, there are some exceptions where experimental screening can result in formation of metastable polymorphs, for example due to solvent loss from a solvate, as in the case of polymorph C of compound XXXI.30
To aid assessment of the performance of the MACE-OFF23(M) and DFT//MACE approaches, the results in Table 1 are divided into four groups based on the types of compounds included in the blind tests. These groups are rigid molecules, multi-component crystals composed of neutrally charged molecules (i.e. cocrystals and solvates), organic salts, and flexible molecules with multiple rotatable bonds. Molecules are considered to be rigid if they have no rotatable bonds, other than those that result in changing only H-atom positions (as in compound II), or when steric factors prevent rotation (as in XVII).
The results in Table 1 show that MACE-OFF23(M) performs somewhat well for the majority of the rigid molecules, ranking the experimental structure within 2 kJ mol−1 of the minimum in 9/14 cases, and within 5 kJ mol−1 of the minimum in another 4 cases. The only large outlier is compound IX, where the experimental structure is predicted to lie 16.1 kJ mol−1 above the MACE-OFF23(M) energy minimum. Notably, IX is the only compound in our data set than contains iodine, so perhaps this element was insufficiently well represented in the original SPICE and Qmugs data used to train the MACE-OFF23(M) potential. Coincidentally, it is also the only compound for which MACE-OFF23(M) consistently overestimates the unit-cell volumes (Fig. 4).
The DFT//MACE composite approach also appears promising for rigid molecules, with the experimental structure being ranked first, second, or third, and lying within 2 kJ mol−1 of the minimum-energy structure in 12/14 cases. The only two exceptions are compounds XVI and XXII, where the experimental polymorphs are 9 and 7 kJ mol−1, respectively, above the corresponding minimum. For compound XXII, Fig. 3 shows that the distribution in VC-PWDF scores is clustered near zero, indicating generally high similarity between the MACE-OFF23(M) and B86bPBE-XDM structures, and this includes the experimental structure. However, this compound does contain three cyano (CN) groups, a fused ring system with three sulfur atoms, and no hydrogens, so it is quite far removed from the compounds in the SPICE training set.
Compound XVI includes the diazo (N
N) functional group and, as noted above, is one of the two cases showing large deviations between the MACE-OFF23(M) and B86bPBE-XDM geometries. It is, therefore, expected that this would be a problem case for the DFT//MACE composite method. The VC-PWDF score obtained from comparing the MACE-OFF23(M) and B86bPBE-XDM optimized structures for the experimental polymorph is 0.184, and COMPACK comparison59,60 yields only a 2/20 molecule match. This indicates a change in form upon MACE-OFF23(M) optimization, meaning that the experimental polymorph is not a stable minimum with the machine-learned potential. Thus, using MACE-OFF23(M) in the initial stages of CSP would cause the experimental structure of compound XVI to be missed. Reoptimization of the MACE-OFF23(M) “experimental” structure with B86bPBE-XDM gives a yet another structure, with a 16/20 molecule match with COMPACK, and a VC-PWDF score of 0.061, compared to the MACE-OFF23(M) result.
For the three neutral co-crystals, the results in Table 1 show that both the MACE-OFF23(M) and DFT//MACE methods perform very well. The experimental polymorphs are ranked either first or second in energy, within 1 kJ mol−1 of the minimum, in each case. Conversely, the results for the three organic salts are extremely poor. While MACE-OFF23(M) happens to identify the experimental polymorph as lowest in energy for compound XIX, isolated forms are ranked 86th, 93rd, and 192nd for the other two salts, with relative energies ranging from roughly 25–60 kJ mol−1 above the corresponding minimum. As such, these structures would not be carried forward to further study in most CSP protocols. Again, catastrophic failure of the MACE-OFF23(M) potential for salts is completely expected given that it was not trained on systems with net charges (or on ionic crystals). Also as expected, based on the high VC-PWDF scores in Fig. 3, the DFT//MACE approach predicts the experimental polymorphs to be 28 and 31 kJ mol−1 above the minima for compounds XIX and XXIV, respectively. The performance of DFT//MACE for compound XXXIII is quite good, however, which is consistent with the much lower VC-PWDF scores seen in Fig. 3. Likely the quality of the MACE-OFF23(M) geometries for salts is, in part, related to the extent of charge localization and resemblance to the zwitterionic solvated amino acids in the SPICE training set, with the chloride salt (XXIV) proving particular problematic.
Turning to the flexible molecules, Table 1 shows that these can be challenging CSP cases even for DFT methods. It has been shown that inclusion of thermal free-energy corrections is necessary to identify the experimental structure as the most stable for compounds XX, XXIII, and XXXI.11,49,61,62 In the most recent blind test, none of the methods considered ranked the two isolated polymorphs of compound XXXII as particularly low in energy,31 implying either consistent errors for several dispersion-corrected density functionals, importance of kinetic effects on crystal growth, and/or the possibility of a more-stable, late-appearing polymorph not yet characterised experimentally.63 Overall, the DFT calculations predict the experimental forms to be within 10 kJ mol−1 of the minimum for all cases except form C of compound XXXI; however, this polymorph was the result of solvent loss from a solvate, leaving large crystal voids, and is expected to be highly unstable. As shown in Table 1, the MACE-OFF23(M) potential provides good agreement with DFT results for compounds VI, XIV, XVIII, XX, and XXVI, but its performance for the various polymorphs of the remaining flexible molecules is erratic. In general, the DFT//MACE results are in much better agreement with the DFT reference data, illustrating the promise of this composite approach using geometries optimized with MACE-OFF23(M) for the early stages of CSP.
Compound | Polymorph | MACE//MACE | DFT//MACE | DFT//DFT | |||
---|---|---|---|---|---|---|---|
Rank | ΔE | Rank | ΔE | Rank | ΔE | ||
Naphthalene | NAPTHA18 | 1 | 0.0 | 2 | 0.1 | 1 | 0.0 |
Phenanthrene | Ref. 65 | 22 | 2.7 | 1 | 0.0 | 1 | 0.0 |
[4] helicene | BZPHAN | 1 | 0.0 | 30 | 4.4 | 4 | 1.3 |
[5] helicene | DBPHEN05 | 73 | 9.1 | 5 | 1.4 | 2 | 0.0 |
DBPHEN04 | 25 | 5.1 | 9 | 1.8 | 3 | 0.4 | |
DBPHEN02 | 2 | 0.1 | 22 | 4.0 | 4 | 0.6 | |
DBPHEN03 | 28 | 5.3 | 14 | 2.8 | 5 | 0.8 | |
[6] helicene | Intergrowth | 1 | 0.0 | 1 | 0.0 | 1 | 0.0 |
HEXHEL | 2 | 0.6 | 3 | 1.0 | 2 | 0.2 | |
1-aza[6] helicene | COBNUD | 12 | 3.0 | 1 | 0.0 | 1 | 0.0 |
KAWRUY | 38 | 13.3 | 36 | 11.6 | 31 | 10.0 | |
[7] helicene | IMEJIW | 1 | 0.0 | 1 | 0.0 | 1 | 0.0 |
HPTHEL01 | 3 | 4.7 | 5 | 4.6 | 3 | 3.4 | |
[9] helicene | QUJNEQ | 1 | 0.0 | 3 | 3.2 | 1 | 0.0 |
[10] helicene | THELIC | 1 | 0.0 | 1 | 0.0 | 1 | 0.0 |
[11] helicene | UHELIC | 1 | 0.0 | 1 | 0.0 | 1 | 0.0 |
The results in Table 2 show that the composite DFT//MACE approach offers improvement over MACE-OFF23(M) alone for phenanthrene, 1-aza[6]helicene, and most polymorphs of [5]helicene. However, agreement with DFT reference data worsens when applying the composite approach to [9]helicene and, particularly, [4]helicene, where the experimental structure is ranked 30th. Visualisation of the MACE-OFF23(M) structure for the experimental form of [4]helicene reveals a large volume contraction that resulted in very short intermolecular H⋯H contacts of <2.2 Å, which explains its lesser stability with DFT. A similarly short contact is also found in the MACE-OFF23(M) structure of only one of the four experimental forms of [5]helicene, DBPHEN02, which was ranked spuriously high in energy with DFT//MACE. Nevertheless, except for the enantiopure form of 1-aza[6]helicene discussed above, all experimental forms are predicted to lie within 5 kJ mol−1 of the corresponding energy minimum with the composite method. As such, they would still be carried forward to full DFT calculation in a practical CSP protocol.
The MACE-OFF23(M) geometries were compared to their DFT counterparts using the VC-PWDF method, which provides a packing similarity score between two crystals while allowing for distortions of the unit cell. The VC-PWDF scores were frequently clustered near zero, indicating that the MACE-OFF23(M) geometry optimizations converged to the same crystal polymorph as the B86bPBE-XDM optmizations in the majority of cases. Even in cases with broader distributions of VC-PWDF scores, more than 50% of the candidate crystal structures were deemed as matching (VC-PWDF <0.03) the DFT reference structure after optimization with the MACE-OFF23(M) potential. The two noted exceptions were the blind-test compounds XVI and XXIV. Poor performance in these cases is unsurprising since compound XVI contains a diazo (N
N) group, which is not well represented in MACE-OFF23(M)'s training data, while compound XXIV is an organic salt. Ions are inherently problematic for local models such as MACE-OFF23(M) because of the long range of the electrostatic interaction. In addition to VC-PWDF scores, we also considered changes in unit-cell volumes between the B86bPBE-XDM and MACE-OFF23(M) geometries. The MACE-OFF23(M) geometries were typically more compact, with the notable exception of the blind-test molecule IX, which was the only iodine-containing compound present in our study and, again, was not well represented in MACE-OFF23(M)'s training data. Machine-learned potentials are naturally limited by their training data, so it is expected to obtain good performance for compounds or intermolecular interaction motifs similar to those in the training set, and uncertain or poor performance for new or little-sampled regions of chemical space.
The energy ranking of each experimentally isolated polymorph was evaluated relative to the most-stable candidate structure provided by each method (MACE//MACE, DFT//MACE, DFT//DFT). MACE//MACE performed reasonably well for the rigid blind-test molecules, where 9/14 experimental polymorphs were ranked within 2 kJ mol−1 of the minimum-energy structure. The rankings were further improved by DFT//MACE, which predicted 12/14 experimental polymorphs to lie within this energy range. For the helicene compounds, the MACE//MACE method ranked the experimental forms within 6 kJ mol−1 of the minimum-energy structures in 14/16 cases, with the DFT//MACE method generally providing improved agreement with the DFT//DFT reference data. The flexible blind-test molecules, which are challenging even for DFT, were also unsurprisingly challenging for the MACE-OFF23(M) potential. Here, the performance of the MACE//MACE method was somewhat erratic, with large fluctuations in rankings of experimental structures, although there was again improvement when using the composite DFT//MACE approach. Finally, we saw a complete failure of the MACE-OFF23(M) potential for the experimental form of molecule XVI, where geometry optimization resulted in a different crystal structure, meaning that it would never be found in a CSP study using this methodology. A catastrophic failure of the MACE-OFF23(M) potential was also seen for organic salts, which is entirely expected due to the neglect of long-range electrostatics and lack of any ionic compounds in the training data. Developing machine-learned potentials that include a physically reasonable description of long-range electrostatic interactions remains an outstanding challenge.
We conclude that the MACE-OFF23(M) potential provides a promising step forward for the use of machine-learned potentials in CSP. For crystals composed of rigid molecules, containing common functional groups, the MACE method gives remarkably good results for both geometries and energy ranking, and the latter is improved by the DFT//MACE composite approach. Furthermore, our results suggest that this composite approach may even be accurate enough for application to flexible molecules, for selection of candidate structures that would progress to the final DFT energy ranking stage of a CSP study. Unfortunately, it does not seem possible to know a priori how well the MACE structure will approximate the DFT structure for specific compounds, which detracts significantly from the reliability of the DFT//MACE approach. Users of MACE-OFF23(M) should be aware of its pitfalls: it should never be used for ionic systems, and should also be avoided when considering compounds with any uncommon functionals groups, or elements, which may be poorly represented by its training data.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5cp00593k |
This journal is © the Owner Societies 2025 |