Xiao
Liu
a,
Kevin A.
Spiekermann
b,
Angiras
Menon
b,
William H.
Green
b and
Martin
Head-Gordon
*ac
aKenneth S. Pitzer Center for Theoretical Chemistry, Department of Chemistry, University of California, Berkeley, California 94720, USA. E-mail: mhg@cchem.berkeley.edu
bDepartment of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
cChemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
First published on 26th May 2025
Accurate prediction of barrier heights and reaction energies is of paramount importance for reaction kinetics. For computational efficiency, such calculations are typically performed with density functional theory (DFT) methods, with accuracy that depends critically on the choice of functional. The RDB7 dataset (K. A. Spiekermann, L. Pattanaik and W. H. Green, High Accuracy Barrier Heights, Enthalpies, and Rate Coefficients for Chemical Reactions, Sci. Data, 2022, 9, 417) is a diverse chemical kinetics data set that covers 11926 reactions and their barriers to assess present-day functionals. Strikingly, the RDB7 barrier heights reported using a reputable rung 4 hybrid functional (ωB97X-D3) exhibited significantly larger errors than seen in other benchmarks. Here, we identify the sources of error, and to the extent possible, address those sources. We categorize the barrier heights and reaction energies into three subsets based on orbital stability analysis. The “easy” subset has orbitals that are stable at the mean-field Hartree–Fock (HF) level, which implies weak correlation effects. An “intermediate” subset exhibits spin symmetry breaking at the HF level, but the restricted orbitals are stable at the dynamically correlated κ orbital optimized second order Møller–Plesset (κ-OOMP2) level with κ = 1.45. While more challenging than the easy category, this implies that correlation effects are still not strong. The remaining “difficult” subset is expected to be significantly affected by strong electron correlations, which potentially affects the accuracy of standard DFT. With this data classification, we performed new benchmarks with unrestricted ωB97X-D3 as well as two other hybrid functionals, ωB97M-V, and MN15, and the double hybrid ωB97M(2) functional. The RMSD values on the easy subset are comparable to prior high-quality benchmark studies, while the performance of all functionals on the intermediate subset is consistently less good. By far the largest errors lie in the difficult subset involving strongly correlated species. We refined some of the previous reference values to further assess the two key error sources: the density functional and its associated orbitals, and the reduced reliability of the previous RHF:RCCSD(T)-F12 reference. We propose our orbital stability classification as a best-practice approach for DFT calculations in chemical kinetics involving even numbers of electrons, as it provides useful information about the expected accuracy. We strongly recommend the routine use of orbital stability analysis in DFT calculations, as the spin-polarized solutions significantly reduce the strong correlation errors seen with spin-restricted orbitals.
![]() | (1) |
![]() | ||
Fig. 1 The geometric means (GM) of the RMSD values (in kcal mol−1) of all subsets in the BH tasks for the representative functionals from all 5 rungs in the MGCDB84 database. |
How transferable are such conclusions? It must be noted that there are relatively few data points for barrier heights in MGCDB84: only 206 values are included. It is therefore very likely that the results of benchmarking on those 206 barriers are not entirely representative of even the chemistry of the light main group elements that MGCDB84 focuses on. A similar critique can be applied to the other leading dataset used to assess density functionals: the widely used GMTKN55 dataset contains only 194 barrier heights.6 In fact many of their BH test cases are identical. Accordingly there is a need to assess density functionals across much larger and more diverse BH datasets. To partially fill this need, a number of high quality barrier height benchmarks have appeared in recent years. While we cannot document them all, we can list some that we are aware of. For example, TMB50 contains 50 transition metal BHs,7 and MOBH35 contains 35 transition metal BHs.8,9 BH28 contains 28 diverse BHs for pericyclic, bipolar cycloaddition, cycloreversion, and proton transfer reactions,10 Criegee22 contains 22 BHs for ring-closing reactions involving atmospherically important Criegee intermediates,11 CopeBH-22 contains 22 diverse BHs for Cope rearrangements in substituted shape-shifting bullvalenes.12 BH2O-36 contains 36 BHs for hydrolysis reactions,13 and concerted proton transfer (CPT) BHs in 9 cyclic hydrogen bonded clusters have been reported.14 MME55 contains 55 metalloenzyme model BHs,15 and BH9 is a set of 449 diverse BHs.16,17
While very valuable, those recent results provide only about a factor of 5 enhancement over the number of reference barrier heights available in 2017. To generate much higher volumes of data is also possible, although inevitably it is hard to do so at the same level of quality (i.e. targeting at least coupled cluster theory with perturbative triples18 (CCSD(T)) close to the complete basis set (CBS) limit).19 One notable effort that comes close to maintaining benchmark level accuracy is the RDB7 dataset.20 RDB7 contains nearly 12000 reactions and their corresponding reaction energies and barrier heights (forward and reverse), or roughly 120 times more kinetics data than either GMTN55 or MGCDB84. Briefly, the transition structures contain up to 7 heavy atoms, built from the light main group elements H, C, N, and O. The geometries were optimized at ωB97X-D3/def2-TZVP from prior work21,22 that used the single-ended growing string method23 to automatically identify thousands of transition structures (TSs) and products from a set of reactant species chosen from GDB-7.24 RDB7 then refined the energetics at the CCSD(T)-F12a/cc-pVDZ-F12 level of theory, which is close to the CBS limit,25–30 thereby producing near-benchmark quality results. 15 representative reactions exhibited root mean square deviations within 0.25 kcal mol−1 between CCSD(T)-F12a/cc-pVDZ-F12 and CCSD(T)-F12a/cc-pVTZ-F12 for both barrier heights and reaction energies. The RDB7 dataset has been used as ground truth data for AI-enhanced methods,31,32 as well as a direct benchmark for evaluating the performance of various AI models.33
The RDB7 reference results were used to assess two density functionals, a rung 2 GGA, B97-D3,34 and a rung 4 range-separated hybrid GGA, ωB97X-D3 (as was used to obtain the geometries).35–37 The results are very interesting for reasons that prompted this present study. First, the RMSD reported for the B97-D3 functional in the relatively small def2-mSVP basis set was 8.5 kcal mol−1 for RDB7, which is remarkably consistent with the value of 8.3 kcal mol−1 obtained from the 206 data points in MGCDB84 at the CBS limit.4 This would appear to suggest that conclusions drawn from the small dataset transfer quite well to the vastly larger, more diverse, RDB7 set. However, one must then confront the following apparent paradox. In RDB7, an RMSD of 5.3 kcal mol−1 is obtained with the more advanced ωB97X-D3 hybrid using the larger def2-TZVP basis set, which is more than a factor of two larger than the value of 2.3 kcal mol−1 obtained with the same functional at the CBS limit with the MGCDB84 data set!
Why are the range-separated hybrid results for RDB7 so much poorer than for MGCDB84? Does it reflect the greater diversity of the data? If so, then there are important limitations on how transferable the results obtained in MGCDB84 and GMTKN55 are to other systems containing the same light elements. Or are there critical differences in how the DFT reaction barrier heights are evaluated in the RDB7 study versus the MGCDB84 benchmarking? Certainly there are no obvious issues, such as numerical convergence problems. All systems were formally closed shell which meant that spin-restricted DFT calculations could be used. Yet if there are differences in benchmarking protocols, then there could be important implications for best practices in the evaluation of barrier heights by standard DFT methods that deserve to be more widely known. This work presents a systematic exploration of these two questions. While all results are revealed at the appropriate moment, we can foreshadow the conclusions a bit by observing that in fact both factors are at play in accounting for the much poorer hybrid GGA performance reported for RDB7 than in other smaller benchmarks such as GMTKN55 and MGCDB84.
To address the first question, let us remember that DFT has two main modes of breakdown. The first is delocalization or self-interaction (SIE) error38,39 which is often associated with localized electrons or holes, and a defect of the exchange functional. SIE causes the sign of the barrier height for H + H2 → H2 + H to be inverted with semi-local functionals40 (the Perdew–Zunger self-interaction-correction41 can usefully improve such results42). SIE is certainly responsible for the much poorer performance of semi-local versus hybrid density functionals discussed above, as the latter cancel some SIE by use of a fraction of exact exchange. The second source of functional failure is strong correlation error38,43,44 associated with the use of only a single Kohn–Sham determinant of molecular orbitals (MOs). Barrier heights associated with transition structures that exhibit strong correlations could indeed be significantly more difficult than those that are not. We will therefore categorize the reactions into three subsets, based on strength of correlations: easy, intermediate, and difficult. The easy category will be defined as corresponding to a sufficient absence of strong correlations that the mean-field Hartree–Fock orbitals are stable against artificial symmetry-breaking.45 That is well known to be a signature of the absence of strong correlation effects.46–48
The second, intermediate, category will be defined as being when a correlated method that excludes strong correlations remains stable against spin symmetry-breaking while HF is not. For this purpose we use the κ-regularized orbital-optimized second order Møller–Plesset (κ-OOMP2) method.49,50 This method damps the matrix elements of MP2 theory in an energy-dependent way,51 based on the orbital energy differences between pairs of virtuals (a,b) and occupied (i,j), Δabij and a parameter κ:
〈ij‖ab〉′ = 〈ij‖ab〉[1 − exp(−κΔabij)] | (2) |
If the κ-OOMP2 orbitals do not break spin-symmetry while HF orbitals do break spin-symmetry, we must conclude, on the one hand, that correlation effects are more challenging than for the easy category. On the other hand, we must also conclude that species in this category do not exhibit truly strong correlations. This definition is therefore a category of intermediate difficulty. The third, truly difficult category, will be the cases where κ-OOMP2 orbitals also break spin-symmetry, suggesting that there are genuine multi-reference effects at play.48,52
By the same token, if there are large numbers of systems that fall into the intermediate and difficult categories, we must then confront the question of how to properly perform the DFT calculations. Our view is that the answer is to permit the DFT orbitals to break spin-symmetry whenever the spin-restricted orbitals are not stable. One can intuitively appreciate the reason for preferring this choice by recalling that in the separation of a closed shell molecule such as H2 into two radical fragments, the correct answer will be obtained at dissociation by using spin-polarized orbitals because the fragments are becoming more and more weakly coupled. By contrast, the spin-restricted orbitals perform very poorly with almost all commonly used density functionals, as a manifestation of strong correlation error associated with entangling the two fragments.
Another aspect of the same view is that breaking spin-symmetry whenever possible provides the lowest possible energy, which is very suitable definition for the ground electronic state. While this may appear obvious as written, it is worth recalling that it is not typical to test spin-restricted DFT calculations for instability to spin-polarization, so this is certainly not the common view of practitioners of DFT. By contrast, every calculation performed in MGCDB84 assessments was tested for orbital instabilities.4 Most quantum chemistry codes provide this capability, at a cost that is not much greater than a self-consistent field (SCF) calculation itself. So it is computationally viable to do this for RDB7, and indeed it is viable to do this in general if needed. We note that in studies which use multiple functionals, it is typically sufficient to perform the stability analysis using the method most likely to break spin-symmetry (in our case, the mean-field Hartree–Fock method).
In the remainder of this work, we present a careful reassessment of the RDB7 dataset focusing on the nature and extent of spin symmetry breaking as motivated above. To broaden the scope of our assessment and test the transferability of our conclusions, together with ωB97X-D3, we also conducted new benchmarks with three additional functionals: ωB97M-V,53 MN15,54 and ωB97M(2).5 These functionals were selected both for their reported accuracy in other high-quality benchmarks4,6,55,56 and for their slightly different ingredients and positions on the Jacob's ladder of functionals, allowing a more comprehensive evaluation.
DFT calculations of all species were done with Q-Chem 5.4.58,59 For the 4 functionals ωB97X-D3, ωB97M-V, MN15, ωB97M(2), unrestricted DFT calculations are performed with a spin polarized, unrestricted initial guess (10% β-LUMO/α-HOMO mixing from the restricted solution) and the geometric direct minimization (GDM) algorithm.60 The def2-TZVP basis set61 was used for consistency with the original work. This is amongst the smallest basis sets that are reasonable for evaluation of barrier heights with hybrid density functionals.4 Internal stability analysis at the Hartree–Fock SCF level,45 and unrestricted κ-OOMP2 〈S2〉 evaluations were also performed using Q-Chem. All κ-OOMP2 calculations used κ = 1.45 as previously recommended.49
For the “easy” and “intermediate” subsets, the original RHF:RCCSD(T)-F12a/cc-pVDZ-F12 energetic reference values20 are used without change. For the “difficult” subset, reference values for 150 selected outliers (selected as described in the Results section) were refined by applying Yamaguchi approximate spin projection,46 which projects out the triplet contamination in the broken symmetry solution:
![]() | (3) |
![]() | (4) |
For these calculations, the internal stability analysis of the UHF/cc-pVDZ-F12 solutions are done with ORCA 5.0.4,62 together with UCCSD(T)-F12/cc-pVDZ-F12 calculations based on the stable UHF orbitals. The resolution of the identity (RI) approximation63 is applied only for the CCSD(T)-F12 part with cc-pVDZ-F12-CABS near-complete auxiliary basis set64 and cc-pVTZ auxiliary basis set for correlation.65 All of the correlated wavefunction calculations apply the frozen core approximation. Finally, the unrestricted Hartree–Fock based unrestricted CCSD 〈S2〉 calculations required in eqn (4) were performed with the cc-pVDZ basis set66 using Q-Chem. The Q-Chem and ORCA input templates can be found in Section 7 of ESI.†
(1) RHF stability analysis with the cc-pVDZ basis. The purpose of this step is two-fold: to check the fidelity of the spin unpolarized DFT test data and the RHF:RCCSD(T)-F12a/cc-pVDZ-F12 reference. As established in previous studies,67 Kohn–Sham orbitals are usually less sensitive to spin polarization than Hartree–Fock orbitals, due to the inclusion of correlation effects. For instance, the onset of spin symmetry breaking with DFT is usually at longer bond-stretches than for HF. Using the relatively small cc-pVDZ basis set keeps the compute cost of stability analysis relatively low. For barrier height and reaction energy data points involving only RHF stable species, the corresponding data point is categorized into the “easy” subset. The unstable species enter step 2.
(2) The 〈S2〉 expectation value for the “not easy” species is evaluated at the unrestricted, κ-OOMP2/cc-pVDZ level of theory. If 〈S2〉 is less than 0.005 (i.e. spin symmetry is restored), we consider that the corresponding species exhibited “artificial” symmetry breaking at the HF level which is removed by use of κ-UOOMP2. The corresponding data point is categorized into the “intermediate” difficulty subset, because κ-UOOMP2 cannot describe strong correlation without spin symmetry-breaking. Species with spin-polarized κ-UOOMP2 solutions are thus considered to exhibit “essential” symmetry breaking indicating the presence of strong correlation characteristics, thus defining the “difficult” category.
(3) Species in the “difficult” category are not only difficult for DFT calculations: they may also not be reliably treated by the benchmark level of theory used previously.20 Instead, a higher level of theory may be required, and ideally all reference data in the difficult category would be re-evaluated. However, this is very computationally costly, and so was not attempted. As an intermediate step, we selected 3 collections of 50 data points each as described later for closer assessment of the origin of some of the large discrepancies between ωB97X-D3 and the existing reference values. These data points were refined using spin-projection as described in the Methods section.
After the data classification (as shown in Fig. 2), spin-polarized DFT calculations with the 4 selected functionals ωB97X-D3, ωB97M-V, MN15, ωB97M(2) and def2-TZVP basis set are then performed. By contrast, the published RDB7 results are spin-restricted due to the use of a restricted initial guess. They are therefore reported as “R-ωB97X-D3”, in contrast to the true unrestricted DFT solutions in this new benchmark that carry U prefixes in all of the following data.
![]() | ||
Fig. 3 Comparison of the easy subset BH statistics with the unclassified full set FBH statistics from the original RDB7 paper.20 (a) Easy set BH error distribution (with forward and reverse barriers combined). (b) Full set FBH error distribution as reported in the original paper.20 |
A comparison of error statistics across different functionals further underscores the importance of the stability of orbitals. In the right panel, ωB97M-V, MN15, and ωB97M(2) show approximately the same RMSD for forward barrier height (∼2.70 kcal mol−1) on the full RDB7 set. However, their performance diverges significantly for tasks of different levels of difficulty. When evaluating the easier tasks where all species are RHF stable, the ωB97M(2) functional performs notably better than all other functionals, with an RMSD of 1.14 kcal mol−1, which is near chemical accuracy. ωB97M-V also performs well, with an RMSD of 1.55 kcal mol−1 and MAD of 1.20 kcal mol−1 consistent with the findings from other benchmarks.4,16,17,68 On the easy subset, the performance of ωB97X-D3 slightly outperforms MN15, which directly contradicts the conclusions drawn from the full set in the right panel.
For reaction energies (REs), the general trends of the difference in the performance of different functionals are very similar. However, much smaller error reduction from excluding the unstable SCF solutions is achieved compared to the barrier height tasks. Overall, ωB97M(2) and ωB97M-V also perform well on reaction energy tasks with RMSDs of 1.36 and 1.47 kcal mol−1, although their performance here (panel (a) of Fig. 4) is slightly worse than on barrier heights Fig. 3a). Surprisingly, the performance of MN15 is a little worse on the supposedly easy subset than on the full set!
![]() | ||
Fig. 4 Comparison of the easy subset RE statistics with the unclassified full set RE statistics from the original RDB7 paper.20 (a) Easy set RE error distribution. (b) Full set RE error distribution. |
In Fig. 5, we present the error distribution of the intermediate subset. The minor difference between R-ωB97X-D3 and U-ωB97X-D3 RMSD shows that (by contrast with HF) there is relatively little spin-polarization occurring in this subset for DFT. This is consistent with previous work, which demonstrates that Kohn–Sham orbitals are less sensitive to spin symmetry breaking compared to Hartree–Fock orbitals. Therefore, the restricted DFT description is often sufficient even when HF orbitals exhibit spin symmetry breaking. Similarly, the RHF:RCCSD(T)-F12 reference can still be reliable. For our purposes, this distinction was built into the classification process by using κ-OOMP2 〈S2〉 metric. It is reassuring to see that a very similar subset would have been defined by choosing based on stability of R-ωB97X-D3 rather than κ-OOMP2.
![]() | ||
Fig. 5 Intermediate set error distribution. (a) Intermediate set BH error distribution. (b) Intermediate set RE error distribution. |
Turning to the statistical performance, as expected, almost all DFT methods perform worse on the intermediate subset than on the easy subset. For barrier heights, the double hybrid functional ωB97M(2) continues to outperform the other 3 functionals, despite its RMSD almost doubling to 2.11 kcal mol−1 relative to its RMSD on the easy set (1.14 kcal mol−1). Smaller fractional increases (though of similar magnitude) are found for ωB97X-D3 (2.50 kcal mol−1 to 3.81 kcal mol−1) and ωB97M-V (1.47 kcal mol−1 to 2.65 kcal mol−1). In contrast, the MN15 functional shows only a minor increase in error, bringing its performance closer to ωB97M-V in this subset.
Reaction energy tasks generally follow the same trend when comparing intermediate set errors to easy set errors. A consistent error increase of 0.4–0.5 kcal mol−1 is found for ωB97X-D3, ωB97M-V, and ωB97M(2). MN15 shows a minimal decrease in error on the intermediate subset relative to the easy subset, yet its overall performance remains the worst of the 4 functionals in both subsets.
There are three potential contributors to the large errors seen for R-ωB97X-D3 on the difficult subset. (1) The likely instability of restricted Kohn–Sham orbitals, which fail to describe strong correlation well, (2) strong correlation errors that remain when the restricted DFT orbitals are allowed to spin polarize, and (3) the questionable validity of the RHF:RCCSD(T)-F12 reference values in this strongly correlated regime. The first issue can be mitigated by ensuring stable, unrestricted Kohn–Sham orbitals. Breaking spin-symmetry provides the lowest possible energy and hence improves the single-determinantal description of strongly correlated system.
The second issue can be assessed properly if reliable benchmark values are available, but is not readily correctable. Spin polarization often introduces considerable spin contamination. Although this cannot strictly be measured in Kohn–Sham theory, it is likely to raise the energy relative to the properly correlated low spin solution. One may attempt to use spin projection methods to correct for the error associated with spin contamination,46 but the results are ambiguous because the Kohn–Sham orbitals correspond to a fictitious, non-interacting system, and yet that wavefunction is used to evaluate 〈S2〉. We consider this to be out of scope for our assessment. By contrast, the third issue is addressable, as discussed in detail in the Methods section, although due to high compute costs, we only improve the benchmark values for a fraction of the difficult cases.
![]() | ||
Fig. 6 FBH error distribution for the 50 reactions with the largest R-ωB97X-D3 vs. RHF:RCCSD(T)-F12 discrepancy. The indices of the reactions included in this selection are provided in the Excel sheet in the ESI.† |
The next factor to address is the quality of the benchmarks themselves. Spin projected UHF:UCCSD(T)-F12 energies further improve the agreement with unrestricted DFT results as can be seen in the right and bottom panels of Fig. 6. While the refined, spin-projected CCSD(T)-F12 reference values may still carry errors of a few kcal mol−1, it is impractical to further improve their quality with excessive computational cost, and such errors are still (typically) much smaller than DFT errors. In principle, the remaining errors may now be attributed to problems in a density functional when treating energy differences that involve differential strong correlation effects. An assessment of those errors for the 4 tested functionals is shown in Table 1. All functionals have consistently larger errors on this difficult FBH collection compared to easy and intermediate BH results. The mean signed deviation is significant. Since the transition structure is typically more strongly correlated than the reactants, the positive MSD is a signature of spin contamination raising the barrier height for each of the functionals. More specifically, the performance of ωB97M(2) is particularly poor: for this difficult collection it shows an RMSD of ∼6 kcal mol−1 which is larger than the other 3 functionals. We infer that the double hybrid is somewhat more sensitive to strong correlation effects, perhaps consistent with its use of un-regularized PT2.5 A similar trend also holds for RBH, and we refer the readers to the detailed statistics in Section 4 of the ESI.†
Functional | Min | Max | MSD | MAD | RMSD |
---|---|---|---|---|---|
U-ωB97X-D3 | −3.24 | 8.78 | 3.38 | 3.67 | 4.49 |
U-ωB97M-V | −0.95 | 8.93 | 4.34 | 4.38 | 4.99 |
U-MN15 | −3.68 | 10.79 | 3.21 | 3.68 | 4.57 |
U-ωB97M(2) | 1.53 | 9.42 | 5.48 | 5.48 | 5.93 |
For reaction energies, we do similar reaction selection and refinement as summarized in (Fig. 7). With spin-polarized Kohn–Sham orbitals, the biggest outliers with deviations of ∼15 kcal mol−1 or above in the R-ωB97X-D3 vs. RHF:RCCSD(T)-F12 collection are all fixed. Replacing the reference by spin projected UHF:UCCSD(T)-F12 allows us to report the detailed error statistics in Table 2. Again, the performance of all functionals is consistently worse than on the easy and intermediate subsets (Fig. 3a and 5a). Relatively, MN15 performs less badly than the other 3 functionals, but its RMSD is still greater than 5 kcal mol−1.
Functional | Min | Max | MSD | MAD | RMSD |
---|---|---|---|---|---|
U-ωB97X-D3 | −0.43 | 13.06 | 6.26 | 6.28 | 6.81 |
U-ωB97M-V | 2.86 | 11.30 | 6.90 | 6.90 | 7.09 |
U-MN15 | −0.85 | 9.46 | 4.65 | 4.69 | 5.17 |
U-ωB97M(2) | 4.33 | 12.24 | 7.43 | 7.43 | 7.56 |
![]() | ||
Fig. 7 RE error distribution for the 50 reactions with the largest R-ωB97X-D3 vs. RHF:RCCSD(T)-F12 discrepancy. The indices of the reactions included in this selection are provided in the Excel sheet of ESI.† |
Fig. 8 illustrates the forward barrier height error distribution for these 50 data points. Notably, for several of the biggest outliers, spin projection cleans up the errors in the reference level. However, in many other cases, the refined reference does not bring the difference closer, as DFT in general does not perform well in the strong correlation regime. The chief exception is the limit of complete separation of radical fragments, where good performance is recovered; however there are no such cases in this dataset, of course! Detailed error statistics in Table 3 show much poorer DFT performance even compared to the previous collection of 50 reactions. All functionals exhibit very large RMSD values, with ωB97X-D3 even exceeding 11 kcal mol−1!
![]() | ||
Fig. 8 FBH error distribution for the 50 reactions with the largest U-ωB97X-D3 vs. RHF:RCCSD(T)-F12 discrepancy. The indices of the reactions included in this selection are provided in the Excel sheet in the supplementary files.† To simplify the visualization and better highlight the difference purely from refining the CCSD(T)-F12 reference, R-ωB97X-D3 values are excluded. |
Functional | Min | Max | MSD | MAD | RMSD |
---|---|---|---|---|---|
U-ωB97X-D3 | −3.24 | 17.26 | 10.53 | 10.66 | 11.18 |
U-ωB97M-V | 0.20 | 14.61 | 8.95 | 8.95 | 9.40 |
U-MN15 | −2.13 | 12.90 | 7.92 | 8.09 | 8.76 |
U-ωB97M(2) | 1.58 | 12.74 | 8.43 | 8.43 | 8.75 |
When we similarly refine 50 reaction energy data points, we observe no significant difference in the RE error statistics for the new collection (as shown in Table 4) with the largest U-ωB97X-D3 vs. RHF:RCCSD(T)-F12 reaction energy discrepancy (Fig. 9), in contrast to the RE error statistics of the 1st collection (as shown in Table 2). This may be due to the generally lower degree of spin contamination involved in the products than in the transition state structures, where some bonds are quite stretched. The detailed error statistics shown in Table 4 only change marginally (less than 0.2 kcal mol−1) for ωB97M-V, MN15, and ωB97M-V relative to the REs used in the 1st collection. However, the performance of ωB97X-D3 is slightly worse compared to the 1st collection.
Functional | Min | Max | MSD | MAD | RMSD |
---|---|---|---|---|---|
U-ωB97X-D3 | 2.42 | 13.06 | 7.36 | 7.36 | 7.60 |
U-ωB97M-V | 4.25 | 11.30 | 7.14 | 7.14 | 7.25 |
U-MN15 | 0.24 | 9.46 | 4.80 | 4.80 | 5.22 |
U-ωB97M(2) | 4.71 | 12.24 | 7.40 | 7.40 | 7.54 |
![]() | ||
Fig. 9 RE error distribution for the 50 reactions with the largest U-ωB97X-D3 vs. RHF:RCCSD(T)-F12 discrepancy. The indices of the reactions included in this selection are provided in the Excel sheet in the supplementary files.† To simplify the visualization and better highlight the difference purely from refining the CCSD(T)-F12 reference, R-ωB97X-D3 values are excluded. |
For the reverse barrier heights in this collection, similar quantitative and qualitative conclusions can be drawn as for the forward barrier heights, based on the corresponding tables and plots in Section 5 of the ESI.† Additionally, we identify a 3rd collection based on the largest U-ωB97M-V deviations from RCCSD(T)-F12 values (see the Section 6 of ESI† for details), where the observed trends in barrier height and reaction energy remain consistent with the 2nd collection.
To explore the origin of the large reported errors, we used a 3-step workflow to categorize the barrier height and reaction energy data points into easy, intermediate, and difficult subsets, based on the likely strength of electron correlation effects. Our main observations are as follows:
(1) In the easy subset (defined as all species involved yielding restricted Hartree–Fock (RHF) orbitals that are stable to spin polarization), our new benchmarks align well with other prior studies. The good performance on these cases where correlation effects are not strong clearly indicates that the origin of the reported discrepancies lies elsewhere.
(2) In the intermediate subset (where some species are not RHF stable but remain stable against spin polarization with κ-OOMP2) the correlation effects become more challenging to correctly capture. Hence all 4 functionals tested perform less well than in the easy subset, although the outcomes are not out of line with other independent benchmark tests.
(3) For the 84.6% barrier heights and 91.4% reaction energies in the easy and intermediate subsets, the descending order of DFT performance for BHs is ωB97M(2) > ωB97M-V > MN15 > ωB97X-D3, while for REs it is ωB97M(2) > ωB97M-V > ωB97X-D3 > MN15.
(4) The difficult subset (where κ-OOMP2 orbitals also exhibit significant spin-symmetry breaking) exhibits strong correlation effects. Statistical analysis shows clearly that these cases are the primary source of the large reported ωB97X-D3 BH errors.
(5) Benchmark results were refined for a subset of the difficult cases (particularly problematical results) using spin-projection. The improved benchmarks demonstrate that the largest restricted ωB97X-D3 errors can be significantly reduced by allowing the DFT orbitals to spin polarize. Nevertheless, the remaining errors in all 4 functionals remain quite large for strongly correlated BHs and REs.
Given these observations, we can make the following conclusions and recommendations for barrier height and reaction energy calculations using hybrid density functional theory.
(1) We strongly recommend that orbital stability analysis be performed for hybrid DFT calculations on even-electron systems in particular, but also radical systems. The use of the lowest energy (spin-polarized) solutions typically leads to significantly smaller errors versus benchmarks than use of spin-restricted orbitals, when there are orbital instabilities.
(2) To categorize the strength of electron correlations, we recommend using the workflow we describe here, to distinguish easy cases, intermediate cases and potentially difficult cases, ranging from weak to strong correlation effects.
(3) In some cases (e.g. complete separation into fragments), systems in the strong correlation category can be accurately treated by DFT calculations after allowing for spin-polarization. However, in light of the poor performance of all 4 functionals tested on the most difficult subset, we recommend caution about the accuracy of DFT with even the best functionals for systems in the difficult category.
(4) Other best practices4,70 for benchmarking functionals (adequate choices of atomic orbital basis set, numerical quadrature grid, and suitably tight cutoffs and convergence criteria) should be followed to the extent feasible.
(5) To obtain the best quality DFT-optimized structures (i.e. corresponding to the lowest energy for a given functional), especially for the transition state, the same orbital stability recommendations given above should also be followed.
There are almost certainly other existing datasets that could be improved by following these recommendations: one example that is derived from the same original source21 as RDB720 is the very large Transition1x dataset,71 used for machine learning.
Footnote |
† Electronic supplementary information (ESI) available: SI.pdf: the ESI provides detailed statistical analyses of BH and RE errors across different subsets of the RDB7 dataset. Error statistics are presented in tables embedded within violin plots, with explicit comparisons between forward and reverse barrier heights for the easy and intermediate subsets and the 3 collections from the difficult subset. Q-Chem and ORCA input templates for the data categorization are also included. RDB7_refined.xlsx: this file contains raw data for the full RDB7 set and its subsets, organized into corresponding subsheets for easy access. See DOI: https://doi.org/10.1039/d5cp01181g |
This journal is © the Owner Societies 2025 |