Julia
Westermayr
a,
Shayantan
Chaudhuri
ab,
Andreas
Jeindl
c,
Oliver T.
Hofmann
c and
Reinhard J.
Maurer
*a
aDepartment of Chemistry, University of Warwick, Coventry, CV4 7AL, UK. E-mail: r.maurer@warwick.ac.uk
bCentre for Doctoral Training in Diamond Science and Technology, University of Warwick, Coventry, CV4 7AL, UK
cInstitute of Solid State Physics, Graz University of Technology, 8010 Graz, Austria
First published on 6th June 2022
The computational prediction of the structure and stability of hybrid organic–inorganic interfaces provides important insights into the measurable properties of electronic thin film devices, coatings, and catalyst surfaces and plays an important role in their rational design. However, the rich diversity of molecular configurations and the important role of long-range interactions in such systems make it difficult to use machine learning (ML) potentials to facilitate structure exploration that otherwise requires computationally expensive electronic structure calculations. We present an ML approach that enables fast, yet accurate, structure optimizations by combining two different types of deep neural networks trained on high-level electronic structure data. The first model is a short-ranged interatomic ML potential trained on local energies and forces, while the second is an ML model of effective atomic volumes derived from atoms-in-molecules partitioning. The latter can be used to connect short-range potentials to well-established density-dependent long-range dispersion correction methods. For two systems, specifically gold nanoclusters on diamond (110) surfaces and organic π-conjugated molecules on silver (111) surfaces, we train models on sparse structure relaxation data from density functional theory and show the ability of the models to deliver highly efficient structure optimizations and semi-quantitative energy predictions of adsorption structures.
Electronic structure theory plays a vital role in the characterization and exploration of organic–inorganic interfaces and materials, but is limited by intrinsic errors such as the lack of long-range dispersion interactions in common density functionals16–18 and the high computational effort associated with the intrinsic length scale of surface structures. The former issue has been addressed in recent years with the emergence of efficient and accurate long-range dispersion correction methods such as the Grimme16 and Tkatchenko–Scheffler (TS) families of methods.19 In case of metal–organic interfaces, the surface-screened van-der-Waals (vdWsurf)20 and many-body dispersion (MBD)21,22 methods, in combination with generalized gradient approximations (GGAs) or range-separated hybrid functionals, have been shown to provide highly accurate predictions of adsorption structures and stabilities.10,11,18,23–28 Reliable identification and optimization of structures at metal–organic interfaces is a particular challenge due to the structural complexity and the large number of degrees of freedom (molecular orientation, adsorption site, coverage),15 which creates a particular need for structural exploration methods that are efficient. Examples of simulation methods that can alleviate computational effort compared to DFT include semi-empirical electronic structure methods, such as density functional tight-binding (DFTB),29 which usually provides a good compromise between accuracy and computational efficiency. Recently, DFTB has been coupled with the vdW and MBD methods29,30 to incorporate long-range dispersion, but unfortunately few reliable DFTB parametrizations for metal–organic interfaces exist to date.31
Machine learning-based interatomic potentials (MLIPs) offer high computational efficiency whilst retaining the accuracy of the underlying training data based on electronic structure theory. Atomistic MLIP methods include Gaussian approximation potentials32–34 or neural network (NN) potentials (e.g. SchNet,35–37 PhysNet38 or Behler–Parinello type NNs39–41), which describe atoms in their chemical and structural environment within a cutoff region. MLIPs have the potential to advance structure searches,42–44 geometry optimizations,45,46 and molecular dynamics (MD) simulations40,47–49 of highly complex and large-scale systems comprising many thousands of atoms.50 However, most established MLIP approaches learn short-range interactions between atoms by introducing a radial cutoff within which the atomic interactions are captured. This can lead to challenges when attempting to capture long-range electrostatic or dispersion interactions.38
Recent attempts of accounting for long-range interactions in MLIPs have explicitly treated them as separate additive contributions to the potential,38,51–54 such as the third and higher generation NN potentials of Behler and co-workers,55,56 where a charge-equilibration scheme was introduced. Earlier work by Behler and co-workers51,52 has also shown that the simulation of liquid water can be facilitated with neural networks trained on energies and atomic charges, where the latter was used to correct for electrostatic interactions. This scheme was later complemented with long-range dispersion interactions based on the Grimme D3 correction method.52 Atomic charges were further used in TensorMol-0.1 (ref. 53) to augment the total energy with Coulomb and vdW corrections. A similar approach was applied by Unke and Meuwly38 in PhysNet, where the total energy was corrected with additive terms that include electrostatic corrections obtained from partial atomic charges and a D3 dispersion correction term. Recently, this description was extended in SpookyNet, where the total energy is corrected with empirical terms for the nuclear repulsion based on an analytical short-range term, a term for electrostatics and a term for dispersion interactions.54 The aforementioned approaches have been demonstrated to accurately describe MD or spectroscopic signatures,53 small clusters on surfaces,56 water dimers57 and clusters,52 crystals,57 and phase diagrams.58
An accurate DFT-based description of hybrid organic–inorganic interfaces is challenging. The Tkatchenko–Scheffler family of methods have proven to provide robust and reliable descriptions of structure and stability. It was recently shown that these electron-density-dependent methods can also be combined with semi-empirical methods such as DFTB30 and force fields,59 which suggests that a similar approach could be viable for structure prediction at hybrid organic–inorganic interfaces.
In this work, we present a deep learning approach that combines an NN-based MLIP with an established long-range dispersion method from the TS family of methods to efficiently predict structures and stabilities at metal–organic interfaces for the purpose of high-throughput structural (pre)screening and global energy landscape exploration. As shown in Fig. 1, the short-range description is provided by a local MLIP, whereas the long-range interaction is provided by one of the TS methods such as vdWsurf. We couple the two approaches by constructing an ML representation of Hirshfeld atoms-in-molecules volumes.19,61 The atomic volumes are used to rescale atomic polarizabilities that enter the long-range description based on the local chemical environment of the atoms provided by the DFT description of short-range interactions. The method is applicable for the incorporation of additive a posteriori dispersion correction schemes, such as vdW(TS), vdWsurf, and MBD methods that are implemented in the Libmbd package62 or the Grimme dispersion methods D3 and D4.63,64 However, the method in its current form cannot be used for self-consistent vdW corrections such as the vdW-DF family of methods.65,66 If training on data obtained from vdW-DF methods is sought for, adaptions are needed to accurately model long-range effects, e.g., as it is done in PhysNet38 or SpookyNet.54
Fig. 1 Overview of the method developed in this work. Different machine learning interatomic potentials (MLIPs) that allow for the computation of Hirshfeld volume ratios can be combined with different flavors of van der Waals (vdW) corrections, e.g. screened vdW pairwise interactions19 and many-body dispersion (MBD).21 The so-obtained MLIPs are interfaced with the Atomic Simulation Environment (ASE)60 and can be used for global structure searches, optimizations, energy predictions or other types of simulations implemented within ASE. |
We deliver an open-access implementation of our approach by coupling the Atomic Simulation Environment (ASE)60 with the Libmbd package62 and the DFT-D3 code.63 The trained interatomic potentials are independent of the long-range vdW correction. Therefore a trained ML model can be combined with different vdW corrections after model training. To further increase the robustness of our approach, we implement query-by-committee,39,67,68 which establishes the model variance in energy and force predictions. This allows us to define a dynamic stopping criterion for when the prediction of the MLIP becomes unreliable and structure optimizations have to be continued with electronic structure theory. This is particularly useful in the context of efficient pre-relaxation of structures to reduce the computational cost associated with structure search. We show the utility of our approach on two systems, namely a global structure search for gold (Au) NCs adsorbed onto a diamond (110) surface and the structural relaxation of large conjugated organic molecules, namely 9,10-anthraquinone (A2O), 1,4-benzoquinone (B2O), and 6,13-pentacenequinone (P2O), summarized as X2O, adsorbed onto a silver (Ag) (111) surface that self-assemble into a variety of surface phases.9 The model for X2O on Ag(111) is trained on sparse data extracted from open data repositories, which shows the utility of the model to facilitate structure pre-relaxations. We further demonstrate that the ML models trained on these data are transferable to different aromatic organic molecules on the same surface that were not contained in the training data set.
(1) |
(2) |
The many body dispersion (MBD) scheme is an extension of the vdW method that accounts for long-range correlation interactions beyond the pairwise limit. This is achieved by the construction of a Hamiltonian of dipole-coupled quantum harmonic oscillators located at the positions of the atoms and parametrized to reflect the effective atomic polarizability of the atom in the molecule.
In this work, we couple both the vdW and MBD long-range dispersion schemes to an MLIP by creating an ML model of the Hirshfeld-based scaling ratios (HA) for all atoms A in the system. We note that the range-separation parameter in MBD and damping coefficient used in vdW are the only parameters specific to the employed exchange–correlation functional approximation to which the dispersion correction is coupled. As we train MLIPs to reproduce training data created with a specific exchange–correlation functional, we can retain the same parameters as used for the respective functional for vdW corrections to the generated MLIP.
Throughout this work, we employ the ASE code which offers calculator interfaces to various electronic structure packages.60 The ML models in this work are based on the continuous-filter convolutional NN SchNet,35–37 which is a message-passing NN that learns the representation of the atomic environments in addition to its relation to the targeted output. ASE also provides an interface to the deep learning toolbox SchNetPack to employ NN-based MLIPs within ASE.37 We have implemented an ASE calculator interface for the Libmbd code62 and further implemented an ASE calculator instance that combines a short-range calculator (e.g. electronic structure package or MLIP based on SchNetPack) with a Libmbd calculator instance. This interface calculator passes Hirshfeld scaling ratios predicted by an ML model into the Libmbd calculator to perform vdW- or MBD-corrected SchNet calculations (denoted ‘ML+vdW’ and ‘ML+MBD’, respectively). All developed code is freely available on GitHub.70
As starting point for the training data set for Au@C models, we used 62 geometry optimizations of Au NCs on diamond (5, 4, 8, 8, 9, 10, and 18 geometry relaxations were conducted on Au clusters of size n = 15, 20, 30, 35, 40, 45 and 50 atoms, respectively, on the aforementioned diamond (110) surface model). The training data points were collated using every relaxation step of the optimization runs, which therefore included both optimized and not fully-optimized structures. These computations led to an initial training data set comprising 5368 data points, which we used to train four MLIPs (trained on energy and forces). All MLIPs were trained using the same data set, which was split randomly into training, validation, and test sets. All ML models trained on the initial training data set are denoted as “MLinit.”. MLIPs were used to predict ‘local’ energies and forces as well as Hirshfeld volume ratios to correct for long-range interactions at the MBD level. For energies and forces, we trained a set of models to use the query-by-committee approach discussed in subsection 2.4, which makes energy predictions more robust by a factor of , where q is the number of trained ML models. The training process of energies and forces is explained in detail in section S1.1 in the ESI.† The models slightly differed in the weights of energies and forces used in the combined loss function (see eqn (3) and discussion in the next subsection). The model architecture and hyperparameter optimizations for the Hirshfeld model can be found in the ESI in section S1.2.†
To extend the training data set, adaptive sampling39 was carried out, which was originally developed for molecular dynamics simulations. Importantly, the predictions of the set of ML models are compared at every time step. Whenever the variance of the models exceeded a predefined threshold (with the threshold often being set slightly higher than the root-mean-squared error of the models on a test set73), the data point was deemed untrustworthy and recomputed with the reference method. This data point was then added to the training set and the models were retrained. In this work, we applied this concept to a global structure search using the basin-hopping algorithm74,75 as implemented in ASE60 rather than MD simulations. After each geometry optimization during the basin-hopping run, the variance of the model predictions was computed and geometries with the largest model variances were selected for further DFT optimizations. These optimizations were then added to the training set. Stopping criteria for ML optimizations are discussed in section 2.4.
In total, three adaptive sampling runs were carried out. The first adaptive sampling run was carried out with the initial ML models, “MLinit.”. After data points were sampled and the data set was extended, ML models were retrained. MLIPs after the first adaptive sampling run (denoted as MLadapt.1) were trained on 7700 data points for training and 800 data points for validation. Before adaptive sampling, ML models deviated by several 10s of eV for cluster sizes that were not included in the training set, leading to unphysical structure relaxations. After adding additional data points, the average model variance decreased to around 0.1 eV with maximum errors in the range of 1 eV, when the training regime was left. To further increase the accuracy of the ML models a second adaptive sampling run MLadapt.2 was executed with MLadapt.1. A total of 9757 data points were collected after the second adaptive sampling run. MLadapt.2 models were trained on 8500 data points for training and 800 data points for validation. After the final adaptive sampling run (MLadapt.3), there were a total of 15293 data points. 12500 data points were used for training and 1500 for validation. More details on the adaptive sampling runs can be found in section S1.1.†
For Au@C, four ML models were trained on energies and forces (see section S1.1† for details) and one model on Hirshfeld volume ratios, which was used in all geometry optimizations. As mentioned earlier, adaptive sampling was not carried out for this data set as we wanted to base our models purely on sparse existing data derived from a small set of geometry optimizations to showcase the usability of our model to speed up structure relaxations.
In addition, both DFT and ML structure relaxations of 16 B2O@Ag systems far away from the surface were conducted and served as a test set. These structures are especially challenging to relax as common optimization algorithms often fail for systems that are far away from the optimized structure, even with DFT and long-range interactions. One problem is that vdW forces decrease quickly with the distance of an adsorbate to the surface, and quasi-Newton optimizers with simple Hessian guesses can converge to a geometry that has hardly changed compared to the initial structure. This problem can be overcome by using an improved Hessian approximation for the initialization of the optimization. In this work, we used the Lindh Hessian71,80 to initialize structure relaxations for DFT+vdWsurf and ML+vdWsurf calculations. The same optimization criteria were used as in the reference calculations, but we used the ASE calculator with our vdW implementation rather than FHI-aims for consistency.
(3) |
In contrast, the Hirshfeld volume ratios were fitted per atom using another SchNet model that was adapted for this purpose. The corresponding loss function, LH2:
(4) |
As mentioned in the previous subsection 2.2.2 the X2O@Ag data was generated using two basis sets for Ag atoms depending on their position. Different basis sets will result in different energies and forces. Therefore, the data set was pre-processed prior to training by representing all the Ag atoms that were described using a ‘very light’ basis set with a different atom label. This process allowed the MLIPs to be trained on data with mixed basis sets.
The query-by-committee approach39,67,68 takes the mean of the predictions of q ML models for a given property, P: . In all subsequent calculations, we followed the mean of the potential energy surface and corresponding forces. While the accuracy and robustness of the predictions can be improved by a factor of ,81 no improvement for the predictive accuracy of other properties such as dipole moments, could be achieved. We also found that the prediction of Hirshfeld volume ratios was not improved by the query-by-committee approach, so only one ML model was used for learning Hirshfeld volume ratios in the following. The reason for this can be manifold and is likely due to the fact that the accuracy of the Hirshfeld volume ratio models is already very high as compared to the energy models, which is why query-by-committee is unlikely to strongly improve the prediction accuracy of Hirshfeld volume ratios.
A further consequence of having more than one ML model for energies is that this approach allows us to assess the reliability of the ML predictions by computing the model variances,
(5) |
To find optimal stopping criteria of the optimization with ML models, we explored a random grid of 1000 different stopping criterion combinations for structure relaxations of the Au@C test set using MLinit. and the X2O@Ag test set (see Fig. S2a and b,† respectively). The ability to perform 1000 geometry optimizations as a test further showcases the computational efficiency of the approach. Test runs showed that introducing an additional initial fmaxinit. value as a threshold, after which the ML model variance for energies, EMLvar (eqn (5)) is monitored, is beneficial with respect to the agreement of the final ML-optimized structure and DFT-optimized structure. The fmaxinit. value was found to be relatively robust and set to 0.15 eV Å−1 for the test studies shown in this work, but it can be set to a different value by the user to take into account the requirements of other ML models. We tested different thresholds between 0.1–0.2 eV Å−1 for initial models and found that the structures obtained were very similar and differed by less than 0.05 Å root-mean-squared deviation.
As soon as the fmaxinit. value was reached during an optimization, the number of consecutive steps that showed rising energy variances was monitored. The number of consecutive steps that showed rising energy variance was varied in a grid search and we found three consecutive steps of increasing energy variance to be a good criterion to stop the optimization algorithm with final structures closest to the DFT reference minimum (Fig. S1†) for a range of different fmaxinit. values. The energy variance between different ML models will always fluctuate around a small number, even in the case of reliable geometry relaxations. Hence, the energy variance can become larger in consecutive steps without necessarily indicating that the structure relaxation becomes unreliable. Three consecutive steps in which the energy variance was rising was found to be small enough to still ensure that the structure is not already too far away from the last reliable structure. The results obtained when using slightly different parameters (Fig. S1†) for structure optimizations of nanoclusters on surfaces and molecules on surfaces show that parameters are robust and relatively generally applicable to ML models trained on other types of systems. To further ensure that the optimization did not run out of the training regime, we terminated the algorithm after fmaxinit. was reached and after that, whenever the model energy variance reached a high value that we set to 1 eV or when the fmax jumped to a value that was larger than 2 eV Å−1. Both events were observed when model predictions ran into regions not supported by training data. These additional parameters are only relevant for models that are trained on a small training set and ensure that the optimization is stopped before the training regime is left. At that point, the remaining optimizations can be carried out with the reference method. For MLadapt.3 models, an fmax value of 0.05 eV Å−1 was able to be reached, hence the additional stopping criteria were not required using these refined models.
The MAE of the four models ranges from 0.017 to 0.021 eV for energies and 0.021–0.025 eV Å−1 for forces for X2O@Ag. ML models trained on Au@C have MAEs of 0.013 to 0.18 eV for energies and 0.014 to 0.26 eV Å−1 for forces. As can be seen, there are some outliers in the data set of Au@C with errors on these data points shown in the insets of top panels a and b. These data points are geometries with unfavorable structures and energies far out of the region in which most data points lie. These data points were included to ensure that the model was able to rank structures correctly and predict energetically unfavorable structures with high energies. For training on these data points, the L2 loss was adapted to a smooth version of the L1 loss, which is explained and defined in section S1.2.†
Besides data points representing unfavorable Au@C NCs with large vdW-free energies and vdW-free forces that were explicitly introduced into the training set, the ML models predict vdW-free energies, vdW-free forces, and Hirshfeld volume ratios accurately. The MAE for the Hirshfeld volume ratios, a quantity that ranges between about 0.6 and 1.05, is 3.9 × 10−4 and 1.1 × 10−4 for X2O@Ag and Au@C, respectively.
In the following, we will assess the performance of the proposed method by performing structure relaxations of geometries of two additional hold-out test sets for X2O@Ag and Au@C. These hold-out test sets comprise full structure optimizations and none of the geometry optimization steps during the relaxations were included for training.
The benefit of using ML+MBD instead of DFT+MBD lies in the reduction of computational effort associated with structure relaxations. Fig. 3c and d show the computational costs of structure relaxations with ML+MBD, DFT+MBD and a ML+MBD pre-optimization followed by a DFT+MBD optimization (denoted ‘ML+MBD//DFT+MBD’). Panel c shows the cost of a single structure relaxation in kilo-central processing unit hours (kCPUh), recorded on dual AMD EPYC™ Zen2 7742 64-core processors at 2.25 GHz. As can be seen, the computational cost of ML+MBD optimization (black) is about 0.01% of the cost of DFT+MBD. However, it can be argued that the structure relaxations solely conducted with ML+MBD might not be accurate enough for a specific purpose and are not sufficiently close to DFT+MBD. To this aim, we performed DFT+MBD optimizations using the optimized structures obtained from the MLinit. (yellow), MLadapt.1 (pink), MLadapt.2 (red), and MLadapt.3 (dark red) models and summed up the computational expenses from respective ML+MBD and additional DFT+MBD calculations. In this approach, ML+MBD acts as a pre-optimization method. As expected, the computational cost increases when combining ML+MBD with DFT+MBD. However, the better the optimized structure resulting from the ML model, the fewer DFT+MBD optimization steps are required. This is why the combination of refined adaptive models with DFT incurs less computational cost for the same task than the initial model in combination with DFT.
Fig. 3d plots the computational cost of performing one to 10000 structure optimizations of the different models including the cost of generating the training data set for the ML model construction. The costs are extrapolated and are shown relative to DFT+MBD (100%, dark blue). As can be seen from the dotted black lines, using the final ML model, MLadapt.3+MBD can greatly reduce the computational costs whilst still achieving good accuracy (see panels a and b). Note that ML+MBD values include the cost of training data generation and model training. In case of large-scale screening studies, where many geometry optimizations are required, it is clearly beneficial to use refined and accurate ML+MBD models. In cases where high accuracy is required, a subsequent re-optimization with DFT+MBD to reach an fmax of < 0.01 eV Å−1 may be necessary. In this scenario, we find that the ML+MBD//DFT+MBD optimization sequence is only computationally beneficial to standalone DFT+MBD optimization if the number of required structural relaxations is between 100 and 500. In Fig. 3d, MLinit. − MLadapt.3 refers to models trained on more and more data points. The break-even point in terms of computational cost for ML+MBD//DFT+MBD is similar for all models, but lowest for “adapt.2” (about 100 structure relaxations) and highest for “init.” (about 500 structure relaxations). This shows that there is a sweet spot for the construction of MLIPs between the cost of creating an (overly) large training data set and the computational time saving benefit.
To validate the reliability of the structure and adsorption energy prediction of the ML+MBD models for Au@C, three basin-hopping optimization runs that were carried out for the initial adaptive sampling runs for clusters of size n = 6, 15 and 40 were selected. The global minimum and two random local minima were selected from each basin-hopping run for the different cluster sizes. The basin-hopping run for a cluster size of n = 6 is shown in Fig. 4a. The three structures used for validation are denoted S1–S3 (yellow in panel b) and were re-optimized with DFT+MBD (blue) and MLadapt.3 (red) separately. In panel Fig. 4c, the structures of DFT+MBD are compared to those of MLadapt.3+MBD. The structures are very similar to each other with slight deviations visible in geometry S3.
Fig. 4 (a) Basin hopping run with MLinit. for Au@C with Au6 nanoclusters (NCs). Yellow circles indicate (b) 3 selected structures S1–S3 that include the energetically lowest geometry and two randomly selected structures according to MLinit. that are (c) reoptimized with DFT+MBD (blue) and MLadapt.3+MBD (red). (d) Relative energies reported with respect to the energetically lowest cluster for each method. In addition, energy rankings of the energetically lowest structures and two randomly selected structures from basin hopping runs with NC sizes of 15 and 40 atoms using MLinit.+MBD (yellow), MLadapt.3+MBD (red), and DFT+MBD (blue). Corresponding structures are shown for each method in Fig. S2.† |
The energies of the three structures are plotted in Fig. 4d relative to the most stable structure. Even though the structures are not exactly the same, the energies are ranked similarly to each other. The ordering of the three structures is also correctly predicted with each method. As expected, the energy ranking of MLadapt.3+MBD is closer to the relative energy ordering of DFT+MBD than the initial ML model. Panel d further shows the results of the same procedure carried out for cluster sizes of n = 15 and 40, respectively. The structures for all clusters as predicted by all methods are visualized in Fig. S2† of the ESI. As can be seen, for the Au NC with 15 atoms, the energies are ordered incorrectly according to the initial model. The correct ordering of energies is established with the final model, MLadapt.3+MBD, and is similar to DFT. However, the highest energy geometry is predicted to be more stable than in the reference. This result could be an indication that the least favorable structure with a size of 15 is in a region of the potential energy surface that is under-represented in the training set. Indeed, the energy variance according to the query-by-committee approach is 4 times higher for this structure (around 30 meV) than for the other clusters (around 7 meV). For the Au NC with 40 atoms, the initial model suggests three energetically different structures, while the MLadapt.3+MBD and DFT+MBD methods suggest that the first two structures are identical in their energy. To conclude, ML combined with a long-range dispersion correction (MBD in this case) can reduce the computational effort of structure relaxations with DFT+MBD substantially. Given the rich diversity of structures and cluster sizes and the relatively few data points required, the model can be utilized as a pre-optimizer that leads to radial atom distributions close to the DFT+MBD optimum and can facilitate fast global structure searches including an approximate energy ranking of structures.
The test to validate the new method is carried out as follows: the same starting geometries were used for ML+vdWsurf geometry relaxations as were used in DFT+vdWsurf reference optimizations. As can be seen from Fig. 4a, our method reports adsorption heights that are very similar to those obtained with DFT+vdWsurf. The structural similarity can be further assessed from panels b (P2O-2) and d (A2O-2), which show the ML+vdWsurf compared to DFT+vdWsurf structures with the worst agreement in adsorption heights between ML and DFT. The top images show ML+vdWsurf-optimized structures in red and DFT+vdWsurf-optimized structures in blue. Bottom images show the error of each atom in Å. The ML-predicted minimum energy structures are typically relatively close to DFT-predicted structures with the largest deviations in adsorption height per atom at about 0.2 Å. Most deviations are below 0.05 Å. Noticeably, these are not differences in bond lengths (Fig. S4†) but absolute positions in z direction. Visualizations for the remaining structures presented in 5a and c are shown in Fig. S3† of the ESI.
In addition to the adsorption heights, we sought to assess the adsorption energies for the purpose of relative energy predictions of adsorption phases with respect to each other. As can be seen from panel c, the trend observed in the reference data can mostly be reproduced when comparing different molecules. There is hardly any trend in over- or underestimation of adsorption energies and the mean error on adsorption energies is around 0.10 ± 0.06 eV. While the model can distinguish adsorption energies between different molecules, it fails to distinguish the adsorption energies for energetically beneficial local minima of the same molecule at different symmetry sites. Achieving this would likely require more training data than what was provided in the original NOMAD data repository.
As a more difficult challenge for the model, we generated an additional test set of 16 B2O structures on Ag(111) with DFT+vdWsurf, which are far from the surface. These structures required around five to six times more optimization steps than the calculations in the training set and thus provide a test set with initial structures that are much less favorable than those in the training set and the structures tested before. As mentioned briefly in the Methods section, geometry optimization algorithms struggle with geometries far away from the surface and require additional considerations. To counter this problem, a two-fold optimization was conducted with our method. First, all atomic positions of the molecule were fixed apart from motion along the [111] direction, with the Ag(111) substrate fully constrained. After this initial relaxation, the molecule was allowed to relax into all directions and the top three Ag layers were also allowed to relax. The rest of the Ag layers were kept frozen, as in ref. 9. To initialize the optimizations, we used the Lindh–Hessian71,80 as was done in DFT+vdWsurf optimizations. The results are shown in Fig. 6a. Our model gives fair adsorption heights for the systems when compared to the DFT reference and can be used as a computationally efficient pre-relaxation procedure without ever learning from data of systems with large molecule-metal separation, as those were accounted for by the long-range dispersion correction. The mean error for adsorption heights is relatively low and around 0.04 ± 0.02 Å.
Fig. 6 (a) Adsorption heights of B2O molecules on Ag(111). (b) Adsorption heights of benzene,82 naphthalene,83 anthracene,84 pentacene,85 and azulene,83 computed with ML+vdWsurf and compared to DFT+vdWsurf. The same adsorption sites as mentioned in the cited references (Table 1) are used. |
The final challenge was to test our model for transferability to other organic molecules that have not been seen by the model. This would open the possibility to generate a fully transferable MLIP for hybrid metal–organic interfaces to be applied as a general structural pre-relaxation tool. We tested our approach on several different organic molecules adsorbed on Ag(111) that have been experimentally and computationally characterized previously, namely benzene, naphthalene, anthracene, pentacene (all from the acene family), and azulene. According to literature,24,83,84,86 the most stable symmetry site was selected (indicated in Table 1 in the first column) to compare our results to available DFT data in literature and experimental data. We note that models trained on such sparse data will likely fail to reliably predict energy differences between different adsorption sites. The gas-phase optimized structure of each organic molecule was placed around 3.3 Å away from the surface. A similar two-step optimization procedure was applied as before. As shown in Fig. 6b, the trend in adsorption heights across molecules that is found with DFT+vdWsurf (blue triangles) can be reproduced with ML+vdWsurf (red crosses). The deviations are in the range of ±0.1 Å vertical adsorption height. Considering that none of the molecules were featured in the training data set, this demonstrates the increased transferability that the model inherits due to the separate treatment of long- and short-range interactions. The molecules that lead to the largest deviations in adsorption heights are azulene and anthracene. Besides low computational costs, a further advantage of the proposed method is that the vdW correction can be changed. To demonstrate the flexibility of our method we further relaxed structures at ML+MBD level and compute the related adsorption heights (dark-red star-like shapes). As can be seen from Fig. 6b, the adsorption heights are very close to ML+vdWsurf. Larger deviations are only seen when it comes to benzene. However, the prediction of ML+MBD is in line with the adsorption height of 2.97 Å reported in ref. 82 and 87.
Molecule (symmetry) | Adsorption energy [eV] | |||
---|---|---|---|---|
DFT+vdWsurf | ML+vdWsurf | DFT+MBD | ML+MBD | |
Benzene (hcp0)24,82,87 | −0.75 | −0.81 | −0.57 | −0.77 |
Naphthalene (top30)88 | −1.08 | −1.19 | −0.77 | −1.10 |
Anthracene (hcp0)84 | −1.38 | −1.53 | −0.93 | −1.12 |
Pentacene (bridge60)84,86 | −2.40 | −2.12 | −1.65 | −1.79 |
Azulene (top30)88 | −1.37 | −1.22 | −0.91 | −1.07 |
In addition to adsorption heights, we sought to investigate whether the ML+vdWsurf method can be used to approximate adsorption energies. Table 1 shows the computed adsorption energies with both, ML+vdWsurf and ML+MBD. The trends observed in members of the acene family, i.e., increasing adsorption energy with increasing molecular size, can be reproduced with both methods. However, some energies are overestimated, while others are underestimated with respect to DFT+vdWsurf, which correlates with adsorption heights being over- and underestimated, respectively, for all structures except for anthracene. Nevertheless, given the fact that these systems were never seen by the ML models and that the ML models are based on a small amount of data, the results are encouraging for a future development of fully transferable ML models for a wide range of physisorbed structures with only little amount of additional data. This could be applied to large-scale screening studies of organic molecules on surfaces and to perform structural pre-relaxations.
To apply the method for structure pre-relaxations with ML models trained on little data, we additionally incorporated dynamic stopping criteria that take the variance of machine learning predictions into account and ensure the structure relaxation does not run into unreliable territory. The method was tested for fast (pre-)relaxations of complex hybrid systems. Firstly, we demonstrated our framework on gold nanoclusters on a diamond (110) surface and showed that by adaptively optimizing the ML models, global structure searches can be enabled that would be computationally too expensive without the use of ML.
Secondly, we reused data from ref. 9 of three organic molecules (X2O) on Ag(111) surfaces. The goal of this study was to assess the applicability of ML models based purely on reused data from open data repositories without generating a tailor-made training data set. This reflects the realistic application scenario in which a small set of initial geometry optimizations can be used to construct an ML+vdW model that can computationally expedite structural pre-relaxation. The conducted tests show not only the power of open data for developing new methods, but also demonstrate that the method can be used to semi-quantitatively predict adsorption heights and energies and to pre-relax challenging starting systems. Finally, we tested the transferability of our model to unseen organic molecules on Ag(111).
The approach we present is of general utility for the computational surface science community and has the potential to drastically reduce the computational effort of some of the most common tasks in this field. While our method accounts for long-range dispersion interactions, it does not explicitly treat electrostatic interactions. To account for this, the SchNet+vdW approach could be extended in a similar vein by learning partial atomic charges and using these to predict electrostatic long-range interactions, similar to SpookyNet54 or Behler's fourth-generation high-dimensional neural networks.56,89 Our data provides evidence that the construction of a more general and transferable structure relaxation model of hybrid organic–metallic interfaces is feasible and potentially desirable, although small (and rough) system-specific models may be more advantageous in many cases.
Footnote |
† Electronic supplementary information (ESI) available. See https://doi.org/10.1039/d2dd00016d |
This journal is © The Royal Society of Chemistry 2022 |