Ketong
Shao
a,
Aditya Dilip
Lele‡
b,
Zhiyu
Shi
b,
Victor
Von Miller
a,
Yiguang
Ju
bc and
Ali
Mesbah
*a
aDepartment of Chemical & Biomolecular Engineering, University of California, Berkeley, USA. E-mail: mesbah@berkeley.edu
bDepartment of Mechanical and Aerospace Engineering, Princeton University, Princeton, USA
cPrinceton Plasma Physics Laboratory, Princeton, USA
First published on 19th February 2025
Low-temperature plasma catalysis holds promise for electrification of energy-intensive chemical processes such as methane reforming and ammonia synthesis. However, fundamental understanding of plasma–catalyst interactions, essential for catalyst design and screening for plasma catalysts, remains largely limited. Recent work has demonstrated the importance of first-principles studies, including density functional theory (DFT), for elucidating the role of electro- and photo-effects such as electric field and charge in plasma catalysis. The availability of increasing amounts of DFT data in thermal catalysis presents a unique opportunity for plasma catalysis research to efficiently leverage this existing first-principles knowledge of thermal catalysis towards investigating plasma–catalyst interactions. To this end, this paper investigates interpretable transfer learning from thermal to plasma catalysis, with a focus on the role of surface charge. Pre-trained attention-based graph neural networks (GNNs) from the Open Catalysis Project, trained using millions of thermal catalysis DFT data points, are structurally adapted to account for surface charge effects and fine-tuned using plasma catalysis DFT data of single metal atoms on an Al2O3 support and adsorbates involved in plasma-catalytic ammonia synthesis. Not only does the fine-tuned attention-based GNN model provide high test accuracy for predicting adsorption energies and atomic forces in plasma catalysis, but it also exhibits adequate extrapolation for unseen single metal atoms in the plasma catalysis data used for model fine-tuning. To distinguish the effects of surface charge from other dissimilarities in DFT data of thermal and plasma catalysis, a dual-model framework is presented that relies on two pre-trained GNNs, one of which is specifically tasked to capture surface charge effects using an attention mechanism that provides interpretable insights into their role. Lastly, it is demonstrated how the attention-based GNNs developed for single metal atoms can be efficiently adapted for predicting adsorption energies and atomic forces for metal clusters in plasma catalysis. This work highlights the vast potential of interpretable transfer learning from thermal catalysis to plasma catalysis to mitigate excessive computational requirements of first-principles studies in plasma catalysis, towards accelerating fundamental research in this domain.
Broader contextLow-temperature plasmas (LTPs) have received increasing attention for renewably electrified synthesis of chemicals, such as methane reforming, NOx generation, and ammonia synthesis, amongst others. This is due to the unique ability of LTPs to facilitate chemical reactions under atmospheric pressure and low temperatures. Additionally, LTPs are characterized by an abundance of high-energy electrons that can induce vibrationally-excited species, potentially resulting in new reaction pathways and reduced energy consumption. As such, LTP processes have the potential to enable decentralized and on-demand chemical production, as an alternative to large-scale and energy-intensive centralized chemical processes. The performance of LTP processes in terms of energy efficiency and productivity can be further enhanced via integration with catalysts. The availability of increasing amounts of DFT data in thermal catalysis presents a unique opportunity for plasma catalysis research to efficiently leverage this existing first-principles knowledge of thermal catalysis towards investigating plasma–catalyst interactions. This work highlights the vast potential of interpretable transfer learning from thermal catalysis to plasma catalysis to mitigate excessive computational requirements of first-principles studies in plasma catalysis, towards accelerating fundamental research in this domain. |
The performance of LTP processes in terms of energy efficiency and productivity can be further enhanced via integration with catalysts.8–12 These improvements are postulated to arise from the intricate, but poorly-understood, synergies between the plasma and catalyst.13,14 Despite extensive experimental efforts on investigating the role of factors such as electric field,15 surface charges,16–18 surface reactions involving excited species,8 atoms, and photons,19,20 amongst others, there remain significant gaps in the fundamental understanding of plasma–catalyst interactions, let alone designing effective catalysts tailored for plasma catalysis.21 On the other hand, first-principles studies, particularly density functional theory (DFT), have proven useful for the investigation of plasma–catalyst interactions. Liu et al.22 used DFT to investigate the role of Eley–Rideal (E–R), Langmuir–Hinshelwood (L–H), and radical adsorption and dissolution processes in plasma catalysis across nine different metals, identifying a viable pathway for ammonia synthesis through the formation of NNH via radical reactions. Mehta et al.8 studied vibrational excitation of N2 and the resulting surface reactions with excited species via DFT, uncovering distinct routes for plasma-catalytic ammonia synthesis. Bal et al.23 introduced DFT methods for probing charged surfaces, which revealed an altered CO2 binding energy on γ-Al2O3 surfaces under the influence of surface charge. Lele et al.24 investigated the effects of surface charge on plasma-catalytic NH3 synthesis, showing that charged catalytic surfaces can enhance NH3 production. Shao and Mesbah25 used an integrated microkinetic-DFT model to investigate how the electric field, along with other LTP process parameters such as gas temperature, can influence plasma-catalytic ammonia synthesis, providing new insights into trade-offs between the NH3 production rate and energy consumption.
Despite these advances, the use of DFT for catalyst design and screening remains an open problem in plasma catalysis. The challenge is two-fold. First, there is a need for new theory, and possibly computational methods, to effectively account for the myriad of plasma-induced effects on surfaces via DFT. Second, the inclusion of these effects in DFT calculations can significantly increase their complexity, cost, and computational requirements. On the other hand, DFT is increasingly used to guide catalyst design and screening in thermal catalysis,26 which has led to an abundance of data generated from DFT calculations. These efforts are further facilitated by the advances in machine learning to learn computationally efficient surrogates for DFT, towards accelerating the discovery of thermal catalysts.27 Notably, DFT surrogates are trained on millions of data points that encompass various metal surfaces and adsorbates.28 These surrogates can then perform tasks such as rapid prediction of system energy and atomic forces, as well as fast geometry relaxation. By predicting atomic forces and thus the relaxed system energy, DFT surrogates can significantly speed up catalyst screening, enabling resource-efficient evaluation of potential catalysts without the need for costly full DFT calculations.
Yet, there are barely any similar efforts in the area of plasma catalysis. One notable work is by Wan et al.29 in which graph neural networks (GNNs) were used to study electric field-dipole effects in ammonia synthesis using a Ru catalyst, a topic closely related to plasma-catalytic ammonia synthesis. It was demonstrated that a pre-trained GNN model for Ru catalyst could be fine-tuned using a limited amount of DFT data for Fe catalysts to efficiently transfer acquired knowledge from Ru to Fe, while maintaining high accuracy in predicting adsorption energy. Another significant effort in this direction is by Zhang et al.,30 wherein an attention-based GNN was developed to explore the compositional space of Ni–Co–Fe–Pd–Pt for high-entropy electrocatalysis. The proposed GNN model successfully predicted adsorption Gibbs energies and atomic forces for OOH, O, and OH at surface sites across various compositions. These predictions in turn enabled identification of optimal compositions, including non-equal atomic compositions (e.g., Ni0.13Co0.13Fe0.13Pd0.10Pt0.50 and Ni0.10Co0.10Fe0.10Pd0.30Pt0.40), using volcano plots, which were subsequently validated through experiments. This study effectively showcased the utility of DFT surrogate models in accelerating catalyst design by avoiding the costly exploration of vast catalyst composition spaces.
Nonetheless, these works generally rely on training DFT surrogates from scratch, disregarding existing knowledge and data from thermal catalysis. Despite the intricacies of electro- and photo-effects such as electric field and charge in plasma catalysis, fundamental insights into atomic interactions and bonding can be akin to those in thermal catalysis. Leveraging existing DFT data for thermal catalysis can present a unique opportunity for enabling fundamental plasma–surface studies and accelerating catalyst design and screening in the plasma catalysis domain. Central to this is transfer learning,31 where knowledge from one task is systematically utilized to solve problems in related tasks with a limited amount of data. A recent study by Kolluru et al.32 illustrates the potential of transfer learning in thermal catalysis using an attention-based adaptor and pre-trained models derived from the Open Catalyst 2020 (OC20) dataset,28 which was generated based on extensive DFT calculations performed using the Vienna ab initio simulation package (VASP).33,34 The findings of this work revealed that not only does the transferred model excel in learning in-domain tasks similar to the OC20 dataset, but it also exhibits a remarkable performance for out-of-domain tasks. Meanwhile, transfer learning significantly mitigates the intensive computational requirements when compared to training the DFT surrogate model entirely from scratch. Furthermore, recent work by Wang et al.35 demonstrates that transfer learning can substantially reduce the number of required DFT calculations in out-of-domain transfer learning from inorganic to organic adsorbates in heterogeneous catalysis. Another useful concept is the attention mechanism, which has shown significant promise, in particular in natural language processing,36 since it can provide interpretability by automatically assigning weights to the importance of relationships between a central word/node and its neighbors.37 Zhang et al.30 demonstrated that the attention mechanism can reveal how variations in energy and atomic forces are confined to the third nearest atom of O in high-entropy electrocatalysis. This can be explained by the destabilization of the second-nearest-neighbor atoms of oxygen, as the binding strength of the first-nearest-neighbor atoms is shared by the adsorbed oxygen atoms. However, the utility of the attention mechanism in thermal catalysis thus far generally lacks the incorporation of rich physical information, such as the angles formed by three atoms or the geometric configuration formed by multiple atoms, as demonstrated, e.g., in SchNet38 and GemNet.39 Most recently, Liao et al.40 introduced an attention-based GNN EquiformerV2 tailored specifically for catalysis, a promising development in this direction. This model currently shows the best prediction accuracy for system/adsorption energy, atomic forces and geometry relaxation, as can be seen in the Open Catalyst Project Leaderboard.28
Despite the rich body of knowledge on thermal heterogeneous catalysis, this knowledge remains underutilized in plasma catalysis due to a lack of effective tools for systematic and interpretable knowledge transfer in this domain. This paper addresses this gap by demonstrating the promise of attention-based transfer learning for leveraging the extensive DFT knowledge in thermal catalysis for first-principles plasma catalysis studies. To this end, we consider plasma-catalytic ammonia synthesis as the model system. We show how small amounts of plasma catalysis DFT data can be used to efficiently fine-tune existing pre-trained models of thermal catalysis to obtain accurate predictions of adsorption energy and atomic forces for single metal atoms and metal clusters. Moreover, transfer learning allows the model to have a strong extrapolation ability for unseen atoms in the plasma catalysis dataset. Thus, the fine-tuned model has the potential to enable rapid geometry relaxation, since it can be used to replace or reduce DFT calculations, as also shown in thermal catalysis.35 The ability to develop models for predicting adsorption energies and atomic forces in a resource-efficient way can in turn open new avenues for catalyst design and screening for plasma-catalytic systems, which remain grand open challenges in this field.21,41 Furthermore, integrating predictions of these quantities with microkinetic models serves as a critical step towards establishing a foundational understanding of plasma–catalyst interactions,21,41 which is a prerequisite for advancing theoretical and practical insights into plasma-catalytic processes.
We use two pre-trained GNNs, namely the EquiformerV2 model with the attention mechanism and the GemNet-dT model, both of which are trained using the OC20 dataset from the Open Catalyst Project.28 For model refinement, DFT calculations for NxHy species adsorbed onto single metal atoms supported on Al2O3 are performed using CP2K42 to account for plasma-induced charge effects, arguably one of the key contributors to plasma–catalyst synergy, on adsorption energies and atomic forces of the atoms. Although we are not aware of any experimental study combining single metal atom catalysts and plasma, single metal atom catalysts have been experimentally and theoretically studied for almost two decades,43 including on Al2O3 as a support. Nonetheless, the focus of this work is to isolate and systematically study the effect of surface charging across several common catalysts. Hence, we have adopted the single metal atom model for transfer learning. We demonstrate that by structurally adapting the pre-trained EquiformerV2 model and freezing a subset of its learnable parameters during transfer learning, the fine-tuned model can provide accurate predictions of adsorption energies and atomic forces for unseen single metal atoms. This indicates the ability of the fine-tuned model to effectively retain knowledge from thermal catalysis since the unseen single metal atoms were only a part of the OC20 dataset and not the plasma catalysis DFT data used for fine-tuning the EquiformerV2 model. Moreover, we show that the pre-trained EquiformerV2 model can be efficiently fine-tuned with only a limited amount of plasma catalysis DFT data for Pt metal clusters, along with the single-metal-atom data, to predict adsorption energies and atomic forces for unseen Ru metal clusters.
A standard practice in transfer learning is to use data acquired for a new task to fine-tune pre-trained models by adapting all their learnable parameters, typically without delineating various discrepancies that may exist between the old and new tasks.44 In this work, to enhance the interpretability of the fine-tuned attention-based EquiformerV2 model with respect to plasma-induced surface charge effects, we look to delineate these effects from other dissimilarities between the OC20 dataset and the DFT data generated for plasma catalysis, namely the dissimilarities in atomic interactions and discrepancies between DFT calculations performed by VASP and CP2K. To this end, we propose a dual-model framework for interpretable transfer learning that combines the pre-trained GemNet-DT model39 for thermal catalysis and the above-described structurally-adapted pre-trained EquiformerV2 model, which is tasked to account for surface charge effects. The surface charge effects are encoded into the fine-tuned EquiformerV2 model via a loss function designed for this purpose. The attention scores extracted from the fine-tuned EquiformerV2 model in this dual-model framework exhibit strong correlations to surface charge distribution, providing useful insights into the important role of charge distribution on adsorption processes in plasma catalysis.
To account for the surface charge effects on the adsorption of different NxHy species on different catalysts, a set of single metal atoms and metal clusters are first adsorbed on the γ-Al2O3 surface. Then, the adsorption energies of the different adsorbates are calculated by:
Eads = Eslab+adsorbate − Eslab − Eadsorbate |
We consider 11 single metal atoms, namely Ag, Au, Cu, Re, Ru, Co, Ni, Pd, Fe, Pt and Rh, using the single metal atom model, where the last three metals are only used for testing the generalization performance of the fine-tuned model for the single metal atoms. We consider the adsorption of seven different adsorbates, namely N, N2, H, H2, NH, NH2, and NH3, which are involved in NH3 synthesis. DFT calculations are also performed for metal clusters of Ru and Pt on the γ-Al2O3 surface. To ensure that the sensitivity of the adsorbates to surface structures is considered, we calculated the adsorption energies for all adsorbates on the seven unique adsorption sites offered by the γ-Al2O3(110) support, as well as their co-adsorption on the support and metal atom combined. Our analysis showed that adsorption on the metal atom/cluster was always favored for the adsorbates investigated in this work. Additional details about the DFT calculations performed in this work can be found in ref. 23 and 24.
A similar data structure is also used for Pt and Ru metal clusters on γ-Al2O3, yielding datasets of 3965 and 3627 data points for Pt and Ru clusters, respectively. The Pt cluster dataset is further divided into training, validation, and test sets using the same ratio as above to aid in model fine-tuning. The Ru cluster dataset is only utilized for testing the generalization performance of the fine-tuned model for the metal clusters.
![]() | ||
Fig. 2 Architectures of the two graph neural network models, pre-trained using the OC20 dataset for thermal catalysis.28 The left is the EquiformerV2 architecture with eight Equiformer blocks.40 The right is the GemNet-dT architecture39 with a multi-head attention adaptor for improved transfer learning.32 The number of additional interaction blocks is set to one. |
The construction of the initial part of the GemNet-dT model is similar to that of EquiformerV2, with the geometric information extracted and atoms converted into embeddings. Then, graph interaction blocks update these embeddings according to the geometric information. In the original GemNet-dT model without an adaptor, each interaction block, as well as the initial embedding are followed by a feedforward block. The outputs of these feedforward blocks are added to predict the adsorption energy and atomic forces. In this work, however, we use a modified GemNet-dT model that utilizes a multi-head attention adaptor to balance information from the intermediate graph-based blocks for improved transfer learning.32 In the modified GemNet-dT model, the feedforward layers in the interaction blocks are removed and, instead, a weighted summation is performed in the adaptor to make predictions. To further enhance transfer learning ability, additional interaction blocks with feedforward layers are introduced.32 The outputs of these interaction blocks are directly added to the output from the multi-head attention adaptor, yielding the adsorption energy and atomic force predictions. In the modified GemNet-dT model, the parameters of the adaptor, the additional interaction blocks and their feedforward layers must be trained, whereas other parts of the model are based on the pre-trained GemNet-dt model of OC20 with three interaction blocks. We note that the modified GemNet-dT model is only used in the dual-model framework of the task “interpretation of surface charge effects” to capture discrepancies between the thermal catalysis and plasma catalysis datasets other than the surface charge effects.§
These adaptations are made out of several considerations. First, proton is placed far away from all atoms, whereas the impact of proton may be appreciable on all atoms. Therefore, since the EquiformerV2 model only considers the nearest 20 atoms for each atom, the model is adapted to also account for the effects of the proton. Furthermore, atom embeddings remain constant during transfer learning. This is because the updated embeddings may affect the attention blocks of the EquiformerV2 model adversely, potentially hindering the extrapolation capability on unseen atoms. Earlier layers of the pre-trained model tend to capture structural knowledge, such as edge between atoms and rotational equivalence of the catalyst structure.32,40 Therefore, freezing these layers can also be beneficial to the extrapolation capability of the fine-tuned model. However, the number of initial layers to be frozen can have a significant influence on the model performance. Thus, we investigate the impact of freezing different numbers of initial layers of the pre-trained model on the extrapolation capability of the fine-tuned model. Additionally, freezing of the output energy and force blocks is also tested since these blocks are responsible for projecting the outputs from the eight Equiformer block (B8) to the energy and force predictions. As for learning the proton embeddings, they are initialized using the hydrogen embeddings from the pre-trained EquiformerV2 model and their parameters are updated during transfer learning. In theory, hydrogen embeddings represent the closest approximation to that of protons. Table 1 summarizes all the adaptations of the pre-trained EquiformerV2 model used for transfer learning from thermal to plasma catalysis. An ablation study is performed to test the performance of these models.
Model abbreviation | Pretrained | Proton embedding | Atom embedding | Geometric Info. | Equiformer and output blocks | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
B1 | B2 | B3 | B4 | B5 | B6 | B7 | B8 | Energy & force | |||||
S | F | — | — | — | — | — | — | — | — | — | — | — | — |
H | T | H | — | — | — | — | — | — | — | — | — | — | — |
A | T | — | × | — | — | — | — | — | — | — | — | — | — |
HA | T | H | × | — | — | — | — | — | — | — | — | — | — |
L1 | T | H | × | × | × | — | — | — | — | — | — | — | — |
L3 | T | H | × | × | × | × | × | — | — | — | — | — | — |
L5 | T | H | × | × | × | × | × | × | × | — | — | — | — |
L7 | T | H | × | × | × | × | × | × | × | × | × | — | — |
L8 | T | H | × | × | × | × | × | × | × | × | × | × | — |
EF | T | H | × | — | — | — | — | — | — | — | — | — | × |
L1EF | T | H | × | × | × | — | — | — | — | — | — | — | × |
L3EF | T | H | × | × | × | × | × | — | — | — | — | — | × |
L5EF | T | H | × | × | × | × | × | × | × | — | — | — | × |
L7EF | T | H | × | × | × | × | × | × | × | × | × | — | × |
To elucidate the role of surface charge, we propose a dual-model architecture that isolates the effects of proton-induced surface charges in the EquiformerV2 model. Meanwhile, to ensure that all other discrepancies are effectively captured, we employ the GemNet-dT + A architecture, as proposed in ref. 32, which has demonstrated strong transfer learning capabilities for out-of-domain tasks. As shown in Fig. 4, the proposed architecture consists of two pre-trained models operating concurrently: the GemNet-dT + A model that is fine-tuned using single metal atom data of CP2K when proton is removed, and the EquiformerV2 model fine-tuned with CP2K data with the proton effects accounted for. For fine-tuning of the GemNet-dT + A model using single metal atom data of CP2K, proton is removed before a single metal atom structure is fed to the model. This allows the fine-tuned GemNet-dT + A model to learn the discrepancies between the pre-trained model using the OC20 thermal catalysis data and the CP2K data generated in this work. This is while a single metal atom structure with proton is fed to the EquiformerV2 model, serving as a corrector to predictions of the fine-tuned GemNet-dT + A model by accounting for surface charge effects. This way the dual-model architecture can delineate the role of surface charge from other discrepancies between the thermal and plasma catalysis data. The combined outputs of the two models yield the predictions for adsorption energy and atomic forces. To train the models, the following loss functions are devised. For single metal atom structures with proton, the loss function Jplasma is defined as in eqn (1), whereas for structures without proton, the loss function Jthermal takes the form of eqn (2), i.e.,
![]() | (1) |
![]() | (2) |
To fine-tune the GemNet-dT + A model, the initial atom embeddings and the existing interaction blocks in the pre-trained model are frozen. One additional interaction block is added and the number of heads in the multi-head attention adaptor is set as five, as in ref. 32. Since these newly added layers are not pre-trained, they are initialized randomly. For the EquiformerV2 model, we utilize the model architecture outlined in Fig. 3. However, we only allow fine-tuning of proton embeddings and the 8th Equiformer block (B8),36,40 in the pre-trained EquiformerV2 model. Additionally, the output layers responsible for predicting energy and atomic forces remain unchanged during model fine-tuning. This is based on the consideration that each Equiformer block within the pre-trained EquiformerV2 model has its own atom embeddings. Allowing all Equiformer blocks to adapt during model fine-tuning could disperse the surface charge effects across various blocks, rendering the predictions uninterpretable.
Strategy | Description |
---|---|
S1 (baseline) | Use the pre-trained EquiformerV2 model and metal cluster data to directly perform transfer learning. |
S2 | Use the pre-trained EquiformerV2 model and single metal atom data to perform transfer learning. Then, the updated model is further fine-tuned using the metal cluster data. |
S3 | Use the pre-trained EquiformerV2 model and the mixture of the single metal atom and metal cluster data to fine-tune the model in one step. |
![]() | ||
Fig. 5 Transfer learning from thermal to plasma catalysis for single metal atoms. Accuracy of the different fine-tuned models, detailed in Table 1, in predicting the adsorption energy and atomic force is quantified by the R2 score. (a) The test accuracy of the fine-tuned models for all atoms in the test dataset. (b) The predictive accuracy of the fine-tuned models for the unseen Fe atom. (c) The predictive accuracy of the fine-tuned models for the unseen Rh atom. (d) The predictive accuracy of the fine-tuned models for the unseen Pt atom. |
A comparison of models H and HA, which differ solely in whether the atom embeddings are fixed, demonstrates that relaxing the pre-trained atom embeddings significantly diminishes the model's extrapolation capability. This is evident in predicting atomic forces for the unseen metals Fe, Rh and Pt as shown in Fig. 5(c) and (d). Yet, both models exhibit relatively poor extrapolation for Fe as in Fig. 5(b), likely due to its minuscule atomic forces near the optimal structure. This difficulty stems from the transfer learning process of the model H, which also updates the embeddings for metals present in the transfer learning data. This means the differences between thermal catalysis and plasma catalysis impact the embeddings of the seen metals in the model H, while leaving the embeddings for unseen metals unchanged. Using these embeddings of unseen metals for extrapolation thus will lead to missing information on these differences. On the other hand, comparing the performance of models A and HA suggest that initializing the proton embedding using the pre-trained hydrogen embedding may not have a notable impact on the model's generalization performance. Model A outperforms model HA in terms of predicting the adsorption energy for Rh, while showing an inferior performance in predicting the atomic force, as can be seen in Fig. 5(c). This is reversed for the case of Pt, where model A performs better in predicting the atomic force and worse in predicting the adsorption energy (Fig. 5(d)). The reason that initializing the proton embedding from the hydrogen embedding does not yield superior predictions can be attributed to the inherent flexibility of the pre-trained EquiformerV2 model with fixed atom embeddings. As the large number of weights of other layers are unfrozen, it makes the starting point of the proton embedding unimportant. Given the chemical similarity between proton and hydrogen, we opted to initiate the proton embedding based on the hydrogen embedding in the rest of the fine-tuned models in Table 1.
We now examine the impact of the number of frozen layers in the pre-trained EquiformerV2. Fig. 5(b) shows that the fine-tuned models with a greater degree of flexibility (i.e., a fewer number of frozen layers) underperform in extrapolation in the case of Fe. This underscores the important role of the initial layers of the EquiformerV2 model shown in Fig. 5. In particular, in the case of atomic force predictions for the unseen atoms, freezing layers up to and including the seventh Equiformer block (B7) yields the best performing models, as seen in Fig. 5(b)–(d). This is while the extrapolation performance of models L7 and L7EF is comparable, suggesting that freezing the output energy and force blocks may not be critical. Note that these output blocks are responsible for converting the abstract output from the eighth Equiformer block (B8) to the adsorption energy and atomic force predictions. Hence, with the eighth Equiformer layer unfrozen, allowing the energy and force blocks to be fine-tuned as in model L7, can enable a more effective transfer learning to plasma catalysis.
We now compare the performance of the fine-tuned model L7 to that of model S, i.e., an EquiformerV2 model architecture fully trained using the same training dataset. Fig. 6 shows parity plots of the predicted adsorption energy and atomic force for the unseen metals Fe, Rh and Pt against their corresponding true values. Model L7 significantly outperforms model S trained from scratch, in particular for atomic force predictions, as depicted in Fig. 6(b), (d) and (f). Notice that model S tends to either over predict the atomic forces, as in Fig. 6(b) and (d), or yield numerous zero predictions as in Fig. 6(f). These parity plots imply that via careful fine-tuning of the EquiformerV2 model pre-trained on thermal catalysis data adequate generalization performance can be achieved for single metal atoms in the case of plasma catalysis. Additionally, for a metal seen with a large amount of thermal catalysis data during the EquiformerV2 pre-training, the fine-tuned model provides satisfactory generalization performance for these metals even if not seen during transfer learning. We note that only 3614 plasma catalysis datapoints were used for the fine-tuning model, as compared to the millions of datapoints used to establish the pre-trained EquiformerV2 model. Therefore, transfer learning using a pre-trained model based on a large dataset and a large array of atom types can provide valuable extrapolative predictions for unseen catalysts in plasma catalysis with a much smaller amount of new DFT data, thus accelerating the catalyst discovery process.
![]() | ||
Fig. 6 Parity plots of the adsorption energy and atomic force predictions of models S and L7, as detailed in Table 1, for the unseen metal atoms of Fe, Rh and Pt. (a) and (b) Adsorption energy against its corresponding predicted values for structures containing Fe, as well as the atomic forces experienced by Fe versus predicted atomic forces. (c) and (d) Adsorption energy against its corresponding predicted values for structures containing Rh, as well as the atomic forces experienced by Rh versus predicted atomic forces. (e) and (f) Adsorption energy against its corresponding predicted values for structures containing Pt, as well as the atomic forces experienced by Pt versus predicted atomic forces. |
![]() | ||
Fig. 7 Principal component analysis is applied to project the 168th-dimensional attention scores of the eight blocks of the Equiformer model of Fig. 4 onto a 3-dimemsional space. This figure shows the projected attention scores onto the 3-dimensional space. The 2-dimensional contours of the 3-dimensional space can be found in figures SP1, SP2 and SP3. |
We now investigate the relationship between the attention scores and the surface charge distribution for the Al2O3–Ni–N system. It is observed that some of the eight attention heads give large weights to the attention scores of the single metal atom and proton. This is expected as the single metal atom bonds with the N adsorbate, and proton imposes the additional negative charge on the surface. Conceivably, both of these atoms would play an important role on the adsorption energy and forces of the system. However, a notable correlation is also observed between some of the attention heads and the partial charge of atoms. Fig. 8(a) demonstrates the importance of the closest 19 atoms in the Al2O3 support to the N adsorbate, as captured by the third attention head of the Equiformer block B8 for the Al2O3–Ni–N system. An atom with a color closer to purple has a larger attention score, demonstrating a more significant contribution to the N adsorbate. The contributions of Ni atoms and protons are not considered, as they both hold a large amount of charge. Fig. 8(b) illustrates the Mulliken charge distribution on the 19 atoms. Here, a deeper red color indicates a stronger positive charge on aluminum, while a deeper blue color indicates a stronger negative charge on oxygen. The correlation between the attention scores and the absolute Mulliken charges tends to be inverse. That is, an oxygen atom with deeper blue color (i.e., more negatively charged) in Fig. 8(b) shows a smaller attention score, as indicated by lighter purple in Fig. 8(a). Alternatively, an aluminum atom with a more positive charge (deeper red) in Fig. 8(b) has a lower attention score, as shown in orange in Fig. 8(a). This can be attributed to the excess negative charge on the surface that modifies the reactivity of the surface atoms. The adsorption energy of an adsorbate would be affected by the distribution of the excess negative charge on the surface. Hence, the distribution of the excess surface charge introduced on the catalyst surface, calculated in terms of Mulliken charges, is a strong indicator of the effect of surface charge on adsorption energies. Less absolute charge on Al and O atoms receiving higher attention scores could mean that these atoms affect the distribution of additional charge on the surface more significantly than other Al and O atoms, as their Mulliken charges differ from other Al and O atoms highlighted in Fig. 8. The inverse correlation between the attention score and the absolute surface charge distribution is also validated through Spearman correlation analysis,54 which measures the strength of association based on the ranking of values. This analysis results in a correlation coefficient of −0.68 and a p-value of 0.0021, indicating a strong correlation between the attention score and the absolute surface charge distribution. Notably, the inputs to the dual-model framework shown in Fig. 4 are solely structural (atom types, edges and distances between atoms), without any explicit charge information. This highlights the ability of the attention mechanism to infer underlying physical concepts. Similar analyses for the other single metal atoms and adsorbates consistently show strong correlations between the attention scores and the Mulliken charges, with absolute values of Spearman correlation coefficients ranging between 0.6 to 0.8 and p values always less than 0.01, further demonstrating the model's interpretability. Such interpretable attention-based models can highlight the key atoms in a catalyst structure that have significant interactions with the adsorbate, beyond the Mulliken net charge effects considered in this study. These insights can in turn inform further targeted DFT studies on these surface atoms for catalyst design and discovery. Furthermore, Mulliken net charge effects could be isolated by treating them as additional learning targets, similar to atomic forces. This could enhance the interpretability of attention-based models, enabling a deeper understanding of how surface charge would impact the catalyst.
![]() | ||
Fig. 8 (a) Visual representation of scores of the third attention head of the Equiformer block B8 of the dual-model framework in Fig. 4 for the Al2O3–Ni–N system. These scores weigh the influence of the closest 19 atoms in the Al2O3 support to the nitrogen adsorbate, while the attention scores for Ni and proton are omitted. (b) Visualization of the Mulliken net charge calculated by CP2K for the same 19 atoms. Darker red signifies a larger positive charge on Al, whereas a deeper blue denotes a larger negative charge on O. The color bar in (b) is a merged scale for both negative and positive charges, meaning no atom holds exactly zero charge. |
![]() | ||
Fig. 9 Transfer learning from single atoms to metal clusters. Accuracy of the fine-tuned models of Table 2 in predicting the adsorption energy and atomic forces, as quantified in terms of mean absolute error (MAE). (a) Test accuracy of the fine-tuned models for the Pt cluster test data. (b) Predictive accuracy of the fine-tuned models for the unseen Ru cluster. |
Fig. 9(b) suggests that model L7 has a superior extrapolation performance for predicting adsorption energy, whereas model L7EF is superior for predicting atomic forces. Fig. 10 shows the parity plots for predictions made by these two models when fine-tuned via the three transfer learning strategies of Table 2 for metal clusters. The results indicate that strategies S2 and S3, which include single-atom data, outperform strategy S1 in extrapolating energy and force predictions, as seen with model L7 in Fig. 10(a) and (b). While strategy S3 excels in predicting atomic forces (Fig. 10(d)) under model L7EF with even lower MAE for force predictions, it does not consistently provide the best energy predictions (Fig. 10(c)), highlighting the trade-off between predicting system energy and atomic forces in extrapolation tasks. While incorporating single-atom data clearly enhances transfer learning for metal clusters, the optimal strategy may depend on whether the focus is on energy or force predictions. We note that the particularly large deviation for strongly negative adsorption energies in the case of Ru clusters (Fig. 10(a) and (c)) can be attributed to the limited amount of training data with adsorption energies below −10 eV for Pt clusters, as shown in Fig. SP4 (ESI†). As such, the extrapolation to Ru clusters becomes more challenging, leading to underfitting in this energy range due to insufficient data.
Our future work will focus on studying a broader range of plasma–catalyst interactions to further evaluate the effectiveness of transfer learning approaches for developing more comprehensive plasma–catalyst interaction models. Larger and more diverse plasma catalysis datasets will likely improve the quality of transfer learning. Additionally, we will incorporate Mulliken net charge as an extra prediction target to explore whether it enhances the GNN's learned representations. Furthermore, we will integrate predictions of atomic forces and adsorption energies with microkinetic models to enable holistic investigations of plasma–catalyst synergies and reaction mechanisms in plasma–catalytic systems, towards experimental validation of the presented approach.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4ey00256c |
‡ A. D. Lele is currently with the Department of Mechanical Engineering, Rowan University, Glassboro, USA. |
§ Fine-tuning of all the pre-trained models is based on the same setting as in OC20, but a different batch size of 4 and number of epochs 100 were used. We used a batch size of 4, since for GNNs the batch size refers to the number of graphs used during training. In this study, each graph consists of around 240 atoms and approximately 240 × 21 = 5040 edges, creating a substantial load on the GPUs. We utilized four GPUs with 12 GB of memory, each capable of processing only one graph at a time. We observed that increasing the batch size would yield minimal improvement in the transfer learning results. Therefore, we opted to use a batch size of 4 for computational efficiency. |
This journal is © The Royal Society of Chemistry 2025 |