Xiaofeng Cao,
Wenjia Luo* and
Huimin Liu
School of Chemistry and Chemical Engineering, Southwest Petroleum University, Chengdu, 610500, P. R. China. E-mail: luowenjia@swpu.edu.cn
First published on 16th April 2024
Despite the rapid development of computational methods, including density functional theory (DFT), predicting the performance of a catalytic material merely based on its atomic arrangements remains challenging. Although quantum mechanics-based methods can model ‘real’ materials with dopants, grain boundaries, and interfaces with acceptable accuracy, the high demand for computational resources no longer meets the needs of modern scientific research. On the other hand, Machine Learning (ML) method can accelerate the screening of alloy-based catalytic materials. In this study, an ML model was developed to predict the CO2 and CO adsorption affinity on single-atom doped binary alloys based on the thermochemical properties of component metals. By using a greedy algorithm, the best combination of features was determined, and the ML model was trained and verified based on a data set containing 78 alloys on which the adsorption energy values of CO2 and CO were calculated from DFT. Comparison between predicted and DFT calculated adsorption energy values suggests that the extreme gradient boosting (XGBoost) algorithm has excellent generalization performance, and the R-squared (R2) for CO2 and CO adsorption energy prediction are 0.96 and 0.91, respectively. The errors of predicted adsorption energy are 0.138 eV and 0.075 eV for CO2 and CO, respectively. This model can be expected to advance our understanding of structure–property relationships at the fundamental level and be used in large-scale screening of alloy-based catalysts.
Compared with the traditional experimental “trial and error method,” theoretical methods such as density functional theory (DFT) calculations have apparent advantages in their ability to rapidly screen materials. Nevertheless, predictions of adsorption energy on bimetallic surfaces are challenging due to the exponentially large possibility of alloy compositions, which makes the computational screening too time- and resource-consuming even for methods like DFT.12–15 To this end, developing adsorption predictive models, for example, based on machine learning (ML), is necessary to rapidly survey appropriate adsorption energies for reactions of interest.16 Although there are theoretical models for predicting the chemisorption energy of adsorbates on pure metal surfaces, for example, the d-band center model estimates the adsorbate-metal interactions based on the coupling of d-states of metal with adsorbates,17,18 generalizing these simplified thermochemical models to bimetallic materials is unpractical, which will be shown in this study.
In recent years, ML methods have emerged as a powerful approach for screening promising catalyst materials.19,20 Among all the popular ML methods, decision trees, multilayer perceptron, extreme gradient boosting, and support vector regression have emerged as the most well-known supervised learning approach for data mining. They can use existing data to find regularity and map the correlation between the varieties of properties with desired prediction targets.21 Moreover, they can handle many irrelevant inputs because they incorporate internal feature selection as an integral part of the algorithm. Furthermore, ensemble learning can significantly improve the prediction accuracy of decision trees by aggregating multiple weak learners.22,23
A series of studies have been carried out utilizing ML to predict adsorption energy on the surface of materials.24–27 Shi et al. investigated ML modeling of CO adsorption energy on surface-layered alloys doped with 23 metals including Cr, Mn, and Fe, and screened out CO2RR catalysts based on suitable CO adsorption energy range (−1.68 to −1.64 eV), in their layered alloy model, five layers (2 × 2) of surface cells are used to simulate the surface of the alloy, each consisting of only one metal, and the bottom three layers are always composed of the same elements, changing only the doping of the first layer, which consists of 20% or 40% of the doping atoms.28 Liu investigated the adsorption energies on binary alloy surfaces of PdnAu16−n alloyed surface with different Pd content (n = 1–16) by ML prediction and concluded that the isolated Pd top sites surrounded by Au atoms are stable adsorption sites.29 Nayak et al. predicted adsorption energies of H, O, N, OH, NO, and CO on fcc(111) surface top sites of 25 different transition metals including Ir, Pt, and Au with an average root-mean-square error (RMSE) of about 0.4 eV by random forest regression.30 Prediction of adsorption energies on metal and alloy surfaces was also reported using the XGBoost regression,31 artificial neural network algorithm,32,33 random forest,34 and other methods.35,36 However, the feature selection method, which can quickly select the most suitable features for prediction from dozens or even hundreds of machine learning features, was seldom used in previous studies. Moreover, ML models aiming to predict adsorption energies on alloys often focus on binary alloys with only layered structures, i.e. the entire topmost layer of the metal was replaced by another metal element, while the actual configuration of alloys in real catalysts could be much more complex.29
In this study, we directly exploit the adsorption energies of CO2 and CO on surfaces of a wide range of binary alloys using ML methods without any assumptions of linearity, i.e. we do not assume the adsorption energy on alloys to be a linear combination of adsorption affinity on its two component metal surfaces.35 We have chosen a feature selection method called the greedy algorithm,37 which traverses all combinations of features and finally locates the optimal combination. From these, we select the best feature combinations to be used in the subsequent algorithms for prediction. The results show that the algorithm with XGBoost works best. As an extension to previous literature,29 in this study, we focus on single-atom doped binary alloys rather than alloys with layered structures. We admit that this is still a significantly simplified model for realist materials, but the ML method and models that are developed and shown to be valid in this study can be further extended for alloys with more complex structures. This approach can be used to rapidly predict adsorption energies with high accuracy. The root-mean-square error (RMSE) for the entire dataset is 0.075 eV and 0.138 eV for the adsorption energy of CO and CO2, respectively, which are comparable to the accuracy of Batchelor's ML models for predicting *OH and *O adsorption energies,38 except that this study covers a much wider range of materials. This model goes beyond the traditional strategies and can be used to facilitate the discovery of novel alloy catalytic materials.
(i) Multi-layer perceptron (MLP32) is called a feedforward neural network, the basis of a deep neural network. It can optimize the objective function and improve the model accuracy. The hyperparameters that need to be optimized are the learning rate (Lr), dropout (Dt), L2 regular term (L2), number of hidden layer (Nl), number of hidden layer neurons (Nn), and activation function.44 Commonly used activation functions include the Sigmoid function,45 Tanh function,46 and ReLU function,47 etc. A visual depiction of the MLP model's diagram is shown in Fig. 2a. If only one layer is included, this model is called a wide single-layer linear model and can be expressed as
y = WTwide{x,∅(x)} + b | (1) |
Fig. 2 Schematic diagram of the structure of different machine learning algorithms for (a) MLP, (b) DTR, (c) SVR, and (d) XGBoost. |
In cases where there is more than one layer, it is referred to as a multi-layer neural network (MLP), and can be defined as
al+1 = f(Wldeepal + bl) | (2) |
(ii) Decision Tree Regression (DTR48) shown in Fig. 2b is an ML algorithm for predicting continuous numerical values. Its details are specified by hyperparameters including max_depth (Dm), min_samples_leaf (Sl), and min_samples_split (Ss). It can be described mathematically by
(3) |
(iii) Support Vector Regression (SVR49) demonstrated in Fig. 2c is a typical optimization problem; its mathematical model is a convex quadratic programming model, which can be used for pattern classification, regression estimation, and density estimation problems. SVR regressor aims to find a unique linear model f(x) = w × x + b to approximate. The tunable model hyperparameters are kernel type (kernel), penalty factor (C), and precision (epsilon).
(iv) Extreme gradient boost (XGBoost50) illustrated in Fig. 2d is an ensemble learning model with high efficiency, flexibility, and lightness. Assume that we are currently in the m-th iteration, Ø is the defined loss function, the optimization step can be represented by the minimization of eqn (4).
(4) |
Physicochemical property | Code name |
---|---|
Atomic number | AN |
Electronegativity | EN |
First ionization energy | FE |
Density | G |
Period of the element | PN |
Radius | R |
Specific heat capacity | C |
IUPAC group number | GN |
First electron affinity | AE |
Gas phase standard entropy of formation | S |
Gas phase standard enthalpy of formation | H |
Gas phase standard gibbs free energy of formation | G |
Contrary to previous studies, the values of these 12 properties were not directly used in ML trainings.52 On the other hand, we defined ML features in a more general way. Here we define a ‘feature’ to be a numerical value associated with an alloy material that can be used as an input to the ML model. Features of an alloy are constructed from the properties listed in Table 1 in the following way. Since we considered binary alloys, there are 12 property values for each element and 24 property values in total, giving the first 24 features. Then we performed arithmetic operations (addition, subtraction, multiplication, and division) on each feature of two elements, given 48 (12 × 4) additional features. Finally, we included the DFT-calculated adsorption energy of O and H on the alloy surface, bringing the final number of features to 74 for each alloy material. A complete list of these features and their code names in our model is listed in ESI Table S1.†
Contrary to intuition, an excessive number of features will result in reduced training efficiency of machine learning and adversely affect prediction accuracy.53 This means not all 74 features defined in ESI Table S1† are equally important for the ML model. To find which feature or which combination of features is most effective in predicting the adsorption affinity of alloys, a greedy algorithm (Fig. 3a) was utilized and described below.54,55
Fig. 3 Schematic diagram of optimization algorithm flow, (a) the greedy algorithm of feature screening, (b) acceleration algorithm utilizing parallel computing. |
Initially, a simple linear regression was used to predict the correlation between the DFT adsorption energy values and a single feature, and the feature with the lowest RMSE value out of the 74 total features was selected as the optimal one. Secondly, one of the remaining 73 features was selected so that the prediction based on the two features-tuple can produce the smallest RSME. This process was repeated iteratively until all possible feature combinations were tested, resulting in optimal features combinations.56 For example, in Fig. 3a the combination of features X1, X3, X73, X72, … was found to be the best.
This feature selection process is also accelerated using multi-process concurrency, GPU acceleration, and multi-server operation methods, as shown in Fig. 3b. Initially, code 1 calls all servers simultaneously, and then code 2 is commanded on the active server for multi-process optimization, thus achieving accelerated optimization. It is worth noting that the GPU version shows a more pronounced acceleration effect. Generally, in any chemometrics-based approach, the performance of the techniques is evaluated using different indices related to the simulated and actual values. The current work explored the use of three various statistical error measures, namely, Mean absolute error (MAE), Root Mean Square Error (RMSE), and R-squared (R2),57,58 coupled with one fitness indices including the Pearson correlation coefficient (P) (as shown in equation). Before the simulation stage, an external validation process based on k-1 fold (the 78 sets of alloys data are divided into 77 training sets and 1 test set, and all the combined prediction learning is cycled, and finally, the individual prediction results of 78 data are output) cross-validation was conducted to optimize the models' performance, increase the model integrity, and minimize errors.
The Pearson correlation coefficient (P) is primarily used to examine the correlation between feature values and parameters.
(5) |
R-squared (R2) represents the degree to which a regression line fits the observed data points.
(6) |
Root Mean Square Error (RMSE) indicates the extent of the differences between predicted values and actual values.
(7) |
Mean Absolute Percentage Error (MAE) represents the average of the absolute errors between predicted and observed values, expressed as a percentage.
(8) |
For example, as shown in Table S5,† at the earliest stage, the greedy algorithm has located feature 2 (E_H: Eads of H on alloys) to be the one most correlated to Eads of CO. At step 136, the algorithm finds that feature 2 together with 63 (EN_1: electronegativity of M1) is the best two-feature combination that can be correlated to Eads of CO. The mechanism of the following steps is similar.
The lowest RMSE values for Eads of CO2 are observed at step number 1243, with a value of 0.11 eV. At this step, 24 features (E_O, FE_1, GN_1, E_H, FE_differ, EN_1, GN_product, G_differ, R_1, GN_2, C_1, R_2, GN_sum, g_differ, R_product, GN_division, R_sum, AN_1, R_division, GN_differ, H_1, AE_1, G_1 and g_1) are in the optimal combination. After this step, when more features are added to this combination, there is a decline in model performance, as indicated by an increase in RMSE. These results suggest that an increasing number of features leads to overfitting with the error reaching 0.18 eV at step 2775. In other words, if all 74 features were considered as the input of the ML model, the performance would be worse than just using the 24 features subset.
For CO the optimal features searching process is similar. At step 1456 the algorithm locates an optimal combination of 19 features (E_H, EN_1, C_1, GN_division, FE_1, E_O, AN_1, AE_1, GN_1, H_division, H_1, GN_product, R_1, C_product, FE_sum, G_1, g_1, GN_differ and S_1), with a minimal RMSE of 0.24 eV.
Fig. 5b and c show the correlation of feature screening results through thermal maps. If the correlation between features is too high, there will be redundant data and waste of learning costs, so the relationship between data can be more intuitively understood. The analysis of the selected features shows that the correlation between them is not dense and high, so the data need not be cleaned.
Adsorption model | Nl | Nn | Lr | Dt | L2 | R2 |
---|---|---|---|---|---|---|
CO | 1 | 70 | 0.02 | 0.01 | 0.001 | 0.950 |
2 | 70/25 | 0.007 | 0.01 | 0.0008 | 0.953 | |
3 | 70/25/15 | 0.006 | 0.01 | 0.0007 | 0.961 | |
4 | 70/25/15/375 | 0.005 | 0.01 | 0.0006 | 0.932 | |
5 | 70/25/15/375/215 | 0.03 | 0.01 | 0.0002 | 0.914 | |
CO2 | 1 | 150 | 0.05 | 0.01 | 0.002 | 0.883 |
2 | 150/175 | 0.05 | 0.01 | 0.001 | 0.886 | |
3 | 150/175/125 | 0.006 | 0.01 | 0.0008 | 0.908 | |
4 | 150/175/125/230 | 0.0008 | 0.01 | 0.00001 | 0.910 | |
5 | 150/175/125/230/400 | 0.006 | 0.01 | 0.00063 | 0.891 |
Adsorption model | Dm | Sl | Ss | R2 |
---|---|---|---|---|
CO | 1 | 6 | 1 | 0.522 |
4 | 1 | 0.3 | 0.854 | |
8 | 1 | 0.3 | 0.864 | |
11 | 1 | 0.1 | 0.867 | |
14 | 1 | 0.2 | 0.861 | |
CO2 | 1 | 4 | 0.6 | 0.621 |
4 | 1 | 0.2 | 0.814 | |
8 | 1 | 0.2 | 0.796 | |
11 | 1 | 0.3 | 0.790 | |
14 | 1 | 0.3 | 0.785 |
Adsorption model | Kernel | C | Epsilon | R2 |
---|---|---|---|---|
CO | Linear | 32 | 0.5 | 0.651 |
Poly | 8 | 0.1 | 0.968 | |
Rbf | 16 | 0.001 | 0.951 | |
Sigmoid | 2 | 0.1 | 0.215 | |
CO2 | Linear | 3 | 0.2 | 0.762 |
Poly | 5 | 0.01 | 0.944 | |
Rbf | 1 | 0.0001 | 0.945 | |
Sigmoid | 5 | 0.02 | 0.275 |
It should be added that XGBoost does not require hyper-parameter optimization due to its reinforcement learning mechanism. All the learning algorithms performed well in both the training and validation phases, which indicates the model's ability to capture and explain the reasonable variability portion of the dataset.
The intelligent learning algorithms equally depict higher R2 values, ranging from 0.867 to 0.968 in the CO modeling and 0.814 to 0.945 in the CO2 modeling, respectively, representing a higher relation between the experimental and simulated values for the optimized parameter model. Fig. 6 shows that the techniques indicate MAE for a single material value ranging from 0.01 eV to 0.50 eV in the CO2 modeling and 0.01 eV to 0.75 eV in the CO modeling, respectively, demonstrating a slight deviation between the experimental and simulated values, except DTR model. Generally, according to the objective indices (R2, MAE, and RMSE) used in the current study, we can deduce that all the models have performed well in modeling, with some performing better than others. However, the SVR model, composed of all the input variables, performed better than others in most instances in both the training and validation stages. It is worth mentioning that even the XGBoost and MLP techniques equally depict exceptional prediction skills in both the training and validation steps. Furthermore, the performance of the best intelligent combination can be illustrated graphically using different visualizations. The diagram makes it easier to judge how well other datasets or models represent the variation and patterns in the reference dataset. Moreover, the prediction skills of the learning algorithms can also be visualized using the scatter plot, a popular data visualization style, as demonstrated in Fig. 6. A complete picture of how well several models or datasets compare to a reference dataset in terms of correlation is provided by scatter plot. A better agreement with the reference dataset is indicated by points nearer the reference point regarding these criteria. A two-dimensional graph consists of points plotted with one variable on the x-axis and the other on the y-axis. The positions of the two variables being compared indicate where each point in the dataset, representing an observation or data point, is located. Hence, the scatter plot performance depicts the graphical illustration of the table's optimal result. Therefore, the performance of the models should be ordered by SVR > MLP > XGBoost > DTR.
Fig. 6 The predicted CO2/CO adsorption energies by ML algorithms versus DFT results for (a) SVR, (b) MLP, (c) DTR, and (d) XGBoost. |
Since Pd base data did not participate in the learning training, we adopted two prediction modes: prediction 1: Including only the 78 alloy data into the XGBoost model for prediction. Prediction 2: supplement with adsorption data on two additional Pd-doped alloys to the training set of data 1, and reform the learning process before making predictions. In prediction 2, extra information about the Pd base materials was provided to train the ML model to improve the prediction accuracy. The reason to use this method in this study is that if unknown base data learning is not provided, direct prediction (prediction 1) may have an explosion of dimensions. For example, one of the features is the atomic number of the dopant atom. When the previously learned model is applied to a new alloy, the atomic number of the new model may be out of range of the previously learned data set. If the values of a significant number of features are outside the range, there will be a considerable error. Results of these two predictions on the Pd alloys are listed in Tables 5 and 6 and visualized in Fig. 7 and 8. Similar results for Rh alloys are provided in ESI Tables S5, S6, Fig. S4 and S5.†
Algorithms and results | Co | Cr | Fe | Ir | Mn | Mo | Os | Re | Ru | Ta | Tc | V | W | MAE (eV) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a These cases are part of the training set; therefore, small errors are expected.b Error between DFT and predicted value. | |||||||||||||||
DFT (eV) | 0.20 | −0.46 | 0.31 | 0.05 | 0.41 | −0.45 | 0.17 | 0.20 | 0.03 | −0.08 | 0.10 | −0.51 | −0.54 | ||
MLP | Prediction 1 (eV) | 0.13 | 0.13 | 0.13 | 0.13 | 0.13 | 0.13 | 0.13 | 0.13 | 0.13 | 0.13 | 0.13 | 0.13 | 0.13 | 0.27 |
Error (eV)b | 0.07 | −0.59 | 0.18 | −0.08 | 0.28 | −0.58 | 0.04 | 0.07 | −0.10 | −0.21 | −0.03 | −0.64 | −0.67 | ||
Prediction 2 (eV) | 0.40 | 0.03 | 0.39 | 0.38 | 0.40 | −0.28 | 0.34 | 0.12 | 0.37 | −0.17 | 0.15 | −0.51 | −0.30 | 0.22 | |
Error (eV)b | −0.20 | −0.49 | −0.08 | −0.33 | −0.01a | −0.17 | −0.17 | 0.08 | −0.34 | 0.09 | −0.05 | 0.00a | −0.24 | ||
DTR | Prediction 1 (eV) | 0.14 | 0.14 | 0.14 | 0.14 | 0.14 | −0.02 | 0.14 | 0.14 | 0.14 | −0.02 | 0.14 | 0.14 | −0.02 | 0.24 |
Error (eV)b | 0.06 | −0.60 | 0.17 | −0.09 | 0.27 | −0.43 | 0.03 | 0.06 | −0.11 | −0.06 | −0.04 | −0.65 | −0.52 | ||
Prediction 2 (eV) | 0.17 | −0.01 | 0.17 | 0.17 | 0.41 | −0.01 | 0.17 | 0.17 | 0.17 | 0.00 | 0.17 | −0.51 | −0.01 | 0.16 | |
Error (eV)b | 0.03 | −0.45 | 0.14 | −0.12 | 0.00a | −0.44 | 0.00 | 0.03 | −0.14 | −0.08 | −0.07 | 0.00a | −0.53 | ||
XG | Prediction 1 (eV) | 0.12 | 0.02 | 0.12 | 0.07 | 0.14 | −0.02 | 0.03 | 0.03 | 0.03 | −0.11 | 0.03 | 0.01 | −0.05 | 0.22 |
Error (eV)b | 0.08 | −0.48 | 0.19 | −0.02 | 0.27 | −0.43 | 0.14 | 0.17 | 0.00 | 0.03 | 0.07 | −0.52 | −0.49 | ||
Prediction 2 (eV) | 0.34 | −0.13 | 0.35 | 0.09 | 0.41 | −0.34 | 0.09 | −0.14 | 0.05 | −0.27 | −0.13 | −0.51 | −0.26 | 0.14 | |
Error (eV)b | −0.14 | −0.33 | −0.04 | −0.04 | 0.00a | −0.11 | 0.08 | 0.34 | −0.02 | 0.19 | 0.23 | 0.00a | −0.28 | ||
SVR | Prediction 1 (eV) | 1.58 | 1.14 | 1.39 | 1.68 | 1.21 | 1.14 | 1.37 | 1.19 | 1.42 | 1.09 | 1.21 | 1.18 | 1.11 | 1.33 |
Error (eV)b | −1.38 | −1.60 | −1.08 | −1.63 | −0.80 | −1.59 | −1.20 | −0.99 | −1.39 | −1.17 | −1.11 | −1.69 | −1.65 | ||
Prediction 2 (eV) | 0.96 | 0.02 | 0.64 | 0.83 | 0.31 | −0.31 | 0.42 | 0.04 | 0.49 | −0.90 | 0.08 | −0.41 | −0.32 | 0.36 | |
Error (eV)b | −0.76 | −0.48 | −0.33 | −0.78 | 0.10a | −0.14 | −0.25 | 0.16 | −0.46 | 0.82 | 0.02 | −0.10a | −0.22 |
Algorithms and results | Co | Cr | Fe | Ir | Mn | Mo | Os | Re | Ru | Ta | Tc | V | W | MAE (eV) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
a These cases are part of the training set; therefore, small errors are expected.b Error between DFT and predicted value. | |||||||||||||||
DFT (eV) | −2.13 | −1.63 | −2.14 | −2.47 | −1.94 | −1.43 | −2.37 | −1.87 | −2.28 | −1.04 | −1.88 | −1.26 | −1.45 | ||
MLP | Prediction 1 (eV) | −1.97 | −1.69 | −2.03 | −1.99 | −1.88 | −1.55 | −2.12 | −1.93 | −1.98 | −1.08 | −1.85 | −1.16 | −1.61 | 0.15 |
Error (eV)b | −0.16 | 0.05 | −0.11 | −0.48 | −0.06 | 0.12 | −0.25 | 0.06 | −0.30 | 0.04 | −0.03 | 0.10 | 0.16 | ||
Prediction 2 (eV) | −2.05 | −1.72 | −2.06 | −2.06 | −1.92 | −1.63 | −2.21 | −1.97 | −2.02 | −1.20 | 1.92 | −1.24 | −1.74 | 0.14 | |
Error (eV)b | −0.08 | 0.09 | −0.08 | −0.41 | −0.02a | 0.20 | −0.16 | 0.10 | −0.26 | 0.16 | −0.04 | −0.02a | 0.29 | ||
DTR | Prediction 1 (eV) | −2.19 | −2.19 | −2.19 | −2.36 | −1.88 | −1.63 | −2.69 | −2.69 | −2.36 | −0.66 | −1.88 | −1.29 | −1.63 | 0.22 |
Error (eV)b | 0.06 | 0.56 | 0.05 | −0.11 | −0.06 | 0.20 | 0.32 | 0.82 | 0.08 | −0.38 | 0.00 | 0.03 | 0.18 | ||
Prediction 2 (eV) | −2.13 | −2.13 | −2.13 | −2.41 | −1.88 | −1.63 | −2.41 | −2.41 | −2.41 | −0.66 | −1.90 | −1.29 | −1.63 | 0.18 | |
Error (eV)b | 0.00 | 0.50 | −0.01 | −0.06 | −0.06a | 0.20 | 0.04 | 0.54 | 0.13 | −0.38 | 0.02 | 0.03a | 0.18 | ||
XG | Prediction 1 (eV) | −2.11 | −1.65 | −2.10 | −2.25 | −1.90 | −1.63 | −2.18 | −1.88 | −2.21 | −1.00 | −1.85 | −1.28 | −1.53 | 0.08 |
Error (eV)b | −0.02 | 0.02 | −0.04 | −0.22 | −0.04 | 0.20 | −0.19 | 0.01 | −0.07 | −0.04 | −0.03 | 0.02 | 0.08 | ||
Prediction 2 (eV) | −2.14 | −1.67 | −2.12 | −2.26 | −1.94 | −1.64 | −2.19 | −2.00 | −2.23 | −1.04 | −1.85 | −1.26 | −1.53 | 0.07 | |
Error (eV)b | 0.01 | 0.04 | −0.02 | −0.21 | 0.00a | 0.21 | −0.18 | 0.13 | −0.05 | 0.00 | −0.03 | 0.00a | 0.08 | ||
SVR | Prediction 1 (eV) | −2.29 | −2.19 | −2.39 | −2.48 | −2.33 | −2.07 | −2.75 | −2.50 | −2.38 | −1.83 | −2.34 | −1.74 | −2.29 | 0.44 |
Error (eV)b | 0.16 | 0.56 | 0.25 | 0.01 | 0.39 | 0.64 | 0.38 | 0.63 | 0.10 | 0.79 | 0.46 | 0.48 | 0.84 | ||
Prediction 2 (eV) | −2.06 | −1.90 | −2.13 | −2.22 | −1.94 | −1.74 | −2.44 | −2.2 | −2.13 | −1.4 | −2.09 | −1.26 | −1.94 | 0.19 | |
Error (eV)b | −0.07 | 0.27 | −0.01 | −0.25 | 0.00a | 0.31 | 0.07 | 0.33 | −0.15 | 0.36 | 0.21 | 0.00a | 0.49 |
Fig. 8 Comparison of CO2 Eads between four ML predictions and DFT: prediction 1 (left); prediction 2 (right). |
According to the data analysis of CO2 adsorption energy in Table 5, prediction 1 shows that the MLP model has the most significant deviation with a maximum error of −0.67 eV for a single material, a minimum error of −0.03 eV for a single material, and an MAE of 0.272 eV for all materials. The XGBoost model has the least deviation with a maximum error of −0.52 eV for a single material, a minimum error of 0 eV for a single material, and an MAE of 0.223 eV for all materials. This result is mainly due to machine learning being derived from learned data. If feature laws do not conform to the learned set outside this range, it may lead to prediction bias. Unlearned Pd data can affect prediction accuracy if there is a significant deviation between primary Pd data and learning sets.
After learning and training from CO2 adsorption energy characteristic data on Pd/Mn and Pd/V alloys, prediction 2 results show significantly improved accuracy across all four machine learning models compared to prediction 1 results alone. Among them, XGBoost has the best prediction accuracy on CO2 adsorption energy, with its the MAE decreased from 0.223 eV to 0.138 eV.
As visualized in Fig. 7, With the five Eads lowest energy calculated by DFT are Pd/W, Pd/V, Pd/Cr, Pd/Mo, and Pd/Ta. The predicted results of XGBoost are Pd/Ta, Pd/W, Pd/Mo, Pd/V and Pd/Cr. Although the expected adsorption energy values were somewhat skewed, the lowest five energy combinations were 100% accurate.
According to the CO adsorption energy data analysis in Table 6 and Fig. 8, MLP and XGBoost have excellent performance regardless of prediction 1 or 2, while DTR and SVR have average performance. The main reason for the difference in predicted results between CO and CO2 is that the adsorption energy of the vital feature H is positively correlated with the adsorption energy value of CO, therefore even the prediction of the unknown base data also conforms to this law with a small error. The XGBoost model has the best prediction effect on CO adsorption energy. As shown in Fig. 8, the five alloy combinations with the lowest energy calculated by DFT are Pd/Ir, Pd/Os, Pd/Ru, Pd/Fe, and Pd/Co. The prediction 1 results of XGBoost are Pd/Ir, Pd/Ru, Pd/Os, Pd/Co, and Pd/Fe.
To sum up, the result of prediction 1 is XGBoost > MLP = SVR = DTR, and the result of prediction 2 is XGBoost = MLP > SVR = DTR. Although the MLP and SVR models have strong learning abilities, their generalization of data processing is poor. Therefore, the XGBoost model is still the most stable machine learning model for the prediction of adsorption energy.
(R1) |
(R2) |
COOH* + (H+ + e−) → CO* + H2O | (R3) |
The ML model can predict the adsorption energy values of CO2, and CO on all alloy surfaces, and based on the stability of reaction intermediates, a potential energy diagram along the CO2 reduction reaction pathway can be constructed and shown in Fig. 9. It should be noted that our model predicts only the Eads of CO and CO2, there Eads of the intermediate COOH on Fig. 9 were directly from DFT calculations. In the future, our model can be extended to predict Eads of all intermediates along the CO2RR pathway. Meanwhile, our model lacks the ability to estimate kinetic barriers, and the energies shown in Fig. 9 are only electronic energies without entropy or zero-point corrections. Despite these limitations, this model provides useful information in the screening of materials. Specifically, our model predicts that Pd/Mo may provide a fast CO2 to COOH conversion because of its strong binding to both CO2 and COOH. Although Pd/Os also has a strong binding with COOH, its weak binding of CO may lead to the conversion to CO slower than Pd/Os.
To our knowledge, there are no reports about Pd/Mo alloy catalysts for CO2RR. But the Pd/Mo alloy, a highly curved and sub-nanometer thick metal nanosheet, is an efficient and stable electrocatalyst for ORR and OER in alkaline electrolytes and has shown good performance in zinc-air and lithium-air batteries,63 Dawid Ciesielski et al. studied the diffusion of Pd adatoms on faceted Pd/Mo (111) surfaces with hill and valley structures is studied using the kinetic Monte Carlo method.64 Cao65 et al. also describe an efficient method for preparing highly dispersed carbon-supported Pd/Mo bimetallic nanoparticles. In other words, as a preliminary screening result, the selected alloy catalysts need to be further verified. Nevertheless, if the CO and CO2 adsorption energies and stability of layered alloys play a vital role in a process like CO2 hydrogenation to CO provided in this paper, it will be very informative and valuable.
In the future, the ML model will be further improved by expanding the optimization of descriptors, adding ensemble learning methods, and expanding the data set. At the same time, this method can also be applied to the screening of electrocatalytic materials by predicting the adsorption energy of all intermediate products in the alloy catalytic electroreduction pathway of carbon dioxide reduction.
Footnote |
† Electronic supplementary information (ESI) available: Supplementary materials (including adsorption energy calculated by VASP, concrete values of machine learning feature parameters, the full name and naming of physicochemical parameters, and the feature screening results) are available in this article. Source codes of our ML models can be available to readers upon request. See DOI: https://doi.org/10.1039/d4ra00710g |
This journal is © The Royal Society of Chemistry 2024 |