Licheng Yu,
Wenwen Zhang,
Zhihao Nie,
Jingjing Duan and
Sheng Chen*
Key Laboratory for Soft Chemistry and Functional Materials (Ministry of Education), School of Chemistry and Chemical Engineering, School of Energy and Power Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China. E-mail: sheng.chen@njust.edu.cn
First published on 18th March 2024
Traditional design/optimization of metal–organic frameworks (MOFs) is time-consuming and labor-intensive. In this study, we utilize machine learning (ML) to accelerate the synthesis of MOFs. We have built a library of over 900 MOFs with different metal salts, solvent ratios, reaction durations and temperatures, and utilize zeta potentials as target variables for ML training. A total of four ML models have been used to train the collected dataset and assess their convergence performances, where Random Forest Regression (RFR) and Gradient Boosting Regression (GBR) models show strong correlation and accurate predictions. We then predicted two kinds of MOFs from RFR and GBR models. Remarkably, the experimentally data of the synthesized MOFs closely matched the predicted results, and these MOFs exhibited excellent electrocatalytic performances for oxygen evolution. This study would have general implications in the utilization of machine learning for accelerating the synthesis of MOFs for diverse applications.
The application of ML in materials science, particularly in catalytic science, is rapidly advancing, because ML has successfully enabled predictions related to elemental composition, crystal structure, microstructure images, surface reaction networks, and surface phase diagrams of catalysts.11–13 One key aspect of applying ML in material development lies in the representation of input descriptors. Selecting appropriate descriptors plays a crucial role in enhancing the accuracy of ML model training. By harnessing data-driven technologies, not only can ML serve as a powerful tool for discovering new electrocatalysts, it also provides a deeper understanding of the relationship between inherent characteristics of MOF materials and their electrocatalytic performance. Recent advancements14,15 in this field have led to the development of Machine Learning (ML) models16 that can further enhance and expedite the design and discovery process of MOFs. ML-assisted screening research has proven successful in various applications, particularly in the areas of H2 storage,17–19 CO2 separation/capture,20–22 and other gas storage and separation fields. These ML models leverage high-throughput density functional theory (DFT) workflows23–25 to construct large-scale electronic structure26 performance databases encompassing materials ranging from inorganic solids to molecular systems. The combination of high-throughput DFT databases27–29 with ML has facilitated the discovery of materials with desirable properties across different domains.30–33 For example, this approach has led to the identification of materials for high-efficiency organic light-emitting diodes,34 super-hard inorganic materials,35 thermal conductive polymers,36 and more. To unlock the full potential of network chemistry37 and accelerate material discovery, it is crucial to develop a similar database that encompasses MOF material properties calculated using DFT. Establishing a comprehensive MOF property database, coupled with ML techniques, will contribute to advancing network chemistry and expediting material discovery processes.38–40 Overall, the integration of ML models in MOF research has revolutionized the screening and discovery process. By leveraging the power of ML algorithms and the wealth of computational data, researchers can efficiently explore the vast space of MOF structures, leading to the identification of promising candidates for various applications.
In this work, we employed a hydrothermal method to synthesize a significant number of MOF materials in batches. Although the dataset may be relatively small compared to most machine learning (ML) studies, it is considered substantial within the field of experimental ML integration. When the dataset is small, complex models are prone to overfitting, so it is necessary to simplify the model or use regularization techniques to improve generalization ability. And when the dataset is small, it is prone to underfitting, and feature selection becomes more important because too many features may also lead to overfitting. In big datasets, there may be a large number of features, which increases the complexity of feature selection and feature engineering; of course, when the dataset is large, more complex models can be trained because having more data helps reduce the risk of overfitting. The size of the dataset directly affects the generalization ability of the model. A larger dataset can provide more training samples, enabling the model to better understand the distribution and patterns of data, thereby improving generalization ability. Smaller datasets may not provide sufficient training samples, resulting in poor performance of the model on new, unseen data. Therefore, for large datasets, more complex algorithms and models can be used, such as deep learning models or ensemble learning algorithms. For small datasets, simpler algorithms or ensemble methods may be needed to improve the model's generalization ability. In summary, the size of the dataset has a significant impact on the generalized ability of machine learning models. In order to achieve good generalization ability, it is necessary to comprehensively consider factors such as feature selection, model complexity, regularization, and algorithm selection based on the size of the dataset.
The zeta potential of MOF material dispersion was selected as the output descriptor, which has a correlation with the material size. A higher absolute value of the zeta potential indicates better stability of the dispersion and often corresponds to smaller material sizes, which can influence the electrocatalytic performance of the MOFs. Using the zeta potential data collected from experimental synthesis as the dataset, you trained four ML models. Among them, the Random Forest Regressor (RFR) and Gradient Boosting Regressor (GBR) models showed good training effects. These two models were then used to predict the zeta potential of five-metal MOFs, and corresponding experimental synthesis parameters were obtained. Subsequently, these two types of MOFs were synthesized. The zeta potentials of their dispersions were measured, and it was found that the results were consistent with the ML predictions, validating the rationality of the ML model selection. Furthermore, electrocatalytic oxygen evolution reaction (OER) tests were conducted on the two predicted MOF materials. It was observed that the predicted MOFs exhibited smaller OER overpotentials at a current density of 10 mA cm−2. This suggests that the ML-guided approach holds promise for guiding material development for electrocatalytic OER. The work provides practical ideas for researchers to conduct large-scale material development and demonstrates the potential of ML-guided strategies in accelerating the discovery and optimization of materials for specific applications, such as electrocatalytic OER. By combining experimental synthesis, ML models, and validation through characterization and testing, this research contributes to the advancement of materials science and the application of ML in material development.
We achieve this goal by adopting appropriate input feature selection, which depends on their importance and potential impact on the prediction results. For the input features, we selected 8 descriptors, which are the atomic radius (R), electronegativity (E), main group number (group), outermost electron number (n), atomic number (N) of the metal in the material, as well as the temperature (T), time (t), and solvent ratio (per) of the synthesis reaction. These properties can be obtained at any time from the periodic table and experimental conditions. From a practical perspective, it is important to choose easily obtainable feature values as descriptors in order to effectively bypass time-consuming DFT calculations and maintain good prediction accuracy. We use the Scikit Learn package to develop our ML model. Firstly, we normalize the data, which is an important step in mitigating potential biases caused by differences in feature scales.
The next step is to experimentally synthesize 1–4 metal MOF materials (900 types), test their dispersion to form a dataset of zeta potentials, randomly shuffle and divide 900 sets of data, with 25% as the training set and 75% as the testing set, select appropriate machine learning models, including k-near neighbor (KNN), support vector regression (SVR), random forest regression (RFR), and gradient boosting regression (GBR) to evaluate the zeta potential of the training set, To predict the zeta potential of the test set. The model was comprehensively evaluated using mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R2) scores. At the same time, we conducted a single experiment of 100 random leave-n-out trials by repeating 100 random tests/training segmentation, and used the average of 100 RMSE estimates as the prediction accuracy of the ML model (Fig. 1).
In the experiment, five-metal salts (copper acetate, iron nitrate, zinc acetate, manganese acetate, and cadmium acetate) were chosen as organic ligands, along with pyromellitic acid. The reaction time and solvent ratio (water vs. ethanol) were varied, resulting in the synthesis of 900 different MOF materials under hydrothermal conditions at a temperature of 65 °C. To assess the dispersion stability of each material, the zeta potential measurements were conducted. Each material's dispersion was measured three times to reduce measurement errors. The collected data from these measurements formed a dataset that is represented in Fig. S1–S6,† which likely shows the zeta potential values for the different MOF materials synthesized under varying reaction conditions. This dataset serves as valuable information for further analysis and can be used as input for machine learning algorithms to predict and understand the relationship between synthesis parameters and the resulting properties of the MOF materials.
To evaluate the impact of individual input features on model output, we use features from sklearn_ Importances_ Attribute is used to obtain the importance value of each feature in the decision tree. In decision trees, the importance of features can be measured by observing the role of each feature in the decision-making process of the model. Specifically, if a feature plays an important distinguishing role at each node of the decision tree, then that feature is considered important. Therefore, the decision tree is of great significance for tasks such as feature selection and model optimization. As shown in Fig. S9,† which shows the feature importance scores of 7 descriptors, we found that R and per have a prominent effect.
In addition, we also use Pearson correlation coefficient to evaluate the correlation of input features. We found that the correlation of temperature (T) is 0, so it is not input as a feature. The highest Pearson correlation coefficient between other features is only 0.7, which is lower than 0.8. These features have no dependency in terms of quantity and will not bring information redundancy, so they will not reduce generalization ability and the model will not overfit.
In the next step of the study, preliminary fitting was carried out on the structures of the 900 synthesized MOFs. Following this, four ML models were selected for training and convergence using the dataset. The chosen ML models were k-near neighbor (KNN), gradient boosting regression (GBR), random forest regression (RFR), and support vector regression (SVR). Fig. 2 shows the convergence graphs of the dataset training for the four models. It can be observed that the training results of the RFR and GBR models are more consistent with the test results and demonstrate better convergence compared to the other models. This finding is further supported by Fig. S7,† which displays the training errors of the different models. Among the four selected models, GBR and RFR have high R2 values of 0.940 and 0.948, respectively. The GBR and RFR models exhibit relatively small mean absolute error (MAE) values of 1.32 and 1.34, respectively, as well as root mean squared error (RMSE) values of 1.1 and 1.11, respectively. On the other hand, the KNN and SVR models have slightly higher MAE values of 1.56 and 1.53, respectively, and RMSE values of 2.59 and 1.8, respectively. Based on these evaluation metrics, it can be concluded that the GBR and RFR models demonstrate superior performance in accurately predicting the zeta potential of the MOF materials, as they achieve lower errors and better convergence during training.
Fig. 2 Convergence graph of four ML models used for dataset training. (a–d) are GBR, KNN, RFR and SVR. |
To verify the prediction of the ML model, GBR and RFR models were used to predict the zeta potential of five-metal MOFs. As shown in Fig. S10,† the predicted values of GBR and RFR models are in good agreement with the experimental values. Generally speaking, the higher the absolute zeta potential of the nanomaterial dispersion, the smaller the nanomaterial, and the better the catalytic effect. Therefore, we selected materials with higher absolute values from the predicted values of GBR and RFR models for synthesis, and tested their electrocatalytic OER performance. The reaction conditions predicted by the GBR model are a reaction time of 26 hours, a solvent ratio of DIW:Et = 3:2, a reaction temperature of 65 °C, and a predicted zeta potential of −11.11 mV for the MOF dispersion. The RFR model predicts a reaction time of 29 hours, a solvent ratio of DIW:Et = 1:4, a reaction temperature of 65 °C, and a predicted zeta potential of −11.86 mV for the MOF dispersion. Based on these predictions, two types of MOFs composed of five metals were synthesized using a simple solvothermal reaction. Fig. 3 shows the surface morphology of the synthesized MOFs, exhibiting linear microstrip structures with different boundaries. In addition, the energy dispersive X-ray spectroscopy (EDS) of both MOFs confirmed the presence of all five metals in the MOF structure. This validation experiment emphasizes the reliability of the ML model in predicting the zeta potential of MOF material dispersion and guiding the synthesis process. The synthesized MOFs demonstrate the expected characteristics based on ML prediction, further supporting the rationality and accuracy of ML model selection and its application in material development.
Fig. 3 SEM and EDS images of two five-metal MOFs predicted by GBR and RFR models, where (a and b) are RFR and (c and d) are GBR. |
Fig. S10† shows the zeta potential of the dispersion of five-metal MOFs synthesized at different reaction times and solvent ratios. It is interesting that as the proportion of ethanol in the solvent increases, the zeta potential value does not show a consistent trend. On the contrary, they exhibit a fluctuating pattern. It is worth noting that the turning points in the predicted trends of GBR and RFR models are consistent. These findings demonstrate the complexity of the relationship between reaction conditions, solvent ratio, and zeta potential of the obtained dispersion. The ability of ML models to capture and predict these complex patterns supports their usefulness in guiding material synthesis and understanding the factors that affect the performance of MOF materials.
In the subsequent stage, the five-metal MOF materials were assembled into electrodes for the electrocatalytic oxygen evolution reaction (OER), as depicted in Fig. 4. It is observed that when the solvent ratio is DIW:Et = 1:4, the RFR model predicts that the MOF material synthesized with a reaction time of 29 hours demonstrates a smaller overpotential of only 388 mV. The overpotentials at different reaction times are reported as 420 mV at 23 hours, 464 mV at 26 hours, 440 mV at 32 hours, and 455 mV at 35 hours. Similarly, when the solvent ratio is DIW:Et = 3:2, the GBR model predicts that the MOF material synthesized with a reaction time of 26 hours exhibits a smaller overpotential of only 306 mV. The overpotentials at different reaction times are reported as 403 mV at 23 hours, 423 mV at 29 hours, 406 mV at 32 hours, and 430 mV at 35 hours. In addition, from the electrochemical impedance spectroscopy (EIS) curves generated by the two models, it can be observed that the selected materials with larger zeta potentials predicted by the GBR and RFR models exhibit smaller impedances. This suggests improved catalytic activity and more efficient electron transfer at the electrode–electrolyte interface (Fig. 4c and f). These results indicate the potential of the ML models in predicting and optimizing the electrocatalytic performance of MOF materials for the OER. By considering various reaction conditions and solvent ratios, the models can guide the synthesis process to achieve MOF materials with enhanced electrocatalytic activity.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d3ra08873a |
This journal is © The Royal Society of Chemistry 2024 |