Optimal machine learning feature selection for assessing the mechanical properties of a zeolite framework†
Abstract
In this study, 45 and 249 critical features were discovered among 896 zeolite descriptors generated by the matminer package for estimating the shear and bulk moduli of zeolites, respectively. A database containing the mechanical properties of 873 zeolite structures, calculated using density functional theory, was used to train the machine learning regression model. The results of using these critical features with the LightGBM algorithm were rigorously compared with those from other regressors as well as with different sets of features. The comparison results indicate that the surrogate model with critical features increases the prediction accuracy of the bulk and shear moduli of zeolites by 17.3% and 10.6%, respectively, and reduces the prediction uncertainty by one-third of that achieved using previously available features. The suggested features originating from several physical and chemical groups highlight the unveiled relationships between the features and mechanical properties of zeolites. The robustness of the constructed model with 356 features was confirmed by applying a set of different training-test set ratios. We believe that the suggested critical features of zeolites can help to understand the mechanical behavior of a half million unlabeled hypothetical zeolite structures and accelerate the discovery of novel zeolites with unprecedented mechanical properties.