Enhanced descriptor identification and mechanism understanding for catalytic activity using a data-driven framework: revealing the importance of interactions between elementary steps†
Abstract
Accurate identification of descriptors for catalytic activities has long been essential to the in-depth understanding of catalysis and recently to set the basis for catalyst screening. However, commonly used methods suffer from low accuracy in predictability. This study reports an enhanced approach to accurately identify the descriptors from a kinetic dataset using a machine learning (ML) surrogate model. CO hydrogenation to methanol over Cu-based catalysts was taken as a case study. Our model captures not only the contribution from individual elementary steps but also the interaction between relevant steps within a reaction network, which was found to be essential for high accuracy. As a result, six effective descriptors are identified, which are accurate enough to ensure the trained gradient boosted regression (GBR) model for good prediction of the methanol turnover frequency (TOF) over metal (M)-doped Cu(111) model surfaces (M = Au, Cu, Pd, Pt, Ni). More importantly, going beyond the purely mathematical ML model, the catalytic role of each identified descriptor can be revealed by using model-agnostic interpretation tools, which enhances the insight into the promoting effect of alloying. The trained GBR model outperforms the conventional derivative-based methods in terms of both the predictability and the mechanism understanding. It opens alternative possibilities toward accurate descriptor-based rational catalyst optimization.