A data-driven QSPR model for screening organic corrosion inhibitors for carbon steel using machine learning techniques†
Abstract
Machine learning (ML) techniques have shown great potential for screening corrosion inhibitors. In this study, a data-driven quantitative structure–property relationship (QSPR) model using the gradient boosting decision tree (GB) algorithm combined with the permutation feature importance (PFI) technique was developed to predict the corrosion inhibition efficiency (IE) of organic compounds on carbon steel. The results showed that the PFI method effectively selected the molecular descriptors most relevant to the IE. Using these important molecular descriptors, an IE predictive model was trained on a dataset encompassing various categories of organic corrosion inhibitors for carbon steel, achieving RMSE, MAE, and R2 of 6.40%, 4.80%, and 0.72, respectively. The integration of GB with PFI within the ML workflow demonstrated significantly enhanced IE predictive capability compared to previously reported ML models. Subsequent assessments involved the application of the trained model to drug-based corrosion inhibitors. The model demonstrates robust predictive capability when validated on available and our own experimental results. Furthermore, the model has been employed to predict IE for more than 1500 drug compounds, suggesting five novel drug compounds with the highest predicted IE on carbon steel. The developed ML workflow and associated model will be useful in accelerating the development of next-generation corrosion inhibitors for carbon steel.