A machine learning approach for predicting the nucleophilicity of organic molecules†
Abstract
Nucleophilicity provides important information about the chemical reactivity of organic molecules. Experimental determination of the nucleophilicity parameter is a tedious and resource-intensive approach. Herein, we present a novel machine learning protocol that uses key structural descriptors to predict the nucleophilicities of organic molecules, which agree well with the experimental values. A data driven approach was used where quantum mechanical molecular and thermodynamic descriptors from a wide range of structurally diverse nucleophiles and relevant solvents were extracted and modelled using advanced algorithms against the experimentally available nucleophilicity values. Despite the structural diversity of nucleophiles, we are able to achieve statistically robust models with a high predictive power using tree-based and neural network algorithms trained on an in-house developed unique dataset consisting of 752 nucleophilicity values and 27 molecular descriptors.