Hybrid QSPR models for the prediction of the free energy of solvation of organic solute/solvent pairs†
Abstract
Due to the importance of the Gibbs free energy of solvation in understanding many physicochemical phenomena, including lipophilicity, phase equilibria and liquid-phase reaction equilibrium and kinetics, there is a need for predictive models that can be applied across large sets of solvents and solutes. In this paper, we propose two quantitative structure property relationships (QSPRs) to predict the Gibbs free energy of solvation, developed using partial least squares (PLS) and multivariate linear regression (MLR) methods for 295 solutes in 210 solvents with total number of data points of 1777. Unlike other QSPR models, the proposed models are not restricted to a specific solvent or solute. Furthermore, while most QSPR models include either experimental or quantum mechanical descriptors, the proposed models combine both, using experimental descriptors to represent the solvent and quantum mechanical descriptors to represent the solute. Up to twelve experimental descriptors and nine quantum mechanical descriptors are considered in the proposed models. Extensive internal and external validation is undertaken to assess model accuracy in predicting the Gibbs free energy of solvation for a large number of solute/solvent pairs. The best MLR model, which includes three solute descriptors and two solvent properties, yields a coefficient of determination (R2) of 0.88 and a root mean squared error (RMSE) of 0.59 kcal mol−1 for the training set. The best PLS model includes six latent variables, and has an R2 value of 0.91 and a RMSE of 0.52 kcal mol−1. The proposed models are compared to selected results based on continuum solvation quantum chemistry calculations. They enable the fast prediction of the Gibbs free energy of solvation of a wide range of solutes in different solvents.