Evaluating the density of organic compounds at variable temperatures by norm descriptors-based QSPR model
Abstract
Accurately predicting the density of organic compounds is essential in chemical engineering. This study develops a robust quantitative structure-property relationship (QSPR) model with a comprehensive dataset of 5478 organic compounds and 23866 data points to predict density over a broad temperature range (115.0 to 594.1 K). Notably, norm indices (NIs) are applied for QSPR modeling of organic compound density for the first time. The model demonstrates excellent predictive performance, with a squared correlation coefficient (R2) of 0.9953 and a mean absolute error (MAE) of 10.11 kg/m³. Rigorous internal, external, and extrapolation validations are applied to confirm the model’s reliability, accuracy, and generalization. The model achieves an R2 value of 0.9951 and a MAE of 9.31 kg/m³ in external validation, while in internal validation using leave-one-out cross-validation, the corresponding values are 0.9951 and 10.51 kg/m³, respectively. Extrapolation validation, a novel approach recently introduced, further confirms the model’s extrapolation ability, with most descriptors achieving the root mean square error (RMSE) of the test set (EV) values well below the training set’s standard deviation (σ95=140.89 kg/m³), closely aligning with RMSEtest(Model). RMSE of forward test exhibits a significant increase for NI8 and NI27 when extrapolation degree (ED) exceeds 0.02, which suggests that it is not recommended to apply these two NIs for extrapolation. Overall, the results validate the robustness and broad applicability of the ρ(NI, T)-QSPR model, confirming its reliability for organic compound density prediction in industrial applications.
- This article is part of the themed collection: Emerging Investigator Series