Predictive deep learning models for environmental properties: the direct calculation of octanol–water partition coefficients from molecular graphs†
Abstract
As an essential environmental property, the octanol–water partition coefficient (KOW) quantifies the lipophilicity of a compound and it could be further employed to predict toxicity. Thus, it is an indispensable factor that should be considered for screening and development of green solvents with respect to unconventional and novel compounds. Herein, a deep-learning-assisted predictive model has been developed to accurately and reliably calculate log KOW values for organic compounds. An embedding algorithm was specifically established for generating signatures automatically for molecular structures to express structural information and connectivity. Afterwards, the Tree-structured long short-term memory (Tree-LSTM) network was used in conjunction with signature descriptors for automatic feature selection, and it was then coupled with the back-propagation neural network to develop a deep neural network (DNN), which is used for modeling quantity structure–property relationship (QSPR) to predict log KOW. Compared with an authoritative estimation method, the proposed DNN-based QSPR model exhibited better predictive accuracy and greater discriminative power in terms of the structural isomers and stereoisomers. As such, the proposed deep learning approach can act as a promising and intelligent tool for developing environmental property prediction methods for guiding development or screening of green solvents.