A cocrystal prediction method of graph neural networks based on molecular spatial information and global attention†
Abstract
This study presents a cocrystal prediction model based on molecular point cloud information and a graph attention network (GAT). We first expanded our experimental dataset for the training purpose. This dataset included 6368 positive samples obtained from the Cambridge Structural Database (CSD) and 1104 negative samples. The latter comprised failed cocrystal experiments conducted in our laboratory and recorded results of failed cocrystal experiments in the relevant literature. We then explored a feasible GAT framework and integrated molecular point cloud information into end-to-end learning for cocrystal prediction. It involved applying a graph network to learn the three-dimensional information of the molecular structure and improving the cocrystal prediction via a global attention mechanism. The experimental results indicated that the proposed model achieved an average accuracy of 98.87% on the training dataset. For five challenging independent test sets, it achieved a high performance accuracy of 99.35%, demonstrating the excellent robustness and generalization ability of our model. Furthermore, our model is insensitive to the order of input molecules because it merges the adjacency matrices of molecules in the feature representation, which leads to a reduction of nearly 50% in computation. Finally, the cocrystal screening experiment on edaravone was conducted to verify the validity of the model. An edaravone p-aminobenzoic acid cocrystal was obtained and determined by single-crystal X-ray diffraction. Our model can correctly predict the formation of cocrystals, demonstrating the validity of this model in practice.