MMSSC-Net: multi-stage sequence cognitive networks for drug molecule recognition
Abstract
In the growing body of scientific literature, the structure and information of drugs are usually represented in two-dimensional vector graphics. Drug compound structures in vector graphics form are difficult to recognize and utilize by computers. Although the current OCSR paradigm has shown good performance, most existing work treats it as a single isolated whole. This paper proposes a multi-stage cognitive neural network model that predicts molecular vector graphics more finely. Based on cognitive methods, we construct a model for fine-grained perceptual representation of molecular images from bottom to top, and in stages, the primary representation of atoms and bonds is potential discrete label sequence (atom type, bond type, functional group, etc.). The second stage represents the molecular graph according to the label sequence, and the final stage evolves in an extensible manner from the molecular graph to a machine-readable sequence. Experimental results show that MMSSC-Net outperforms current advanced methods on multiple public datasets. It achieved an accuracy rate of 75–94% on cognitive recognition at different resolutions. MMSSC-Net uses a sequence cognitive method to make it more reliable in interpretability and transferability, and provides new ideas for drug information discovery and exploring the unknown chemical space.