Carbohydrate NMR chemical shift prediction by GeqShift employing E(3) equivariant graph neural networks
Abstract
Carbohydrates, vital components of biological systems, are well-known for their structural diversity. Nuclear Magnetic Resonance (NMR) spectroscopy plays a crucial role in understanding their intricate molecular arrangements and is essential in assessing and verifying the molecular structure of organic molecules. An important part of this process is to predict the NMR chemical shift from the molecular structure. This work introduces a novel approach that leverages E(3) equivariant graph neural networks to predict carbohydrate NMR spectral data. Notably, our model achieves a substantial reduction in mean absolute error, up to threefold, compared to traditional models that rely solely on two-dimensional molecular structure. Even with limited data, the model excels, highlighting its robustness and generalization capabilities. The model is dubbed GeqShift (geometric equivariant shift) and uses equivariant graph self-attention layers to learn about NMR chemical shifts, in particular since stereochemical arrangements in carbohydrate molecules are characteristics of their structures.