Capturing molecular interactions in graph neural networks: a case study in multi-component phase equilibrium†
Abstract
Graph neural networks (GNNs) have been widely used for predicting molecular properties, especially for single molecules. However, when treating multi-component systems, GNNs have mostly used simple data representations (concatenation, averaging, or self-attention on features of individual components) that might fail to capture molecular interactions and potentially limit prediction accuracy. In this work, we propose a GNN architecture that captures molecular interactions in an explicit manner by combining atomic-level (local) graph convolution and molecular-level (global) message passing through a molecular interaction network. We tested the architecture (which we call SolvGNN) on a comprehensive phase equilibrium case study that aims to predict activity coefficients for a wide range of binary and ternary mixtures; we built this large dataset using the COnductor-like Screening MOdel for Real Solvation (COSMO-RS). We show that SolvGNN can predict composition-dependent activity coefficients with high accuracy and show that it outperforms a previously-developed GNN used for predicting only infinite-dilution activity coefficients. We performed counterfactual analysis on the SolvGNN model that allowed us to explore the impact of functional groups and composition on equilibrium behavior. We also used the SolvGNN model for the development of a computational framework that automatically creates phase diagrams for a diverse set of complex mixtures. All scripts needed to reproduce the results are shared as open-source code.