Data-driven generation of perturbation networks for relative binding free energy calculations†
Abstract
Relative binding free energy (RBFE) calculations are increasingly used to support the ligand optimisation problem in early-stage drug discovery. Because RBFE calculations frequently rely on alchemical perturbations between ligands in a congeneric series, practitioners are required to estimate an optimal combination of pairwise perturbations for each series. RBFE networks constitute in a collection of edges chosen such that all ligands (nodes) are included in the network, where each edge represents a pairwise RBFE calculation. As there is a vast number of possible configurations it is not trivial to select an optimal perturbation network. Current approaches rely on human intuition and rule-based expert systems for proposing RBFE perturbation networks. This work presents a data-driven alternative to rule-based approaches by using a graph siamese neural network architecture. A novel dataset, RBFE-Space, is presented as a representative and transferable training domain for RBFE machine learning research. The workflow presented in this work matches state-of-the-art programmatic RBFE network generation performance with several key benefits. The workflow provides full transferability of the network generator because RBFE-Space is open-sourced and ready to be applied to other RBFE software. Additionally, the deep learning model represents the first machine-learned predictor of perturbation reliability in RBFE calculations.