MacroConf – dataset & workflows to assess cyclic peptide solution structures†
Abstract
Knowing solution structures of cyclic peptides is essential for predicting pharmacokinetic properties for drug discovery. Here, we report the MacroConf dataset along with computational workflows to evaluate how well experimental cyclic peptide solution structures are reproduced by current in silico methods. The dataset was compiled from the literature and contains 68 cyclic peptides and macrocycles with existing solution NMR data. We provide a reproducible and automated computational workflow to quickly compare different cyclic peptide (CP) conformer generators with one another and to NMR derived nuclear overhauser effect (NOE) distance constraints. When analysing the CP subset of compounds, we found that enhanced sampling molecular dynamics (MD) methods, such as Gaussian accelerated MD, reproduced experimental NOEs well. Conventional MD suffered from a lack of sampling especially for compounds with proline isomerisation and did not always match with the reference data. When considering all compounds studied here, conventional and Gaussian accelerated MD were statistically indistinguishable when considering the % of NOE distance restraints satisfied. Cheminformatics based conformer generators such as OMEGA and RDKit ETKDG often generated diverse and plausible structures that matched the sampling observed in MD-based methods, but do not yield relative populations or thermodynamic insights. Bundles of conformers produced via cheminformatics methods reproduced experimental NOE values to similar levels as the MD based methods, with high-quality structures contained in the cheminformatics outputs. The presented computational workflow can be easily extended to include new compounds or different simulation methods. We envisage that this work will serve as a benchmark to help improve cyclic peptide conformer generators and standardize their assessment.