Hebatallah
Mohamed
a,
Hongzhao
Shao
a,
Madoka
Akimoto
a,
Patrick
Darveau
a,
Marc R.
MacKinnon
b,
Jakob
Magolan
b and
Giuseppe
Melacini
*ab
aDepartment of Chemistry and Chemical Biology, McMaster University, Hamilton, Ontario, L8S 4L8, Canada. E-mail: melacin@mcmaster.ca
bDepartment of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, L8S 4L8, Canada
First published on 3rd August 2022
Exchange proteins directly activated by cAMP (EPAC) are guanine nucleotide exchange factors for the small GTPases, Rap1 and Rap2. They regulate several physiological functions and mitigation of their activity has been suggested as a possible treatment for multiple diseases such as cardiomyopathy, diabetes, chronic pain, and cancer. Several EPAC-specific modulators have been developed, however studies that quantify their structure–activity relationships are still lacking. Here we propose a quantitative structure–activity relationship (QSAR) model for a series of EPAC-specific compounds. The model demonstrated high reproducibility and predictivity and the predictive ability of the model was tested against a series of compounds that were unknown to the model. The compound with the highest predicted affinity was validated experimentally through fluorescence-based competition assays and NMR experiments revealed its mode of binding and mechanism of action as a partial agonist. The proposed QSAR model can, therefore, serve as an effective screening tool to identify promising EPAC-selective drug leads with enhanced potency.
I942 functions by binding to the cyclic nucleotide binding domain (CNBD) of EPAC, which serves as the critical allosteric element7 regulating EPAC activation. Recent NMR studies on how I942 interacts with the EPAC1 CNBD have provided initial insight on the I942 mechanism of action.8 The unbound EPAC form (apo) exhibits a dominant inactive state where the phosphate-binding cassette (PBC), which is the region that binds the cAMP phosphate group, is in the ‘out’ conformation (Fig. 1A). The C-terminal helix of the CNBD, α6 helix, commonly called the hinge region as it controls the relative rotation of the regulatory and catalytic regions, also exhibits an ‘out’ conformation (Fig. 1A). Upon cAMP binding the PBC shifts to the ‘in’ conformation and the hinge, consequently, moves to the ‘in’ conformation since it is allosterically coupled to the PBC.9–16
Fig. 1 (A) The cAMP-bound EPAC2-CNBD (holo) state (blue, PDB 3CF6)10 is overlayed on the unbound (apo) state (yellow, PDB 1O7F).9 The phosphate binding cassette (PBC), the base binding region (BBR) and the hinge region are marked by black circles. The PBC and the hinge are in an ‘out’ conformation in the apo state and adopt an ‘in’ conformation in the holo state. cAMP, illustrated in sticks, forms key interactions between its phosphate group and residues in the PBC whereby equivalent residues in EPAC1 are written in brackets. The base of the nucleotide is oriented towards the BBR. (B) Map of residues (cyan surfaces) in the EPAC1-CNBD that showed NOE peaks to I942 protons.8 NOE peaks to protons in the phenyl group of I942 (2 and 3) originated from residues at the PBC, whereas the NOE peak towards the naphthalene proton (9) originated from the BBR residue, T261. (C) A scheme for the cAMP vs. I942 mimicry whereby regions of the molecular structures that are highlighted by the same shape are proposed to interact with similar regions of EPAC1-CNBD. |
A binding interface was proposed for I942 and the EPAC1-CNBD based on measurements of intermolecular nuclear Overhauser effects (NOEs).8,17,18 Such NOEs (Fig. 1B) are between the PBC residues and the I942 protons in the dimethylbenzene and between the base binding region (BBR) residues and the I942 naphthalene moiety. Based on those NOEs as well as chemical shift perturbation analyses, I942 was proposed to mimic cAMP (Fig. 1C), whereby the adenine base of cAMP or the naphthalene group in I942 interact with the BBR, whereas the cAMP's ribose ring or the I942 dimethylbenzene group interact with the PBC region.8 Moreover, the I942 binding induces a partial shift in the PBC and hinge regions and results in a partial ‘in-to-out’ conformational transition in those two regions. The observed effect is more dominantly seen in the hinge region, and together with the chemical shift covariance analysis (CHESCA) results that reported weakened allosteric couplings, a mixed intermediate state was proposed for the I942: EPAC1-CNBD complex, whereby the PBC is in the ‘in’ conformation and the hinge is in the ‘out’.8
Since I942 showed promising EPAC1-selective activity, Wang et al.19 evaluated several I942 analogs for their EPAC binding affinities. Here, we show how it is possible to develop a quantitative structure–activity relationship (QSAR) based on such affinities and how such QSAR can be used to design new I942 analogs with improved affinity for EPAC1. QSAR models aim at establishing a statistically significant correlation between a target property, such as the inhibition of an enzyme function, and molecular descriptors of ligands.20,21 The concept of QSAR modeling was originally introduced in 1964 by Hansch and Fujita.22 Since then, it has been extensively applied in the computer-aided drug discovery process, yet rarely to CNBDs.23
QSAR modeling relies on the notion that compounds which share structural similarity often display comparable biological activities, and this is known as the similarity-property principle (SPP).21 The SPP proposes that slight structural modifications of a compound correspond to slight variations in a biological property, such as potency, of that compound which, in turn, creates the foundation for linear relations that QSAR models aim to generate. From these linear relations, QSAR models can then be utilized for potency predictions of new ligands with a conserved scaffold and varying substituents.
Although several notable advancements in QSAR methods have been developed in the last three decades, only a limited number of QSAR studies are currently available for proteins involved in the cAMP-mediated signaling cascade and none for EPAC.24–28 Here, we report a reliable QSAR model generated using the aforementioned series of I942 analogues and their respective affinity measurements towards the EPAC1-CNBD. We then validated the QSAR model and used it to predict affinities for a series of I942 analogues that were ‘unknown’ to the model. The affinity for the most promising candidate, known as MLGM-2013, as predicted by our validated QSAR model, was confirmed through fluorescence competition assays. In addition, we investigated the mechanism of action of MLGM-2013 using NMR experiments,29 revealing a new avenue to design I942 analogs with enhanced potency through modifications of its phenyl moiety.
Training setbc | Test setd | Cross-validation (CV) | Threshold36,37 | |
---|---|---|---|---|
a Standard deviations were computed using data from eleven different partitioning of training vs. test sets. b Number of molecules in the training set was 45. c Number of descriptors were ≤9. d Number of molecules in the test set was 11. | ||||
R 2 | 0.972 ± 0.008 | 0.929 ± 0.021 | 0.772 ± 0.055 | R 2 > 0.600 and > 0.500 for CV |
σ | 27.69 ± 0.60 | 28.69 ± 2.40 | — | — |
RMSE | 11.83 ± 1.81 | 19.09 ± 3.13 | 12.78 ± 1.50 | RMSE < σ |
k | 0.972 ± 0.008 | 0.960 ± 0.090 | — | 0.850 ≤ k ≤ 1.150 |
Fig. 3 Predicted vs. measured relative fluorescence intensities (RFI) correlation plots, with zero intercepts, for the (A) training and (B) test sets of I942 analogues shown in Fig. 2. Representative molecules are marked with black arrows and labeled in their assigned compound names as in Fig. 2. |
Interestingly, the QSAR models obtained from the 11 distinct partitions, exhibited classes of recurring molecular descriptors in the multiple linear regressions. The shared descriptors are 2D in nature and fall in the ‘autocorrelation’ category, which essentially captures the distribution of physicochemical properties across the spatial arrangement of atoms.33 They describe the correlation between values of specific functions placed at intervals referred to as ‘lag’. The specific functions are the atomic physicochemical properties and ‘lag’ represents the topological distance at which these properties are distributed.34 In our particular model, the main physicochemical properties are (a) the intrinsic state, represented by the GATS5s, AATS5s and MATS5s descriptors, and these report on the electronegativity of the atom in its valence state, as well as (b) the Sanderson electronegativity35 represented by descriptors such as ATSC8e and AATS5e. It was interesting to observe consistently positive coefficients for the descriptors in the linear regression equations, which reflects a positive correlation between these descriptors and the RFI values.
Fig. 4 Independent validation of the QSAR model. (A) Molecular structures of new synthesized I942 analogues that are ‘unknown’ to the QSAR model. MLGM-2013, which was predicted to have the highest affinity towards EPAC1-CNBD (Table 2), is highlighted in blue and MLGM-2017, the compound with the lowest predicted affinity (Table 2) is shown in red. (B) EPAC1-CNBD binding isotherm for I942 (grey), MLGM-2013 (blue), and MLGM-2017 (red) measured through the 8-NBD-cAMP fluorescence-based competition assay. The resulting measured dissociation constants are included in the top right corner. The percentage of 8-NBD-cAMP bound to EPAC1 is represented by 〈v〉 on the y-axis. (C) The chemical shift differences between 50 μM apo EPAC1-CNBD (green) and 50 μM EPAC1-CNBD bound to 350 μM of MLGM-2013 (blue) were monitored by 15N-1HSQC spectra. (D) The compounded chemical shift variations between 50 μM apo EPAC1-CNBD and 50 μM EPAC1-CNBD bound to MLGM-2013 (350 μM), and 100 μM apo EPAC1-CNBD and 100 μM EPAC1-CNBD bound to cAMP (1 mM) are plotted as blue and green bars, respectively. The secondary structure is shown on the top of the plot in boxes and key regions are highlighted in grey. (E) Compounded chemical shift differences of 50 μM EPAC1-CNBD in the presence of MLGM-2013 (350 μM) are plotted against the compounded chemical shift differences of 50 μM EPAC1-CNBD in the presence of I942 (350 μM). (F) 1D saturation-transfer reference (STR, blue) spectrum of EPAC1-CNBD: MLGM-2013 overlayed with the scaled saturation transfer difference (STD, red) spectrum. The assigned protons are marked in green and represented as circles on the structure of MLGM-2013 where the size of the circles reflects the relative STD/STR ratios (normalized to proton j with the highest STD/STR ratio). The structure of I942 with the previously determined STD/STR ratios8 are shown for comparison whereby the STD/STR ratios with the most significant differences are reported near the corresponding proton. |
Compound name | Predicted RFI (%) |
---|---|
MLGM-2013 | 25.87 |
MLGM-2010 | 48.73 |
MLGM-2016 | 52.32 |
MLGM-2011 | 59.36 |
I942 | 69.97 |
MLGM-2012 | 79.08 |
MLGM-2017 | 82.22 |
To confirm our predictions, a competition assay was preformed using the fluorescently tagged cAMP known as 8-(2-[7-nitro-4-benzofurazanyl] aminoethylthio) adenosine-3′,5′-cyclic monophosphate (8-NBD-cAMP).30 The displacement of 8-NBD-cAMP by a competing ligand at increasing concentrations was used to measure the dissociation constant (KD) of MLGM-2013, I942 as well as MLGM-2017, as a negative control. The assay clearly showed a significant enhancement of the binding affinity of MLGM-2013 relative to I942 (Fig. 4B), while MLGM-2017 resulted in a significantly higher KD value compared to that of I942, as expected, further confirming the validity of our QSAR model's predictions. To gain structural insight into the enhanced affinity of MLGM-2013 and its mechanism of action, we also investigated the interactions of this I942 analog with the EPAC1 CNBD using NMR.38,39
To further elucidate the difference in binding affinity between I942 and MLGM-2013, saturation transfer difference (STD) experiments were performed to map the binding epitopes of MLGM-2013 and assess the proximity of ligand protons to the EPAC1-CNBD (Fig. 4F).8,41 Interestingly, we found that the STD/STR ratios for MLGM-2013 are higher for several phenyl protons compared to I942 with the most significant increase observed for the tertiary butyl protons located at the para position of the phenyl group (Fig. 4F and Fig. S4, ESI†). As opposed to the single methyl group at that location in I942, the additional methyls of the tertiary butyl offer more extensive contacts with the protein as seen through STD/STR ratios of 0.62 vs. 0.39 for MLGM-2013:EPAC1-CNBD vs. I942:EPAC1-CNBD, respectively (Fig. 4F). Based on the N275 outlier observed in Fig. 4E, we hypothesized that the enhanced contacts of the tertiary butyl in MLGM-2013 are with the PBC of the EPAC1-CNBD.
To test our hypothesis, we measured the EPAC1-CNBD compounded chemical shift changes (ΔCCS) between MLGM-2013 and MLGM-2014 which lacks any phenyl substituents, and therefore, serves as a useful reference ligand to capture the effect of the MLGM-2013 tertiary butyl para substituent (Fig. 5A). Despite the absence of phenyl substituents, MLGM-2014, previously referred to as I178,5 was shown to bind EPAC1 and result in an IC50 of ∼40 μM.5Fig. 5A reports the residue-specific MLGM-2013 vs. MLGM-2014 ΔCCS values as well as the corresponding I942 vs. MLGM-2014 ΔCCS values as a control. Although the ΔCCS values of the EPAC1 CNBD in the presence of MLGM-2013 or I942 relative to MLGM-2014 are overall similar, the most evident difference is observed in the PBC. MLGM-2013 yields a markedly higher ΔCCS, reflecting additional perturbations in that region due to the bulkier, tertiary butyl moiety at the para phenyl position. These results confirm our hypothesis that the tertiary butyl group of MLGM-2013 interacts with the PBC and that such contacts are unique of MLGM-2013, possibly explaining the enhanced affinity of MLGM-2013 relative to the parent I942 compound.
Fig. 5 (A) Residue specific compounded chemical shift variations between MLGM-2013-bound (green) or I942-bound (blue) and MLGM-2014-bound EPAC1 CNBD. Structural differences between the ligands are highlighted with corresponding color codes. (B) Vector representation of the CHESPA analysis. (C) Fractional activation X and (D) cosθ values of MLGM-2013-bound EPAC1 relative to cAMP-bound EPAC1 where values greater than 1 or less than −1 are not within the scale of the plot. (E) Fractional activation X and F) cosθ values of I942-bound EPAC1 measured under the same conditions as that of MLGM-2013-bound EPAC1. The asterisks correspond to the residues in the PBC which are more negative in the MLGM-2013-bound structure and the red asterisk marks N275, which exhibits the greatest change. The secondary structure of EPAC1-CNBD is shown in the same way as Fig. 4D. |
When the CHESPA profiles of MLGM-2013 (Fig. 5C and D) are compared to those of I942 (Fig. 5E and F), one of the most notable differences is observed for PBC residues such as A272, N275 and A277 (asterisks in Fig. 5E). These sites exhibit markedly more negative X values for MLGM-2013 than I942, suggesting a more significant shift towards the inactive state in that region compared to I942. Additionally, MLGM-2013 demonstrates a more negative average X value at the PBC region compared to I942, whereas the average X value for the hinge region is slightly less negative compared to I942 (dotted lines in Fig. 5C and E). These average X values suggest that MLGM-2013-bound EPAC samples an inactive state with PBC out, hinge out with a population of around 60%, while the population of the mixed intermediate with the PBC in, hinge out is negligible. Based on these results, MLGM-2013 promises to serve as a more potent EPAC1-CNBD modulator than I942 with a distinct inhibitory mechanism.
The QSAR model proposed here serves as an effective tool to virtually screen compound libraries for EPAC1 binding, thus aiding the identification of novel EPAC1-selective drug candidates. Such tool was not previously available given the absence of prior QSAR studies on EPAC. However, the applicability of such models highly depends on the size of the original datasets utilized to build and train the model. Such datasets should be sufficiently large to generate reliable models and additionally, the molecules should also maintain a conserved structural core for the SPP21 to hold. Nevertheless, the protocols and approaches illustrated here can be extended to other ligand databases to develop similar QSAR models for EPAC or other signaling systems, facilitating the design of novel allosteric modulators with enhanced potencies.
(1) |
RapidMiner Studio49 was used to narrow down the number of descriptors for the QSAR model by applying the forward selection method50 on the training set. The method entails sequential addition of molecular descriptors that improve the performance of the model, i.e., descriptors leading to enhanced linear regression correlations. The stopping criteria for the sequential addition are either (1) there is no improvement in model performance or (2) the maximum number of descriptors that satisfy a 5:1 ratio for number of molecules vs. number of descriptors was reached.51 The descriptors chosen were then fed into RapidMiner to generate the linear regression model, which was applied to both the training and test sets to generate a coefficient of multiple determination31 (R2) for each. R2 is calculated as:
(2) |
An additional parameter reporting on the QSAR quality, known as the root mean squared error (RMSE)31 describes the range of error in the model's predictions and is defined as:
(3) |
As an initial mean of validating the QSAR model, we relied on cross-validation (CV), which is a form of internal validation of the model's predictivity utilizing an approach called the ‘Leave-Many-Out’ (LMO)52 method. LMO holds back a portion of the training set as a small test set and applies the model without that test set. The process was repeated for 10 iterations and the squared correlation obtained was represented as an average value of the multiple iterations. The descriptor selection process and QSAR workflow outlined above were repeated for each of the eleven different training and test partitions and average statistical parameters across these partitions were computed.
The Y-randomization coefficient was additionally obtained to validate the robustness of the correlation between the descriptors and the RFI percentages and is described as:
(4) |
The compounded chemical shift differences (ΔCCS) between ligand bound EPAC1 and the apo form were calculated using the following equation:
(5) |
Samples for 1D saturation transfer difference (STD) were prepared using a 50 μM EPAC1-CNBD solution that was buffer-exchanged with a 20 mM sodium phosphate buffer containing 50 mM NaCl, pH 7.4 and 99.9% D2O. PD-10 Desalting columns (GE Healthcare) were used to facilitate the exchange by a gravity protocol. 350 μM of MLGM-2013 (final concentration) was added to 50 μM of EPAC1-CNBD and the saturation frequency in the STD experiments was set to 0.8 ppm to saturate the region of protein peaks (i.e., methyl region) that is further away from the MLGM-2013's signal. An off-resonance saturation of 30 ppm was applied to the STR experiments and the STD/STR ratios normalized to the largest value were compared to those acquired for I942.8 The spectra were referenced to DMSO (2.48 ppm) and the assignments of MLGM-2013 were obtained by comparison with previously established assignments for I942.8 The STD spectra were acquired at 298 K with a time domain of 32768 points and number of scans of 128 and 1024 for STR and STD experiments, respectively. Eight dummy scans were used for both STD and STR. The spectral width was 11.7057 ppm, and the transmitter frequency was set to 4.697 ppm.
The chemical shift projection analysis (CHESPA) was implemented according to previous protocols8,42,43 and using the NMRFAM-SPARKY plugin.44 The reference vector (A) is defined from the apo to the cAMP-bound EPAC1 state, while the perturbation vector (B) is defined from the cAMP-bound form to the I942-analog ligand-bound form. The minimum cut-off for the ΔCCS values of both vectors was set to 0.02 ppm and the cosθ and fractional activation (X) values were computed according to the following formulae:
(6) |
(7) |
cAMP | Cyclic adenosine monophosphate |
EPAC | Exchange protein activated by cAMP |
GEF | Guanine nucleotide exchange factor |
GDP | Guanosine diphosphate |
GTP | Guanosine triphosphate |
CNBD | Cyclic nucleotide binding domain |
PBC | Phosphate binding cassette |
NOE | Nuclear Overhauser effect |
BBR | Base binding region |
QSAR | Quantitative structure activity relationship |
SPP | Similarity-property principle |
RFI | Relative fluorescence intensity |
8-NBD-cAMP | 8-(2-[7-Nitro-4-benzofurazanyl] aminoethylthio) adenosine-3′,5′-cyclic monophosphate |
KD | Dissociation constant |
NMR | Nuclear magnetic resonance |
HSQC | Heteronuclear single quantum coherence |
CCS | Compounded chemical shift |
STD | Saturation transfer difference |
STR | Saturation transfer reference |
CHESPA | Chemical shift projection analysis |
CHESCA | Chemical shift covariance analysis |
Footnote |
† Electronic supplementary information (ESI) available: Fig. S1–S4 and Tables S1, S2, synthetic chemistry procedures and compound characterization data including HPLC traces of compounds MLGM-2013, MLGM-2014, MLGM-2017. (PDF) Molecular Formula Strings. (CSV). See DOI: https://doi.org/10.1039/d2cb00106c |
This journal is © The Royal Society of Chemistry 2022 |