Kiran
Kumar‡
a,
Shin M.
Woo‡
a,
Thomas
Siu
a,
Wilian A.
Cortopassi
a,
Fernanda
Duarte
*b and
Robert S.
Paton
*a
aChemistry Research Laboratory, University of Oxford, 12 Mansfield Road, Oxford OX1 3TA, UK. E-mail: robert.paton@chem.ox.ac.uk
bEaStCHEM School of Chemistry, University of Edinburgh, Joseph Black Building, David Brewster Road, Edinburgh EH9 3FJ, UK. E-mail: fernanda.duarte@ed.ac.uk
First published on 31st January 2018
We have studied the cation–π interactions of neutral aromatic ligands with the cationic amino acid residues arginine, histidine and lysine using ab initio calculations, symmetry adapted perturbation theory (SAPT), and a systematic meta-analysis of all available Protein Data Bank (PDB) X-ray structures. Quantum chemical potential energy surfaces (PES) for these interactions were obtained at the DLPNO-CCSD(T) level of theory and compared against the empirical distribution of 2012 unique protein–ligand cation–π interactions found in X-ray crystal structures. We created a workflow to extract these structures from the PDB, filtering by interaction type and residue pKa. The gas phase cation–π interaction of lysine is the strongest by more than 10 kcal mol−1, but the empirical distribution of 582 X-ray structures lies away from the minimum on the interaction PES. In contrast, 1381 structures involving arginine match the underlying calculated PES with good agreement. SAPT analysis revealed that underlying differences in the balance of electrostatic and dispersion contributions are responsible for this behavior in the context of the protein environment. The lysine–arene interaction, dominated by electrostatics, is greatly weakened by a surrounding dielectric medium and causes it to become essentially negligible in strength and without a well-defined equilibrium separation. The arginine–arene interaction involves a near equal mix of dispersion and electrostatic attraction, which is weakened to a much smaller degree by the surrounding medium. Our results account for the paucity of cation–π interactions involving lysine, even though this is a more common residue than arginine. Aromatic ligands are most likely to interact with cationic arginine residues as this interaction is stronger than for lysine in higher polarity surroundings.
Cation–π interactions are vital for several physiological processes. Notable examples include acetylcholine binding,14–17 the recognition of post-translationally modified histones by epigenetic proteins18–21 and ion selectivity in K+ channels.22 These are examples of positively charged ligands interacting favorably with the aromatic sidechains of histidine, phenylalanine, tryptophan and tyrosine residues. On the other hand, the interaction of aromatic ligands with positively charged residues – arginine, histidine, lysine – can also occur. Given the abundance of aromatic and heteroaromatic rings in the structures of drug-like molecules, this type of cation–π interaction is the focus of this paper. For example, sorafenib, approved for the treatment of liver cancer, forms a cation–π interaction with a positively charged lysine residue of the human p38 MAP kinase (Fig. 1A).23 Lapatinib, approved for breast cancer treatment, forms a cation–π interaction with a lysine residue of ErbB4 kinase (Fig. 1B).24 Dihydroquinoxalinone derivatives selectively inhibit the CBP/p300 bromodomain over the closely-related BRD4 receptor, attributed to the formation of a cation–π interaction with a positively charged arginine residue close to the active site (Fig. 1C).25–27 Predictable structure–activity relationships were established according to the electrostatic interaction strength.27
Cation–π interactions are dominated by the electrostatic attraction between an electron-rich arene and electron-deficient cation.28,29 Houk30 and Wheeler31,32 showed that ring substituents augment this interaction strength with additional electrostatic interactions between the substituents and cation. Differences in binding strength can be explained quantitatively in terms of these additive, through-space electrostatic interactions, more so than by considering any polarizing effect on the π system. Nevertheless, non-electrostatic effects such as induction feature in accurate physical descriptions of cation–π interaction energies.33,34
Analyses of the structures deposited in the PDB provide information about the occurrence and geometric characteristics of cation–π interactions in biological systems, including protein–DNA complexes,35 protein–protein interactions,8,9 metal cation–π interactions36 and cation–π–cation interactions.37 Quantifying the nature and magnitude of these interactions remains a challenge. Combined studies employing both small models and crystal structure analyses have proven useful in understanding these interactions in naturally occurring systems. The aromatic–aromatic interactions of nitroarenes with amino acid side chains of histidine, phenylalanine, tryptophan, and tyrosine have been studied in this way. Wheeler constructed CCSD(T)/aug-cc-pVTZ//B97-D/TZV(2d,2p) theoretical models and compared the computed energies and geometries against nitroarene binding sites from the PDB.38 Chipot studied model systems of intra-residue cation–π interactions between lysine/arginine and phenylalanine/tyrosine/histidine complexes at the MP2/6-311++G**//MP2/6-31G** level of theory and compared them to results obtained with 1718 cation–π protein–protein complexes found in the PDB.8
Over 40% of small molecule drugs launched in clinical trials or under development in 2009 contained a (hetero)-aromatic ring.39 Given the prevalence of aromatic rings in drug-like molecules, we reasoned there should be many crystallographically characterized cation–π interactions in which aromatic rings of ligands interact with positively-charged amino acids of protein active sites. A large collection of structures provides a statistically significant empirical dataset against which to test the relevance of theoretical model complexes. The foci of previous studies exploring the presence of cation–π interactions in biomolecular databases3 have centered on cationic ligands interacting with aromatic residues8 and intra-residue cation–π interactions inside proteins40a or at protein–protein interfaces.9 Comparative studies on aromatic ligands are restricted to nucleic acid bases.40b To the best of our knowledge, a meta-analysis of the interactions between aromatic ligands and cationic residues has not been described previously. Herein, we study the occurrence, geometrical features and magnitude of cation–π interactions between positively charged amino acid sidechains: arginine (Arg), lysine (Lys), and histidine (His), and (non-protein) ligands featuring an aromatic ring. The distance and angular dependence of these interactions has been characterized at the DLPNO-CCSD(T) level of theory and interpreted with symmetry adapted perturbation theory (SAPT). These results are compared to naturally occurring interactions between ligands and proteins found through systematic data-mining of non-redundant protein structures from the PDB.
Fig. 2 Cation–π complexes analyzed in this work. Models of (A)/(B) lysine; (C)/(D), arginine; and (E)–(H), histidine sidechains interacting with benzene. |
Based on gas phase ab initio calculations, a bidentate orientation of NH4+ above aromatic rings is favored over those with one or three N–H atoms facing the aromatic ring.41 This binding mode is most prevalent in protein inter-residue interactions.42 Guanidinium–benzene complexes can adopt at least two conformations: perpendicular (T-shaped) or parallel. While T-shaped geometries are preferred in gas-phase, parallel-complexes are preferred in solution and have been observed more frequently in protein structures.7,43 We studied the parallel configuration, based on its biological relevance.7,44 For [C6H6][Imi]+ complexes, both perpendicular and parallel arrangements are found in protein interactions.45 We included both interaction types in our analysis.
We studied the interaction energy (Eint) of each model cation–π complex as a function of intermolecular separation and of horizontal displacement parallel to the aromatic plane. Distances were calculated between the center of mass of both species in the complex. Cartesian (x,y) displacements are defined with respect to the benzene ring as shown (Fig. 3).
Potential energy curves (PECs) were generated at the domain-based local pair-natural orbital coupled cluster with perturbative triple excitations, DLPNO-CCSD(T),46 level of theory. An augmented, correlation consistent basis set, aug-cc-pVTZ, was used. The convergence of valence DZ, TZ, and QZ quality basis sets was examined for the [benzene][Na]+ complex (Fig. S1†). The aug-cc-pVTZ equilibrium separation closely matches that obtained with a larger aug-cc-pVQZ basis, with an interaction energy within 0.5 kcal mol−1. DLPNO-CCSD(T) energies achieve an accuracy of 1 kcal mol−1 or better compared to CCSD(T),47 while CCSD(T)/CBS values are generally considered benchmark values for intermolecular interaction energies.48 DLPNO-CCSD(T) thermochemistry of small organic molecules is accurate to within 3 kJ mol−1 against experimental data.49 For complexes (A), (C), (E), and (G) CCSD(T) calculations were also carried out using a dielectric constant of 4.2 (diethyl ether) and 78.4 (water).
In relation to the D6h symmetry of benzene, two vectors in the plane of the ring represent extreme scenarios of displacement – one towards a C–H bond (angle displacement) and the other towards a C–C bond (side displacement). These vectors are related by a rotation of μ = 30° about the C6 axis (Fig. 3). By plotting a potential energy curve (PEC) with vertical distance against displacement in each vector, behaviour for 12 planes (360°/30°) about the C6 axis of benzene can be achieved for a minimum of 12 geometries per complex. MP2/aug-cc-pVTZ optimized geometries of monomers were obtained imposing Td, D3h, C2v and D6h symmetry restraints for the NH4+, Gdm+, Imi+, and benzene monomers, respectively.
E int was then calculated as a difference between the energy of the complex and the sum of the energies of the optimized monomers (eqn (1)):
Eint = Ecomplex − Ecation − Eπ | (1) |
MP2 optimizations were performed with Gaussian50 and DLPNO-CCSD(T) energy calculations with ORCA.51 Computed structures and absolute energies are available as ESI (Tables S2–S9†).
The DLPNO-CCSD(T) potential energy curves were interpolated (spline interpolation in 1-d with SciPy) to obtain minimum energy separations and corresponding geometries, on which symmetry adapted perturbation theory computations (SAPT)52 were performed with PSI4.53 The aug-cc-pVDZ-JKFIT basis set was used for these calculations. Briefly, SAPT provides a rigorous partitioning of the intermonomer interaction energy (Eint) into various physical contributions. These include short-range exchange–repulsion (Eee), electrostatics (Eele, e.g. charge–charge, charge–dipole, dipole–dipole, etc.), induction polarization (Eind, e.g. dipole/induced-dipole) and London dispersion forces (Edisp, e.g. instantaneous dipole/induced dipole). This approach has been used to analyze non-covalent interactions including π–π and cation–π interactions.54 SAPT (at orders above SAPT2) gives interaction energies close to benchmark CCSD(T)/CBS values.55,56 Our results are consistent with this (Table S1†), with all SAPT2+3 interaction energies lying within ±0.6 kcal mol−1 of corresponding DLPNO-CCSD(T) values (see Fig. S5† for full comparison).
(i) X-ray crystal structures of proteins with non-covalent bound ligands at a resolution of ≤2.0 Å were retained for further analysis. This returned 30053 structures.
(ii) The Protein–Ligand Interaction Profiler (PLIP)58 was used to identify cation–π interactions by imposing geometric criteria. OpenBabel was used to identify rings by Smallest Set of Smallest Rings (SSSR) perception and assign aromaticity.59 Based on previous works,7,8 two thresholds were applied: a 6.0 Å cut-off from the centroid of the aromatic ring and a 2.3 Å horizontal cut-off in the plane of the ring (Fig. S2†).
(iii) PROPKA 3.1 (ref. 60) was used to determine the protonation state of all residues at a physiological pH of 7.4. This considers the local environmental perturbation of sidechain intrinsic pKa values. This is most pertinent for histidine residues, whose protonation state is influenced by nearby residues, although instances of neutral lysine and arginine residues were also predicted. We removed all interactions where these sidechains were predicted to be neutral. Equally, we found instances initially characterized as a π–π interaction that were reassigned as cation–π interactions once the sidechain was predicted to be protonated.
(iv) Two sources of duplicate records were considered and removed; homologous protein chains within the same protein structure, and polycyclic aromatic ligands that were initially counted as two or more distinct interactions by satisfaction of the geometric cutoffs. To address this, only the first homologous chain containing the cation–π interaction was retained. In the case of polycyclic aromatic ligand interactions, these were further classified as either ‘fused’, ‘sandwich’, or ‘bridge’ complexes (Fig. S3†). For fused complexes the shortest distance was used and described as one cation–π complex, while sandwich and bridge complexes containing n aromatic rings were treated as n occurrences. Following this automated workflow, we obtained a total of 1827 unique protein structures, with 2012 cation–π interactions (Table 1).
Stage | Filter | PDBs | His | Arg | Lys | Total |
---|---|---|---|---|---|---|
a | 2.0 ≥ Å resolution, containing ligand | 30053 | — | — | — | 30053 |
b | Geometric thresholds | 3248 | 4959 | 3141 | 930 | 9030 |
c | Residue pKa >7.4 | 1827 | 68 | 2954 | 848 | 3870 |
d | Non-redundant chain and polycyclic ligand | 1827 | 49 | 1381 | 582 | 2012 |
The strongest interaction energies occur for complexes in which N–H bonds are oriented towards the aromatic ring: for model complexes (A)/(B) of lysine (−19.2 & −18.7 kcal mol−1) and the T-shaped histidine complex G (−14.0 kcal mol−1). In the other T-shaped complex a C–H bond is oriented towards the ring, giving a smaller interaction energy of −11.5 kcal mol−1. Interaction energies for the stacked complexes, arginine (C)/(D) (−7.5 & −7.4 kcal mol−1) and parallel histidine (E)/(F) (−7.7 & −7.7 kcal mol−1) are weaker still. The more strongly interacting complexes ((A), (B), (G) and (H)) have shorter equilibrium intermolecular separations around 2.9–3.0 Å. Correspondingly, the Non-Covalent Interaction (NCI) isosurfaces61 (Fig. 4) show focused, strongly attractive regions due to the polar X–H–π contacts and more diffuse, weakly attractive regions associated with the stacked systems. Molecular van der Waals surfaces using Bondi radii (Fig. S4†) show that X–H⋯π contacts occur at distances less than the sum of atomic van der Waals radii, whereas the stacked complexes are separated by interatomic distances slightly longer than the sum of these radii.
These interactions depend negligibly on the orientation of each complex. Very similar interaction energy profiles (within 0.5 kcal mol−1) were obtained for the two rotamers of each of the lysine, arginine, and parallel histidine complexes. For the T-shaped [C6H6][Imi]+ complexes, there is a more pronounced difference, depending on whether an N–H or C–H bond is oriented towards the arene. An N–H⋯π interaction is 2.5 kcal mol−1 (complex (F)) is more stable than C–H⋯π interaction (complex (G)) and gives a shorter equilibrium separation by 0.25 Å. This is due to the increased polarity of the N–H bond.62
Comparison with the available gas-phase experimental thermochemistry is promising. Our DLPNO-CCSD(T) interaction energy for [C6H6][NH4]+ of −19.2 kcal mol−1 matches the experimental ΔH° (380 K) of −19.3 ± 1.0 kcal mol−1,41b as well as [C6H6][K]+, being −19 kcal mol−1.54 In order to further rationalize these quantitative results, SAPT2+3/aug-cc-pVDZ analysis was used to partition the interaction energy (Eint) into various physical contributions (exchange repulsion Eee, induction Eind, dispersion Edisp, and electrostatic Eele) (Fig. 5) at the equilibrium separation of each complex. In line with previous studies,56 the SAPT2+3 Eint methods gave values within ±0.6 kcal mol−1 of the DLPNO-CCSD(T) energies (Fig. S5†).
The SAPT2+3 decomposition shows that [C6H6][NH4]+ complexes (A)/(B) are very different from the others. The dominant favorable terms are electrostatic and induction (polarization). In the remaining complexes the terms are more evenly balanced: for T-shape [C6H6][Imi]+ complexes (G)/(H) the electrostatic term is the most favorable, whereas for [C6H6][Gdm]+ complexes (C)/(D) and parallel [C6H6][Imi]+ complexes (E)/(F) the dispersion term is most favorable. Hobza has proposed that noncovalent complexes can be classified based on the ratio of SAPT Edisp/Eelec terms.63 Empirical observation suggests that dispersion/electrostatics ratio less than 0.59 should be categorized as electrostatic; greater than 1.7 (1/0.59) as dispersion bound; between 0.59 and 1.7 as mixed. Adopting this convention, [C6H6][NH4]+ complexes (A) and (B) are electrostatics dominated (Edisp/Eelec = 0.49–0.50), whereas all others are mixed (Edisp/Eelec = 0.82–0.83 for complexes (G) and (H); Edisp/Eelec = 1.13–1.27 for complexes (C)–(F)).
Finally, the large energy difference between (H) and (G), with the former being 2.5 kcal mol−1 more stable, arises from favorable Eee, Eind, which operate cooperatively. In this case, the N–H bond closest to the ring is more polar, and therefore induces both a stronger electrostatic attraction and a greater polarization response from the π system. This brings [Imi]+ closer to the ring of the benzene by 0.18 Å, which increases both Eee and, to a lesser extent, Edisp. The favorable interactions from this closer contact negate the increase in repulsive Eee.
In addition to the magnitude of the interaction strengths, SAPT analysis illustrates a continuum of interaction type. The [C6H6][NH4]+ cation–π interactions are dominated by the favorable electrostatic attraction, whereas the other complexes feature a balanced mixture of electrostatic and dispersion interactions. Within this mixed category, the electrostatic term is the most favorable in the [C6H6][Gdm]+ T-shape complexed, and is 3–4 kcal mol−1 larger in magnitude than those experienced by [C6H6][Gdm]+ and [C6H6][Imi]+ stacked complexes. The stacked complexes have the smallest electrostatic terms and dispersion is the most favorable attractive term. The distinct nature of this interaction has been additionally classified as a π+–π interaction.64
For arginine, the empirical distribution from X-ray and computed PES agree well. The 1381 geometries are clustered empirically in the minimum energy well of the PES, with a vertical offset of 3.5 Å. Additionally, interplanar angles between the guanidium and aromatic group are predominantly clustered below 30° – i.e. closer to a parallel stacked geometry, as considered by our model calculations, than to a T-shaped conformation (Fig. S6†). The computed PES is relatively flat with respect to horizontal displacement and accordingly the interactions are evenly spread across this range. Additionally, most of the interactions with secondary rings of bicyclic aromatics are found within the potential well. Much fewer (59) X-ray structures were available for cationic His-aromatic complexes, making statistical inference more difficult. 70% of the complexes are found close to the minimum energy well centered at Rx,y = 1.25, Rz = 3.25. The computed well for lysine is deeper and narrower, however, the empirical distribution of these interactions is scattered widely (Fig. 6). Unlike arginine, the empirical distribution of lysine's cation–π interactions is more scattered than predicted by theory. A large cluster of interactions occurs at Rx,y = 2.0, Rz = 4.0, far from the PES minimum at (0.0, 3.0).
Why do the X-ray structures of cationic arginine–arene interactions match theory well, whereas cationic lysine–arene interactions do not? Recalling our SAPT results, dispersion interactions prevail for arginine while for lysine, electrostatics interactions dominate. This led us to consider how each interaction type is influenced by the surrounding medium. We compared the empirical distance dependence of each interaction type against the DLPNO-CCSD(T) energy profile calculated in the gas-phase, and with a surrounding conductor-like polarizable continuum model (CPCM)65 with dielectric constant of 4.2 and 78.4 (Fig. 7). These values reflect the average polarity of the relatively hydrophobic protein interior and that of bulk water.66 Solvation corrections were computed at the MP2/cc-pVTZ level of theory.
Each cation–π interaction is weakened by the presence of a surrounding dielectric medium. For lysine, the interaction strength decreases to 19% of its gas-phase value (to 3.6 kcal mol−1) for ε = 4.2, and 7% (1.3 kcal mol−1) for ε = 78.4. For arginine, the decrease is less pronounced at these values of dielectric constant, to 47% (3.2 kcal mol−1) and 34% (2.3 kcal mol−1) – in water this interaction is stronger than lysine's even though it is less favorable by 11.5 kcal mol−1 in the gas-phase. With the SMD solvation model,66 arginine's cation–π interaction is also stronger than lysine's in water, although the interaction strengths were greater. A previous computational estimate of the lysine–benzene interaction strength in water is larger (5.5 kcal mol−1 with SM5.42R/HF/6-31+G*)7b compared to the values of 1.3/2.8 kcal mol−1 (CPCM/SMD) we have obtained with correlated wavefunction theory and a larger basis set.
The scattered distribution of lysine–arene interactions found empirically is consistent with a small interaction strength. This also explains the relative paucity of this interaction type, in relation to the abundance of lysine residues in proteins. In contrast, the arginine–arene interaction strength is less strongly influenced by the presence of a surrounding polar medium and a clear minimum remains on the PEC. The high frequency of this interaction type and the empirical distribution is consistent with the position of this minimum energy separation. The distinct behavior of lysine's and arginine's interactions results from the predominantly electrostatic character of the former interaction and mixed character of the latter, as shown by SAPT analysis. The cation–π interactions of T-shaped histidine are weakened to 21% of the gas-phase value (3.0 kcal mol−1) for ε = 4.2 and 9% (1.3 kcal mol−1) for ε = 78.4. For stacked histidine the reduction is smaller, to 44% (3.4 kcal mol−1) and 32% (2.5 kcal mol−1) at these values of dielectric constant. Again, the interaction with a greater electrostatic character (by SAPT) is diminished more severely by the surrounding dielectric medium.
We considered as large a data set possible, but the interactions we found are still biased by the proteins and ligands studied experimentally and crystallized. Few lysine–arene interactions were found empirically above the center of the aromatic ring, although the PES minimum is found at Rx,y = 0. Fig. 8 illustrates how several intermolecular interactions influence ligand position, such that the cation–π interaction itself between lysine–arene ligands is not a determinant of this pose. For example, a high proportion (63%) of the lysine–arene interactions found around Rx,y, Rz ≈ (2.0, 4.0) involve GTP/ATP/NAD/FAD related ligands. In such cases, the ligand's binding position is also influenced by a hydrogen bond interaction involving the ribose ester and the interactions between the phosphate group and the surroundings (Fig. 8A). It is interesting to notice that even when the systems involving these cofactors are removed from the analysis (Fig. S7†), the scattered nature of Lys-aromatic complexes is still evident. This is more likely due to additional interactions, such as salt bridges or hydrogen bonds, that Lysine can establish with nearby residues and solvent molecules. In the case of arginine–arene complexes, some long distance (6 Å) cation–π interactions are observed as a result of a π–π interaction with another aromatic residue (Fig. 8B). Cation–π interactions may also be shortened due to cooperative effects, such as those which occur when an N–H aromatic interaction between Arg101 and His98A inductively enhances the positive charge on arginine, thereby strengthening the electrostatic components of the cation–π interaction and salt bridge (Fig. 8C).
The gas phase cation–π interaction of lysine is significantly stronger than arginine by more than 10 kcal mol−1. However, the distribution of 582 empirical structures is scattered away from the minimum on the interaction PES, with a much longer intermolecular separation. In contrast, the 1381 structures involving arginine (69% of the total found) reflect the underlying calculated PES well. This reflects the different electrostatic contributions to each of these residues' interactions with an aromatic ring, which was established by SAPT analysis. The electrostatics dominated lysine–arene interaction is greatly diminished by a surrounding dielectric medium, such that it becomes essentially negligible in strength and without a well-defined equilibrium separation. The arginine–arene interaction involves a near equal mix of dispersion and electrostatic attraction, which is weakened to a much smaller degree by the surrounding medium.
Our results account for the relative paucity of cation–π interactions involving lysine, despite its prevalence in protein structures. Particularly in protein active sites of medium to high polarity, such as those which are solvent exposed, the lysine–arene interaction is predicted to be weakened to the point that it has a negligible effect on an aromatic ligand binding mode. In contrast, the cation–π interaction made by arginine residues with aromatic ligands is more robust to changes in the surrounding environment. This interaction is the most frequent found empirically and is also computed to be stronger than for lysine in higher polarity surroundings. There are relatively few cation–π interactions involving positively charged histidine residues, although the stacked π+–π interaction is predicted to be of similar magnitude to that of arginine.
Systematic analysis of crystal structures showed that other factors, such as competitive hydrogen bonding interactions and solvent accessibility, unconnected to the cation–π interaction may also affect the cation–π interaction geometry. This investigation has characterized the intrinsic properties of biologically relevant cation–π interactions and the effect of the protein environment in an average way using a surrounding polarizable dielectric medium. Future work will consider the inhomogeneous nature of the local environment on these interactions.67
Footnotes |
† Electronic supplementary information (ESI) available: DLPNO-CCSD(T)/aug-cc-pVnZ benchmarks; PLIP geometric criteria; polycyclic aromatic ring visualizations; van der Waals surfaces of model complexes; absolute energies from DLPNO-CCSD(T)/aug-cc-pVTZ and SAPT2+3/aug-cc-pVDZ calculations; Cartesian coordinates; Table of PDB IDs and corresponding cation–π interactions identified (.xls). See DOI: 10.1039/c7sc04905f |
‡ These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2018 |