Jon I.
Quintana†
a,
Unai
Atxabal†
a,
Luca
Unione
*ab,
Ana
Ardá
*ab and
Jesús
Jiménez-Barbero
*abcd
aCICbioGUNE, Basque Research & Technology Alliance (BRTA), Bizkaia Technology Park, Building 800, 48160 Derio, Bizkaia, Spain. E-mail: aarda@cicbiogune.es; jjbarbero@cicbiogune.es
bIkerbasque, Basque Foundation for Science, Plaza Euskadi 5, 48009 Bilbao, Bizkaia, Spain
cDepartment of Organic Chemistry, II Faculty of Science and Technology, EHU-UPV, 48940 Leioa, Spain
dCentro de Investigación Biomédica En Red de Enfermedades Respiratorias, Madrid, Spain
First published on 8th February 2023
Nuclear Magnetic Resonance (NMR) has been widely employed to assess diverse features of glycan–protein molecular recognition events. Different types of qualitative and quantitative information at different degrees of resolution and complexity can be extracted from the proper application of the available NMR-techniques. In fact, affinity, structural, kinetic, conformational, and dynamic characteristics of the binding process are available. Nevertheless, except in particular cases, the affinity of lectin-sugar interactions is weak, mostly at the low mM range. This feature is overcome in biological processes by using multivalency, thus augmenting the strength of the binding. However, the application of NMR methods to monitor multivalent lectin–glycan interactions is intrinsically challenging. It is well known that when large macromolecular complexes are formed, the NMR signals disappear from the NMR spectrum, due to the existence of fast transverse relaxation, related to the large size and exchange features. Indeed, at the heart of the molecular recognition event, the associated free-bound chemical exchange process for both partners takes place in a particular timescale. Thus, these factors have to be considered and overcome. In this review article, we have distinguished, in a subjective manner, the existence of multivalent presentations in the glycan or in the lectin. From the glycan perspective, we have also considered whether multiple epitopes of a given ligand are presented in the same linear chain of a saccharide (i.e., poly-LacNAc oligosaccharides) or decorating different arms of a multiantennae scaffold, either natural (as in multiantennae N-glycans) or synthetic (of dendrimer or polymer nature). From the lectin perspective, the presence of an individual binding site at every monomer of a multimeric lectin may also have key consequences for the binding event at different levels of complexity.
The interactions mediated through carbohydrates occur in multiple ways and with different kinds of entities. Glycans can interact between themselves, for instance, in cell–cell recognition.1 However, the most studied and relevant systems involve protein–carbohydrate interactions. These interactions are essential for cell adhesion,2 signalling events,3 host-pathogen interactions,4 cancer development,5 and many more.
Carbohydrates are also present in pathogens like virus, bacteria, parasites, or fungi. The glycans on the surface of these entities are often the first interface with the host cell. As a result, targeting those carbohydrates can be useful to avoid infection. As many viruses display a dense coat of glycans, another approach that has been proposed to battle pathogens is the use of glycan binding proteins.6,7 Fittingly, several glycan binding proteins have shown the ability of neutralising various viruses, including HIV, and therefore, these proteins are in the pipeline to be used to treat and prevent infections.
A relevant feature to take into consideration is the conformation of the carbohydrates. As stated before, glycans are rather flexible molecules. This feature can be, on many occasions, detrimental for the binding, due to high entropic penalties. Indeed, in certain systems, the glycan might need a specific conformation and presentation for the binding event to take place, which can result in high entropic penalties in flexible molecules. Thus, the design of synthetic molecules (glycomimetics) which already are preorganized for the binding may be a proper strategy to consider to target a biologically relevant sugar-binding protein.8
Among the entities that interact with glycans, lectins are sugar binding proteins with no catalytic function (they are not enzymes) and do not provide a direct immune response (they are not antibodies). There are fourteen different types or group of lectins in the animal kingdom.9 Three of the most relevant lectin families found in humans are C-type, I-type and S-type lectins. C-type (Ca2+-dependent) lectins are found both as transmembrane and as soluble proteins. Some of the lectins of this family, like DC-SIGN, langerin, and MGL have key roles in pathogen recognition and have become targets in the field of drug discovery.10
Within I-type lectins, the study of sialic acid binding immunoglobulin-type lectins (Siglecs) is nowadays a topic of major interest.11 The Siglec family of transmembrane lectins is comprised by 15 members, which contain an N-terminal domain that recognizes sialic acids.12 All siglecs (except for Siglec-4 and 6) are expressed in immune cells and help our immune system to distinguishing between self and non-self signals.
Galectins (earlier dubbed as S-type lectins) are β-galactoside binding proteins. This family is formed by 16 members, which are found ubiquitously in the human body.13 These lectins are expressed in the cytoplasm and then secreted through a non-classical pathway or transported to the nucleus. Through their ability of binding β-galactosides, galectins participate in cell–cell interactions, and they are also involved in immune responses, inflammation and signalling events, among many others. Due to their involvement in several diseases, galectins have been targeted for inhibitors development, for which different approaches have been used. Glycomimetics have been employed,14 ranging from monovalent molecules with multiple chemical decorations to relatively simple molecules displayed along multivalent scaffolds. One of the main drawbacks in developing mimetics for galectins is the difficulty of designing a molecule that is specific for just one galectin. However, there are various promising molecules that are fairly specific for Gal-1,15 Gal-3,16 and Gal-8.17
Given all these structural and dynamic features, the affinity of most individual protein–carbohydrate interactions are rather weak (KD values in the mM–μM range).18 In biological systems, however, this low-affinity binding is usually overcome through the engagement of simultaneous synergic interactions between the receptor and the ligand, a phenomenon known as multivalency.19 Therefore, multivalency is commonly used by nature to enhance the innately weak interactions occurring during carbohydrate–lectin molecular recognition.
There are diverse ways in which multivalent presentations can enhance affinity: chelation, subsite binding, statistical rebinding, steric stabilization and clustering effects.20–22 However, in the development of multivalent ligands there are various factors that should be taken into consideration. Firstly, the nature of the scaffold. Rigidity is a crucial feature to consider, since it is directly related to entropy. Flexible linkers may display a large entropic penalty upon binding. However, flexibility can also be advantageous as it might adopt the proper conformation for favourable interactions to take place.23,24 The chemical nature of the linker is also relevant, since it might establish additional interactions with secondary binding sites and therefore, improve the affinity.25 Undoubtedly, the choice of the ligand is a key factor in the outcome, as well as its effective concentration within the scaffold. Usually, the higher the effective concentration of the ligand is, the higher the affinity. However, at high concentrations of the ligand, steric clashes may take place and the effectiveness of the approach decreases. Usually, the higher the concentration of the ligand is, the higher the affinity. However, at high concentration of the ligand, steric clashes may take place and the effectivity of the approach decreases.
Other biophysical techniques, as surface plasmon resonance (SPR) or more recently, biolayer interferometry (BLI) have also been widely employed to monitor protein–carbohydrate interactions.32,33 These techniques may yield the thermodynamics, kinetics and binding energy of the interaction and can be performed without any type of labelling. Another powerful label-free technique is isothermal titration calorimetry (ITC), which enables obtaining the thermodynamic profile of the binding event.34 These methods, which are extremely useful, provide key energy and thermodynamic data, but no direct information on the epitope and the paratope of the binding is obtained.
The NMR methods employed to analyse interactions are classified into two groups. In ligand-based methods, changes in the NMR signals of the ligand (the glycan, herein) are observed, whereas in receptor-based methods changes in the NMR signals of the macromolecule (the lectin, herein) are monitored.38
For a small or medium size glycans, as depicted in Fig. 1, their motional properties (fast Brownian motion, slow relaxation, fast diffusion, and positive NOE values) are different to those of the receptor (slow Brownian motion, fast relaxation, slow diffusion, and negative NOE values). However, when the glycan binds to the protein, its rotational motion properties change, and are similar to those of the large macromolecule.
Fig. 1 Motional properties of a receptor, a ligand, and the corresponding binary complex. The structures are taken from the PDB code 4YM0. |
Selection of the mixing time of the NOESY is also very relevant, as exemplified by Weimar and Peters when studying the interaction of α-Fuc-(1-6)-β-GlcNAc-OMe with Aleuria aurantia agglutinin.42 Whereas for the free disaccharide, NOEs are positive and small, for the complex, NOEs are negative and ca. ten times higher in absolute values (See Fig. 3). Transferred NOESY (trNOESY) experiments are usually performed with short mixing times (ca. 100 ms).43 Under these conditions, the contribution of the free ligand is almost negligible, meaning that the obtained information basically describes the conformation of the bound ligand.
Fig. 3 Dependence of the NOE on the mixing time and the diverse molecular motion regimes for the bound and free states. NOEs for the free ligand are displayed as blue dots. Transferred NOEs are displayed as orange dots. Adapted from ref. 42. |
One of the drawbacks of trNOESY experiments is the existence of spin-diffusion. Nevertheless, this effect can be quantitatively taken into account through full relaxation matrix calculations44 and moreover, can also be distinguished in ROESY experiments, which give rise to positive cross peaks for directly related proton pairs, the 3-spin-diffusion mediated signals are negative (Fig. 2).45 Additionally, the existence of chemical exchange events between the free and bound species can also be assessed through ROESY experiments, since chemical exchange peaks are always negative (Fig. 2).46
In STD-NMR, two different spectra are recorded. In the first one, the reference, a selective saturation is applied in a region devoid of any NMR signal, usually 100 ppm. A second spectrum is recorded, in which only protons of the protein are selectively saturated (on-resonance spectrum). As the protein is saturated, the magnetization is rapidly spread throughout the polypeptide chain protons. Fittingly, if a given ligand binds to the saturated protein, the ligand will also receive this magnetization (Fig. 4). As a result, the signals of those protons that are closer in space to the protein will suffer a decrease in their intensities. The subtraction of the on-resonance spectrum to the off-resonance one will result in a spectrum in which only the signals of those protons that are close to the protein will be present. This difference spectrum, defined as STD-NMR spectrum, contains the binding epitope of the ligand.
The difference between the off-resonance (I0) and on-resonance (Ion) intensities is the STD (ISTD) intensity. Usually, the intensity for each proton (ISTD,i) is shown as the relative value compared to the proton that displays the most intense STD (ISTD,max).
One of the drawbacks of STD-NMR is the overlap of numerous signals, especially for carbohydrates. Most of their 1H-NMR signals appear between 4.5–3.5 ppm. In this narrow chemical shift range, many signals overlap, which hinders the precise analysis of the binding epitope. 2-D STD-NMR experiments have been developed in which the second dimension provides significant enhancement in the spectral dispersion. For instance, the synthesis of 13C labelled carbohydrates or carbohydrates with fluorine tags has enabled performing STD-HSQC and STD-TOCSY experiments.61,62
Fig. 5 DOSY spectra of a free ligand (left) and of the ligand in the presence of its receptor (right). |
Indeed, the use of 13C (usually 13C-labelled glucose) or 15N (usually 15NH4Cl) containing precursors allows the introduction of 13C and/or 15N atoms into recombinant proteins. The presence of 15N nuclei in proteins permits obtaining 1H–15N correlations in which every proton attached to a 15N provides a cross peak.65
Two different types of experiments can be performed: 1H,15N-HSQC (Heteronuclear Single Quantum Coherence) for small single domain proteins and 1H,15N-TROSY (Transverse relaxation optimized spectroscopy) variant for the larger proteins.66
Since these spectra reflect the unique structure of each protein, they are considered a fingerprint, and any modification on the protein, such as ligand binding, can be monitored in the spectrum.
This sensitivity of the chemical shift towards changes in the chemical environment of the nuclei can be exploited to monitor binding events.67 When a ligand binds to a protein, the nearby protein nuclei suffer changes in their chemical shifts, defined as Chemical Shift Perturbations (CSP), which can be applied to monitor the binding events (Fig. 6). The equilibrium of the system, i.e. the concentration of the protein (P), ligand (L), and protein–ligand complex (PL) is defined by the following equations:
Having explained these NMR methods, it is worth mentioning that their application to study multivalent effects is intrinsically challenging. It is well known that when large macromolecular complexes are formed, the NMR signals disappear from the NMR spectrum, due to the existence of fast transverse relaxation, related to the large size and exchange features. Indeed, at the heart of the molecular recognition event, the associated free-bound chemical exchange process for both partners takes place in a particular timescale. Depending on the time frame of the exchange, lines may be sharp, broaden or even disappear due to the existence of a fast transverse relaxation. This fact is also usually associated to the binding affinity. For tight binding, the exchange rate will be slow, and provided that the generated supramolecule is very large, the NMR peaks will be very broad and will not be detected. Therefore, other approaches different than the direct detection should be employed. Usually, competition experiments with small (labelled) and well-defined binders are employed to unravel details of the binding event. Moreover, precipitation of the formed complexes in solution may also take place. Therefore, under these circumstances, only partial information can be usually extracted, for the individual components or for specific cases. Therefore, experiments with the components of the multivalent entity (either glycan or protein monomer) are also performed to try later to extend the findings achieved with this reductionistic approach to the whole system. Obviously, this has pros and cons. Moreover, in case that the NMR approach provides some information, it is evident that within any multivalent architecture, there are several “monomers” that are repeated. Given the features of NMR spectroscopy, these monomers cannot be directly distinguished, since their chemical environment is identical and will provide identical chemical shifts. Some methodologies to circumvent this initial problem are given below (specific isotope labelling, paramagnetic NMR).
Using a natural polysaccharide backbone to provide the multivalent presentation of the interacting epitopes,69 a dextran skeleton decorated with LacNAc (Fig. 7) moieties has been employed to target human galectin-3 (hGal3). The multivalent presentation of the epitopes in the dextran backbone was achieved through propargylation and bioconjugation with lactose, and using maltose and mannobiose as controls.70 Binding studies of the multivalent conjugates were performed from the receptor's perspective by using 1H,15N-HSQC based titrations. Despite the multivalent presentation, KDs in the medium–high μM range were measured, as those described for free lactose.71,72 The lack of affinity enhancement suggests that the interactions do not cooperatively provide added value to the interaction. Nevertheless, chemical-shift perturbations were also observed for cross peaks belonging to residues that are not located at the canonical galactose binding site of hGal3, but at the opposite face. Curiously, this locus had been previously reported to interact with β-mannans, as described below.72
Fig. 7 Chemical structure of lactose substituted propargylated dextran molecules employed for targeting hGal3. |
Alternatively, a series of HPMA-based glycopolymers (Fig. 8) bearing different LacNAc contents were designed and tested through ELISA assays.73
Fig. 8 Glycopolymers designed for targeting hGal-1 and hGal-3.73 For each scaffold 5 polymers were synthesized with different carbohydrate contents. From left to right polymers 7a, 7b, and 7c. |
The selectivity of the designed multimeric compounds for hGal-1 versus hGal-3 was remarkable. Herein, we should mention that Gal-1 is a non-covalent homodimer74 while the CRD of Gal-3 is a monomer.75 The phenomenon has also been investigated by NMR,76 from both the ligand and receptor's perspectives. STD-NMR experiments highlighted that the recognized epitope for the monomer presentations of di- and tri-LacNAc moieties is that of the LacNAc (ligand 1) moiety.
Then, the binding events were followed from the protein perspective by 1H,15N-HSQC titrations, with 15N-labeled Gal-1 and Gal-3 (Fig. 9). The observed chemical shift perturbations (CSP) generated by the di- and tri-LacNAc monomers again matched those induced by simple LacNAc, although a clear decrease in the cross-peak intensities in the HSQC spectra were observed, which did not occur when LacNAc was used.
Fig. 9 Schematic representation of 1H,15N-HSQC spectra of human galectin-1 (on the top) and galectin-3 (below) with the monomer of ligands 7b (left) and ligand 7c (right) adapted from Bertuzzi et al.72 The 1H,15N-HSQC spectra of both apo Gal-1 (top) and Gal-3 (below) are colored in blue. Top: Upon the addition of the ligands to the Gal-1 (red spectra), a significant reduction of signal intensities is observed, especially in the presence of the monomer of ligand 7c. Bottom: Signal reductions re observed when ligands were added to Gal-3 (green spectra), being more notorious at the right-hand side. |
It is tempting to guess that statistical rebinding events taking place when the di- or tri-valent ligands are employed, producing the increase of the transverse relaxation rate of the protein nuclei, further enhanced by the free-bound exchange process. Thus, the HSQC cross-peak intensities are decreased. Fittingly, the decrease was much larger in the presence of Gal-1 than with Gal-3. They were also larger for the tri-LacNAc than for the di-LacNAc analogue.
Receptor-based 1H,15N-HSQC experiments were employed to monitor the binding of the glycopolymers to both galectins. However, the addition of just small amounts of the polymers triggered the disappearance of the lectins’ cross-peaks. However, the addition of several equivalents of LacNAc to the NMR tubes containing the lectin/polymer mixtures permitted the recovery of the HSQC signals of the 15N-labeled lectin. Thus, the molar equivalents of LacNAc required to recover the cross peaks were a marker of the relative binding affinities for the different partners. The analysis of the results permitted concluding that the glycopolymer with just one LacNAc entity per monomer showed the largest potency versus Gal-1 per active LacNAc moiety, while no selectivity among the three glycopolymers was deduced for Gal-3. Dynamic light scattering and cryo-electron microscopy experiments allowed deducing the existence of supramolecular and cross-linked entities, further supporting the NMR results.76
The interaction of natural ligands versus galectins has also been studied by NMR. Typical multivalent presentations can be found in polysaccharides, multiantennae N-glycans, or polyLacNAc chains. Regarding polysaccharides, it is obvious that they may display multiple repeating units of the same oligosaccharide unit, which may provide numerous contact points to the partner lectins. As example of these interactions, the interaction of Davanat (Fig. 10), a galactomannan (GM), composed of β1–4-linked D-mannopyranosyl units periodically decorated with Galα1–6 moieties (59 kDa average molecular weight) to the Gal-1 homodimer has been studied.77,78 The use of 1H,15N-HSQC experiments allowed to distinguish an alternative binding site for long galactomannans, other than the canonical β-galactoside-binding region. This report evidenced the possibility of the existence of simulataneous binding sites for galectins. The existence of a non-canonical binding site was demonstrated by the fact that simple lactose is indeed able to bind the preformed Gal-1/Davanat complex. Moreover, DOSY experiments also assessed that Gal-1 binding alters Davanat conformation, likely perturbing the putative glycan–glycan interactions that take place between the Davanat saccharide chains. Alternatively, a detailed characterization of the recognition phenomena at the canonical and alternative sites was achieved by using two small galactomannans as models.
In a publication from the same research group, the binding of GMs, with diverse Gal/Man molar ratios, to Gal-3 was also scrutinized. In fact, following the same methodology, it was observed that the intensities of the HSQC cross peaks from the carbohydrate recognition domain (CRD) and N-terminal domain (NTD) of galectin-3 were differentially affected, showing diverse degrees of broadening.72
Addition of simple lactose to the NMR tube containing the mixture of Gal-3 bound to GM partially recovered the intensities, strongly suggesting that the binding of lactose at the β-Gal site competes with Gal-3 binding to GM, but that there are additional binding events. Indeed, a fraction of GM still interacts with Gal-3 as shown by the still observable cross peak broadening. The effective KD was estimated in the low micromolar range (per Gal-3 binding site), one order of magnitude stronger than that for LacNAc, thus assessing the existence of multiple binding events at different sites, but also suggesting the existence of statistical rebinding processes. Intriguingly, although the effective binding affinity depended on the Gal/Man ratio, no clear structural explanation at the supramolecular level could be deduced.
Nature also provides spectacular multiantennae N-glycan structures that can interact with their receptor lectins. The experimental demonstration of the type of multivalent effects that these molecules may display, mediated by clustering, cross-linking, or statistical rebinding processes remains a challenge. NMR has been also applied to try to approach this scientific problem. In particular, one alternative approach to monitor the interaction of multiantennae glycans to lectins is the use of paramagnetic nuclei.79 It is well known that the presence of a paramagnetic lanthanide nucleus (Fig. 11A and B) attached to a lanthanide-bind tag linked to the reducing end of N-glycans provides80 pseudocontact shifts (PCS) that are proportional to the distance between the sugar nuclei and the metal (1/r3). In this manner, it is possible to distinguish the NMR resonance signals of the nuclei belonging to sugars at equivalent positions in the different arms, to estimate their specific distances to the lanthanide, and therefore to decipher the conformational features of the glycan.81 Thus, this methodology makes possible to distinguish between signals that in the presence of a diamagnetic metal are overlapped, such as those belonging to the same residues in the different arms of multiantennae N-glycans.80–82 Moreover, since cross peaks for every particular monosaccharide moiety are identified, 1H–13C HSQC experiments of the N-glycan, decorated with the lanthanide-binding tag, recorded in the presence and in the absence of Ricinus communis agglutinin and Datura stramonium lectins (Fig. 11C and D) allowed observing differential signal intensity decrease for each anomeric peak of the diverse Gal and GlcNAc units, thus revealing the preferences of these lectins for each arm of the N-glycan.81
Fig. 11 (A) Overlapped HSQC of the tetra-antennary glycan in the presence of diamagnetic metal. (B) HSQC of the tetra-antennary glycan with a paramagnetic metal. (C) HSQC of the previous molecule in the presence of a lectin. (D) Structure of the tetra-antennary N-glycan and the name of each branch. (E) Plot of the difference in intensity between spectra (C and D). Adapted from Canales et al.81 |
A similar strategy has been applied to study the interaction between a bivalent sialylated N-glycan and the hemagglutinin from the strain HK/68 of the influenza virus.83 The Neu5Acα2-6Gal units are recognized by the hemagglutinin on the surface of the virus, enabling the attachment of the virus to the host cell through binding to this epitope. Interestingly, microarray and infection studies have postulated that H3N2 human influenza viruses have evolved to recognize Neu5Acα2-6Gal at the non-reducing end of long polyLacNAc structures, whereas the early viruses preferred these terminal epitopes with shorter LacNAc structures.84 In this context, the paramagnetic NMR approach was applied to the study of the interaction of two biantennae N-glycans with either one or two LacNAc units capped with α2,6 linked sialic acids. First, the conformational features of the glycans were determined by the PCS analysis, while the combination of the observation and analysis of the PCS, together with STD-NMR in the presence of HK-68 allowed to deduce the existence of interaction of both sialic acids at the two arms of the N-glycan located far away from the Dy3+ (over 30 Å) present PCS. Intriguingly, in the STD-NMR experiments with Dy3+, STD signals arising from both sialic acids are present, showing that both participate in the binding, a feature that had not been proved before.
Despite the proven efficacy of this methodology, it involves chemically modifying the ligand, meaning that a non-natural modification needs to be introduced, which could affect to the properties of the binding. An alternative to the inclusion of a paramagnetic nucleus in the epitope is the labelling with 13C fragments of the molecule. In the context of protein–carbohydrate interactions, Moure et al. shed light into the interaction between several galectins (β-galactoside binding lectins) and polylactosamine molecules.61 The molecules analysed in this work consist on hexasaccharides containing three repeating units of LacNAc (tri-LacNAc), for which the galactose units were labelled with 13C. The use of 13C enables using STD-HSQC experiments with high sensitivity and no protein background, which profits a higher signal dispersion compared to traditional STD.85 This approach enables having a broad chemical shift dispersion as in the one with the paramagnetic tag, however, the advantage of 13C labelling is that the modification is minimal, and the ligand is identical to the natural structure.
Despite having a broader signal dispersion due to the capability of working in 2D, spectral overlap is still a bottleneck on many occasions. In the case of the tri-LacNAc, the internal galactose and the one in the reducing-end are isochronous, and thus cannot be differentiated. A solution to this issue is that of synthesizing molecules with the same structure but in which the labelled residue is different (Fig. 12). This approach was previously applied to decipher the conformational features of a series of selectively labelled linear β1–6 linked glucose hexasaccharides.86 The design of selective labelling of galactoses proved to be very useful to characterize the binding to galectins. The interaction with five different galectins was analysed, each with each own preferences towards the three epitopes. The case of galectin-7 is very representative of the usefulness of the selective labelling. STD-HSQC experiments with the tri-LacNAc 1, which is labelled in the three galactoses showed STD intensities mainly for terminal galactose, although the internal and reducing end units also showed STD. In order to differentiate between these galactose units, STD-HSQC experiments were performed with molecules 2 and 3, in which the galactoses are selectively labelled. In the STD with molecule 3 no STD effects are detected, whereas for triLacNAc 2 clear STD signals arise. These experiments showed clearly how Gal-7 is exclusively recognizing the terminal and internal LacNAc and not the one in the reducing-end.61
Fig. 12 Left: Structure of the polyLacNAc and the naming of each LacNAc epitope. Right: Molecules synthesized by Moure et al.61 |
The existence of multiple binding modes for particular ligands has also been demonstrated in the investigations of age-related macular degeneration, paying attention to the structural basis of the interaction of diverse modules of complement factor H with sulfated glycosaminoglycans (GAG).89 A receptor-based NMR approach, including site directed mutagenesis allowed demonstrating that the GAG interacting site is occupies the centre of an extended binding groove, with multivalent recognition of the sulfated GAGs.
Fig. 14 (A) The key employed bivalent molecules used to target Shiga-like toxin.91 (B) Structure of a Shiga-like toxin (PDB 4M1U), the A subunit is represented in green, whereas the B subunits are depicted with its surface in red, orange, blue, yellow, and purple. |
A different NMR approach, measuring residual dipolar couplings (RDC), allowed to describe the binding mode of the saccharide fragment of globotriaosylceramide and the B-subunit homopentamer of verotoxin 1 (VTB).92 The analysis of the RDC for the free and bound saccharide showed that the oligosaccharide binds in a single binding locus per monomer (site II). Fittingly, this is one of the three possible sites deduced by X-ray crystallography for the same molecular complex. No NMR experimental evidences were found for binding at the other two possible sites, which are likely low affinity sites. Interestingly, the paradigmatic STARFISH inhibitor invented by Bundle and co-workers90 was designed to bridge sites 1 and 2, although it exclusively binds to site 2 in the two adjacent molecules of VTB in the crystal structure. Nevertheless, it cannot be discarded that the low affinity sites I and III may contribute to the molecular recognition events in physiological conditions.
In a similar context, the synergic application of inter-ligand NOE and STD NMR experiments, using the DEEP protocol, permitted demonstrating the presence of a cryptic binding subsite on the ganglioside recognition site of cholera toxin-B.53 The combination of the experiments with computing data acquired through Hamiltonian replica exchange molecular dynamics (HREMD) revealed that, although the subsite could not be deduced by inspection of the X-ray crystal structure of the GM1/CTB complex, the MD simulations predicted that it can be easily generated by simple rearrangements of the orientation of Lys138 and Ile59, close to the known Neu5NAc and Gal binding subsites.
In the same subject, the binding of the histo blood group antigens (HBGA) to the CTB-pentamers and the El Tor variant has been investigated by STD-NMR and trNOESY experiments.93 Interestingly, no significant differences were observed, and similar binding affinities were deduced for both toxin genotypes. However, the HBGA antigens interact at a binding site distinct from that of GM1, the canonical binder. Indeed, the blood group H tetrasaccharide and the GM1-oligosaccharide simultaneously bind to the classical CTB.
It has been described that that the HBGA are related to the life cycle of the virus. More specifically, it had been previously demonstrated that α-L-fucose (the common part of A, B and H blood groups antigens) is necessary for binding.95 The authors employed STD-NMR spectroscopy combined with T2-filtering experiments to screen a commercial small-compound library (Maybidge Ro5 500 fragment library) versus norovirus virus like particles (VLP).
The protocol is fairly robust: The initial screening process by STD-NMR and spin-lock filtered experiments, (VLP:ligand ratio, 1:10), led to a very high hit rate. Then, these hits were subjected to a competition STD-NMR experiment to identify those that indeed bind to the Fuc subsite of the HBGA binding locus, using an excess of Fuc as competitor. The observed decrease in the STD intensity of the putative ligands indicated those that were competitors. Using small mixtures of just 9 molecules with a small molar excess of Fuc, the hits were ranked according to their relative binding ability.
Since the VP1 proteins of VLP are dimers, (25 Å distance between binding sites, and the dimers placed at ca. 75 Å from each other), a multivalent polymer was synthetized, placing one Fuc moiety and one identified ligand every 30 propionylamide units. It was guessed that this geometry would allow simultaneous binding to the HBGA binding clefts at the dimer and also at the vicinal dimers.96 Fittingly, an outstanding 1000-fold gain of potency over Fuc was obtained. The binding mode and bioactive conformation of the heterobifunctional moiety of the polymer was deduced (Fig. 15) by a combined STD-NMR and trNOE approach.97,98
Indeed, the interaction of different ligands with noroviruses has been extensively studied by NMR. Recently, the comparison of the results obtained through NMR have been compared to those obtained by other techniques, highlighting the pros and cons of the diverse experimental approaches, and providing explanations to the observed reasons.99 The authors conclude that the combination of Mass Spectrometry techniques and NMR experiments provides the best insights for understanding the HBGA binding events by norovirus capsid proteins, providing reliable and reproducible binding affinities.
The molecular details of the recognition of the HBGA by a Human Norovirus had been previously determined by STD-NMR. The binding specificity was obtained as well as details on the bioactive conformation of the glycans.100
Moreover, the binding of HBGAs and sialoglycans to a variety of human and murine norovirus capsid proteins has been extensively studied by NMR experiments.101 Interestingly, on top of the usually employed STD-NMR experiments, the use of chemical shift perturbation NMR experiments allowed redefining the glycan recognition code for noroviruses. In particular, the norovirus P-domains from both species did not bind to the sialyl-containg glycans. Moreover, the murine P-domains did not bind to the HBGA either, while the infection through MNV-1 of cells deficient in sialoglycans did not present any difference to other cells that were expressing the corresponding glycans.
Additionally, glycomacromolecules functionalized with Fuc moieties (Fig. 16) have been developed to Targeting the Human Norovirus Capsid Protein in a precise manner.102 The design was based on the fact that the P domain dimer (P-dimer) contains two distinct HBGA binding loci, although two additional sites have been recently found between the two outer canonical binding sites The distances between the different sites were assessed through X-ray crystallography, being 11 Å (between Fuc sites 1 and 3), 17 Å (Fuc sites 1 and 4) and 27 Å (Fuc sites 1 and 2), as can be seen in Fig. 17.
Fig. 16 Left: Scheme of the histoblood group antigens (HBGA 0, A and B). Right: The glycomacromolecules synthesized by Bücher et al., adapted from ref. 98. |
Fig. 17 X-Ray crystal structure of the human norovirus capsid protein dimer in complex with four L-Fuc molecules (red). Binding to the four binding sites is a dose-dependent and stepwise process, where followed order is indicated according to the Fuc numbering in the image. Distances between different Fuc pockets are indicated. PDB ID: 4Z4R. |
Ligand and receptor-based NMR experiments were employed to disentangle the challenging multivalent interaction. The multivalent nature of the presented Fuc moieties at the polymer lead to precipitation of the ligands in the presence of the protein dimers. Nevertheless, the obtained NMR spectra were still useful to provide the binding epitope on the glycomacromolecules: the Fuc moieties were in close contact with the protein while the polymer backbone was not. Nevertheless, no information on the number of units at the individual glycomacromolecule binding to the protein could be extracted.
Although precipitation problems were also observed during the TROSY-based chemical shift perturbation (CSP) experiments, information on the binding mode of one particular glycomacromolecule to the GII.4 P-dimers could be extracted, showing that the CSP were basically identical to those observed when simple Fuc was added. This fact also supports that that the scaffold does not provide major interactions. Nevertheless, additional CSPs were observed at remote positions of the binding site, which were interpreted in terms of allosteric effects.
The quest for allosteric regulation of DC-SIGN and its selective recognition versus langerin has also led to the design of heteromultivalent molecules.116 In this context, Rademacher and coworkers have described an elegant and multidisciplinary approach to discriminate DC-SIGN and langerin, continuously increasing the complexity of the employed molecules in different publications along the years. Initially, a library of mannosides derivatized at C1 and C6 was screened for langerin using a 19F NMR reporter displacement assay. A ligand with micromolar affinity (KI = 0.23 ± 0.03 mM and KD = 0.5 ± 0.2 mM) was then discovered.
This ligand was conjugated to DSPE-PEG2kDa lipids to be displayed on liposomes. Fittingly, no meaningful binding was observed for langerin+ cell, while the interaction with DC-SIGN+ cells occurred, especially when the hetero-multivalent liposomes were employed.117 Based on this evidence, it was hypothesized that the glycomimetic bearing liposome might target a secondary binding pocket on DC-SIGN. Then, the binding of glycomimetic 48 was scrutinized by different NMR techniques to provide a structural perspective of the findings (Fig. 18). The dissociation constant was estimated independently by 19F-based CSP and 19F R2 filtered experiments,117 assisted by 1H,15N-HSQC based titrations, yielding almost identical values within the micromolar range (KD ∼ 0.46 mM). Moreover, 19F R2 filtered experiments were carried out under inhibitory conditions, observing that neither high Man concentrations nor EDTA addition completely abrogated DC-SIGN binding to the glycomimetic, while the same experiments with langerin resulted in complete inhibition. In fact, in the STD-NMR experiments, the signals arising from the Man residue were substantially reduced in presence of EDTA, while those of the biphenyl aglycone highly increased (Fig. 18). The addition of deuterated Man further enhanced the saturation received by protons at the aromatic system. Thus, the data demonstrated suggest the presence of a Ca2+-independent, secondary binding site for DC-SIGN, which displays specific interactions with the glycomimetic, driven by the biphenyl system.91H,15N-HSQC based titrations showed the characteristic CSPs at the carbohydrate binding domain, (N344, N365, E358, N366, N367, S360 and F313) together with others at residues located far away. Interestingly, these contacts had previously been identified by Aretz et al.118 Additional experiments carried out under Ca2+-free buffer and competition experiments with high Man concentrations demonstrated the increment of the CSP of remote residues, while those at the sugar binding site were abolished in the absence of Ca2+. Fittingly, it was finally demonstrated that targeting the putative allosteric binding pocket potentiated glycan recognition and (Fig. 19) allosteric activation, selectively for DC-SIGN over langerin. Therefore, although formally no NMR methods were applied to the monitor the interaction of the multivalent system with the lectins, they were instrumental to provide the rational for the observed functionality. This investigation provided ground-breaking information on the differentiation of ligands targeting DC-SIGN versus langerin, as continuation of previous investigations of the research group on those systems using similar NMR protocols.
Fig. 18 STD NMR epitope mapping of the glycomimetic with DC-SIGN. Left: Epitope map under Ca2+ containing buffer conditions, where the Ca2+-dependent binding mode of the glycomimetic was determined. Right: Epitope map in the presence of EDTA to determine the Ca2+ independent binding mode. Image adapted from Wawrzinek et al.112 |
Fig. 19 Scheme of the avidity enhancement mechanism for DC-SIGN in the presence of the glycomimetic and Fuc-bearing liposome particles. The binding of the glycomimetic to the DC-SIGN allosteric binding site causes structural rearrangements that potentiate the binding at the canonical sugar binding site resulting in a cooperative avidity enhancement.112 |
Langerin has also been used as target for glycosaminoglycans. In particular, the combination of experimental data obtained through STD-NMR and trNOESY experiments allowed deducing that while small heparin-like oligosaccharides bind to langerin in a Ca2+-dependent way in the canonical site, a long hexasaccharide, with an extra O-sulfate moiety at the non-reducing end, interacts with the lectin in a previously identified Ca2+-independent binding site. Indeed, the extra sulfate abolishes the interaction at the Ca2+ locus. Curiously, HEP-like oligosaccharides can also bind to the Ca2+-dependent binding site, in contrast to large heparin (6 kDa) that is bound at the multimerization interface between langerin monomers.119
Indeed, further NMR investigations combining glycan array screening with NMR spectroscopy allowed deducing that the interaction of heparin hexasaccharides to the elusive secondary site did not require the presence of Ca2+ ions, while activated an intradomain allosteric network of langerin that had previously been identified, although it was just linked to the affinity and release of Ca2+ ions.120
As a landmark in the field, combining multivalent presentation with NMR, an elegant series of asymmetrically branched precision glycooligomers have been built using chemical synthesis to study multivalent lectin–sugar interactions, such as in the Fig. 20.121
Fig. 20 Schematic view of one of the key glycooligomers employed to target multivalent lectin–glycan interactions.121 |
The binding features of the glycomacromolecules to langerin were monitored via19F-NMR based T2-filter competition assay, using the basic monovalent N-acetylmannosamine analogue decorated with 19F as spy molecule. The 19F-NMR based strategy revealed a clear correlation between the glycooligomers architecture and the resulting binding affinity. Again, this work showed how the combined use of NMR and multivalent presentation provides structural evidences on the way towards tuning and modulation of key glycan–lectin interactions. Further elaboration of this concept has allowed the generation of a specific glycomimetic ligand for langerin that is able to specifically target human Langerhans cells in the human skin, when conjugated to liposomes. In this case, the ligand was designed and built based on previous knowledge acquired for the interaction of heparin oligosaccharides with langerin.122
The use of 19F-NMR-based experiments in fragment-based screening has been employed also in the search for druggable pockets in multimeric lectins, as in β-propeller lectins.123 The hits identified by 19F-NMR were further validated by orthogonal methods, such as SPR and TROSY NMR experiments. In that sense, the NMR approach identified druggable pockets in a bacterial b-propeller lectin, which could be used in the design of allosteric inhibitors.
Given its relevance in modulate immune response, DC-SIGN has been one of the key targets for drug discovery campaigns. It has been shown that DC-SIGN binds Man and Fuc-containing glycans from viral proteins, including the gp120 glycoprotein from the HIV envelope. Different NMR studies have demonstrated that the DC-SIGN Man/Fuc binding site shows a large plasticity and can indeed bind these sugar residues in different modes. For instance, the interaction of this ligand DC-SIGN with a glycomimetic pseudotrisaccharide (Fig. 21) deduced by a combined ligand-based NMR approach, using STD-NMR and trNOESY experiments.120 The use of molecular modelling protocols together with CORCEMA-ST calculations, assessed that the experimental data can only be explained by using an ensemble of binding poses, which can account for the large inhibition provided by the glycomimetic. Indeed, it was also demonstrated that the pseudomannotrioside is also able to promote clustering without any multivalent presentation.124
Fig. 21 Scheme of the pseudomannotrioside employed to target DC-SIGN.124 |
The existence of multiple binding modes has also been assessed for DC-SIGNR (also dubbed L-SIGN), a C-type lectin highly related to DC-SIGN.125 In particular, the interaction of Man9GlcNAc with the carbohydrate-recognition-domain of the lectin was investigated by receptor-based NMR techniques. Interestingly the lectin displays micro- to millisecond dynamics in the presence of the Man9 glycan, with extensive line broadening. The data strongly suggest the existence of multiple binding modes, which can interconvert over a range of time scales.125
As a further step, diverse oligomannosides were used to prepare glyconanoparticles targeting 2G12.127 The use of STD-NMR methods allowed demonstrating that the Man glycans, when clustered onto gold nanoparticles, are able to interact with 2G12 with high affinity, and even inhibited the binding between 2G12 and the gp120 glycoprotein. The observed affinity was dependent on the particular Man-glycan and the density on the gold surface, as also demonstrated by SPR. Moreover, some glyconanoparticles were able to restrict the interaction of 2G12 with a recombinant virus.
Also related to the battle against HIV, the interaction of cyanovirin-N (CV-N) with Man glycans has been extensively studied by NMR, specially using receptor-based methods,128 since it is relatively small and provides very well defined 1H–15N spectra.129 CV-N is a potent antiviral lectin with two binding sites.130 However, the fine details of the interaction of the glycan and the lectin within the complex, including the involvement of the saccharide hydroxyl groups has been deciphered by employing a 13C-labelled oligosaccharide. 13C-based methods had previously been used to study the interaction between cyanovirin-N (CV-N) and a linear mannose trisaccharide Manα(1-2)Manα(1-2)ManαOMe (Man3).131 In this system, the interaction between the trimannoside and CV-N is too strong, resulting in very low quality STD-NMR experiments, since the off-rate is too low. However, this dynamic regime enables observing the bound state of the 13C-labelled glycan in the 1H–13C HSQC spectrum in the presence of equimolar amounts of the lectin (Fig. 22), directly observing the chemical shift perturbation of the signals of the ligand generated by the lectin. As a result, changes in the linewidth and intensity of the peaks are also observed, yielding information on the dynamics changes of the sugar between the free and bound states.
Fig. 22 (A) Scheme of Man3. (B) In black 1H–13C HSQC of the 13C labelled Man3 and in blue in the presence of CV-N. The shift of the signal is indicated with arrows (C) sugar region of the 13C filtered NOESY-HSQC (D) side chain region of the 13C filtered NOESY-HSQC. NOEs between the glycan and CV-N are indicated. Adapted from ref. 131. |
The use of 13C labelled carbohydrates and the study of its complex with a 15N or 13C/15N labelled lectin opens up the possibility of using a large variety of NMR experiments to characterize the binding events. The main advantage is that the complexity of the spectra can be significantly simplified through filtering the magnetization through 13C or 15N, yielding information on the 1H nuclei attached to those labels. For instance, CNH-NOESY experiments can be performed, observing direct intermolecular NOEs between protons attached to 13C labelled nuclei (sugar) and those attached to 15N labelled nuclei (lectin). Another possibility that labelling generates is detecting intermolecular NOEs between protons attached to 13C labelled nuclei of the sugar and other protons of protein (Fig. 20C and D). Additionally, interresidual NOEs within the sugar can be detected, reducing the spectral complexity. Finally, through 2D 1H–13C-HSQC-NOESY experiments, the conformation of the free and bound ligand could be elucidated. In this case, slight differences in the conformation between the two states were encountered.
Fig. 23 Glycosylation of the MUC1 antigen by GalNAc-T2 analysed through 1H,15N-HSQC of a 20-amino acid mucin tetramer, adapted from ref. 132. |
Generally speaking, on-cell NMR methods (Fig. 24) could use either ligand-based or receptor-based NMR approaches. The motional properties and the kinetics of the free-bound chemical exchange process are essential to define the approach. In the glycan–protein interaction field, ligand-based approaches have been usually employed. In particular, STD-NMR has been widely applied to probe the interaction of membrane-associated lectins to a variety of glycans. In this case, it is absolutely essential that blank experiments are carried out in the absence of the ligand and in the absence the membrane receptor to be able to disregard possible STD signals that may arise from non-specific binding between the ligands and the cell. Therefore, the experiment with the same cells should be repeated but without the expression of the target protein. In case that some signals are still observed, the difference between the STD NMR spectrum acquired in the presence of the receptor and that obtained in the absence of it should be subtracted to give the STDD spectrum, which should only show the STD signals arising from specific interactions.
This approach was first applied to investigate the interaction of membrane bound DC-SIGN with a mannan polysaccharide.134 The quality of the STD-NMR spectrum was very high, displaying signals only when the employed K562 cells were transfected with DC-SIGN, showing the high specificity of the interaction.
The methodology has also been applied to disentangle the interaction of the Neu5Ac-α-(2,6)-Gal-β-(1–4)-GlcNAc trisaccharide with H1 and H5 influenza hemagglutinins from human and avian strains. The trimeric lectins were transfected on the surface of HEK 293T human cells. Interestingly, under these conditions, the HAs keep their native trimeric geometry and binding features.135 The authors demonstrated, through STD-NMR methods that the glycan epitopes recognized by the two HA variants were different.
On-cell STD NMR methods have also been applied as a fast and reliable method to screen diverse ligands136 targeting the FimH lectin, a mannose-binding bacterial adhesin that is a virulence factor and therefore, a therapeutic target for treating infections of the urinary tract. In this case, the binding epitopes of a series of dendrimers decorated with Man were deduced, while the ability of the multivalent molecules to prevent FimH-mediated yeast agglutination was also determined (Fig. 25).
Fig. 25 The multivalent molecules employed to target FimH.136 |
Footnote |
† Equal contribution. |
This journal is © The Royal Society of Chemistry 2023 |