Eva
Dehling‡
,
Jennifer
Rüschenbaum‡
,
Julia
Diecker
,
Wolfgang
Dörner
and
Henning D.
Mootz
*
Institute of Biochemistry, Department of Chemistry and Pharmacy, University of Muenster, D-48149 Münster, Germany. E-mail: Henning.Mootz@uni-muenster.de
First published on 11th August 2020
Nonribosomal peptide synthetases (NRPSs) are large, multi-modular enzyme templates for the biosynthesis of important peptide natural products. Modules are composed of a set of semi-autonomous domains that facilitate the individual reaction steps. Only little is known about the existence and relevance of a higher-order architecture in these mega-enzymes, for which contacts between non-neighboring domains in three-dimensional space would be characteristic. Similarly poorly understood is the structure of communication-mediating (COM) domains that facilitate NRPS subunit docking at the boundaries between epimerization and condensation domains. We investigated a COM domain pair in a minimal two module NRPS using genetically encoded photo-crosslinking moieties in the N-terminal acceptor COM domain. Crosslinks into the C-terminal donor COM domain of the partner module resulted in protein products with the expected migration behavior on SDS-PAGE gels corresponding to the added molecular weight of the proteins. Additionally, an unexpected apparent high-molecular weight crosslink product was revealed by mass spectrometric analysis to represent a T-form isomer with branched connectivity of the two polypeptide chains. Synthesis of the linear L-form and branched T-form isomers by click chemistry confirmed this designation. Our data revealed a surprising spatial proximity between the acceptor COM domain and the functionally unrelated small subdomain of the preceding adenylation domain. These findings provide an insight into three-dimensional domain arrangements in NRPSs in solution and suggest the described photo-crosslinking approach as a promising tool for the systematic investigation of their higher-order architecture.
In a typical linear NRPS,7 the modules are arranged in a co-linear fashion with the amino acid sequence of the NRP product. While this arrangement suggests an ordered and largely repetitive spatial organization of modules and domains, it still remains unclear if and how multi-modular NRPSs templates are organized in a three-dimensional, higher-order superstructure.8–12 Crystal structures were obtained of several didomain truncation constructs13–16 and of some mono- and di-modules with multi-domain arrangements.8,9,17–19 These structural insights have confirmed significant conformational domain mobility between neighboring domains. In particular, the PCP domain as a carrier of covalently bound substrates and intermediates travels large distances to reach the various catalytic centers of the domains it has to interact with. In solution FRET studies have correlated these motions with catalysis.20 A recent cryo electron microscopy study suggested that the domain and module arrangements in multi-domain constructs might be very flexible, on top of the mobility of individual domains.8 Crystal structures of a dimodular fragment revealed contacts between a formylation domain of module 1 and a C domain of module 2, the first example of domain–domain contacts of domains that are not neighboring in the primary sequence.9 Such contacts might be indicative for a superstructure, however, due to crystal packing effects their occurrence as crystallization artefacts cannot be ruled out. For these reasons, new and complementary techniques to probe the structure of multimodular and entire NRPS templates in solution are needed.
Another important aspect is the structural organization of NRPSs with more than one subunit, which is the predominant occurrence of bacterial NRPSs. For example, the five modules of the gramicidin S synthetase are distributed over two enzymes, the gramicidin S synthetase A (GrsA) with one module and the gramicidin S synthetase B (GrsB) with four modules (Fig. 1A). The interaction between subunits is facilitated by docking or communication-mediating (COM) domains.21,23–28 Multiple such interactions are required in NRPS templates with more than two subunits (Fig. 1B). Combinatorial exchange of docking and COM domains holds the potential to reprogram the biosynthetic pathways in order to obtain new peptide products.21,23 COM domains represent one specific class of docking domains that are found at NRPS subunit interfaces with an E domain at the upstream subunit and a C domain at the downstream subunit (Fig. 1A and B). This class was initially defined as C- and N-terminal tails of approx. 15–20 aa, referred to as donor and acceptor COM domains, respectively.21,29 Examples include the protein interaction interfaces between the GrsA and GrsB enzymes as well as between the pairs of TycA–TycB and TycB–TycC subunits in the tyrocidine NRPS (Fig. 1A–C). Initial biochemical and swapping studies suggested that these COM domains are the only mediators of the interaction and that the rather short peptide sequences possibly form helical structures.21,29,30
![]() | ||
Fig. 1 Representative NRPSs with COM domains. (A and B) Domain organization of the gramicidin S and tyrocidine synthetases, respectively. (C) Magnified illustration of the GrsA and TycB1 module with sequence alignment of the terminal COM domains. Numbering refers to the sequences of GrsA/TycA and GrsB1/TycB1 and indicated positions refer to domain boundaries according to initial studies.21 Amino acids in red were replaced with photo-crosslinking BzF (p-benzoyl-phenylalanine) in this study. (D) Model of the upside-down helix-hand model.22 Positions of the hand region are with respect to TycB1 numbering: aa2–8 (thumb), aa9–14 (first β sheet of palm), aa79–83 (second β sheet of palm), aa68–71 (third β sheet of palm), aa72–78 (fingers). The helix shown represents GrsA residues aa1081–1098. The illustration was created using pdb file 2VSQ, representing SrfA-C with an artificial tag helix.17 (E) Structure of BzF. |
However, a different model was suggested by the crystal structure of a C domain with its N-terminal acceptor COM domain, as a part of the structure of the surfactin A synthetase C (SrfA-C) module. This structure showed by serendipity the binding of an unrelated tag sequence at the protein's C terminus to the acceptor COM domain in the crystal lattice.17 The tag sequence had reasonable sequence homology to the cognate donor COM domain, suggesting it mimicked the binding of the latter in the COM complex. The tag sequence adopted an α helix and formed contacts not only with the acceptor COM domain, but also with an extended surface on the C domain grasping around the helix like a hand, suggesting that additional sequences outside the COM regions might be important for the interactions. Based on this structure, a helix-hand model was proposed for the interaction (Fig. 1D).17 We could further support the general architecture of a helix-hand motif by a mutational and photo-crosslinking study of the donor COM domain of GrsA.22 However, based on spatial constraints obtained from mapped crosslinks, we predicted an upside-down orientation of the donor COM helix in the acceptor COM hand motif (Fig. 1D).22 Consistent with this revised model, a crystal structure of a C domain of the fungal TqaA NRPS later showed that the hand motif can bind an extended sequence as an α helix in such reverse orientation.31 Despite these studies, the actual structure of a COM domain complex remains elusive. Other swapping attempts showed mixed successes and thereby further underline that COM domains are not yet sufficiently understood to reliably reprogram NRPS templates.32,33
Genetically encoded photo-crosslinking amino acids have enabled the probing of protein–protein interaction interfaces in a position-dependent manner.34 The benzophenone moiety of the unnatural amino acid p-benzoyl-phenylalanine (BzF) is a widely used photo-crosslinker that can be repeatedly activated with light of approx. 350–365 nm.35 The formed diradical is short-lived and typically inserts into C–H bonds of side chains and the peptide backbone in a distance range of 3.1 Å,36 although larger labeling radii are possible due to rotations and flexibility at the BzF side chain and the surrounding environment.35,37 Methionine side chains have been observed as preferred crosslink partners.38
In this work, we have further investigated the architecture of a COM domain complex by using photo-crosslinking and mass spectrometry (MS) mapping of the crosslinks. We performed a positional scan with BzF39 in the acceptor COM domain of TycB1 in the dimodular GrsA–TycB1 system (Fig. 1C). We report the discovery of an unusual type of crosslink that produced a protein band with aberrantly slow migration behavior in SDS gel electrophoresis. By MS mapping and defined conjugate synthesis using bioorthogonal chemistry we show the importance of L-form and T-form crosslink isomers of the >250 kDa complex to explain the unusual migration behavior. Furthermore, our data suggests the spatial proximity of an unrelated catalytic adenylation domain to the COM interaction interface and thereby highlights the photo-crosslinking approach as a new method to study the higher-order architecture of the giant multi-domain NRPS.
We determined a dissociation constant Kd = (5.0 ± 0.9) μM for the interaction between GrsA and TycB1 using microscale thermophoresis (MST) (Fig. 2).
We incorporated the photo-crosslinking amino acid BzF into the first 13 positions (S2 to A14) of the acceptor module TycB1. These residues represent the “thumb” and the first β sheet of the “palm”-motif in the helix-hand model (Fig. 1D).22 M12 is the central hydrophobic amino acid in the β sheet that faces the donor-COM helix according to our proposed model.22 To monitor the crosslinking ability of all these constructs, each TycB1(BzF) protein was UV-irradiated either in the absence or presence of the GrsA partner protein and then analyzed by SDS-PAGE and immunoblotting against the SBP-tag on GrsA (Fig. 3A). Control experiments with wildtype TycB1 lacking BzF showed that no new bands were produced by UV-irradition (Fig. 3B), whereas most TycB1(BzF) constructs formed new bands even in the absence of GrsA (except those with D11BzF and Y13BzF), suggesting various forms of intra- or intermolecular crosslinking (see exemplary Fig. 3C and all lanes without GrsA in Fig. 3A). Depending on the protein batch, additional GrsA-independent bands as photo-crosslink products could become more pronounced, possibly through partially misfolded TycB1 species. Further control experiments exploring varying protein concentrations, irradiation times and buffer conditions were performed and showed that the observed crosslinks, which are discussed in the following, were reproducible over a wide range of conditions (Fig. S1–S3†).
In total, three types of new bands with very different migration behavior and apparent molecular weights exceeding those of the individual GrsA (132.5 kDa) and TycB1 (125.2 kDa) proteins became visible. We refer to these as low (l), middle (m) and high (h) bands (Fig. 3A). The low bands (∼160 kDa), when present, always appeared also in the absence of GrsA and did not stain in the GrsA-specific anti-SBP immunoblot. Its migration behavior corresponded to a molecular weight clearly below the calculated size of two TycB1 molecules. Together, these findings suggested the low bands represented (a) monomeric form(s) of TycB1 with an intramolecular crosslink. The medium bands migrated at >200 kDa, which potentially fitted with the calculated molecular weight of both the crosslinked GrsA–TycB1 heterodimer (257.7 kDa) and a TycB1–TycB1 homodimer (250.4 kDa). It could be observed without or with GrsA, but was more pronounced in its presence, and it stained in the SBP immunoblot in the latter cases. These findings suggested that the middle bands represented a form of a TycB1–TycB1 homodimer in samples lacking GrsA, and additionally a GrsA–TycB1 heterodimer in samples that included GrsA. Finally, the high bands were only observed in presence of GrsA and only for the V3, F4, S5, E7, Q8 and V9 positions of the TycB1(BzF) mutants (see Fig. S4† for densitometric analysis of band intensities), suggesting they represented GrsA–TycB1 hetero-crosslinked species. The finding that the high bands always stained in the SBP immunoblot is consistent with this interpretation.
Control experiments with gradually truncated COM domains on GrsA or TycB1 confirmed that the appearance of the GrsA-dependent high and middle bands was dependent on the intact COM regions and became weaker with their gradual deletion (Fig. S5†).
Since the migration behavior of these bands was difficult to determine precisely on our standard acrylamide Tris–glycine gels (6%) with a standard molecular weight marker (highest marker band at 200 kDa) as shown in Fig. 3A, we turned to a Tris–acetate gel (6%) using a special high-molecular weight marker (Fig. 3D). Using TycB1(S5BzF) as one example that showed all three bands, this analysis suggested the high band migrated well beyond 300 kDa (at ∼400 kDa). The middle band was determined more accurately to run at approx. 270–280 kDa and the low band migrated at 130–140 kDa (Fig. 3D). The calculated 257.7 kDa of a GrsA–TycB1 crosslink are thus best fitting to the middle band. The middle band is also similar in size to crosslinks previously obtained using GrsA(BzF) with BzF in the donor COM domain.22
The finding that the presence and intensity of the photo-crosslink products were clearly dependent on the BzF position (Fig. 3A and S4†), suggested that structural information on the architecture of the interface could be derived from this data.
To probe the two different models, we mapped the crosslink positions by tandem mass spectrometry (MS/MS). The middle and high bands of a photo-crosslink experiment using GrsA and TycB1(S5BzF) were excised from the SDS gel, digested with trypsin and analyzed by LC-MS/MS. In the middle band digest, at least two chromatographically distinct isobaric peptides with m/z 811.14 were identified, both corresponding to crosslinks to the GrsA donor COM helix (downstream of E1080). In one case, the fragmentation data quality allowed us to pin down the crosslink site to S1096 (Fig. 5A and S6A†), whereas for the second peptide, either I1089 or F1090 is the target (Fig. S6B†). Close proximity of these residues with S5 in the acceptor COM domain is consistent with our structural helix-hand model of the COM domain interface.22 According to our second hypothesis, the resulting shape of these crosslinked GrsA–TycB1 species would resemble the L-form with a terminus-to-terminus crosslink (Fig. 4B).
![]() | ||
Fig. 5 MS/MS mapping analysis of photo-crosslinked peptides. The MS/MS spectra are consistent with the expected fragmentation patterns of the illustrated crosslinked peptides, identified using StavroX 3.6.6.44 (A) Assignment of a cross-link between GrsA (α) and TycB1(S5BzF) (β) peptides from the middle band of the photo reaction x denotes BzF. The precursor ion [M + 4H]4+ at m/z 811.1434 matches the expected mass of the cross-linked peptide with a deviation of 0.7 ppm. The GrsA fragment encompasses amino acids E1080LELEEMDDIFDLLADSLT1098 and the additional residues GSR from the fused tag. (B) Assignment of a crosslink between GrsA (α) and TycB1(S5BzF) (β) peptides from the high band of the photo reaction x denotes BzF. The precursor ion [M + 4H]4+ at m/z 817.4054 matches the expected mass of the cross-linked peptide with a deviation of −0.3 ppm. The GrsA fragment encompasses amino acids Q489FSSEELPTYMIPSYFIQLDK509. (C) Assignment of a cross-link between GrsA(Y498AzF) (α) and TycB1 (β) peptides from the high band of the photo-reaction. O denotes AzF. The precursor ion [M + 4H]4+ at m/z 779.6391 matches the expected mass of the cross-linked peptide with a deviation of 0.4 ppm. The TycB1 fragment encompasses amino acids S2VFSK.6 The associated MS/MS spectra are presented in the ESI (Fig. S6 and S11†). |
Interestingly, the crosslinks identified in the high band mapped to a markedly different position. The amino acid stretch P496TYMI500 of GrsA was recovered with M499 as the crosslink site (Fig. 5B and S6C†). Surprisingly, this internal crosslink site is located outside of the terminally located COM-interface (compare Fig. 1C for GrsA numbering). The biochemical conclusions from this finding are discussed below. Notably, the crosslinking to this interior position of GrsA would result in the T-form shape of two polypeptides, as postulated in our second hypothesis (Fig. 4B). Similar results were obtained for the middle and high bands of a photo-crosslink experiment using GrsA and TycB1(V3BzF) (Fig. S7†).
To rule out the possibility of an artefactual nature of the identified crosslinks, which might be conceivable due to the non-native pairing of GrsA with TycB1, we also analyzed photo-crosslink products in all possible combinations of the first two modules of the gramicidin S and tyrocidine synthetases. BzF was incorporated at the corresponding position of the acceptor COM domain of GrsB1 (K5BzF). Indeed, the bands representing the L- and T-form crosslinks were observed in all native and non-native combinations, however, with varying relative intensities (Fig. S8 and S9†). TycB1(S5BzF) was more prone to the formation of the T-form crosslink, both with GrsA and with its native partner TycA. On the other hand, GrsB1(K5BzF) resulted mostly in formation of the L-form crosslink with both protein partners, but also the T-form crosslink with the internal position could be mapped (Fig. S9†). A TycA–TycB1 fusion construct as a control migrated similar to middle bands, thus providing further confirmation for their assignment as L-form isomers (Fig. S8†).
While these results supported the second hypothesis to explain the middle and high bands as L-form and T-form crosslink products (Fig. 4B), they did not strictly rule out the first hypothesis because a second crosslink leading to a potential GrsA–(TycB1)2 heterotrimer might have escaped the detection. However, since the terminus–terminus crosslinks (L-form) were exclusively found in the middle bands and the terminus-internal crosslinks (T-form) exclusively in the high bands, the heterotrimer model of the first hypothesis appeared very unlikely (Fig. 4A). Nevertheless, given the difficulty to prove the absence of a possible heterotrimeric reaction product, which would be necessary to disprove the first hypothesis, we aimed at collecting direct evidence to prove the second hypothesis.
![]() | ||
Fig. 6 CuAAC-mediated synthesis of standards of L- and T-form isomers. (A) Scheme of the reactions with AzF and PrY at the indicated positions. (B) Synthesized click-L and click-T isoforms are illustrated. Analysis of the CuAAC reactions by coomassie-stained SDS-PAGE gels is shown with a photo-crosslink reaction of TycB1(S5BzF) with GrsA for comparison (BzF control). (C) Assignment of a cross-link between GrsA(Y503PrY) (α) and TycB1(S5AzF) (β) peptides from the high band of the CuAAC reaction. O and J denote AzF and PrY, respectively. The precursor ion [M + 4H]4+ at m/z 815.1499 matches the expected mass of the cross-linked peptide with a deviation of −2.7 ppm, and the MS/MS spectrum is consistent with the expected fragmentation pattern. The associated MS/MS spectra of this click-T standard as well as the other synthesized click-L and click-T standards are presented in Fig. S10.† |
To further validate the unexpected spatial proximity suggested by the T-form crosslinks we asked whether the proximity could also be observed in a ‘reverse’ photo-crosslinking experiment. To this end, photo-crosslinking amino acids BzF and AzF were incorporated at position Y498 of GrsA right next to M499 that was identified by MS-mapping. Indeed, following incubation of GrsA(Y498AzF) with TycB1 and UV-irradiation we mapped the crosslink to peptide S2VFSK6 of TycB1 (Fig. 5C and S11†). These findings independently confirmed the spatial proximity of the terminal TycB1 region and the internal GrsA regions. They also further supported the notion that the T-form crosslink did not result from potential structural artefacts caused by the unnatural amino acid BzF or the preference of BzF for crosslinking with methionine residues.38
To rationalize the captured proximity between the AC domain of GrsA and the acceptor COM domain of TycB1 we attempted to conclude on the most likely underlying conformation of the GrsA–TycB1 complex. Notably, this endeavor is complicated by the fact that neither structures of the donor COM domain or a native COM domain complex nor of a PCP-E-C sequence of domains are available. The E domain of GrsA is an additional binding site for the PCP that is not represented in known structures of entire NRPS modules. We reasoned that the AC subdomain of GrsA will be partially dragged on the backside of the PCP to the catalytic domains, as observed in several crystal structures.17,18 Next to possible open structures with the PCP not being in functional domain contact, the three expected positions the PCP can adopt are those in contact with the catalytic centers of the A and E domains of GrsA and the C domain of TycB1. We term the respective conformations as transfer, epimerization and donor condensation conformations (Fig. 7A, B, and D). The PCP of the TycB1 module could adopt transfer and acceptor condensation conformations (Fig. 7C–E). Crystal structures from other NRPS systems are known for the transfer9,13,14,18,19 and the donor condensation conformations.9 Despite the lack of the E domain in these structures, we hypothesized that they would allow us to estimate whether the relative orientation and distance of the A9 motif in the AC subdomain to the other domains would be compatible with the proximity of the acceptor COM observed in this study. Interestingly, crystal structures of both the transfer and the donor condensation conformations showed the sequence corresponding to the P496TYMI500 sequence of the GrsA A9 motif to be at the center of the domain contacts with the PCP and the respective catalytic domain, AN or C (Fig. 7A and D). These arrangements result in the A9 motif being almost completely engulfed and therefore very likely not available for any further domain contacts with the COM domain (Fig. S12A and B†). Furthermore, the obviously conserved structural role of the A9 motif in domain contacts is also found when the AC-PCP unit binds to the C domain in the acceptor position (illustrated in Fig. 7D and E for TycB1, not shown in detail).18,19 While these observations support the propensity of the A9 motif for contacts with other domains, they appear to rule out the transfer and donor condensation conformations as the conceivable domain constellations for the proximity with the COM domain. Another argument against the donor condensation conformation can be construed from the estimated location of the photo-crosslinking side chain in the COM domain complex, which would be ∼44 to 55 Å away from the A9 motif, on the opposite side of the C-domain (illustrated in Fig. 7D).
We therefore attempted to evaluate whether the unknown epimerization conformation could be compatible with a close proximity between the acceptor COM domain of TycB1 and the A9 motif of the GrsA AC subdomain. An isolated PCP-E structure is known (pdb: 5ISX)16. To project possible localizations of the AC domain relative to the PCP-E ensemble (Fig. 7B) we overlayed the PCP-E structure with AC-PCP units from several other structures. This modeling suggested that the contact between the AC subdomain and the acceptor COM domain in three-dimensional space is plausible with the PCP binding the E domain (Fig. S12C†), although these findings do not provide a solid proof. Together, we assume that the mapped crosslink of the T-form isomer most likely captured the GrsA–TycB1 complex in the epimerization conformation as illustrated in Fig. 7E.
Furthermore, all BzF positions that gave rise to T-form crosslinks were in the ‘thumb’ region of the hand motif17,22 from aa3–9 (Fig. 3 and S4†). A protrusion of the thumb away from the compactly folded C-domain,17 as observed in pdb-file 2VSQ, may explain why it can be in contact with both the donor COM helix and the unrelated AC domain. The similarly observed L-form crosslinks from the ‘thumb’ positions reflect the simultaneous interaction in the COM–COM pair.
BzF positions at aa10–14 of TycB1 are located in the first β sheet of the ‘palm’ in the hand motif. The residues of the β sheet facing the one side are expected to be completely covered when binding the donor COM helix. Consequently, only L-form crosslinks with the COM donor domain were observed. The finding that D11BzF and Y13BzF failed to produce crosslinks, whereas Q10BzF, M12BzF and A14BzF did, is consistent with the alternating orientation of these side chains in the β sheet such that only every second residue would face the helix of the COM donor motif and the others are turned towards the interior of the C domain. Importantly, these results show in fact the first direct proof for residues of the ‘palm’ region to be involved in the COM–COM interaction and thus further strengthen the helix-hand model. A more comprehensive photo-crosslinker-scanning and crosslink mapping analysis will therefore likely reveal a more detailed view on the COM domain structure.
By photo-crosslinking and peptide mapping we have shown that a functionally unrelated and in primary sequence non-neighboring AC domain of the NRPS template can be localized in spatial proximity to the interaction mediating COM domain interface of two subunits. To our knowledge, this is the first non-neighboring domain contact in 3-D space unraveled for NRPSs in solution. Our results suggest a rational approach to investigate the three dimensional packing of domains in multimodular NRPS on the molecular level by photo-crosslinking to unravel their higher-order architecture, which is mostly uncharted territory.
DataAnalysis 4.4 (Bruker Daltonik GmbH, Bremen, Germany) was used for chromatogram processing and ProteinScape 4.0.3 (Bruker Daltonik GmbH, Bremen, Germany) was used to search our in-house database and for further analysis of MSMS data. Crosslink peptides were identified using StavroX 3.6.6 (Michael Götze, University of Halle-Wittenberg).44
Footnotes |
† Electronic supplementary information (ESI) available: Supplementary figures and tables. See DOI: 10.1039/d0sc01969k |
‡ Contributed equally. |
This journal is © The Royal Society of Chemistry 2020 |