Songya
Zhang‡
ac,
Shuai
Fan‡
d,
Haocheng
He‡
ac,
Jing
Zhu
ac,
Lauren
Murray
efg,
Gong
Liang
ac,
Shi
Ran
d,
Yi Zhun
Zhu
h,
Max J.
Cryle
*efg,
Hai-Yan
He
*d and
Youming
Zhang
*abc
aCAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
bHelmholtz International Lab for Anti-infectives, Shandong University-Helmholtz Institute of Biotechnology, State Key Laboratory of Microbial Technology, Shandong University, Qingdao, Shandong 266237, China. E-mail: zhangyouming@sdu.edu.cn
cShenzhen Key Laboratory of Genome Manipulation and Biosynthesis, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
dNHC Key Laboratory of Biotechnology for Microbial Drugs, Institute of Medicinal Biotechnology, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100050, China
eDepartment of Biochemistry and Molecular Biology, Monash Biomedicine Discovery Institute, Monash University, Clayton, Victoria 3800, Australia
fEMBL Australia, Monash University, Clayton, Victoria 3800, Australia
gARC Centre of Excellence for Innovations in Peptide and Protein Science, Monash University, Clayton, Victoria 3800, Australia
hSchool of Pharmacy & State Key Lab. for the Quality Research in Chinese Medicine, Macau University of Science and Technology, Macau, China
First published on 25th November 2024
Cyclic compounds are generally preferred over linear compounds for functional studies due to their enhanced bioavailability, stability towards metabolic degradation, and selective receptor binding. This has led to a need for effective cyclization strategies for compound synthesis and hence increased interest in macrocyclization mediated by thioesterase (TE) domains, which naturally boost the chemical diversity and bioactivities of cyclic natural products. Many non-ribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) derived natural products are assembled to form cyclodimeric compounds, with these molecules possessing diverse structures and biological activities. There is significant interest in identifying the biosynthetic pathways that produce such molecules given the challenge that cyclodimerization represents from a biosynthetic perspective. In the last decade, many groups have pursued the characterization of TE domains and have provided new insights into this biocatalytic machinery: however, the enzymes involved in formation of cyclodimeric compounds have proven far more elusive. In this review we focus on natural products that involve macrocyclization in their biosynthesis and chemical synthesis, with an emphasis on the function and biosynthetic investigation on the special family of TE domains responsible for forming cyclodimeric natural products. We also introduce additional macrocyclization catalysts, including butelase and the CT-mediated cyclization of peptides, alongside the formation of cyclodipeptides mediated by cyclodipeptide synthases (CDPS) and single-module NRPSs. Due to the interdisciplinary nature of biosynthetic research, we anticipate that this review will prove valuable to synthetic chemists, drug discovery groups, enzymologists, and the biosynthetic community in general, and inspire further efforts to identify and exploit these biocatalysts for the formation of novel bioactive molecules.
Cyclization, a crucial chemical reaction, modifies the activity of natural products by forming a ring structure, of which the size and shape can significantly impact the activity of the molecule, particularly in the design of peptides.5–7 Also, the cyclic nature of the ring is typically essential for the activity, proving highly beneficial in enabling the specific interactions between molecule and its biological target, and improving its pharmacokinetic properties, such as half-life and permeability.7 This makes cyclic peptides highly promising for drug development. Cyclic peptides also have low structural Gibbs free energy, increasing binding affinity and selectivity for target molecules. They can selectively bind to unstable protein surfaces and penetrate cell membranes, thus facilitating target drug development.8,9
The polymerization of molecules as a biosynthetic strategy can bring many advantages,10 for example by generating compounds with high structural complexity and chemical diversity from a much smaller biosynthetic assembly line. Dimerization (as well as higher order multimerization) reactions can also be used to introduce specific functional groups or structural features, whilst at the same time generating enlarged molecules with improved stability or selectivity, which are crucial factors in target binding.11,12 Larger molecules are also able to access multiple binding sites, with examples showing how two separate binding sites on a receptor molecule can be targeted by a dimeric drug, creating much tighter binding than would be possible with two isolated monomeric compounds.13,14 To date, many dimeric natural products have been shown to exhibit enhanced bioactivities when compared to their monomeric form.15–18 Indeed, in some cases, dimeric compounds have been shown to be is active whilst the monomer remains inactive19 (e.g. dimeric lipopeptide fusion inhibitors exhibit significantly greater antiviral potency compared to monomeric derivatives,20,21 and cyclic peptide oligomers display more potent antimicrobial effect than their linear counterparts).22 Due to their broad range of biological activities, some dimeric NPs have been investigated as potential therapeutic agents or have inspired the creation of structural mimics.23–25
This review will focus primarily on cyclodimeric (as well as higher order multimeric) products formed through cyclisation of a dimeric precursor (referred to as macrocyclization), with a subset formed through direct cyclodimerization (the dimerization of two identical monomers to form a cyclic product). Cyclopeptides can be cyclized in various ways, including head-to-tail, head-to-side chain, side chain-to-tail, and side chain-to-side chain. Both de novo design and in vitro evolution are major strategies utilized for current cyclopeptide development.26 A detailed review of cyclic peptide cyclization strategies has been recently published, and will not be discussed in detail here.27–29
In natural product biosynthesis, TE domains play a crucial role in enhancing the chemical diversity and bioactivities of these compounds through cyclization. Typically, TE domains control the final chain length after the assembly of monomers from precursors. Macrocycle formation, often mediated by TE domains, is a common strategy in PKS- and NRPS-mediated natural product assembly lines.30–32 Various types of reactions, including TE-mediated cyclization, cycloaddition, cycloisomerization, esterification, and radical reactions, contribute to the formation of dimeric natural products.11,33 Understanding the enzymes catalysing these reactions within biosynthetic pathways is essential for drug discovery and the development of new biocatalysts for macrocyclization.
Moreover, research has shown that TE domains can mediate diverse atypical cyclization. For example, in teixobactin biosynthesis, the tandem TEs (TE1 and TE2) within the NRPS terminal module (Txo2) collaborate to create an active site for substrate cyclization, with biochemical experiments confirming their essential roles through serine residue acetylation.34 Similarly, the ObiF1 TE domain in obafluorin biosynthesis catalyzes β-lactone ring closure via transthioesterification, influenced by intramolecular interactions with the C-terminal integrated MbtH-like protein (MLP).35 Additionally, the TE domain within curacin A's polyketide synthase mediates a unique termination reaction within module CurM, preceded by sulfonation catalysed by an integrated sulfotransferase (ST) domain.36 Furthermore, the recombinant TE domain StmC/ACP–TE, discovered by the Ge group, showcases tandem reactions, acting as both a thioesterase and cyclase during streptoseomycin biosynthesis.37 Notably, Kobayashi et al. developed a chemoenzymatic approach to synthesize cyclic peptides by exploiting SurE, a protein homologous to a penicillin-binding protein rather than a traditional TE, thereby expanding the scope of chemoenzymatic synthesis of cyclopeptides.38,39 YsfF and YsfFHBT, both with thioesterase activities, are involved in complex biochemical reactions leading to the formation of youssoufene A1, which shows increased inhibition against multidrug-resistant bacteria. It was found that YsfF/YsfFHBT catalyse various reactions, including 6π-electrocyclic ring closure and dimerization.40 These examples collectively highlight the diverse and pivotal roles TE domains play in orchestrating the biosynthesis of natural products.
Currently, many non-ribosomal peptide synthetase (NRPS) and polyketide synthase (PKS) derived cyclodimeric NPs have been identified, with these molecules possessing diverse structures and biological activities. The identification of biosynthetic pathways that produce such molecules is of great interest due to the challenge that cyclization represents: specifically, how do the TE domains in NRPS or PKS assembly lines recognize and accommodate large substrates during catalysis? In the last decade, several groups have pursued identification of such TE domains from NRPS or PKS assembly lines, which has led to new insights to this biocatalytic machinery.41–43 Whilst these have typically not been involved in cyclodimer formation, our team has been one of those that has sought to overcome this by extensively investigating PKS/NRPS synthetases related to such dimerizing TE domains (including those involved in mohangamide and disorazole biosynthesis). Recently, several excellent reviews about TE-related biosynthesis have been published.31,32 However, these works only describe a single example of dimer cyclization. This review focuses on the special class of TE domains involved in the cyclization of dimeric (as well as higher order multimeric) natural product intermediates during NRPS and PKS assembly, including discussion concerning their mechanism.
In this review, we examine key examples of cyclic dimeric (as well as higher order multimeric) natural products. We illustrate the synthetic and biosynthetic routes to these cyclodimeric compounds, including the catalytic mechanism of key biosynthetic enzymes. We also strive to emphasize the gaps in both knowledge and methodology that must be addressed to effectively access these cyclic scaffolds. We limit our discussions to focus primarily on works that generate cyclodimeric natural products including novel analogues, thus largely excluding approaches that lead to the production of high polymer products through cyclo-oligomerization. This review covers the period until July 2023.
To accurately reflect the scope of our investigation, we have clarified that the phylogenetic analysis specifically covers TEs from reported bacterial PKS or NRPSs pathway. We excluded the TEs from fungal PKS/NRPSs which clearly have different evolutionary histories. In some fungal peptide synthetases, the oligomerization is catalysed by the CT domain which will be elaborated in the next section. Interestingly, TE domains with reported iterative catalytic function do not gather in a single cluster (Fig. 2), and the mechanism underpinning their catalysis still requires investigation. It is also apparent that a large number of uncharacterized TE domains remain spread across these different clusters and await characterization.
Fig. 2 Phylogenetic analysis of the TE domain family sequence. Colours represent TE domain function and substrate type (type I or II from NRPS or PKS pathway and others). The characterized TE domain with iterative catalytic function mentioned in this paper was labelled by asterisk. This phylogenetic analysis was based on TE proteins from representative phyla involved in reported biosynthetic pathways. This analysis was conducted using protein data sourced from the reference collections of the Boddy group52 and the Deng group.53 The detailed protein information can be found in the ESI† (Table S1). The sequence alignment and phylogenetic tree was generated by MEGA, the tree was modified by iTOL.54,55 For a more detailed phylogenetic analysis, refer to Caswell et al. (2022).56 |
Generally, these thioesterases can be grouped into type I or type II TE domains from NRPS or PKS pathways. Type I TEs are typically integrated into the final module of the biosynthetic assembly line and catalyze product release through hydrolysis or cyclization, whereas type II TEs function independently, often playing a role in proofreading or editing the assembly line intermediates.52,57 (Fig. 2). For example, the TEs from DptD, EndC, Sas19, GlmI and GrsB group together in one cluster, as they are all domains that are found in the final module of an NRPS and are responsible for peptide cyclization (in daptomycin, enduracidin, WS9326, glidomide and gramicidin biosynthesis, respectively). In contrast, PKS-derived intra-module TE domains exist together in different cluster (e.g. TE domains from pikromycin (pikAIV TE), erythromycin (DEBS TE), amphotericin (AmphK TE) and FR-008 (FscF TE) biosynthesis).
Many PKS and NRPS pathways contain stand-alone enzymes known as type II TEs (TEII), which group into one cluster in the phylogenetic tree (Fig. 2).52–56 Type II TEs are not covalently bound to the multienzyme, with recent research suggesting that Type II TEs instead are capable of diverse catalytic functions. Type II TEs belong to the α/β hydrolase family, where their role is typically to hydrolyze thioester-bound residues on the 4′-phosphopantetheine (4′-PPant) arms of acyl carrier proteins (ACPs) or peptidyl carrier proteins (PCPs), such as the TEII BarC from barbamide pathway58,59 or Tcp39 from teicoplanin biosynthesis.51 Some type II TEs have been characterized with the function of chain translocation, such as the type II TEs WS5 (Cal5 or Sas5) and WS20 (Cal20 or Sas20) from the WS9326 pathway and TEII LgnA from legonindolizidin A pathway, where they work as shuttling enzymes for substrates during peptide assembly.52,60 In the case of PnG, this TEII enzyme performs multiple functions in the biosynthesis of phoslactomycin PKS (Pn PKS), including release of an ACP-tethered intermediate as well as proofreading.61
The TE domain of NRPS machineries is responsible for catalyzing the hydrolysis or cyclization reactions of the NRPS polypeptide chain. Studies have shown that NRPS type I thioesterases recognize specific linear peptide-PCP substrates (or even peptide-S-CoA/SNAC substrates) and utilize internal nucleophiles to cyclize the peptide via carbon-to-nitrogen intramolecular lactonization or lactamization.68 As a crucial NRPS catalytic domain, research into the protein structure and enzymatic catalytic function of TE domains has been steadily increasing.
Structural analysis has revealed that TE domains belong to the α/β hydrolase superfamily, which includes enzymes such as lipases, proteases, and esterases.71 The typical α/β hydrolase fold is comprised of a repeating β/α/β motif that forms a seven- or eight-stranded parallel β-sheet wrapped by α-helices on both sides, while several helices on top form the lid region.74,75 In general, the first N-terminal β-strand of the hydrolase fold is absent in TE domain. Instead, the second β-strand serves as the only anti-parallel strand among the remaining six or seven strands in the structure. Sequence alignments showed that lid region significantly differs between TE domains. The differences observed in TE structures mainly lie between the sixth and seventh β-strands, with the α-helices linking these strands displaying diversity in structure, including the insertion of an extra structural domain.76 It has been hypothesized, based on crystal structures and MD simulations, that the lid region opens to accommodate the presentation of thioester substrates. These studies offer detailed insights into the dynamics of the lid region, which is thought to play a role in substrate loading and release.77–79
The structures of SrfTE and FenTE confirm that thioesterases belong to the α/β hydrolase superfamily.71,72 The asymmetric unit of SrfTE reveals the presence of two distinct molecules, each exhibiting closed and open conformations. While SrfTE exists as a dimer in the asymmetric unit, it has been shown to exist as a monomer in a solution environment. In the SrfTE, the lid structure covers three α helices and has two different conformations. When in the “open” conformation, the lid structure folds back, thus exposing the catalytic center. Conversely, in the “closed” conformation, the catalytic center is almost completely covered by the lid. In the Fen TE, the lid structure is the shortest seen for NRPS TE domains, which causes the catalytic center of Fen TE to be exposed, possibly serving to make this TE highly promiscuous. Fully or partly exposed catalytic triads were observed in the related lysosomal palmitoyl protein thioesterases,80 whose substrates are similarly sterically demanding when compared to NRPS peptides.
In terms of the structure of TEs, cyclodimerization is performed by EntF TE and Vlm2, making these of particular interest (Fig. 3A and B). The structure of Vlm2 TE adopts the α/β-hydrolase fold typical of type I TE domains, with a canonical Ser-His-Asp catalytic triad covered by the lid structure.70 The lid of Vlm2 TE is a mobile component that includes an extended loop, three helices (α4–6), a five-residue helix (α7), a long helix (α8), and another short helix (α9), which is thought to have various functions including substrate positioning and solvent exclusion (Fig. 3E). The structure of the EntF PCP–TE di-domain represents the physiological structure before CoA addition to the PCP domain. The PCP and TE domains have a well-defined relative orientation with a primarily hydrophobic interface. The globular core of the TE domain displays two α-helices (α6–α7) protruding from this core, resembling webbed fingers that form a lid covering the TE active site (Fig. 3F). This allows the 4′-PPant arms, which is tethered to Ser 48 of the PCP domain in the holo state and can span up to ∼20 Å, to reach Ser 180 in the EntF TE domain.
Fig. 3 Overall structure of TEs. Structures of Vlm2 TE (A), EntF TE (B), DEBS TE (C) and PICS TE (D). Secondary-structure elements of Vlm2 TE (E), EntF TE (F), DEBS TE (G) and PICS TE (H). |
The typical thioesterase mechanism belongs to the Bi–Bi model, where a covalent intermediate is formed between the substrate and the enzyme during the reaction process.56 The first step involves substrate nonribosomal peptide being loaded onto the TE domain by transesterification of the PCP thioester, forming the acetyl–enzyme intermediate.78,81 The second step involves a histidine residue acting as a base to help deprotonate a nucleophile within the substrate (such as the side chain amine, hydroxy and thiol groups of amino acids) that then can attack the acetyl–enzyme complex, forming the tetrahedral intermediate. The tetrahedral intermediate is stabilized by the oxyanion hole by hydrogen bonding from two backbone amide NH groups. Finally, the cyclic product is released from the TE domain, and as a result, the active site of the TE domain is reactivated (Fig. 4).
The catalytic core of the α/β hydrolase region is highly conserved, usually consisting of a catalytic triad Ser-His-Asp present comprising a nucleophilic serine adjacent to the β5 fold belonging to the GXSXG motif,82 an aspartic acid usually located at the loop following the β7 strand according to its canonical site in α/β hydrolase folds, and finally a histidine positioned on the loop after the last β-fold, which is also highly conserved (Fig. 4).69–71 Interestingly, some TE domains contain a Cys in place of the Ser in catalytic triads.83,84 The backbone amides of the amino acid following the nucleophilic amino acid, as well as another usually located between the β3 fold and the α-helix, form the negatively charged oxyanion hole, which stabilizes the intermediate during the reaction.85
The formation of a linear dimerization intermediate is a prerequisite for cyclodimerization mediated by TE domains. Two possible pathways for the oligomerization of NRPS intermediates by TE domains in valinomycin synthesis have been postulated.70 In the first scenario, ‘forward transfer’, the distal hydroxyl group of the peptidyl-O-TE complex attacks the thioester group in the peptidyl-S-PCP enzyme intermediate, directly forming dipeptidyl-O-TE as a product. In the second scenario, ‘reverse transfer’, the distal hydroxyl group of the peptidyl-S-PCP complex attacks the ester group in the peptidyl-O-TE enzyme intermediate, forming dipeptidyl-S-PCP as a product, which would then be transferred onto the serine of TE domain (Fig. 5). The analysis of the synthetic intermediates identified during valinomycin synthesis has revealed that Vlm TE catalyzes oligomerization through the “reverse transfer” pathway.70
Notably, Gln125 of Skyxy-TE is not conserved among other TE enzymes. The catalysis by Skyxy-TE of the triad's His254 might extract an α-proton from the substrate, and Skyxy-TE is capable of utilizing an oxyanion hole to stabilize the enolized intermediate, thereby facilitating the spontaneous epimerization reaction. Nocardicin is a monocyclic β-lactam antibiotic biosynthesised by an NRPS assembly line of five modules, despite only three residues being contained in the final peptide product. In nocardicin biosynthesis, the TE domain NocTE (PDB: 6OJD) has two functions, catalyzing the C-terminal epimerization of an L-Hpg residue to D-Hpg before performing the hydrolytic release of the mature product.86,87 The peptide binding pocket has been resolved by high-resolution crystal structures of the thioesterase domain in ligand-free form and trapped with a fluorophosphonate mimic of the substrate, which has aided in rationalizing the stereoinversion performed by this TE domain.88 The spontaneous epimerization reaction mechanism of NocTE is similar to that of Skyxy-TE. In the substrate of NocTE, the substitution of the hydroxyphenyl moiety stabilizes the carbanion; consequently, the substrate of NocTE can spontaneously epimerize to its stereoisomer (Fig. 6).
Both the DEBS TE and PICS TE are classified within the α/β hydrolase family and exhibit a hydrophobic cavity that traverses the protein, along with a hydrophobic dimer interface enriched in leucine residues. The catalytic triad is located in the center of the active site pocket, indicating that substrates must access this pocket during the reaction. The DEBS TE consists of a central β-sheet composed of seven strands, with β2 oriented in antiparallel manner to the remaining strands. Notably, the length of the β7 strand is shorter than β8, and the C-terminus of β7 is twisted 90° to form part of the substrate channel. The β strands are flanked on either side by α helices, two on one side and four on the other. Additionally, at the N-terminal extremity of the TE, two supplementary α-helices constitute the dimer interface.95,98
A two-step mechanism has also been proposed for the TE-mediated formation of macrocyclic polyketides. PKS and NRPS release mechanisms the first step involves the acylation of a linear polyketide thioester at a serine residue that is conserved within the catalytic triad of the PKS TE domains. This step generates an acyl–enzyme intermediate which can be stable for an extended period of time.56 The second step is a nucleophilic attack of an intramolecular hydroxyl group leading to cyclization (Fig. 7), or hydrolysis of the acyl–enzyme intermediate when no efficient intramolecular nucleophile is available.
In the biosynthetic pathway of pikromycin (PIK), PIK-TE-1 demonstrated a higher frequency of “active” conformations, enabling easier transformation into the pre-reaction state necessary for macrocyclization. This transformation was influenced by a hydrogen-bonding interaction with His268 and essential hydrophobic interactions for substrate recognition. Notably, Tyr178 was proposed to widen the exit and facilitate product release in PIK-TE-1. The substrate channel of PIK-TE-1 is crucial for the mechanism and selectivity of macrolactonization, featuring a hydrophobic spacious chamber within the substrate channel, while a hydrophilic barrier at the distal end of the substrate channel induces the long-chain substrate to curl towards the catalytic residue Ser148, thus facilitating macrolactonization. A similar phenomenon exists in Pim TE, where its substrate channel is bifunnel-shaped, with a highly hydrophobic cavity near the catalytic triad, and a hydrophilic region specifically inducing the linear substrate to curl and complete cyclization. Thus, the substrate channel interrupted by a hydrophilic barrier represents a universal mechanism that induces essentially linear hydrophobic substrates to curl into conformations suitable for forming macrolactones.
Computational analysis supported the notion that macrocyclization catalyzed by PIK-TE via re-face nucleophilic attack was thermodynamically and kinetically more favorable than si-face attack.100 Chen et al. combined MD simulations with QM and QM/MM calculations for complexes of DEBS TE and its substrates. The DEBS TE and substrate undergo induced fit aided by hydrogen bonding and hydrophobic interactions. This transition is accompanied by a conformational change in active site pocket of the enzyme, shifting from a “closed” state to an “open” state. Proton abstraction from the substrate is followed by nucleophilic attack on the carbonyl carbon at C1, resulting in the formation of a charged tetrahedral intermediate. The energy barrier for this reaction step is 9.9 kcal mol−1. Finally, the final macrocyclic product is released with a very low barrier of 0.3 kcal mol−1.101
For NRPS-derived peptides, cyclodimerization mediated by the TE domain further boosts the chemical diversity of the peptide products of these assembly lines. Beyond this, nature employs diverse strategies in constructing macrocyclic molecular scaffolds. For example, P450 oxidases and radical SAM enzymes also can catalyse macrocyclic formation, such as macrocyclic construction in vancomycin and streptide.106,107 Diels–Alder enzymes utilize a cycloaddition reaction to construct the macrocyclic scaffold in pyrroindomycin, whilst the bifunctional oxidase LkcE catalyses a unique combination of amide oxidation and a Mannich reaction to assemble the macrocyclic system in lankacidin.108 Cyclodimerization in NRPS pathway greatly increases the possibilities to access peptides with high structural complexity and bioactivity diversity. Moreover, NRPS modules can be used iteratively to generate larger peptide rings, following the principle of molecular efficiency and atom economy. Understanding how nature generates cyclooligomer peptides from acyclic precursors can provide inspiration for the construction of functional macrocyclic molecules. In this section, we will focus on NRPS-derived NPs that include cyclodimerization mediated by a TE domain in their biosynthesis, and particular focus on the structural characterization and catalytic mechanism of these TE domains.
Scheme 1 The gramicidin S biosynthesis pathway. The dotted lines are used to indicate the four intramolecular hydrogen bonds found in gramicidin S. |
Tyrocidine is a cyclic decapeptide first isolated from the soil bacteria Bacillus brevis in 1940, and is the major constituent of tyrothricin, an antibiotic mixture also containing gramicidin.118 Tyrothricin was an important milestone in antibiotic development, as it was the first antibiotic commercialized for clinical use. Tyrocidine A is active against a broad spectrum of Gram-positive bacteria and acts by disrupting the synthesis of the peptidoglycan layer of bacterial cell walls.119 Tyrocidine has also been shown to be effective at controlling the fungal pathogens.120
Further experiments then demonstrated that the ability of the GrsB TE domain to catalyse the cyclodimerization of peptide substrate decreases with increasing peptide length when these are loaded onto PCP–TE didomain constructs. The GrsB TE was shown to efficiently dimerize or even trimerize 6–15 amino acid containing linear peptides, demonstrating significant flexibility and substrate tolerance in the catalytic pocket of this domain.124 Additional chemoenzymatic studies demonstrated that the GrsB TE can cyclize immobilized linear decapeptide precursors, suggesting its potential use in chemoenzymatic analogue synthesis.125
In 1997, the Marahiel lab reported the biosynthetic gene cluster of tyrocidine in Bacillus brevis ATCC 8185. The tyrocidine NRPS synthetase enzymes TycA, TycB and TycC are responsible for peptide assembly of the linear decapeptide.117 The C-terminal TE domain found in TycC (TycC TE) is then responsible for the macrocyclization of the decapeptide to generate tyrocidine A.127 The TycC TE domain was the first excised TE domain (35 kDa) shown to retain cyclization activity. Moreover, it has been demonstrated that the recombinant TycC TE domain is a versatile macrocyclization catalyst, which can generate tyrocidine variants with altered biological activity.127 Whilst the TycC TE domain typically catalyses the cyclization of a decapeptide-thioester to form tyrocidine A, it has been shown that the TycC TE can also catalyse the cyclodimerization of the gramicidin S pentapeptide-SNAC (D-Phe-Pro-Val-Orn-Leu-SNAC) to generate gramicidin S (Scheme 2).126
The TycC TE can further catalyse the macrocyclization of peptide substrates that differ significantly from the linear precursor of tyrocidine. Tyrocidine and gramicidin share structural similarity in both their N-terminal (D-Phe-Pro) and C-terminal residues (Val-Orn-Leu). Biochemical results suggest that TycC TE can cyclize the gramicidin S precursor decapeptide-SNAC (D-Phe-Pro-Val-Orn-Leu-D-Phe-Pro-Val-Orn-Leu-SNAC) and a tyrocidine A precursor (D-Phe-Pro-Phe-D-Phe-Asn-Gln-Tyr-Val-Orn-Leu-SNAC) at a comparable rate to the wildtype peptides. These results suggest that the TycC TE can cyclize different substrates provided that the necessary “recognition residues” near each end of the substrate are present.126 Furthermore, the TycC TE tolerates the replacement of residues 5–8 in peptidyl thioesters, and that it shows comparable catalytic efficiency towards various artificial tyrocidine variants, including glycosylated peptide thioesters.128
To further explore the substrate specificity of the TycC TE, Xie et al. established positional-scanning libraries based on SNAC substrate mimics to determine the catalytic activity of TycC TE towards these libraries (Kcat and Km). In general, TycC TE is less efficient in cyclizing such libraries by a factor of 2- to 82-fold in comparison with the substrate sequence of wild type. Analysis of the catalytic efficiency (Kcat/Km) of these reactions suggested that the N-terminal D-Phe1 and C-terminal Orn9 residues are essential for substrate recognition by the TycC TE domain,129 which is consistent with other reported results.130 It was proposed that the efficient occurrence of TycC TE-mediated cyclization requires the formation of key hydrogen bonding partners between the amine group of the Orn side chain and the carbonyl group of the N-terminal aromatic residue.31 In addition, the results obtained using sub-library O3, where the catalytic efficiency showed the greatest reduction (82-fold), suggest that the side chain of the 4th residue significantly influences TE domain activity (Scheme 2).131
In 2006, Wadhwani et al. synthesized gramicidin S utilizing solid-phase Fmoc chemistry. The sequential synthesis starts from the 2-chlorotrityl (2CT) resin loaded with Fmoc-protected D-phenylalanine, and proceeds using typical SPPS conditions until the linear decapeptide 1 was cleaved from the solid support using TFA/iPr3SiH/H2O. An overall yield of 69% was obtained after peptide cyclisation mediated by PyBOP/HOBT in DCM solution and hydrazine-mediated Dde deprotection (Scheme 3).132
Rivera and colleagues developed an on-resin Ugi reaction system for the synthesis of gramicidin S analogues. 2CT resin was used as the solid support for peptide synthesis by solid phase peptide synthesis (SPPS). Next, the intermediate product resin-bound acyclic decapeptide 2 was directly subjected to Ugi macrocyclization to generate a range of gramicidin S analogues (Scheme 4).133,134 Grotenbreg et al. designed an alternate synthesis of gramicidin S via stepwise SPPS using HMPB-resin. After the linear nonapeptide 3 was generated using standard Fmoc-based SPPS, cyclization was performed through the incorporated azide functionality (Scheme 5). After deprotection, the cyclopeptide gramicidin S analogue was obtained in 96% yield.135
TE domains can be combined with SPPS of linear peptide precursors to cyclize the immobilized peptide, which can greatly enrich the structural diversity of the peptides obtained. As a key feature of the enzymatic synthesis of cyclic peptide molecules via NRPS-mediated biosynthesis is the linkage of an activated linear intermediate peptide to a carrier protein domain via a thioester linkage, this has similarities to SPPS approaches.136 TE domains can act as isolated cyclization catalysts, and further these TE domains can also act on solid-phase linked peptides that contain biomimetic linkers to complete peptide cyclisation, thus enabling the integration of biocatalysis with combinatorial SPPS.137 Kohli and colleagues also synthesized linear peptide SNACs (that mimic the thioester attachment of peptides to the NRPS) to investigate the catalytic function of the TycC TE domain; the results of their study further demonstrate the versatility of this approach for peptide cyclization (Scheme 2).137
Yoon and colleagues recently investigated the macrocyclization of mohangamide A by the TE domain found in the mohangamide NRPS assembly line (Scheme 6). Sequence alignment revealed few differences between the TE domains from Streptomyces sp. SNM55 (WS19T-TE) and S. calvus (Cal19T-TE) (Fig. 2).142 Their biochemical experiments suggested that the mohangamide TE domain displays strict specificity for its substrate, and is only able to catalyse peptide cyclization when both native precursors of mohangamide A are present. This suggests that the structure of the unsymmetrical monomeric peptides is restricted in order to lead to a competent reaction for the WS19T-TE/Cal19T-TE. Further structural investigation is needed to illustrate the detailed catalytic mechanism of this unusual pseudo-dimeric scaffold.143,144
Recently, it was demonstrated that Tyr dehydrogenation occurs during the WS9326A peptide assembly, with the cytochrome P450 Sas16 catalysing the dehydrogenation of the PCP-dipeptide intermediate (Z)-2-pent-1′-enyl-cinnamoyl-Thr-N-Me-Tyr. The P450Sas-mediated incorporation of the double bond follows N-methylation of the Tyr by the N-methyl transferase domain found within the NRPS, and P450Sas appears to be specific for substrates containing the (Z)-2-pent-1′-enyl-cinnamoyl group. The P450Sas structure reveals differences with other P450s involved in the modification of NRPS-associated substrates, including the substitution of the canonical active site alcohol residue with a phenylalanine (F250), which in turn is critical to P450Sas activity and WS9326A biosynthesis. This discovery expanded the repertoire of P450 enzymes that can be used to produce biologically active peptides,145 and raises further questions regarding the specificity of the mohangamide TE domain for the acyl side chains of its substrates.
Catechol-type siderophores are widely diverse in their structural composition. In contrast to enterobactin, streptobactin has a cyclic triester-bonded macrocycle scaffold (L-Thr-L-Arg-DHBA)3, which generate a C3 symmetry axis (Fig. 8). Streptobactin was discovered from the marine actinomycete Streptomyces sp. YM5-799, with the corresponding dimer dibenarthin, trimer tribenarthin and monomer benarthin all discovered from the same strain.153 The derivative corynebactin is assembled from the corresponding tri-lactonic scaffold containing the units L-Thr, L-Gly and DHBA (Fig. 8). Corynebactin was discovered from Brevibacterium sp, which are obligate aerobic Gram-positive bacteria found in milk and on human skin.154,155
The enterobactin biosynthetic pathway in E. coli is a well-researched example of a non-linear NRPS assembly line. The enterobactin gene cluster encodes six functional enzymes EntA-F.156,157 The formation of 2,3-dihydroxybenzoic acid (DHBA) requires the activities of EntC, EntB and EntA.156 The three core NRPS enzymes EntE, EntB and EntF comprise the two-module NRPS assembly line, which is responsible for the iterative condensation of three DHBA and three L-serine residues to produce the final iron-chelating product enterobactin.158
EntE is an AMP ligase that activates DHBA through adenylation. EntB, consisting of two domains (C-PCP), accepts the activated dihydroxybenzoate (DHB) and loads this aryl acid on its PCP domain. EntF catalyses the final step in the pathway and comprises four domains (A-C-PCP–TE). In this protein, the A domain initially adenylates a L-Ser residue and transfers it to the adjacent PCP domain. The subsequent C domain catalyses amide bond formation between the 2,3-DHB bound to the upstream EntB Ar-CP domain and the PCP-bound L-Ser residue to generate DHBA-Ser-S-PCP in EntF. The EntF C domain has been demonstrated to possess specificity for L-Ser residue as the downstream (acceptor) substrate.159,160
Cyclotrimerization is required to create the trilactone backbone of enterobactin, which is performed by the final two domains of EntF (PCP–TE) via sequentially transferring oligomer intermediates between the 4′-PPant arm of the PCP domain and the active site serine residue of the TE domain (Scheme 7). The EntF TE domain therefore fulfils roles as both a terminal domain for acyl transfer from the PCP domain and an extended catalyst for enterobactin trimerization. Through three cycles of iterative (2,3-dihydroxybenzoyl)-L-serine (DHB-Ser) synthesis, three DHB-Ser moieties are connected whilst remaining linked to the TE domain by ester bond. Finally, the linear DHB-Ser trimer is cyclized and released from TE domain (Scheme 7).160,161 Structurally, EntF TE is similar to homologous thioesterase domains, such as AB3403-TE, Vlm-TE, FenTE and SrfTE.162 Using solution phase nuclear magnetic resonance (NMR) spectroscopy, Frueh et al. have also successfully elucidated the structure of the apo-PCP–TE didomain (PDB: 2ROQ, 37 kDa) of the enterobactin NRPS synthetase EntF (vide infra).69
Even though some factors that impact the cyclization and hydrolysis of TE domain mechanism have been investigated for NRPSs, the fundamental structural basis for both iteration and cyclization remain elusive. Whilst multiple sequence alignment of triscatechol siderophore TE domains provides possible clues, the exact mechanism (iteration, hydrolysis, and cyclization) awaits further elucidation.163 Structurally, the conserved Ser-His-Asp catalytic triad (Ser1138, His1271, Asp1165) is vital for EntF-TE thioesterase activity. Ser1138 acts as the nucleophile that binds DHB-Ser precursors. Mutation of Ser1138 to Ala results in complete disruption of substrate oligomerization. Mutation of His1271 to Ala can still generate enterobactin but with an approximately 10000-fold decrease in yield.158,164 Interestingly, the residue Pro1073 is conserved in the EntF and other PCP–TE homologues, and it was recognized to play important role of stabilizing the oxyanion hole in these structures (Fig. 9).165
Iterative biosynthesis facilitates the repeated use of modules and necessitates a “waiting room” for reaction intermediates.164 According to solution NMR and protein interaction analysis, the catalytic pocket formed by PCP–TE di-domains is dynamic. The lid region of the EntF holo-PCP–TE (PDB 3TEJ) domain needs to fluctuate to accommodate various intermediates in the hydrophobic pocket and to remain open to the 4′-PPant arm of the corresponding PCP domain bearing the tripeptide.69 In addition, the process of cyclotrimerization, which is facilitated by the EntF TE, presumably involves an interaction between the two domains, allowing the structure bound to the PCP domain to be transferred into the substrate pocket of the TE domain.165
Protein–protein interactions are also hypothesized to influence the catalytic function of TE domains from specific systems. The Cryle group reported the characterization of the TE domain from the teicoplanin NRPS.51 Based on the results of a series PCP-substrate cleavage activity assays, they found that the activity of the Tcp12-TE domain is dependent on the presence of an unusuallylong N-terminal linker region, which appears to be exclusive to the NRPS machinery seen in GPA biosynthesis. In addition, the longer Tcp12-TE (L-TE) domain of the teicoplanin NRPS has a strong selectivity for the PCP-bound peptide in its optimal cross-linking condition,51 indicating that the TE in GPA biosynthesis acts as a logic gate for the complex X-domain/P450 mediated peptide cyclisation cascade.126,166,167
TE domains are relatively flexible and possess significant conformational diversity. To trap a PCP complex with a TE, Bruner and colleagues utilised the chemical probe α-chloroacetyl-amino-coenzyme A, which is a mimic of the natural thioester substrate targeting the Ser residue in the catalytic triad (Ser1138, His1271, and Asp1165) of TE domains (Fig. 9). After bioconjugation with the PCP domain via the activities of a phosphopantetheinyl transferase, the probe allowed a stable complex of the PCP–TE didomain protein to be trapped, with the resultant structure demonstrating the interaction between the TE and PCP domains.165,168 The Gulick lab revealed that the EntF-TE domain also exhibits significant conformational mobility, as it can adopt multiple positions with respect to other domains based on the analysis of negative-stain electron microscopy.162,169 Based on these structures, it is postulated that the mature PCP domain-bound peptide chain can be transferred to the TE domain via rotation of the linker attached to PCP.162
The total synthesis of enterobactin was first reported in 1977.170 Shanzer et al. developed a synthetic route to the enterobactin cyclic trilactone backbone (4) using an organotin template. This method required fewsynthetic steps, but delivered a relatively low yield of cyclic trilactone (23%).171 In 1997, Ramirez et al. designed a strategy to improve the synthetic yield of enterobactin. Using methyl N-triphenyl-methyl-L-serinate (5) as starting material and a variation on Shanzer's organotin template, they also synthesized the cyclic trilactone (4) intermediate with a much-improved yield (81%) (Scheme 8).172
Through reconstitution and heterologous expression, it was demonstrated that the NRPS enzyme AebF catalyses amide bond formation and cyclization of the tetralactone backbone. The C domain in this single module NRPS performs bifunctional condensation catalytic activity (Scheme 9). The long-chain fatty acid CoA ligase (FACL) AebG is required to activate the fatty acid as a fatty acid CoA thioester. In contrast to the trilactone backbone of enterobactin, the AebF-TE can accommodate the larger 16-member tetralactone backbone of amphi-enterobactin.164,173 Based on the “waiting room” model, the AebF-TE serves as a temporary attachment site for L-Ser-DHBA during amphi-enterobactin assembly.
Many studies have reported the genes related to vicibactin biosynthesis and proposed the steps of biosynthetic pathway.180,181 The biosynthesis of vicibactin produced by the symbiotic bacterium Rhizobium leguminosarum involves a cluster of eight genes: vbsG, vbsS, vbsO, vbsA, vbsD, vbsL, vbsC and vbsP.163 Among them, VbsS, a NRPS enzyme (C-A-PCP–TE) plays a crucial role in the assembly of the vicibactin scaffold. VbsS activates L-N5-hydroxy-ornithine and attaches it to the PCP domain via covalent attachment as an acyl thioester. Further, the assembly of vicibactin siderophore involves acylation catalyzed by VbsA, followed by epimerization and acetylation mediated by VbsL. In the last step, trimerization and cyclization are mediated by the TE domain of VbsS (Scheme 11). The cyclotrimerization reaction, catalyzed by VbsS, is similar to that catalyzed by EntF in the biosynthesis of enterobactin.181
The siderophore synthetase DesD, a member of the NIS synthetase superfamily, catalyses the biosynthesis of desferrioxamines in Streptomyces coelicolor M145. DesD iteratively catalyses the ATP-dependent condensation of the repeating units of N-hydroxy-N-succinyl-cadaverine to form the dimer or trimer.182,183 In the biosynthesis of bisucaberin, the BibC multidomain enzyme from Vibrio salmonicida has been identified as catalyzing the macrocyclization of the monomer N-hydroxy-N-succinyl-cadaverine to produce bisucaberin.184 The closely related analogue PubC catalyzes specific macrocyclization and cyclotrimerization in putrebactin biosynthesis (Scheme 12).185
Cheng et al. first reported the complete biosynthetic gene cluster of valinomycin (termed as vlm) in Streptomyces tsusimaensis ATCC 15141.191 The vlm gene cluster encode two NRPS enzymes: Vlm1 and Vlm2. Vlm1 (370 kDa) includes two modules: A1-KR1-PCP1-C2-A2-PCP2-E2, and Vlm2 (284 kDa) includes another two modules: C3-A3-KR3-PCP3-C4-A4-PCP4-TE, a total of 16 domains constituting 4 modules, each of which is responsible for the incorporation of the building blocks D-α-hydroxyisovaleric acid (D-Hiv), D-valine (D-Val), L-lactate (L-Lac), L-valine (L-Val). The NRPS domain system is homologous to the assembly of related cyclopeptide cereulide (Scheme 13).192
In the valinomycin biosynthesis pathway, the tetrapeptide precursor is synthesised three times, with each monomer synthesis halting at the TE domain. The Vlm2 TE domain catalyses the final oligomerization, cyclization and the release of valinomycin (Scheme 13).193 Montanastatin was discovered as a shunt product of valinomycin. These molecules both comprise the same tetradepsipeptide basic unit, but montanastatin's basic unit only repeats twice, while valinomycin's basic unit repeats three times. Studies have shown that both valinomycin and montanastatin follow the same biosynthetic pathway, which implies that the TE domain loosely controls the basic unit repetition in the valinomycin system.191 It remains unclear how the TE domain can regulate the number of iterations during cyclo-oligomerization.
In 2023, Konno et al. created a substrate peptide-based p-nitrophenyl phosphonate that forms a stable covalent complex with recombinant TycC TE. This method is useful for comprehending substrate identification and substrate-TE interaction during macrocyclization.194 Similarly, recent work by the groups of Schmeing and Chin has provided further structural insights into TE mechanisms, focusing on the Vlm2 TE. By employing genetic code expansion to install the amine-containing analogue 2,3-diaminopropionic acid (Dap) in the active site, they were able to stabilize and trap acyl-TE intermediates. Using this method, the first and last acyl-TE intermediates in the catalytic cycle as Dap conjugates were trapped and resolved, which provides structural insight into the conformational changes in this TE domain that control the oligomerization and cyclization of linear substrates. Moreover, this Dap substitution strategy can be widely used for protein–ligand related characterization or assays. According to this study, the Vlm TE (PDB: 6ECE) lid region could contribute by changing the shape of the depsipeptide chain to encourage intramolecular cyclization. By comparing the protein structure between tetradepsipeptidyl-TEDAP (6ECD) and the dodecadepsipeptidyl-TEDAP (6ECE and 6ECF), significant conformational changes could be observed in the lid region (α4–α8) (Fig. 12). It was proposed that this lid rearrangement is necessary for reorganizing the substrate during the cyclo-oligomerization.70 In addition, by capturing the key intermediate (octadepsipeptidyl-SNAC), their investigation confirmed that Vlm TE mediates peptide oligomerization in the “reverse transfer” mode (Scheme 13), and other NRPS-TEs would also be expected to catalyse the peptide ring dimerization in this manner.
Shemyakin et al. first reported the chemical synthesis of valinomycin in 1963. The open-chain depsipeptide was synthesized using SPPS, which was subsequently cyclized into valinomycin using acid chloride after being cleaved from the resin. Many other reported syntheticroutes follow essentially the same strategy.188,195–197 In 2020, Li and colleagues developed a new biosynthetic method of valinomycin, in which they used a cell-free protein synthesis (CFPS) system to express the entire valinomycin BGC, giving rise to about 37 g L−1 of valinomycin. This demonstrated for the first time that a single-pot CFPS reaction could enable the complete biosynthesis of a complex nonribosomal peptide.198
Cereulide consists of three monomers each comprising the four amino acids D-O-Leu, D-Ala, L-O-Val and L-Val. The BGC of cereulides and isocereulides have been characterized as the ces NRPS from Bacillus cereus.204–206 The cereulide synthetase gene cluster (ces, 24 kb) contains 7 genes: cesH, cesP, cesT, cesA, cesB, cesC and cesD. The biosynthesis of cereulide is closely controlled via post-transcriptional and post-translational regulation.207 The NRPSs CesA and CesB complete the assembly of three repeating tetrapeptide motifs (D-O-Leu-D-Ala-L-O-Val-L-Val) as the cereulide backbone. In addition, CesB2 is a termination module containing the TE domain, which releases the mature nonribosomal peptide by cyclization (Scheme 14).191,208 CesB2 catalyzes the trimerization and macrocyclization of tetrapeptide substrates to finally generate cereulide. Biochemical studies have shown that this TE domain is selective for the nucleophilicity of hydrolyzed deaminopeptidyl-TE intermediates.209
Bisintercalator depsipeptides feature a C2-symmetric macrocyclic scaffold, which is a cyclic structure formed by various amino acids, including two identical planar bicyclic heteroaromatic chromophore moieties that are found at opposite ends of this symmetric peptide scaffold. These chromophore units are the first moieties incorporated during the assembly of these by NRPSs, and is also the reason why these compounds are known as bisintercalators (Fig. 13).211 The chromophore subunits employed in bisintercalators are 3-hydroxy-quinaldic acid (3HQA) and quinoxaline-2-carboxylic acid (QXCA).
This general structure enables bisintercalators to position the two chromophores on the peptide scaffold and in such a manner that they interact deeply with the minor groove of DNA.212 This structure leads to the unwinding and extension of the DNA helix, causing changes in a variety of cellular processes related to replication and transcription and that ultimately result in the death of the affected cells.213
Many of these compounds (thiocoraline, triostin, SW-163 and echinomycin) share high degrees of similarity in terms of chromophores and the number and type of amino acids, which suggests that they originate from similar biosynthetic pathways.214
Echinomycin B, C, D and E have been identified as being produced by the same assembly line, and occur as a consequence of the alternate incorporation of a variety of branched amino acids such as isoleucine indicating a relaxed specificity for some A-domains within the NRPS enzymatic assembly line.218,219 Echinomycin (Quinomycin A) is the most widely studied member of the quinoxaline/quinoline antibiotic group and comprises a C2-symmetric scaffold containing the cross-bridged cyclic octapeptide dilactone core linked by twin quinoxaline/quinoline 2-carbonyl chromophores. Like other bisintercalators, echinomycins show significant inhibitory against cancer stem cells and hypoxia-inducible factor-1 (HIF-1) by inhibition of DNA replication and RNA synthesis.220
The quinomycin biosynthetic gene cluster encodes the enzymes that perform its biosynthesis in two key steps. The first is the synthesis of quinoxalin-2-carboxylic acid (QXCA), which involves not only the secondary metabolic pathway but also the ACP protein from fatty acid metabolism.221–223 Next, the assembly of the echinomycins is performed by an NRPS (Ecm6, Ecm7) that iteratively catalyses the assembly of the linear tetrapeptide peptide precursor. Ecm1 and FabC assist the loading of the chromophore quinoxaline-2-carboxylic acid. The echinomycin TE domain (Ecm TE), located in the final module of Ecm7, is responsible for the dimerization, final cyclization and release of echinomycin (Scheme 15).224
Echinomycin and triostin A possess an identical cyclic depsipeptide core that contains C2 symmetry.225 Genome sequencing has demonstrated that triostin A and echinomycin are produced by the same BGC, with the thioacetal bridge-forming enzyme encoded by Ecm18 being responsible for the conversion of triostin A to echinomycin (Scheme 15).222
In the Ecm-TE-mediated cyclization, the reaction has been shown to be limited by the hydrolysis of the product.226,227 Koketsu and colleagues designed a DNA-binding strategy to increase the catalytic efficiency of Ecm TE-catalyzed cyclization.228 This was based on the binding preference of tetrapeptide unit products to a specific DNA sequence, which can inhibit unnecessary product hydrolysis and significantly improve the yield of cyclization products.225
Koketsu et al. synthesized series octapeptide substrates with various substituted amino acid residues and chromophores. Use of these in biochemical assays suggests that the Ecm TE has a relatively broad substrate specificity, and further that the excised Ecm7 TE domain can also catalyse the dimerization of the tetrapeptidyl-SNAC substrates, generating Triostin A.228 In addition, these assays showed that the Ecm TE mediates the cyclization of pyrazine ring-containing analogues at a slower rate (Kcat/Km = 0.046 mM−1 min−1) compared to a higher rate (Kcat/Km = 3.87 mM−1min−1) for the naphthalene ring-containing analogue. This suggests that the ring size of the chromophore has a major influence on substrates recognition by the Ecm TE domain.
Triostin A, by virtue of being a symmetric bicyclic depsipeptide, is an easier synthetic target than echinomycin and a variety of total synthesis methods in solid or liquid phases have been developed to generate triostin A.229–232 In all of these methods, the amino acids are first linked into linear tetradepsipeptides by SPPS, which is followed by a double cyclisation reaction in which the two tetrapeptide chains are first linked to form a linear octapeptide, and then macrocyclization is carried out with a catalyst to obtain the cyclodimeric peptide skeleton.
Sable et al. reported a different method for triostin A synthesis. Here, allyl and alloc-protecting groups were used in the generation of the linear tetradepsipeptides by SPPS, which were then removed using [Pd(PPh3)4] and PhSiH3 in CH2Cl2. Cyclisation was then achieved using N,N′-diisopropylcarbodiimide (DIC)/HOAt in a DMSO/DMF (3:2) solvent mixture to obtain the cyclized intermediate 6, which was cleaved from the resin by treatment with TFA/TIS/CH2Cl2 (10:5:85) in 4.0% overall yield (Scheme 16A). In an alternative route, the 2-quinoxaline carboxylic acid was coupled to the peptide after the cyclization. The desired product was then isolated in 9.1% overall yield (Scheme 16B).230
Nagasawa et al. developed a 13-step solution-phase synthesis of triostin A, with cyclization achieved through a macrocyclization reaction. Using iodine for oxidative deprotection of the Bam group on the linear octapeptide forms a disulfide bond, and subsequently, when mixed with EDCI/HOAt at high dilution (0.001 M), the anticipated macrolide 7 was formed with a yield of 48%. Triostin A was then synthesised by removing the Cbz groups on 7 with thioanisole/TFA. This was followed by coupling two 2-quinoxaline carboxylic acid residues, which gave a 17.5% yield of the desired product (Scheme 16C).231
The presence of the thioacetal bridge makes the synthesis of echinomycin and quinomycin more challenging and delayed the first complete synthesis of echinomycin until 2020. In their synthesis, the Ichikawa group prepared the ester 8 and sulfide 9 as key precursors based on their retrosythetic analysis. Simultaneous cyclization and two-directional peptide chain elongation were used to create the C2-symmetrical bicyclic octadecadepsipeptide.233 The sulfur of 10 was oxidized to sulfoxide with m-CPBA, and the sulfoxide was converted into the corresponding chloride using AcCl through the Pummerer rearrangement. Finally, using the nucleophile TMSSMe and ZnCl2 the methylthio group was introduced to generate echinomycin (Scheme 17).233
SW-163C-E (Fig. 13) is produced by Streptomyces sp., and its biosynthetic pathway is similar to triostin A, thiocoraline and echinomycin. The SW-163 BGC was first annotated in Streptomyces sp. SNA15896, where fifteen genes were shown to be involved in the biosynthesis of SW-163s.223 Amongst these, two NRPSs (Swb16, Swb17) are involved in the assembly of the tetrapeptide chain and the final cyclisation to generate WS-163C. The SAM-dependent methyltransferases Swb8 and radical SAM protein Swb9 are responsible for the conversion of SW-163C to SW-163D and SW-163E-G, respectively.138 In contrast with echinomycin, whose chromophore is quinoxaline, quinoline is found in SW-163-type compounds. In addition to the three common amino acids in the tetrapeptide chain of SW-163-type compounds, they also contain the rare amino acid N-Me-norcoronamic acid.234
The solution-phase total synthesis of sandramycin and its analogues was first completed in 1996 and used a strategy similar to the synthesis of other bisintercalators such as quinomycin. The symmetrical pentadepsipeptides were synthesized first, followed by coupling and macrocyclization of a 32-membered decadepsipeptide at a single secondary amide site to afford the target compound.236
Unlike the sequential peptide coupling approach, Ichikawa et al. recently performed the total synthesis of sandramycin through an Ugi three-component (11, 12, 13) reaction as the key step to obtain a linear pentadepsipeptide 14. Sandramycin was generated by the stepwise coupling of two pentapeptide monomers, with subsequent introduction of the quinaldin chromophores (Scheme 18).237 Chemical synthesis methods mostly use C2 symmetry to synthesize the dimerized backbone of sandramycin, which limits derivatives to those containing C2 symmetry. In 2023, the Ichikawa group further optimized the SPPS approach based on the previous coupling strategy and could synthesize desymmetrized analogues of sandramycin (Scheme 18).238
The Du group identified the biosynthesis gene cluster of luzopeptin A (luz, 48 kb) from Actinomadura luzonensis DSM 43766 and its analogue korkormicins from Micromonospora sp. ATCC 55011 (Scheme 19). Two NRPS enzymes Luz1 and Luz2 are responsible for the assembly of the peptide using the serine, tetrahydropyridazine-3-carboxylic acid (Thp), glycine (2x) and threonine units. The TE domain of Luz2 performs the final release of the product via head-to-tail cyclization (Scheme 19).241 After release, a series post-NRPS modifications are required for the generation of mature luzopeptins. The tailoring enzymes Luz25 (cytochrome P450), Luz26 (cytochrome P450) and Luz27 (membrane acyltransferase) are responsible for the modification of the cyclic peptide to generate the non-proteinogenic amino acid residues β-OH-N-methyl-L-Val and the acyl-substituted tetrahydropyridazine-3-carboxylic acid (Thp), respectively.241
As with the synthesis of sandramycin, the synthesis of the luzopeptins commences with the preparation of the linear monomer decadepsipeptide, which is achieved through the coupling of pentadepsipeptides 15 and 16, mediated by EDCI-HOAt (CH2Cl2, 0 °C, 2 h). By using transfer hydrogenolysis (25% aqueous HCO2NH4, 10% Pd/C, EtOH/H2O), the FMOC group and benzyl ester were cleaved, with macrocyclization (EDCI-HOAt, CH2Cl2, 0 °C, 2 h) providing the 32-membered cyclic decadepsipeptide, and finally generating the luzopeptins (Scheme 20).240
TioR and TioS are the NRPS enzymes involved in thiocoraline biosynthesis, with TioR comprising two modules (C1-A1-PCP1-E1-C2-A2-PCP2) and TioS two modules (C3-A3-M3-PCP3-C4-A4-M4-PCP4-TE).247 Studies suggest that 3HQA, which is activated by TioJ, is not directly loaded onto the NRPS module but rather onto the fatty acid synthase ACP domain FabC located outside the gene cluster; this phenomenon is similar to that found in the biosynthesis route of echinomycin (Scheme 22).221,248
The TioS TE domain is the first NRPS-derived thioesterase reported to be capable of catalysing the formation of macrothiolactone and macrolactone functionalities in an iterative fashion, which is typical for the assembly of quinoline or quinoxaline carrier compounds. Robbel et al. used TioS PCP–TE to synthesise thiocoraline analogues and showed that the TioS PCP–TE could catalyse the attachment and subsequent cyclisation of tetrapeptide sulphate substrates. By expressing the TE domain of thiocoraline independently and utilizing SNAC-thiophenol mimics, it was also possible to synthesize a range of thiocoraline analogues in vitro using various activated tetrapeptides. These studies highlighted the importance of D-stereochemistry for specific residues to promote cyclization, as opposed to hydrolysis.249 In addition, substrate specificity studies using the TioS PCP–TE showed that the spatial requirement and polarity of the C-terminal amino acid is essential for substrate recognition, ligation and macrocyclization by this TE domain.
Boger and coworkers have reported the total synthesis of thiocoraline using a convergent strategy to prepare the precursor tetradepsipeptide. After deprotection of the amine and carboxylic acid, the key intermediate octadepsipeptide 19 was generated through the coupling of monomer tetradepsipeptides 17 and 18. The ring was closed through a second round of BOC deprotection, Tce ester deprotection and amide formation. 3-Hydroxyquinoline-2-carboxylic acid residues were attached to the scaffold after removing the Cdz-protecting group using TFA-thioanisole at 25 °C for 4 h (Scheme 23).250 Other thiocoraline analogues have been synthesized using a similar strategy.251
The five genes tdiA–tdiE were identified in Aspergillus nidulans and annotated as the biosynthetic gene cluster of terrequinone A. Walsh and colleagues reconstituted all five proteins (TdiA–TdiE) in E. coli and in doing so achieved the production of terrequinone A.258 The bisindolylbenzoquinone backbone is constructed by indole pyruvic acid (IPA) through dimerization, with the initial substrate IPA derived from tryptophan through the actions of the PLP-dependent transaminase TdiD.
In this pathway, TdiA comprises a single-module NRPSs, which contains three domains responsible for adenylation, substrate tethering and thioester cleavage. The single-module TdiA ends with a thioesterase (TE) domain. Biochemical investigations have suggested that the TdiA TE domain facilitates an intramolecular Claisen condensation reaction, resulting in the symmetric cyclization of two IPA molecules to generate the final product (Scheme 24). The TdiA TE domain can therefore facilitate the formation of carbon–carbon bonds. The mutagenesis of S774A, located within the conserved GXSXGG motif in the TE domain of TdiA, prevented the generation of didemethylasterriquinone D from IPA. This result provides evidence that the TE domain plays a crucial role in facilitating the cyclization of IPA monomers.258
Dimeric polyketides exhibit a wide range of distinct structural characteristics and have a significant potential to form several new medicinally relevant molecules.11,262,263 Their applications include immunosuppressive drugs, antibiotics, herbicides, fungicides, and active medications against cancer, HIV.264,265 Dimeric polyketides are mainly generated through a dimerization process, including radical reactions, cycloadditions, esterification, and acetal formation.33
Macrodiolides are compounds primarily synthesized via the PKS or hybrid PKS/NRPS pathway and feature a symmetrical or unsymmetrical polyketide scaffold. In PKSs assembly lines, the TE domain is responsible for iteratively synthesising two monomeric precursors before linking these together in a head-to-tail orientation, thus creating a dimer product which serves as the molecular scaffold for the cyclic diolide structure. In this section, we will mostly examine and review polyketides involving macrocyclization in their biosynthesis, with particular interested in considering how the TE domain functions in these processes.
Due to its structural complexity and promising biological activity, several groups have completed the synthesis of elaiophylin or aglycon derivatives.273–275 The challenges in such syntheses were the low selectivity of stereospecificity during aldol condensation. Evans et al. developed a cyclic protecting group to restrict the conformational flexibility of the ketone and thus gaining high aldol diastereoselectivity. They first synthesized the hydroxy acid 20 in five steps with carboximide as the starting compound, with cyclodimerization of 20 preformed using a modified Yamaguchi's macrocyclization strategy (Scheme 26). The dimer 21 was converted to 23 in two steps ready for crucial aldol coupling. Dialdehyde 23 was treated with the chlorophenylboryl enolate of protected ethyl ketone 24, providing the bis-aldol adduct 25 as the only detectable product. Finally, deprotection and subsequent cyclization of 25 completed the synthesis of the elaiophylin aglycon elaiolide (Scheme 26).276
In 2004, Haydock et al. identified the putative biosynthetic gene cluster of elaiophylin from the elaiophylin-producing strain Streptomyces sp. DSM4137, which was found to contain five PKS genes encoding eight modules, consistent with the assembly of the elaiophylin polyketide chain. The C-terminal TE domain in Ela5 (Ela-TE) was shown to catalyse polyketide chain release and intramolecular lactonization to generate the matured C2 symmetrical macrodiolide (Scheme 27).277
To study the mechanism and selectivity of macrodiolide formation, Leadlay et al. attempted to reconstitute the in vitro activity of the TE domain. Two alternative mechanisms for the formation of symmetrical diolide was proposed, similar to the cyclization steps seen in nonribosomal peptide synthetases. They synthesized SNAC thioesters of tetraketide 26 and pentaketide 27 as substrate mimics, demonstrating that incubation of pentaketide 27 with Ela-TE formed a homodimerized 16-membered decaketide diolide 28. The intermediate 29, a linear dimer linked with SNAC, and hydrolysis product 30 were also detected in these Ela-TE in vitro assays. Ela-TE was also shown to catalyze the conversion of the purified intermediate 29 into the final product 28. When pentaketide 27 and tetraketide 26 were mixed with Ela-TE, the C2-synmmetric product 28, together with a novel “hybrid” macrodiolide 31, could be observed, demonstrating that Ela-TE exhibits substrate flexibility. These results further confirm that the elaiophylin TE iteratively catalyzes two acylation and two deacylation reactions to form the diolide (Scheme 28).278
Interestingly, a recent study has revealed that the biosynthetic pathway of elaiophylin also generates a non-dimerized product known as pteridic acid. This compound exhibits entirely different biological functions from elaiophylin, as it enhances plants' resistance to abiotic stressors. The biosynthesis of pteridic acid also relies on TE enzymes (Scheme 29), possibly due to the broad substrate specificity and catalytic versatility of TE domain.279
6-Deoxyerythronolide B synthase (DEBS) is a modular PKS that catalyses the biosynthesis of 6-deoxyerythronolide B, the macrocyclic core of the antibiotic erythromycin.280 The DEBS assembly line generate more than 50 erythromycin derivatives using combinatorial biosynthesis, which suggests promiscuity of the corresponding TE domain towards different substrates.98,281 Boddy et al. reported that the incubation of the DEBS-TE with the non-natural SNAC thioester mimic 32 generated the 14-membered macrolactone and the hydrolysis product as major products but also minor amounts of C2-symmetrical, head-to-tail dimerized macrodiolide (Scheme 30).53 The observation of linear dimer and the 28-membered macrodiolide is consistent with the reactions catalyzed by Ela-TE.
In 2015, Leadlay et al. sequenced the conglobatin producer Streptomyces conglobatus and elucidated its biosynthetic gene cluster, suggesting conglobatin was biosynthesized by a PKS-NRPS hybrid. Based on this biosynthetic gene cluster, they reconstituted conglobatin macrodiolide in vitro.264 SNAC thioester 33 was synthesized as a substrate, which when incubated with Cong-TE was converted into the liner dimer 34 and conglobatin. Time course assays showed the production level of conglobatin increased continuously whilst 34 reached a plateau after 2 hours. This suggested that 34 could be an intermediate in diolide formation, which was then demonstrated in the conversion of purified 34 by the Cong-TE into the final product conglobatin (Scheme 31).
In an attempt to generate hybrid polyketides with Cong-TE, SNAC thioesters 35 and 36 previously synthesized for use with the elaiophylin TE were also tested as substrates. These experiments demonstrated that when the substrates 34 and 35/36 were co-incubated with Cong-TE, liner dimeric or trimeric products and their corresponding carboxylic acid could be detected. These results reveal significant subtleties exist in the iterative mechanism controlled by macrodiolide TE domains (Scheme 32).
The soil bacterium Streptomyces pactum ATCC 27456 is the producer of many secondary metabolites, including conglobatin, aromatic polyketide NFAT-133 and pactamycin. Several new NFAT-133 derivatives were identified from a mutant of S. pactum, with structural elucidation confirming two of these (TM-127 and TM-128) as hybrids of NFAT-133 and conglobatin (Scheme 33). In vivo gene inactivation confirmed that both biosynthetic gene clusters of NFAT-133 and conglobatin are necessary to produce the hybrids TM-127 and TM-128. The Cong-TE from S. pactum ATCC 27456 was then purified and assayed with NFAT-133 and the SNAC thioester of the conglobatin monomer; compound TM-127 was successfully formed in this assay. In contrast, the assays using inactivated Cong-TE or without Cong-TE were unable to generate TM-127. When Cong-TE was incubated with NFAT-133 and the conglobatin monomer, TM-127 was not produced, demonstrating that both ACP and TE domains are required for such hybrid formation, and further shows that Cong-TE displays broad substrate scope.290
Recently, Zhou et al. isolated a series of compounds 38–43 from gene-inactivated mutants of the conglobatin producer S. conglobatus ATCC 31005. Two of these compounds (38/39) are new oxazole-containing esters, whilst 40 is a biosynthetic precursor of 38 and 39 (Fig. 17). Inspired by the formation of 38/39, together with TM-127/TM-128 and considering the substrate scope of Cong-TE, 69 alcohol-containing compounds bearing benzylic, allylic, propargyl, primary, secondary, tertiary, and sulfhydryl hydroxyl groups were tested as substrates for Cong-TE, which was able to form 43 new hybrid esters. When the 43 alcohols that had been accepted by the Cong-TE were individually fed to the strain, 36 of these were successfully incorporated in the Cong NRPS-PKS to yield corresponding oxazole containing esters 44in vivo (Fig. 18).291
Samroiyotmycin A, a C2-symmetric macro diolide displaying activities against the MDR malarial strain P. falciparum K1 and lung carcinoma cell line NCI-H187, was isolated from crude extracts of Streptomyces sp. BCC33756.292 Despite no investigations into the biosynthesis of the samroiyotmycins to date, the dimerization process appears similar to that found in conglobatin biosynthesis. In 2021, Hulme et al. completed the first total synthesis of samroiyotmycin A, not relying on ester-mediated dimerization but rather a one-pot alkyne cross metathesis–ring-closing metathesis (ACM–RCAM) reaction to generate the desired 20-membered macrocyclic framework (Scheme 34).293
This reaction employed two molybdenum alkylidynes (47a/b) endowed with a privileged tripodal silanolate ligand. Treatment of monomer 45 with the complex 47a provided the dimer 46 in good yield by raising the temperature. Under these conditions, higher-order oligomers were also found to be formed via concentration-induced polymerization, and the yield was significantly improved when using a more dilute solution of 45. Treatment of 45 with complex 47b gave full conversion to the desired product 39, but purification of 46 was hindered by fragmentation of the catalyst ligands. Hydrostannation of 46 catalyzed by ruthenium catalyst 48 and subsequent protodestannation in the presence of copper(I) salt 49 led to successful synthesis of the desired (5E,21E) isomer samroiyotmycin A as the major stereoisomer.
Disorazoles are biosynthesized by a trans-AT PKS-NRPS hybrid pathway, with the trans-acyltransferase encoded by a separate gene, dszD(disD). Analysis of the domain organization of this PKS/NRPS reveals that DszC(DisC) contains one NRPS module, utilizing serine to form the oxazole, and two non-extending PKS modules located between NRPS and the TE domain. It has been proposed that a monomer is transferred to the serine residue of the TE domain from the acyl carrier protein before being dimerized with the other monomer loaded on the adjacent acyl carrier protein. Since both non-extending domains contain ACP domains along with the NRPS module, which contains a PCP domain, it is unclear which carrier proteins are involved in the dimerization of monomers (Scheme 35).297
In 2004, Wipf and Graham reported the first asymmetric total synthesis of disorazole C1.298 The dimer was separated into four segments and both segments 50 and 51 were utilized twice for the convergent synthesis. Sonogashira cross-coupling of 50 and 51 formed the protected monomer 52 and subsequent acylation and a second Sonogashira coupling afforded seco-disorazole C1. Selective mono-saponification of 54, followed by a Yamaguchi lactonization provided macrocycle 55. Finally, double alkyne reduction with Lindlar catalyst afforded disorazole C1 (Scheme 36).
Nicolaou et al. also employed a convergent strategy to synthesize disorazoles A1 and B1.299 Monomers 56, 57 and 58 were constructed by the connection of different building blocks through a Wittig reaction, Suzuki coupling and Stille coupling, with a Sharpless asymmetric epoxidation exploited to synthesize the epoxy vinyl moiety in 57 and 58. Condensation of 56 and 57 by a Yamaguchi esterification provided linear diester 59. Removal of TMS from 59, followed by a Yamaguchi macrolactonization afforded the final product disorazole A1. Using the same strategy, the epoxy monomers 56 and 58 were used to synthesize the C2-symmetrical disorazole B1 (Scheme 37).
In 2012, He et al. reported the biosynthetic gene cluster of quartromicins from the producer A. orientalis No. Q427-8. The QmnA1-QmnA2-QmnA3 PKS assembly is responsible for the synthesis of two polyketide chains through a ‘module skipping’ pathway, after which 3-oxoacyl-ACP synthase III QmnD5 catalyzes the condensation of a glycerate unit and the mature polyketide chains to form tetronates 60 and 61.301 Dehydration of 60–61 relies on QmnD3 and QmnD4 to afford 62 and 63.302 Further biosynthetic studies concerning natural spirotetronate compounds have revealed enzymatic [4+2] reactions are used to form the spirotetronate intermediate.303 A similar intramolecular [4+2] route is anticipated as the source of the four chiral spirotetronate centers found in the quartromicins (Scheme 38).
Roush et al. completed the diastereoselective synthesis of the endo- and exo-spirotetronate subunits 66 and 67 of the quartromicins through an enantioselective Diels–Alder reaction of an acyclic (Z)-1,3-diene and partially assigned the stereochemical assignment of quartromicins A3 and D3.304,305 They further synthsised the vertical and horizontal bis-spirotetronate (68) units of quartromicins A3 and D3 (Fig. 21).306 However, due to the structural complexity, the total synthesis of quartromicins has not been completed to date.
Menisporopsin A, 15G256ι and 15G256ω all contain 3-hydroxybutyric acid and 3,5-dihydroxy-7-(β-hydroxypropyl)-benzoic acid moieties, while menisporopsin A also contains an extra 2,4-dihydroxy-6-(2,4-dihydroxy-n-pentyl)-benzoic acid unit. The menisporopsins exhibit a broad spectrum of cytotoxic, antimycobacterial and antimalarial activity.308,309
Concerning their biosynthesis, a 13C-labeling experiment has demonstrated that the pentalactone scaffold of menisporopsin A is assembled by a polyketide synthase, with all the carbons of menisporopsin A derived from acetate as the sole building block.310
Two polyketide synthases, Men1 (HR-PKS) and Men2 (NR-PKS), were identified in Menisporopsis theobromae BCC 4162 and were shown to be responsible for the biosynthesis of menisporopsin derivatives. These two enzymes are involved in the iterative biosynthesis of the polyketide backbone and the subsequent cyclization of the molecule.
The TE domain in Men2 is responsible for the release of the final product from the PKS assembly line (Scheme 39). The esterification and cyclolactonization reactions required for the synthesis of the menisporopsins could therefore reside in this TE domain of the NR-PKS, which is similar to that of the NRPS catalyzing the elongation and cyclization of trilactone in enterobactin biosynthesis and that of modular PKSs catalyzing macrodiolide formation in elaiophylin (from Streptomyces violaceusniger DSM 413721) and conglobatin biosyntheses (from Streptomyces conglobatus).311
The boromycins were discovered from the strains Streptomyces antibioticus ETH 28829 and Streptomyces sp. MA4423 and are notable for being the first natural products found that contain the element boron.318,319 Boromycin analogues Tartrolon A and B were isolated from the culture broth of Sorangium cellulosum So ce 678, which contain a similar boron binding region as is found in boromycin and aplasmomycin. The tartrolons exhibit significant antibacterial and cytotoxic activity.314,320
Feeding experiments with 13C-labeled precursors have demonstrated that the biosynthesis of boromycin and aplasmomycin utilise similar assembly pathways.321,322 Recent studies have revealed the biosynthesis of boromycin-related natural products at the genetic level. Elshahawi et al. first identified the tartrolon BGC (trtA–trtJ) from the strain Teredinibacter turnerae T7901.323 Among the trt cluster, trtDEF comprise three large genes that encode trans acyltransferase (AT) type I PKSs. These three multimodular PKS ORFs contain 11 modules in addition to the loading module. No detailed research concerning the mechanism of macrocyclization in this pathway has yet been performed, with the intra-module TE domain in TrtF postulated to be involved with release and cyclization (Scheme 40).324 It has been hypothesized that natural products related to boromycin undergo cyclization catalysed by a TE domain in a process similar to that observed in disorazole biosynthesis; this is due to their similar dimercyclic scaffold.324
In 2014, Avery et al. completed the chemical synthesis of the symmetrical 36-membered macrodiolide aplasmomycin A via Mukaiyama lactonization strategies. A base-promoted Chan rearrangement method was used to generate the symmetrical 36-membered diolide intermediate during the synthesis. The final product, aplasmomycin A, was generated by biomimetic modification of desboroaplasmomycin A, followed by boration reaction with trimethyl borate. This synthetic strategy is also relevant for the synthesis of boromycin (Scheme 41).325
The pamamycins are group of macrodiolides with antifungal activity isolated from Streptomyces alboniger.329–332 Investigation of S. alboniger IFO 12738 revealed that the derivative pamamycin-607 is the main component in the pamamycin mixture.333,334 The biosynthesis of pamamycin-607 has been investigated by isotopic feeding, which suggested the building blocks of pamamycin-607 were derived from the acetate, propionate, succinate and amino acid (Fig. 24).335
Luzhetskyy et al. recently reported the identification of the nonactin and pamamycin BGCs in S. alboniger DSMZ40043 and S. sp. HKI 118.329 The nonactin BGC contains five KSs-encoding genes, and was characterized from Streptomyces griseus.329 Two KSs, NonJ and NonK, which are highly homologous to the KSs PamJ and PamK in pamamycin biosynthesis, catalyse the C–O condensation reaction of acyl CoA substrates and result in the closure of the macrodiolide ring (Scheme 42).326 Interestingly, L-valine supplementation increased pamamycin production and larger derivatives due to increased availability of CoA thioesters, but negatively affected growth and repressed pamamycin formation in the heterologous host S. albus J1074/R2.334
Scheme 42 The putative key step in the pamamycin biosynthesis.329 |
The total syntheses of the macrotetrolide family antibiotics have been realized through a variety of methods. The key macrocyclic reaction can be performed through macro-lactonization reaction using different catalysts (Scheme 43). Based on retrosynthetic analysis, diverse esterification routes have also been developed to realize the formation of the pamamycin dilactone backbone by coupling two precursor “fragments”.336–342
Scheme 44 The cyclic imine natural products. (A) total synthesis of scytonemide A; (B) the structure of koranimine and scytonemide A. |
Fig. 25 Examples of macrocyclic peptidyl products mediated by the terminal condensation-like (CT) domain. |
In comparison with TE domains, CT domains show a different protein fold and different catalytic mechanism. The catalytic triad in the TE domain transfers the peptidyl intermediate to the active site serine, which then attacks the peptidyl-thioester in an intramolecular reaction to complete cyclization. In contrast, CT-catalyzed cyclization does not generate a covalent adduct with the enzyme. Instead, the catalytic histidine in the CT domain (conserved “HHxxxDG” motif) deprotonates the amine nucleophile, which attacks the thioester carbonyl bound to the PCP domain to cyclize and liberate the product.358 In addition, the bacterial TE-catalyzed reactions differ from the fungal CT domain catalysed reactions in that the TE domain can accept peptidyl-S-N-acetylcysteamine (SNAC) mimics, whereas the CT domain needs specific protein–protein interactions with the upstream PCP domain to be competent for catalysis, requiring a specific PCP domain partner. Gao et al. demonstrated that CT domains only recognized peptide substrates tethered to the native PCP domains, while peptidyl-SNAC and peptidyl-CoA substrates were not cyclized.359 The interactions between the PCP and CT domain seems play a “proofreading mechanism” for the correct peptide cyclization and prevent spontaneous hydrolysis reaction.358
Despite sharing the conserved motif with the canonical C domains, CT domain is placed in a separate evolutionary taxonomy due to functional divergence. The structural basis of the unique macrocyclization activity of CT domain has been identified through the analysis of the structural elements of CT domains compared to canonical C domains.359 Specifically, although the protein sequence of CT domains exhibits low similarity to that of canonical C domains (VibH,360 TycC-C6,361 PCP2-C3 didomain362 and SrfA-C363), its overall fold is strikingly similar to these C domains, heterocyclization (Cy) domain and epimerisation (E) domains. CT domains comprise of two largely separate subdomains, with both N-/C-subdomains arranged in a V-shaped manner and adopting the chloramphenicol-acetyltransferase fold (Fig. 26).
Fig. 26 The overall structure of the CT domain. The H3766, D3770, and D3773 are drawn as green sticks, while the α2 helix of CT domain is coloured in red. |
However, the main structural feature which distinguishes CT domains from classic C domains is the replacement of the N-terminal loop in C domains with a α1 helix in CT domains. The insertion of this α2 helix in the CT domain results in the compaction of the two subdomains of CT domains close to the acceptor site, ultimately blocking this substrate access channel (Fig. 26). This structural dissimilarity from canonical C domains is noteworthy, as it enables CT domain to have distinctive macrocyclization activities.359
In some fungal depsipeptides synthetases, CT domains are also involved in the catalysis of cyclo-oligomerization. Enniatin (cyclic trimer), beauvericin (cyclic trimer), and bassianolide (cyclic tetramer) are representative cyclo-oligomeric depsipeptides (CODs) that are biosynthesized as dimers, trimers, or tetramers through iterative cyclo-oligomerization by CT domains. There are numerous impressive reviews on these types of peptides from fungi.364–367 In this article, we will briefly summarise some of the previously reviewed material and describe some of the defining characteristics of enniatin, beauvericin, bassianolide, and PF1022A. These cyclopeptides are all assembled via the actions of iterative NRPSs (type B), in which individual modules within the NRPS assembly lines are reused for iterative cycles of peptide assembly.368
The enniatins and beauvericin are mycotoxin CODs found in many fungi, such as Fursarium spp., Verticillium hemipterigenum, Halosarpheia sp., and entomopathogen Beuveria bassiana, amongst others, and exhibit a wide range of biological activities, includes antibacterial, insecticidal, and anticancer activity.369–371 Beauvericin, due to its ionophoric properties, exhibits a similar range of diverse biological activities, including antibiotic, antifungal, insecticidal, and cancer cell antiproliferative and antihaptotactic activity.372–374 As a broad-spectrum anthelmintic, the semisynthetic PF1022A derivative emodepside (bis-para morpholino-PF1022A) has even been brought to market.375
The assembly of enniatin, beauvericin, and bassianolide is performed by an NRPS in each pathway. These NRPSs resemble each other, consisting of a di-module (C1-A1-PCP1, C2-A2-MT-PCP2a-PCP2b-C3) responsible for peptide synthesis (Scheme 47). As an example, the bbBeas encodes a BbBEAS nonribosomal peptide synthetase that was isolated from the strain B. bassiana and confirmed to be responsible for the beauvericin biosynthesis by targeted disruption. BbBEAS utilizes D-2-hydroxyisovalerate (D-Hiv) and L-phenylalanine (Phe) for the iterative synthesis of a predicted N-methyl-dipeptidol intermediate and forms the cyclic trimeric ester beauvericin from this intermediate in an unusual recursive process.376
Scheme 47 (A) The gene cluster organization of bassianolide, beauvericin, PF1022A and enniatin, and (B) their biosynthetic pathway. |
Interestingly, these NRPSs uses a specific iterative assembly pathway to form the final cyclic oligomer. It is proposed that A1 and A2 domain adenylate the requisite amino acids (D-α-hydroxycarboxylic acids and various L-amino acids, respectively) and transfer these to PCP1, or PCP2a/PCP2b. PCP2a and PCP2b play alternate roles as acceptors and donor in the condensation process. The Mt-domain catalyses the methylation of the amino acid residue once bound to a PCP domain. The tandem PCP domains in COD synthetases are presumed to store the depsipeptide during the process of elongation (Scheme 47). In this process, the N-terminal C domain (C1) is inactive, while the C2 domain catalyses the condensation of the amino acid acceptor monomers (peptidyl thioester intermediate) attached to PCP2a/PCP2b and the elongated intermediate donors attached to PCP1. The CT domain controls the length of peptide chain, the formation of the ester bond, as well as the macrocyclization and release of the cyclic peptide from the assembly line. As the peptide chain reaches a certain length, the CT domain function shifts from elongation to cyclization, catalysing the attack of the terminal hydroxyl on the thioester, thus releasing the mature cyclopeptide. To date, the exact mechanism of this function switching in CT domains remains enigmatic.192
Recently, Steiniger et al. investigated the combinability of these two types of synthetases (iterative NRPSs and linear NRPSs) and compared the exchange unit variants of four hybrid modules. They demonstrated that the selectivity of the C domain influences the assembly line functionality. These experiments suggested that the PCP-C and C-A domain linker regions significantly influence the iterative peptide assembly process and production titres. Furthermore, they also suggest that swapping parts of the CT domain can be used to uncovered functional aspects of macrocyclization and ring size control in these domains.377 Based on their hybrid synthetase assays, they propose a two-face XU concept for fungal NRPSs engineering, which emphasises the importance of C domain specificity and employs C-A-PCP and A-PCP-C combinatorial engineering methods, which is also potentially transferrable to other NRPS systems from a bioengineering perspective.
PfSYN (350 kDa) is the nonribosomal peptide synthetase that produces PF1022 in an iterative manner.378 PFSYN accepts L-leucine, D-lactate, D-phenyllactate, S-adenosyl-L-methionine (AdoMet) and ATP as substrates to assemble this cyclooctadepsipeptide scaffold.378,379 The Süssmuth group performed a comprehensive investigation into the substrate tolerance of PfSYN, which exhibited broad substrate tolerance including aromatic, heteroaromatic and aliphatic residues. In contrast to the α-hydroxy activating domain of enniatin synthetase (ESYN), this assembly line can activate hydroxy acids derivatives containing para-halogenated phenyl rings and propargyl sidechains.379
Many enniatins have been identified, demonstrating that these cyclodepsipeptide synthetases possess innate flexibility in their substrate specificities and thus can incorporate a wide range of amino and hydroxy acids.370,380 As seen with enniatin biosynthesis, the beauvericin depsipeptide synthetase (BbBEAS, 351 kDa) adopts a similar mechanism,376 although the beauvericin synthetase selectively accepts N-methyl-L-phenylalanine and other aliphatic hydrophobic amino acids as substrates.
In the synthesis of these cyclopeptides the main challenge is the cyclization of the linear tetradepsipeptide precursors. Following solution phase or solid phase peptide synthesis, linear tetradepsipeptide 75 is subjected to Mitsunobu esterification using [DIAD (2 equiv.), PPh3 (3 equiv.), benzene (5 mM)]. These salt-free conditions lead to the isolation of bassianolide, with the yield of tetradepsipeptide 75 macrocyclization increased to 31% through the addition of NaBF4 (Scheme 48).365,381
Verticilide is a 24-membered cyclo-oligomeric depsipeptide consisting of α-hydroxy acid and α-amino acid monomers. It was first isolated from fungus Verticillium sp. FKI-1033 in 2006 by Ōmura et al. Whilst verticilide has no antibiotic activity, it is a candidate for pesticide development.382 In addition, verticilide and its derivatives have been shown to function as an inhibitor of insect RyR, ACAT2, as well as ryanodine receptors.369,383,384 In 2006, Ōmura et al. reported the solution synthesis of verticilide based on Boc/benzyl ester protecting group macrocyclization. After the deprotection of the N- and C-termini and esterification under high dilution, verticilide was generated in a yield of 66% in 13 steps. In this synthesis, PyBrop-mediated N-Me macro-lactamization was employed to construct the acyclic oligomer (Scheme 49).382
In 2012, Hu et al. reported an optimized flow-synthesis method for the cyclic hexadepsipeptide enniatin B. In this synthesis, the linear monomer hexadepsipeptide 76 was firstly constructed by the coupling of amine salts, with EDCI and Ghosez's reagent (1-chloro-N,N,2-trimethyl-1-propenylamine) used for amide and ester bond formation. In contrast with the reported verticilide synthesis, the macrocyclization step of enniatin B was catalysed using Ghosez's reagent, providing an overall yield of enniatin B of 36% (Scheme 49).385,386
Cyclodipeptide synthases (CDPSs) are the main biosynthetic pathways for DKP biosynthesis. CDPSs use intracellular aminoacyl-tRNA (aa-tRNAs) as substrates to catalyze the cyclization of two amino acids.392 CDPS protein family are usually composed of 200 to 300 amino acid residues. In 2009, the AlbC protein derived from Streptomyces noursei was shown to catalyse the synthesis of cyclo-(L-Phe-L-Leu)-the skeleton to albonoursin, becoming the first CDPS protein whose function had been characterised.393 Many symmetrical cyclodipeptides have been reported to be synthesized by CDPS enzymes, such as mycocyclosin biosynthesized (Rv2775),394 pulcherrimin (YvmC),395 and cyclo(L-Trp-L-Trp) (Amir_4627) (Scheme 50).396
Following DKP synthesis, tailoring enzymes encoded in the DKP gene cluster then catalyse modifications of the DKPs to generate diverse structures from the basic cyclic dipeptide skeleton. The post-modified enzymes related to CDPS synthesis that have been reported to include cytochrome P450 monooxygenases (e.g. in mycocyclosin, pulcherrimin and guanitrypmycin),394,395,397 oxidases (in bicyclomycin)398 and methyltransferases (in drimentines),399 amongst other.400 Since there are significant numbers of reviews concerning CDPS biosynthesis, we will not elaborate this pathway further here.392,393,401,402
Beyond the actions of CDPS enzymes, DKPs can also be synthesized via NRPS pathways. Gliovirin, pretrichodermamides, FA2097, aspergillazines, peniciadametizines, acetylaranotin and aspirochlorine are examples of a class of natural products originally derived from the cyclodipeptide cyclo-L-Phe-L-Phe scaffold, which is assembled by an NRPS pathway. The biosynthesis of pretrichodermamide A was recently reported, with this fungal diketopiperazine natural product featuring an α,β-disulfide bridge. Here, the DKP cyclodipeptide backbone of pretrichodermamide A is assembled by the NRPS TdaA (PCP-C-A-PCP-C domain architecture).403,404 The NRPS AtaP from the acetylaranotin pathway, AclP from the aspirochlorine pathway, and Glv21 from the gliovirin pathway all contains a similar NRPS domain architecture (PCP-C-A-PCP-C). These NRPS synthetases are responsible for the assembly of the dipeptide, with the C-terminal CT domain responsible for dimerization and cyclization (Scheme 50). The catalytic core of this domain resembles that of the CT domain in beauvericin biosynthesis.405–407
Indigoidine is a blue pigment found in different types of bacteria and it is a highly effective radical scavenger. Indigoidine exhibits a broad spectrum of medicinal, industrial potential and is biosynthesized through an NRPS pathway,408 where L-Glutamine is the precursor. Indigoidine is assembled by condensing two molecules of L-glutamine by indigoidine synthetase, which is the single-module NPRS BpsA. It comprises of two adenylation (A), one oxidation (Ox), one PCP and one TE domains, with the active holo-BpsA demonstrated to be responsible for the dimerization of two L-glutamine residues to form indigoidine (Scheme 51). Pang et al. reported that the BpsA Ox domains catalyse the key intermediate formation through oxidation.409
In the cylindrocyclophane (cyl) cluster from Cylindrospermum licheniforme ATCC 29412, a hemolysin-type calcium-binding protein CylK was biochemically characterized and demonstrated that cyclodimerization reaction uses an alkyl chloride precursor via the SN2 mechanism. The halogenase CylC mediates the chlorination, which was essential for the subsequent nucleophilic displacement of the chloride ion to generate the aryl-alky linkage. A BLAST search has demonstrated that CylC and CylK homologs are widespread in cyanobacterial genomes, offering a possible route to the discovery of new analogues of this compound class (Scheme 53).417
In terms of cylindrocyclophane synthesis, the monomeric precursor thioacetate mesylate 77 was prepared before undergoing cyclodimerization using treatment with NaOMe in MeOH at ambient temperature to afford the corresponding macrocyclic bis (thioether). Oxidation of this bis(thioether) with H2O2 in the presence of (NH4)6Mo7O24·4H2O furnished the macrocyclic bis (sulfone) 78 in 51% overall yield. Finally, the intermediate 78 was then modified to generate the cylindrocyclophanes (Scheme 54).418
Tubocurarine is a naturally dimeric tetrahydroisoquinoline alkaloid isolated from South American plants such as Chondodendron tomentosum.422 Tubocurarine, as a neuromuscular blocking agent, has previously been widely used as a muscle relaxant during surgical procedures. However, this molecule has largely been replaced by safer and more effective drugs such as vecuronium, rocuronium and atracurium in modern procedures.423
Tubocurarine possesses a “head-to-tail”, asymmetrical, quaternary ammonium structure with two ether bridges (8-O-12′ and 11-O-7′) linking the coclaurine monomers. It has been proposed that tubocurarine biosynthesis involves a radical coupling of two enantiomeric monocationic tetrahydrobenzylisoquinolines, the two enantiomers of N-methyl-coclaurine.424 Subsequent methylation of the amine and hydroxyl substituents are then enzymatically performed through the consumption of S-adenosyl methionine (SAM) (Scheme 55).424
In 2016, Wang et al. completed the synthesis of the dimeric macrocyclic alkaloid melanthiodine. The enantioselective phenethyltetrahydroisoquinoline 79 was first synthesised as a monomeric precursor, with subsequent cyclodimerization performed using copper-mediated catalysis. After optimization, the yield of macrocyclic dimer O,O-dibenzylmelanthioidine 80 reached 35% when using the combination of the soluble copper salt CuBr·SMe2 (1 equiv.), Cs2CO3 (3 equiv.), pyridine (0.16 M), 150 °C, 48 h to catalyse this reaction (Scheme 56).421
A Diels–Alder cycloaddition reaction was recognized to be the key step during the biosynthesis of these natural products. Using griffipavixanthone as an example, retrosynthetic analysis suggested that the cyclodimeric structure would be derived from the prenylated xanthone monomer 82, and the core cyclohexene ring would be formed through [4+2] cycloaddition (Scheme 57).427 Oxygenated styrenes are the key intermediates for this cyclization, which is initiated by the formation of a benzyl cation that then undergoes Friedel–Crafts cyclization after adding to the C–C double bond of another styrene molecule.
In 2017, Porco and co-workers reported the first biomimetic total synthesis of griffipavixanthone (81). The key synthetic step is the formation of a highly strained cyclohexene ring generated through an intermolecular cationic [4+2] cycloaddition reaction. The monomer intermediates vinyl para-quinone methide 83 and an in situ generated isomeric 1,3-butadiene 84 were afforded through isomerization mediated by Lewis or Brønsted acids (Scheme 56).427 In 2018, Schaus, Porco and co-workers achieved the asymmetric synthesis of (+)-griffipavixanthone employing a chiral phosphoric acid-catalyzed cycloaddition.428
Many studies concerning the synthesis of cyanolide A have been reported.430,431 To summarise, two main synthesis routes have been explored, which either perform glycosylation before or after cyclodimerization.
To afford the C2 symmetric diolide, macrocyclization reactions mainly exploit the Shiina lactonization strategy.430–432 The Krische group has reported the most direct six-step total synthesis of cyanolide A starting from neopentyl glycol. The synthesis uses no protecting groups and chiral auxiliaries, with the key intermediate C2-symmetric diol 85 constructed using an asymmetric iridium-catalysed dehydrogenation/allylation strategy (Scheme 58).433
Carpaine has been shown to possess a variety of biological activities, including antimicrobial, antiparasitic, and anti-inflammatory effects. Carpaine has also been investigated for its potential use in the treatment of various diseases, such as cancer, heart disease, and neurodegenerative disorders.440,441 The effects of carpaine may be related to its macrocyclic dilactone structure, a possible cation chelating structure.442 The biosynthesis of carpaine is not well understood, but is hypothesised to begin with the decarboxylation of the amino acid tyrosine to form tyramine, which is then further metabolized to form the alkaloid. The intermediate products of this pathway are believed to be modified and rearranged through additional enzymatic reactions to form the final carpaine molecule.
Taro et al. first reported the retrosynthetic routes of (+)-azimine and (+)-carpaine, where the N-derivatives of azimic acid and carpamic acid were used as starting substrates. Taking the example of (+)-carpaine synthesis, the precursor N-Cbz-carpamic acid was di-lactonized under Yamaguchi macrocyclization condition to generate the cyclodimerized N-Cbz-carpaine in 71% yield. Subsequent deprotection led to the isolation of (+)-carpaine (Scheme 60).443
A synthesis route towards glucolipsin A exploited the cyclodimerization of monomer 87, with the key cyclic dimer intermediate 88 obtained through macrodilactonization, with the reaction using the activating agent 2-chloro-1,3-dimethylimidazolinium chloride (89). The cyclodimerized product 88 was obtained in 54% yield using optimized conditions (Scheme 61).446,449
The synthesis of cycloviracin B was also enabled using 2-chloro-1,3-dimethylimidazolinium chloride (89) in the macrodilactonization step.449–451 Interestingly, it was observed that the addition of KH significantly increases the yield of 90 to 71% (Scheme 62), and from this it was postulated that the presence of potassium cations may structure the precursor molecules in a way that favours cyclodimerization.
Here we instead seek to emphasize the chemical synthesis reactions developed to generate cyclodimeric scaffolds. As most of the synthetic strategies for cyclodimerization and macrocyclization have been described in the above sections on individual cyclodimerized natural products, here we will restrict ourselves to the introduction of representative examples of cyclodimerization or cyclo-oligomerization related to NPs or NP-like small molecules. The chemical synthesis of cyclic supramolecular polymers will not be discussed.
Macrocyclic structures with one or more ester connections are known as macrolides. Lactonization often represents an efficient approach for the synthesis of macrolides.461 Porco and co-workers reported distannoxane-catalyzed cyclodimerization, which is mediated by distannoxane transesterification catalysts that facilitate the production of dimeric macrodiolides from various hydroxy esters substrates. They reported that substrate enantioenriched hydroxy esters are readily able to generate the 14- to 22-membered cyclodimerization reaction when using distannoxane transesterification catalysts (Scheme 64).462 Studies have shown that this is also an effective strategy to create stereochemically enriched homodimers from a variety of hydroxy esters monomer pairs. The effectiveness of cyclodimerization reactions has been shown to depend significantly on both microwave power and reaction temperature employed for such reactions.463
Distannoxane catalysts effectively promote the macrolactonization of seco-ester precursors. Collins and co-workers reported a reaction protocol for macrodiolide synthesis based on seco-acids using Lewis acid Hf (OTf)4 as catalyst, with the only by-product generated in this reaction being water. After optimization, this method was shown to be suitable for the synthesis of the 22-membered macrodiolide 95, and macrocycles 94 and 96 through cyclodimerization (Scheme 65).464
Zhang et al. reported a facile synthesis method for cyclic polyamides through cyclodimerization, with monomer pentafluorophenol esters prepared in advance. After deprotection, the two active groups (NH2 and CO2C6F5) were exposed on both termini, which promoted macro-lactamization by monomers cyclodimerization in an alkaline environment. (Scheme 66).465
Miao et al. investigated the autocatalytic polymerization reaction of tripeptides linked by disulfide bond using (Cys-Xxx-Gly-SEt)2 as the “monomer” starting material. The resulting macrocyclic peptides, containing up to 69 amino acids, displayed autocatalytic kinetics and were formed via native chemical ligation; thiol–thioester exchange “hopping” was likely responsible for this behaviour. This method allows for the straightforward production of thiol-rich macrocyclic oligopeptides, which can be used as a tool for studying functional cyclopeptides (Scheme 67).466
Click chemistry has been powerful strategy for chemical backbone synthesis and biomimetic applications.468 Many click reactions have proven to be efficient peptide macrocyclization methods. The azide–alkyne cycloaddition methodology is a valuable technique for generating macrocyclic rings.469 Based on the 1,3-dipolar cycloaddition of azides and alkynes, triazole-containing macrocycles can be readily synthesized. The Liskamp group developed the synthetic route of the ABC ring system of vancomycin through ruthenium-based azide–alkyne cycloaddition (RuAAC). This macrocyclization method exhibited high intramolecular selectivity to mimic the bicyclic formation.470
Jagasia et al. reported a copper-catalysed azide–alkyne cycloaddition reaction (CuAAC) that combined the methods of SPPS and click chemistry. This method can also be applied to perform the head-to-tail cyclodimerization of resin-bound oligopeptides. The advantage of this method is that this reaction was independent of peptide sequence (Scheme 69).471,472
The Ghadiri lab has reported a solution-phase synthesis of C2 symmetric cyclic peptide scaffolds through 1,3-dipolar azide–alkyne cycloaddition reactions. This technique utilised a tandem dimerization-macrocyclization approach enabled by facile click reactions of an azido-dipeptide alkyne.473
The azido-alkyne “click reaction” strategy also can be used in the synthesis of cyclodextrin analogues, with the cyclodimerization of an azido-alkyne saccharides performed via Cu-catalyzed dipolar cycloaddition of alkyne and azide functional groups (Scheme 70).474,475 Menand et al. developed an efficient domino Staudinger aza-Wittig reaction to synthesis sugar-aza-crown ethers through the cyclodimerization of C-glycosyl azido aldehydes (Scheme 70).476,477
The exploration of active precursors for efficient cyclodimerization synthesis has been a longstanding objective of organic chemists. Recently, vinylethylene carbonates (VECs) were discovered as promising dipole precursors to iridium-catalysed cascade allylation-macrolactonization reactions.478 Wang and co-workers developed a wide range of C2-symmetric chiral macrodiolides based on VEC substrates and isatoic anhydride analogues (Scheme 71). Using palladium-mediated catalysis, VECs were converted into the key zwitterionic p-allyl palladium intermediate. Subsequently, this intermediate was used to perform diverse reactions including cyclodimerization, which also maintains high diastereo- and enantioselectivity in the final products.479
Scheme 71 The iridium-catalysed cascade allylation-macrolactonization based on vinylethylene carbonates precursors. |
In organic synthesis, cycloaddition reactions are regarded as one of the most effective strategies for constructing cyclic structures. Cycloaddition dimerization obviously differs from the cyclodimerization reaction because cycloaddition involves the interaction between two distinct reactants, usually a conjugated diene and a dienophile, resulting in the creation of a new cyclic molecule. In contrast to cyclodimerization, cycloaddition reaction forms a single cyclic product, not a dimer made of identical or similar reactants. Cycloaddition reaction have also been extensively reviewed.490–493
In this review, we summarised the structural traits, distribution, biological functions, and uses of natural products that feature cyclodimeric/cyclomultimeric scaffolds, along with a summary of what is known regarding their biosynthesis and origins. We sought to pay particular attention to the role played by the TE domain of PKS and NRPS machinery in the macrocyclization process. Among these compounds, the most representative examples are gramicidin S and tyrocidine, in which the catalytic mechanism of these TE domain have been relatively well-studied. In addition, we also introduced examples of alternate catalytic cyclization reactions, such as the butelase- and CT-mediated macrocyclization of peptides, as well as the formation of cyclodipeptides mediated by CDPSs and single-module NRPSs. Whilst there are several alternate mechanisms found in nature that perform cyclisation within biosynthetic assembly lines, arguably the most relevant for cyclodimerization is the role of TE domains, given their potential application as biocatalysts. These domains have been widely investigated for their biosynthetic and biocatalytic potential, and whilst we now know significant details concerning the mechanism of cyclisation or dimerization performed by TE domains, there is far less known about the aspects of these domains that control the substrate specificity and positioning that gives rise to their specific products.
Examples of this phenomena can be seen in the activity of the TE domains from gramicidin S (GrsB), tyrocidine (TycC) and WS9326A/mohangamide biosynthesis. The GrsB TE efficiently catalyses the dimerization of assembled pentapeptides to form the cyclic decapeptide gramicidin S, with additional potential for chemoenzymatic analogue synthesis. The TycC TE has been widely studied and cyclises the tyrocidine precursor and a range of modified peptides and is also able to dimerize the gramicidin S pentapeptide, thus emphasizing its versatility. The ability of both TycC TE and GrsB TE to accommodate variations in substrate sequences while maintaining catalytic efficiency suggests a degree of plasticity in their active sites. This plasticity enables these enzymes to recognise and cyclise substrates with similar structural features, even if they are derived from different biosynthetic pathways or synthetically derived.124,125 An excellent example of this substrate promiscuity is the use of the TycC TE domain to generate libraries of tyrocidine derivatives with modified amino acid residues at position 4 of the peptide.131 Kohli et al. noted that the structure of tyrocidine—an amphipathic β-sheet—can also be used to predict the propensity of the TE domain to accept modified residues at P4 in terms of how these might disrupt the structure of the peptide, thus suggesting preorganization of the peptide within the TE must be important for cyclisation. This is in addition to constraints placed on the activity of TE domains (and related PBP-type TE domains) that is seen in the amino acid specificities of such enzymes at the N- and C-termini of the peptide substrates.126,128 The ability of the TycC TE domain to generate the dimeric gramicidin S-structure is also highly significant in this regard, although again there are restraints placed in terms of the amino acids that are accepted at the peptide termini and the structure adopted by the final cyclic peptide in this case. This raises a more general question concerning the importance of preorganization of the substrate for cyclisation by TE domains—something that has largely been investigated in peptide biosynthesis only—and could well call into question the ability to apply TE domains to the cyclisation of substrates that differ significantly from the natural products produced by these enzymes.
Nonetheless, the impressive tolerance of the PBP-type TE SurE for different ring sizes shows that this limitation can be overcome, at least for some systems.39 The TE domain from WS9326A/mohangamide biosynthesis is also a fascinating example of how we are yet to understand the substrate positioning governed by these domains.142,143 In this case, the results of in vitro cyclisation assays clearly show that the differentiating feature in whether the TE domain catalyses a standard cyclisation (to afford WS9326A) or a pseudo-dimerization (to generate mohangamide) is the structure of the acyl unit present on the peptide monomer. Whilst it appears as though mohangamide is likely an unusual minor product of this pathway due to the loading of an unusual dihydropyridine-containing acyl unit, this raises the question of how this seemingly minor alteration to a single building block can completely alter the specificity of the reaction towards dimerization. This requirement for the presence of two different monomers to afford mohangamide is also tantalizing for the generation of controlled dimers via the use of TE domains, as the dihydropyridine-containing acylated peptide appears unable to undergo cyclisation without the presence of the alternate monomer.140 Taken together, these results clearly show that the field requires far more insights into the binding and dynamics of peptides within TE domains in order to understand and exploit such catalytic behaviour.
The interplay of the substrates with the TE domain makes the value of isolated crystal structure without substrate bound of somewhat reduced value, and even the pioneering work of Chin and Schmeing demonstrated the resistance of TE domain complexes to crystallographic investigation, at least from the perspective of explaining the noted cyclisation behaviour of the TE domain studied in this case.70 This strongly suggests that a greater application of computational approaches coupled with in solution biophysical approaches (as have been demonstrated recently for NRPS systems in pioneering work from Mootz and co-workers) is needed to understand these phenomena more completely.496
Despite their challenges, TE domains do offer the major advantage over CT domains in their ability to accept soluble SNACs as substrates, which is a major advantage when exploring the potential for cyclodimerization by such enzymes. Additional examples of TE domains accepting ester substrates provides further support for the value of TE-domains as biocatalysts, which is due to the synthesis of such substrates is significantly simplified because of the improved stability of ester linkages during solid phase peptide synthesis. An important area of future investigation—particularly in regard to pseudo cyclodimerization—is to understand how acyl moieties can be used to control the specific dimerization of two substrates. This would be a very powerful tool for synthetic biology and would have great potential in generating libraries of larger cyclic peptides from a smaller pool of appropriately protected linear intermediates. These findings collectively underscore the versatility and substrate tolerance of these TE domains observed in the NRPS assembly in nature, shedding light on their potential applications in peptide engineering and synthesis.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d2cs00909a |
‡ These authors contributed equally to this work. |
This journal is © The Royal Society of Chemistry 2025 |