Bruno S.
Paulo‡
ab,
Michael J. J.
Recchia‡
c,
Sanghoon
Lee
c,
Claire H.
Fergusson
c,
Sean B.
Romanowski
ab,
Antonio
Hernandez
ab,
Nyssa
Krull
a,
Dennis Y.
Liu
c,
Hannah
Cavanagh
c,
Allyson
Bos
d,
Christopher A.
Gray
d,
Brian T.
Murphy
ab,
Roger G.
Linington
*c and
Alessandra S.
Eustaquio
*ab
aDepartment of Pharmaceutical Sciences, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60607, USA. E-mail: ase@uic.edu
bCenter for Biomolecular Sciences, College of Pharmacy, University of Illinois at Chicago, Chicago, IL 60607, USA
cDepartment of Chemistry, Simon Fraser University, Burnaby, BC V5H 1S6, Canada. E-mail: rliningt@sfu.ca
dDepartment of Biological Sciences, University of New Brunswick, Saint John, New Brunswick E2L 4L5, Canada
First published on 13th September 2024
Burkholderiales bacteria have emerged as a promising source of structurally diverse natural products that are expected to play important ecological and industrial roles. This order ranks in the top three in terms of predicted natural product diversity from available genomes, warranting further genome sequencing efforts. However, a major hurdle in obtaining the predicted products is that biosynthetic genes are often ‘silent’ or poorly expressed. Here we report complementary strain isolation, genomics, metabolomics, and synthetic biology approaches to enable natural product discovery. First, we built a collection of 316 rhizosphere-derived Burkholderiales strains over the course of five years. We then selected 115 strains for sequencing using the mass spectrometry pipeline IDBac to avoid strain redundancy. After predicting and comparing the biosynthetic potential of each strain, a biosynthetic gene cluster that was silent in the native Paraburkholderia megapolitana and Paraburkholderia acidicola producers was cloned and activated by heterologous expression in a Burkholderia sp. host, yielding megapolipeptins A and B. Megapolipeptins are unusual polyketide, nonribosomal peptide, and polyunsaturated fatty acid hybrids that show low structural similarity to known natural products, highlighting the advantage of our Burkholderiales genomics-driven and synthetic biology-enabled pipeline to discover novel natural products.
It has been recently estimated that only 3% of genome-predicted bacterial natural products have been isolated and structurally characterized.2 Obtaining the predicted products remains a bottleneck, in part because many BGCs are “silent”, that is they are not expressed in quantities practical enough to allow the detection and isolation of biosynthesized products.6 Approaches for targeted activation of BGCs include engineering of the native producer and heterologous expression in an optimized host strain.7 Heterologous expression has the potential to streamline the discovery process through standardization and automation. However, recent studies8–11 showed that the success rate of heterologous expression is still relatively low, varying from 11% to 32% when using model Escherichia coli and Streptomyces spp. as hosts. The choice of host strain can greatly impact success and product yields.12 For instance, a systematic analysis of host strains revealed a direct relationship between yields and the genetic identity of host and source DNA.13 Recently we have tested a Burkholderia sp. strain as an alternative host and demonstrated its ability to produce Burkholderiales natural products in titers that are two to three orders of magnitude higher than with E. coli.14
Here we report a pipeline to discover natural products from Burkholderiales that combines a suite of complementary approaches (Fig. 1). First, Burkholderiales bacteria were selectively isolated from environmental samples using methods we previously established.15,16 To select strains for genome sequencing, we performed matrix-assisted laser desorption/ionization time of flight mass spectrometry (MALDI-TOF MS) protein and metabolite analyses on cell material from bacteria colonies. The resulting data were processed using the bioinformatics pipeline IDBac to maximize strain and metabolite diversity while avoiding redundancy.17 Strain selection and sequencing was performed in two cycles. In cycle 1, a set of genomes was sequenced and analyzed for biosynthetic potential, which influenced strain selection and sequencing in the second round (Fig. 1A). The biosynthetic potential of each strain was predicted and compared (Fig. 1B). A BGC that was silent in the native producers was prioritized, cloned, and activated using heterologous expression in a Burkholderia sp. host strain18,19 (Fig. 1C). From this strain megapolipeptins A (1) and B (2) were isolated and structurally characterized (Fig. 1D). Megapolipeptins are unusual polyketide-nonribosomal peptides with varying polyunsaturated fatty acid components. They show low structural similarity to other known bacterial natural products (maximum Tanimoto similarity score of 0.58 (1) and 0.56 (2) compared to all entries in the Natural Product Atlas),1,20 highlighting the advantage of our Burkholderiales genomics-driven and synthetic biology-enabled pipeline to discover novel natural products.
Fig. 1 Overview of the approach used in this study. (A) Environmental samples (rhizosphere) were collected from British Columbia, Canada. Burkholderiales strains were then isolated from the rhizosphere of root samples using selective media.16 In the first cycle, 230 isolated strains were analyzed by MALDI-TOF MS/IDBac and 100 strains were selected for genome sequencing based on the analysis of metabolite association networks; the intent of this step was to avoid strain redundancy while maximizing metabolite diversity entering sequencing efforts (ESI Fig. S1†).17 One hundred draft genome sequences were obtained. Biosynthetic gene clusters (BGCs) were predicted using antiSMASH and the biosynthetic potential of strains was compared in terms of BGC numbers and biosynthetic class. Informed by the predicted biosynthetic potential, cycle 2 targeted any newly isolated genera not included in cycle 1 and species determined to be the most ‘talented’ in cycle 1, resulting in 15 additional strains sequenced from 86 analyzed. See ESI Tables S1–S3† for details on the strains sequenced. (B) Phylogenomic and gene cluster family (GCF) analyses were performed on a total of 115 strains to gain insight into GCF distribution and to prioritize BGCs for discovery. (C) A prioritized BGC with clade-specific distribution that was silent in the native strains was cloned and heterologously expressed in Burkholderia sp. FERM BP-3421. (D) Natural product isolation and structure elucidation yielded megapolipeptins A (1) and B (2). |
In the second round of prioritization (Fig. 1A), 86 newly isolated strains were analyzed, and 15 strains were selected that either added phylogenetic diversity (new genus or species not included in cycle 1) or that were determined in the first round to be “talented” in terms of BGC number and diversity but that were underrepresented in the dataset. Thus, in the second round we included Herbasparillum, the only new genus we isolated that was not previously represented, and additional Paraburkholderia megapolitana and Paraburkholderia fungorum strains, which appeared “talented” in terms of BGC number and diversity but that were underrepresented (Fig. 2A and S2†). At the same time, we did not include new Caballeronia spp. because of the lower number of BGCs per strain observed in the first round (9.75 BGCs on average), nor did we include new P. sediminicola strains because they were already well represented in the first round.
Fig. 2 Genome library metrics. (A) Genome size by number of BGCs, color coded according to the clades attributed in Fig. 3. R2 = 0.18. The top 10% strains most prolific in terms of number of BGCs are highlighted in yellow (P. megapolitana/acidicola) and purple (P. sediminicola/fungorum). (B) Donut charts depicting the total number of either BGCs or (C) GCFs subdivided by biosynthetic class. |
In total, 115 draft genomes were obtained using short-read Illumina sequencing and Unicycler assembly (ESI Tables S1–S3†), resulting in 159 contigs on average (range of 33 to 780). Eight representative genomes were also sequenced using long-read Oxford Nanopore technology resulting in four complete genomes with circular replicons and four for which some contigs remained linear (ESI Table S4†).
An average of ∼12 BGCs per strain were predicted in accordance with previous Burkholderiales studies.22 There was only weak association between number of BGCs and genome size (R2 = 0.18). Instead, the best predictor of biosynthetic capacity appeared to be phylogeny. Notably, the clade containing Paraburkholderia acidicola and Paraburkholderia megapolitana had the highest ratio of number of BGCs to genome size at 2.1 BGCs per Mbp followed by P. azotifigens at 1.6 and P. sediminicola/fungorum at 1.5 (Fig. 2A and ESI Table S5†).
In terms of biosynthetic class, terpene BGCs were the most abundant (341, 24.6%) and are present in every single strain in the collection. NRPS (226, 16.3%) and RiPP (220, 15.9%) are also well distributed amongst strains. PKS (35, 2.5%) and PKS-NRPS (34, 2.4%) are less abundant classes (Fig. 2B) with highest occurrence in Paraburkholderia megapolitana, Paraburkholderia fungorum, and some Paraburkholderia sediminicola strains (ESI Fig. S2†). In the ‘other’ category (532, 38.3%, ESI Fig. S3†), the main contributions came from homoserine lactones (134, 9.6%), phosphonates (115, 8.3%), arylpolyenes (112, 8.1%), betalactones (77, 5.5%), and redox cofactor (69, 5%) whereas minor groups included phenazines, non-NRPS siderophores, ectoines, butyrolactones, and furan (≤3, ≤0.2%).
As the same BGC may be present in many strains, total BGC count does not reflect natural product diversity. To explore the natural product diversity encoded in the strain library, we analyzed the 1388 BGCs using the biosynthetic gene similarity clustering and prospecting engine BiG-SCAPE23 which generates BGC similarity networks. Using a clustering threshold of 0.4 as previously reported to best represent similar natural products,2 151 gene cluster families (GCFs) were obtained, including 78 networks and 73 singleton BGCs. An analysis of biosynthetic class distribution (Fig. 2C) shows that there is more redundancy in the terpene space than could be determined from the total number of BGCs (341 BGCs [24.6%] grouping into 10 GCFs [6.6%]). In contrast, there is more diversity in the NRPS space (226 BGCs [16.3%] grouping into 49 GCFs [32.4%]). In the ‘other’ category (ESI Fig. S3†), redundancy is most apparent in arylpolyenes (112 BGCs [8.1%] grouping into 4 GCFs [2.6%]) and phosphonates (115 BGCs [8.3%], 4 GCFs [2.6%]).
We next performed a phylogenomic analysis of the 115 strains and observed three monophyletic groups representing three currently described genera, Herbaspirillum (2 strains), Caballeronia (4 strains), and Paraburkholderia (109 strains), with the latter having the largest representation in the collection (Fig. 3A and S4†). Moreover, Paraburkholderia strains also had the highest number of BGCs (Fig. 2A and S2†). To investigate correlations between BGC distribution and phylogeny, we subdivided the Paraburkholderia group into seven monophyletic groups as shown in Fig. 3A and S5.†
Fig. 3 Phylogenomic analysis and BGC distribution. (A) Phylogenomic tree of Burkholderiales strains based on 49 genes within cluster of orthologous groups (ESI Table S6†). The tree was constructed using the neighbor-joining method. Select Paraburkholderia, Herbaspirillum and Caballeronia genomes available in public databases were included in addition to the 115 strains sequenced in this study which are shown with our internal strain numbering scheme. The Paraburkholderia clade was further subdivided into seven monophyletic groups as highlighted. See also ESI Fig. S4 and S5.† (B) BiG-SCAPE BGC Sequence Similarity Network within the 115 Burkholderiales genomes (distance cutoff = 0.4). A total of 1388 BGCs are displayed, color-coded according to the clades in panel A. Node shape indicates BGC class according to BiG-SCAPE classification. Known and orphan BGCs described in the text are highlighted. (C) Example of a widely distributed BGC that is part of the core genome of Paraburkholderia strains. BGCs in this family contain three core genes and varying gene neighborhoods. The core genes are predicted to encode the biosynthesis of 2-aminoethyl phosphonate (2-AEP) from phosphoenolpyruvate (PEP) via phosphonopyruvate (PnPy) and phosphonoacetaldehyde. (D) The clade-specific mgp GCF and BGC investigated in this work from genome #76 (ESI Table S1†). See ESI Table S7† for gene details. |
To explore the prevalence and potential novelty of BGCs from our collection, we included the reference BGCs from the Minimum Information about a Biosynthetic Gene cluster (MIBiG)24 database in the BiG-SCAPE23 analysis (ESI Fig. S6†). The large majority of the BGCs (1366 BGCs, 98.4%) did not associate with MIBiG nodes. The 1.6% that had an MIBiG counterpart included: (1) an NRPS-PKS similar to that encoding antifungal occidiofungin from Burkholderia pyrrocinia (ocf), which matched with two strains of P. megapolitana (RL18-039-BIC-B and RL17-339-BIF-C);25 (2) the antifungal lagriamide A (lga) encoded in strain P. acidicola RL17-338-BIF-B26 that we recently showed to encode a new compound lagriamide B (lgb BGC);27 (3) the nonribosomal peptide siderophore gramibactin (grb) encoded in 13 strains from the P. graminis and P. stydomiana clades;28 and (4) the β-lactam antibiotic sulfazecin (sul) encoded in P. acidicola RL17-338-BIF-B (ESI Fig. S6†).29 After manual curation, we further identified a glidobactin-like BCG (glb)30 in P. fungorum RL18-167-BIC-A and an ornibactin-like BGC (orb)31 in 21 strains.
Superclusters, which occur when two or more clusters closely co-localize and are treated as one entity, are a common issue with automated BGC prediction. The presence of superclusters in the direct antiSMASH output led to known PKS-NRPS BGCs ocf and lgb forming a network with terp2 (ESI Fig. S6†). We next manually curated PKS-NRPS BGCs that were likely part of superclusters to split the clusters and generate a new network. To facilitate visualization and extraction of networks by biosynthetic class, MIBiG nodes were removed to generate the sequence similarity network displayed in Fig. 3B.
An N-acyl homoserine lactone BGC (ahl) was present in 109 strains. N-acyl homoserine lactones are involved in quorum sensing in bacteria and are known to regulate behaviors such as virulence and biofilm formation.36–39 Finally, phosphonates were also recurrent in our collection being present in 114 strains out of 115. The largest GCF (pho) contained 109 BGCs with three core genes (Fig. 3C). The first gene is common to most phosphonate pathways and is predicted to encode the phosphoenolpyruvate (PEP) mutase PepM, which reversibly converts PEP into phosphonopyruvate (PnPy). The operon contains two other genes predicted to encode a decarboxylase, and a transaminase, catalyzing decarboxylation of PnPy to the aldehyde and reductive amination, respectively, to yield 2-aminoethyl phosphonate (2-AEP). 2-AEP may be attached to structural components such as polysaccharides or lipids.40 The high prevalence of phosphonate BGCs in our collection agrees with prior reports. Based on the presence of pepM homologs, phosphonate biosynthesis was predicted to be encoded in ∼5% of bacterial genomes at large but in 94% of Burkholderia genomes.41
For the accessory BGCs, NRPSs (ESI Fig. S7†) are the most abundant and present in all strains except in Herbaspirillum rhizosphaerae RL21-008-BIB-B (#114, ESI Table S1†). Nearly all NRPS GCFs have a monophyletic distribution (Fig. 3B), except for siderophores gramibactin (grb) and ornibactin (orb) (ESI Fig. S7†). The clade-specific nature of NRPS clusters suggests vertical transmission of specialized functionalities with distinct nonribosomal peptides being produced by phylogenetically distinct strains. RiPPs follow the same tendency of clade-specific distribution with three exceptions (ESI Fig. S8†).
Type I PKS BGCs (ESI Fig. S9†) are less abundant (34/1388, 2.4%). The most prevalent BGC is an orphan, monomodular type I PKS present in 30 genomes, mainly from the P. sediminicola/P. fungorum clade (pks1, Fig. 3B and S9†). The remaining four BGCs fall into one GCF containing two members (pks2) and two singletons. Finally, one PKS belongs to type 3 (t3pks).
PKS-NRPS hybrid BGCs (ESI Fig. S10†) have low abundance as well (34/1388, 2.4%). They are present in P. sediminicola/fungorum, P. azotifingens, and P. megapolitana/acidicola clades. The largest GCF (23 BGCs, pks-nrps1) contains BGCs from P. sediminicola/fungorum and P. azotifingens clades. The highest diversity of PKS-NRPS BGCs comes from the P. megapolitana/acidicola clade, all three strains of which contain two PKS-NRPS BGCs that fall into the known ocf GCF, the known lgb27 singleton, and two orphan GCFs, pks-nrps2 and the mgp BGC studied here.
The mgp BGCs from P. megapolitana RL18-039-BIC-B (genome #76, Table S1†) and P. megapolitana RL17-339-BIF-C (#112) share 96.6% pairwise identity and 77% identity to the BGC found in P. acidicola RL17-338-BIF-B (#103), with the BGC from P. acidicola having a shorter mgpA PKS gene (Fig. 3D). Zheng et al.47 previously identified the BGC with the longer mgpA gene in P. megapolitana DSM 23488. Because the BGC was silent, the authors activated gene expression using promoter replacement in strain DSM 23488. Although mass spectrometry features could be detected, low yields precluded attempts at isolation and structure elucidation of any target molecules. We were likewise unable to detect the products of this BGC in the wild type strains from our collection. Thus, we cloned and expressed the BGC in Burkholderia sp. FERM BP-3421, a host we have been developing as an alternative synthetic biology chassis.18,48 Obtaining the product of the mgp BGC in sufficient quantity for characterization would allow the evaluation of the host's performance with a complex BGC and advance current knowledge of hybrid polyunsaturated fatty acid, polyketide, and nonribosomal peptide systems.
The mgp BGC is located on chromosome 2 of the three genomes (Fig. 4A, #76 shown) and it contains 14 open reading frames (ORFs) spanning 54 kbp (Fig. 3D). The mgp BGC from P. megapolitana RL18-039-BIC-B (#76) was cloned using a CRISPR-Cas9 based methodology.49 The obtained plasmid pBS001 was transferred into a spliceostatin-defective mutant (Δfr9A) of Burkholderia sp. FERM BP-3421,19 and exconjugants were confirmed by PCR (ESI Fig. S11†). Molecular networking was performed to identify molecular features present only in mutants harboring the mgp BGC but absent in strains containing the empty vector pBS003 (ESI Fig. S12–S16†). The analysis identified m/z 984.538 and m/z 958.522 (Fig. 4B) that were pursued for isolation. These features were the ones identified in strain DSM 23488 after promoter replacement, but quantities had not been enough for isolation.47
The full planar structure could not be unambiguously determined from the NMR data as no correlations were observed between the Mpo moiety and the rest of the molecule. Possible connections included the carbonyl groups at Adhda-21, Hoha-1, or Hoha-7. To resolve this issue the molecule was treated with TMS diazomethane to convert the carboxylic acid groups to their corresponding methyl esters. LC-MS analysis of the methylated product showed two peaks, each with an increase in mass of 42 Da suggesting the addition of C3H6 which was unexpected given the presence of only two carboxylic acid moieties (ESI Fig. S27–29†). The derivatized products (3 and 4) were isolated and analyzed by NMR. The 1H-NMR and gHMBC spectra showed two methoxy signals correlating with Hoha-1 and Hoha-7. Closer inspection of the NMR data revealed that Mpo-1 was no longer present. Instead, the gCOSY spectrum revealed that this moiety had been converted to a 2-isopropyloxirane (Ipo), via a Buchner–Curtius–Schlotterbeck rearrangement between the ketone functional group of the Mpo subunit and TMS-diazomethane (Fig. 5C). The complete planar structures of 3 and 4 were determined using a full suite of 1D and 2D NMR experiments (ESI Fig. S30†). Based on these data, it was determined the Mpo moiety was attached to Adhda-21, completing the planar structure of 1 (Fig. 5A and ESI Table S8†).
HRMS and MS/MS fragmentation data of 2 displayed a protonated cluster ion at [M + H]+m/z 984.53821 (calcd m/z 984.53872) and neutral losses of adjacent threonine residues (1Thr and 2Thr) suggesting a molecular formula of C47H77N5O17 and a structural analog of 1 with a mass difference of 26 Da (ESI Fig. S31 and S32†). Examination of the 1H-NMR spectrum showed the presence of two additional vinylic methine protons along the hydrocarbon chain (Fig. 5A and S33†). The NMR data of 2 were comparable to 1 showing five identical spin systems: Ahpa, 1Thr, 2Thr, Hoha, and Mpo moieties. Further examination of the 2D NMR spectra revealed the final spin system as 3-amino-5,21-dihydroxydocosa-8,12,16-trienamide (Adhta) (Fig. 5A, ESI Table S9 and Fig. S33–38†). The position of the Mpo moiety was determined in similar fashion to 1 by treatment with TMS diazomethane to generate the 2-isopropyloxirane motif and identification of gHMBC correlations between Mpo-1 and Adhta-23, completing the planar structure of 2 (ESI Fig. S39 and S40†).
The configurational analysis of megapolipeptins A and B represents a significant analytical challenge. Megapolipeptin A contains 9 chiral centers and two double bonds, for a total of 2048 possible configurations. Because many of these centers are separated from one another by achiral regions it is not straightforward to directly relate their relative or absolute configurations. Instead, the absolute configurations of centers in each region must be determined independently. The absolute configurations of the threonine amino acid-derived stereocenters (1Thr and 2Thr) in 1 and 2 were examined using Marfey's analysis50 (ESI Fig. S41†) which revealed the presence of L-threonine and L-allo-threonine in both molecules. To determine the positions of each amino acid megapolipeptin B (2) was subjected to partial acid hydrolysis (1 N HCl, 110 °C, 30 minutes). UPLC-MS analysis of the hydrolysate revealed the presence of a product consistent with hydrolysis between the two threonine residues (Fig. 5D and S42†). HPLC purification, full acid hydrolysis, and Marfey's analysis of this product defined the configuration of the 2Thr residue as L-threonine (Fig. 5E), which by extension defined 1Thr as L-allo-threonine.
Traditionally, the configurations of disubstituted olefins are determined from the 3JHH coupling constant between the two olefinic signals (15–17 Hz = trans, ∼10 Hz = cis). However, in pseudosymmetrical systems such as the megapolipeptins the olefinic 1H signals are often highly overlapped. Fortunately, both the olefinic carbons (Adhda C10, C11, C14, C15) and the adjacent allylic carbons (Adhda C9, C12, C13, C16) possess diagnostic chemical shifts between cis and trans systems. In both 1 and 2 the olefinic carbons were all in the range 129.7 ± 0.7 ppm, indicative of an all-trans arrangement. This contrasts with cis olefins, where 13C shifts are ∼128.0 ppm.51 Further supporting evidence for the all-trans arrangement was provided by the allylic carbon chemical shifts centered around 32.0 ppm. In cis olefins these values center around ∼27.3 ppm.
Finally, as will be discussed in the following section on the proposed biosynthesis, the configurations of several centers could be inferred from the biosynthetic gene cluster. Analysis of the module responsible for the installation of the ketide-extended serine (Ahpa) indicated the installation of L-serine, followed by extension and reduction by the associated A-type ketoreductase (KR) to install a hydroxy group with L-orientation at position Ahpa-3. The KR in MgpA responsible for installing the hydroxyl group at C5 of the fatty acid (Adhda-5 (1) and Adhta-5 (2)) is also A type. The hydroxy groups at Adhda-19 (1) or Adhta-21 (2) are predicted to have D-orientation based on the B-type KR within MgpE. The configuration at Adhda-3 was not determined.
Fig. 6 Biosynthetic proposal for mgp BGC from P. megapolitana RL18-039-BIC-B. (A) Megapolipeptin biosynthetic gene cluster from P. megapolitana RL18-039-BIC-A (genome #76). (B) Biosynthetic hypothesis based on gene/domain content and the observed structures. The configuration of chiral centers containing hydroxyl groups was predicted based on KR domain type (ESI Fig. S43†). The KR type is indicated with red A, B letters. |
Four proteins have been implicated in PUFA biosynthesis in bacteria, PfaA-PfaD, in addition to a phosphopantetheinyl transferase PfaE that may or may not be present in PUFA clusters.53,54 MgpE is homologous to PfaA that displays a KS-AT-(ACP)n-KR domain organization. MgpF appears to be a variation of PfaBC containing a KS-KS-AT-DH-DH domain organization, and MgpH encodes an ER domain, resembling PfaD. We propose MgpE, MgpF and MgpH catalyze biosynthesis of the fatty acid portion of megapolipeptins from acetyl-CoA and either 7 or 8 malonyl-CoA units leading to 16:2(6,10) or 18:3(4,8,12) unsaturated fatty acids, respectively. Based on the predicted B-type KR (ESI Fig. S43†) within MgpE, the hydroxyl group would possess the R configuration. Additionally, a 4-oxoheptanedioic moiety decorates the terminal ω-1 hydroxyl group. Biosynthesis of the potential precursor 1,5-dicarboxy-3-oxopentyl phosphate might be catalyzed by the putative pyruvyl transferase MgpK and thiamine pyrophosphate-dependent lyase MgpL.55 Alternatively, the 4-oxoheptanedioic moiety may derive from lipid peroxidation.56 Either way, the acyl-CoA synthetase MgpN would activate the fatty acid component for loading into MgpA (Fig. 6B).
MgpA is a PKS with a KS-KR-T organization that we propose extend the fatty acid chain with one malonate unit followed by reduction of the β-keto group to a hydroxyl. Based on the predicted A-type KR (ESI Fig. S43†), the hydroxyl group would possess the S configuration. MgpB and MgpC are hybrid PKS-NRPS enzymes containing unusual domain organization. MgpB (KS-AT-T-TA-C-A-T) would catalyze another C2 extension with malonate, followed by reductive amination of the β-carbonyl catalyzed by the transaminase (TA) domain as has been described for MycA in the biosynthesis of mycosubtilin.57 The A domain in this module is predicted by antiSMASH to load a proline, however seven residues are different from the expected proline code.58,59 We propose this A domain may load α-ketoisovaleric acid according to the structures of megapolipeptins, although the keto-acid code is also not conserved (ESI Table S13 and Fig. S44†).60
Based on sequence and phylogenetic analyses (ESI Fig. S45 and 46†), four of the six C domains clade with LCL domains, indicating that they process L-amino acids (the one within MgpB, the first and the third within MgpC and the first within MgpD). The first module in MgpC (C-C-A-T-C-A-T-KS-KR-DH) appears to catalyze the iterative addition of two L-threonine units (Fig. 6B), which agrees with the A domain code (ESI Table S13†). As determined by partial hydrolysis of megapolipeptin B followed by Marfey's analysis (Fig. 5D and E and S39†), L-threonine is incorporated first followed by L-allo-threonine. The second module in MgpC would add L-serine, followed by a ketide extension using malonate by an AT-less PKS module. The A-type KR in this domain would then reduce the β-carbonyl to a (S) hydroxyl group (Fig. 6B and S43†). The DH is predicted to be inactive as the catalytic histidine and aspartate residues are mutated (ESI Fig. S47†). Accordingly, no dehydration is expected, and the hydroxyl group is maintained in the final structure. The second C domain in MgpD (C-A-T-E-C-T-TE) clades with DCL domains in accordance with the presence of an epimerization domain in this module, suggesting that the L-valine selected by the A domain (ESI Table S13†) is epimerized to D-valine (Fig. 6B).
MgpD may be involved in terminal amide biosynthesis and perhaps in providing α-ketoisovaleric acid. Terminal amide biosynthesis has been described for myxothiazol and melithiazol (MtaG/MelG) from myxobacteria.61,62 MtaG/MelG display a C-A-MOX-A-T-TE domain organization where the A domain is split with a monooxygenase (MOX). Condensation with glycine followed by hydroxylation of the α-carbon catalyzed by MOX and dealkylation of the alcohol amide is proposed to yield the terminal amide, while the TE domain releases the α-ketoacid. Analogously, MgpD could add valine which followed by hydroxylation could result in the terminal amide and into α-ketoisovaleric acid to be condensed with the free amine product of MgpB (Fig. 6B). Although MgpD does not contain a MOX domain, MgpG encodes a standalone cupin-family hydroxylase63,64 that we propose may act in trans. Alternatively, MgpG could catalyze α-hydroxylation of the valine unit and dealkylation after product release from the NRPS by the TE domain. The second condensation domain of MgpC clades with starter C domains and may be the one responsible for capping of the free amine with the α-ketoisovaleric acid. All C domains possess the conserved catalytic motif HHxxxDG motif65,66 except for this second C domain in MgpC which contains the variation HHxxxDR (ESI Fig. S45†). All KS domains contain the catalytic triad of Cys-His-His essential for decarboxylative condensation and are thus predicted to be active (ESI Fig. S48†).
Finally, MgpI encodes a thioesterase that could have proof reading function67 as it is common for PKS-NRPS systems. MgpJ encodes a phosphopantetheinyl transferase that likely serves to activate PUFA, PKS and NRPS carrier proteins. The role of MgpM, a putative metallophosphoesterase is unclear.
In terms of the biosynthetic capacity of our collection, terpene, and phosphonate BGCs are the most conserved (Fig. 3). NRPS and RiPP BGCs are also abundant but tend to show monophyletic distribution; thus, to find new NRPS and RiPP BGCs, taxonomic diversity is important. In contrast, PKS and PKS-NRPS gene cluster families are the rarest in the collection (Fig. 2B and C). The largest diversity of PKS-NRPS BGCs was found in the P. megapolitana/acidicola clade, where each strain contained two such BGCs, with the mgp BGC being conserved in all three strains in this clade. Because we did not detect potential products of the mgp BGC in the wild-type strains, we turned instead to heterologous expression in a Burkholderia sp. strain (Fig. 4) which resulted in the discovery of megapolipeptins A (1) and B (2) at 0.6 and 1.5 mg L−1 isolated yields, respectively (Fig. 5). This work expands recent genome mining efforts in P. megapolitana/acidicola.27,47,68
Megapolipeptins are bolaamphiphilic lipopeptides, that is, they exhibit a hydrophobic center and hydrophylic groups at each end of the molecule, such as the recently discovered bolagladins.69,70 We proposed biosynthetic hypotheses based on the structural features of megapolipeptins and the genetic information within the encoding BGC (Fig. 6), which serves as a starting point for future studies aimed at unraveling novel mechanisms of PKS-NRPS-PUFA biosynthesis. Despite choosing a gene cluster encoding an unusual enzyme combination for novelty, the isolated megapolipeptins 1 and 2 did not exhibit significant activity in the assays tested. Due to the low similarity of megapolipeptins to known compounds, it is difficult to predict their bioactivity. The top NP Atlas hits (ESI Table S14†) include herbicidal rotihibins (Tanimoto similarity score of 0.58) from Streptomyces scabis and siderophores crochelin (0.57) from Azotobacter chroococcum and megapolibactins (0.56) from Paraburkholderia megapolitana, none of which are bolaamphiphiles. Bolaamphiphile bolagladins showed antibacterial activity but display an even lower Tanimoto similarity score of 0.39. Future studies should aim at expanding the scope of tested assays beyond the ones conducted in this study. By doing so, we may uncover hidden aspects and functionalities of PKS-NRPS-PUFA products.
In summary, the low structural similarity of megapolipeptins to known natural products supports our Burkholderiales genomics-driven and synthetic biology-enabled pipeline for uncovering novel natural products from silent BGCs.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4sc03594a |
‡ These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2024 |