Fabrizio
Alberti
*ab,
Daniel J.
Leng
b,
Ina
Wilkening
b,
Lijiang
Song
b,
Manuela
Tosin
b and
Christophe
Corre
*ab
aWarwick Integrative Synthetic Biology Centre and School of Life Sciences, University of Warwick, Coventry, CV4 7AL, UK. E-mail: F.Alberti@warwick.ac.uk; C.Corre@warwick.ac.uk
bDepartment of Chemistry, University of Warwick, Coventry, CV4 7AL, UK
First published on 19th October 2018
In this study, we report the rapid characterisation of a novel microbial natural product resulting from the rational derepression of a silent gene cluster. A conserved set of five regulatory genes was used as a query to search genomic databases and identify atypical biosynthetic gene clusters (BGCs). A 20-kb BGC from the genetically intractable Streptomyces sclerotialus bacterial strain was captured using yeast-based homologous recombination and introduced into validated heterologous hosts. CRISPR/Cas9-mediated genome editing was then employed to rationally inactivate the key transcriptional repressor and trigger production of an unprecedented class of hybrid natural products exemplified by (2-(benzoyloxy)acetyl)-L-proline, named scleric acid. Subsequent rounds of CRISPR/Cas9-mediated gene deletions afforded a selection of biosynthetic gene mutant strains which led to a plausible biosynthetic pathway for scleric acid assembly. Synthetic standards of scleric acid and a key biosynthetic intermediate were also prepared to confirm the chemical structures we proposed. The assembly of scleric acid involves two unique condensation reactions catalysed by a single NRPS module and an ATP-grasp enzyme that link a proline and a benzoyl residue to each end of a rare hydroxyethyl-ACP intermediate, respectively. Scleric acid was shown to exhibit moderate inhibition activity against Mycobacterium tuberculosis, as well as inhibition of the cancer-associated metabolic enzyme nicotinamide N-methyltransferase (NNMT).
Despite the conspicuous number of specialised metabolites isolated from actinomycetes, only a small fraction of the natural products ‘encrypted’ at the DNA level has been exploited to date. Experimental characterisation of the biosynthetic product of a BGC is often laborious and time-consuming particularly due to the uniqueness of every microorganism. Protocols for introducing DNA into bacterial cells are species-dependent and often ineffective. Their optimisation can take years but many culturable micro-organisms remain genetically intractable. This prevents the exploitation of BGCs using many of the previously reported strategies.1 In addition, the biosynthesis of specialised metabolites is often tightly controlled at the transcriptional level. Cluster-associated transcriptional regulators that belong to the TetR-family of transcriptional repressors are particularly numerous.8 Deletions of cluster-specific TetR-like transcriptional repressors have been shown to trigger overproduction of the corresponding specialised metabolites, as previously reported for the antibiotics methylenomycin and coelimycin in Streptomyces coelicolor A3(2) and for the urea-containing gaburedins in Streptomyces venezuelae.9–12
Genetic manipulation of Streptomyces genomes has classically been accomplished using established but often laborious protocols optimised for specific bacterial strains.13 In recent years however, targeted genome editing has been revolutionised by the advent of clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated (Cas) systems, which allow generation of clean genomic deletions/insertions.14 Toolkits for editing streptomycete genomes have been developed,15–17 enabling researchers to overcome the issues associated with classical methods of gene disruption, in particular when multiple mutation events are desirable.18 The number of available selectable markers and issues with potential restoration of the wild-type configuration due to occurrence of single-crossover events were notable limitations.
Here, we report a genome mining strategy based on the identification of a conserved regulatory cassette for selecting and characterising BGCs. A specific BGC was first captured and transferred into a validated Streptomyces heterologous host where CRISPR/Cas9-mediated genome editing was employed to rationally derepress the expression of silent biosynthetic genes (Fig. 1). This approach was applied for the identification, isolation and structural elucidation of a novel structural class of natural products from a silent and cryptic gene cluster found in the soil-dwelling species Streptomyces sclerotialus NRRL ISP-5269, a species of filamentous bacteria first isolated in Poona (India).19
Fig. 1 Overview of the approach used in this study to characterise scleric acid, a novel natural product from a cryptic and silent gene cluster. |
In order to find gene clusters that contained regulatory cassettes homologous to the one found in the methylenomycin cluster and to assess how widespread these are, we performed searches with Multigene BLAST21 using as a query the DNA sequences of mmyR, mmfR and mmfLHP, as well as with ClusterTools,22 using as a query the protein sequences of MmyR, MmfR and MmfLHP. Fourteen actinomycete genomes were found that contained orthologues of all five genes coding for MmyR, MmfR and MmfLHP within a 50-kb region (ESI Table S4†). Remarkably, a total of 98 actinomycete genomes were found that contained orthologues of at least mmyR, mmfR and mmfL within a 50-kb region. We have previously shown that the butenolide synthase MmfL alone is sufficient to give production of functional MMF signalling molecules in S. coelicolor A3(2).9 Additionally, a functional regulatory system that controls biosynthesis of coelimycin antibiotics in S. coelicolor A3(2) has been characterised that includes the butenolide synthase ScbA and the two TetR-like transcriptional regulators ScbR and ScbR2,11 which are orthologues of MmfL, MmfR and MmyR respectively. Hence the regulatory systems that include orthologues of mmyR, mmfR and mmfL are also putatively functional, and the biosynthetic products they regulate expression of could also be studied via manipulation of these regulatory cassettes.
Among the hits generated, the gene cluster from S. sclerotialus NRRL ISP-5269, named hereafter scl cluster, was chosen for further study; the nucleotide sequence containing the scl cluster was available from the GenBank accession number JOBC01000043.1. In addition to homologues of the five genes used as query, the genetic organisation of the scl gene cluster included two adjacent and divergent operons of biosynthetic genes (Fig. 2a). A combination of AntiSMASH23 and manual BLASTp24 analyses indicated the putative borders of the scl cluster (Fig. 2a and Table 1). The predictive power of these modern bioinformatics tools often permits to deduce the chemical structure(s) of cryptic gene cluster products, particularly when modular systems such as type I modular polyketide synthases (PKS) or non-ribosomal peptide synthetases (NRPS) direct the biosynthesis.25 However, the originality of the scl cluster prevented such predictions and we worked on the assumption that a lack of bioinformatics prediction was more likely to result in a more structurally diverse and therefore truly novel natural product. The cluster spanned a region of 19782 bp, and comprised 18 putative genes: 11 biosynthetic genes, 6 genes for regulation and 1 gene coding for a membrane transporter (Fig. 2a and Table 1).
Protein (number of aa) GenBank | Homologue (% identity/% similarity) organism GenBank | Putative function | Proposed role |
---|---|---|---|
SclP (213) WP_037773640.1 | 4-Phosphopantetheinyl transferase (54/61) Streptomyces pristinaespiralis WP_078951206.1 | PPTase | Biosynthesis of the glycolic acid unit |
SclQ4 (386) WP_030624999.1 | QncL (37/49) Streptomyces melanovinaceus AFJ11255.1 (ref. 34) | Lipoyl attachment domain, acyltransferase catalytic domain | |
SclQ3 (338) WP_030625001.1 | QncL (51/68) S. melanovinaceus AFJ11255.1 (ref. 34) | Pyrimidine binding domain, transketolase C-terminal domain | |
SclQ2 (305) WP_078889003.1 | QncN (61/73) S. melanovinaceus AFJ11257.1 (ref. 34) | ThDP binding domain | |
SclQ1 (76) WP_030625009.1 | QncM (32/68) S. melanovinaceus AFJ11256.1 (ref. 34) | Acyl carrier protein (ACP) | |
SclM1 (200) WP_078889004.1 | MmfR (56/73) Streptomyces coelicolor A3(2) WP_011039544.1 (ref. 9 and 10) | TetR-family transcriptional repressor | Regulation of the expression of scleric acid biosynthetic genes |
SclM2 (336) WP_078889005.1 | MmfL (35/47) S. coelicolor A3(2) WP_011039545.1 (ref. 9 and 10) | Signaling molecule biosynthesis (butenolide synthase) | |
SclL (106) WP_051872433.1 | LysR family transcriptional regulator (61/72) S. venezuelae WP_015035381.1 | LysR transcriptional regulator | |
SclM3 (228) WP_078889008.1 | MmfP (46/56) S. coelicolor A3(2) WP_011039547.1 (ref. 9 and 10) | Signaling molecule biosynthesis (hydrolase) | |
SclM4 (196) WP_051872434.1 | MmyR (41/56) S. coelicolor A3(2) WP_011039548.1 (ref. 9 and 10) | TetR-family transcriptional repressor | |
SclM5 (378) WP_051872435.1 | MmfH (49/59) S. coelicolor A3(2) WP_011039546.1 (ref. 9 and 10) | Signaling molecule biosynthesis (oxidoreductase) | |
SclN (1083) WP_078889006.1 | PuwA (32/48) Cylindrospermum alatosporum CCALA 988 AIW82277.1 (ref. 32) | NRPS [C-A-PCP] | Activation of L-proline and condensation with glycolic acid |
SclT (243) WP_030625041.1 | Thioesterase (45/56) Streptomyces sviceus WP_007379259.1 | Thioesterase (TE) | Hydrolytic release of scleric acid from a carrier protein |
SclA (656) WP_051872438.1 | PauY18 (56/67) Streptomyces sp. YN86 AIE54238.1 (ref. 33) | Anthranilate synthase | Biosynthesis of the benzoic acid unit |
SclD (405) WP_030625030.1 | PauY21 (55/65) Streptomyces sp. YN86 AIE54241.1 (ref. 33) | DAHP synthase | |
SclI (220) WP_030625033.1 | PauY19 (57/71) Streptomyces sp. YN86 AIE54239.1 (ref. 33) | Isochorismatase | |
SclG (439) WP_030625035.1 | ATP-grasp domain-containing protein (49/62) Streptomyces sp. NRRL B-5680 WP_051746523.1 | ATP-grasp family enzyme | Condensation reaction of the proline unit with benzoic acid |
SclE (435) | Major Facilitator Superfamily transporter (53/65) Amycolatopsis vancoresmycina WP_003090471.1 | MFS transporter | Export of scleric acid outside the cell |
In order to establish the stereochemistry of the proline residue, scleric acid was hydrolysed and derivatised with Marfey's reagent.30L- and D-proline were also derivatised using the same procedure and used as standards for HPLC comparison. Approximately 95% of the proline residue of scleric acid purified from S. albus/scl ΔsclM4 was found to correspond to L-proline (ESI Fig. S10†). To confirm the proposed structure of scleric acid, an authentic standard was synthesised (see ESI Fig. S20† for a schematic representation of the synthetic route). A structural analogue, 2-((benzoyl-L-prolyl)oxy)acetic acid, possibly consistent with the initial NMR data obtained, was also synthesised (see ESI Fig. S21† for a schematic representation of the synthetic route and ESI Fig. S11–S15† for NMR spectra). LC-MS analyses and NMR data of both of these compounds unequivocally confirmed the proposed structure for scleric acid (Fig. 3); the analogue revealed different NMR spectra and its physico-chemical properties resulted in a different retention time on LC-MS. Moreover, two sets of NMR signals were observed for the purified natural product as well as for the synthetic standard and revealed that scleric acid existed as two different rotamers, trans- and cis-scleric acid (Fig. 3e and ESI Fig. S5†). This is consistent with literature data for synthetic N-benzoyl-L-proline methyl ester where a 4:1 mixture of the two rotamers was observed.31
The four proteins encoded by genes sclQ1-4 showed high homology to a set of three enzymes – QncN, QncL and QncM – from Streptomyces melanovinaceus. This set of enzymes has been shown to direct the biosynthesis and attachment of a C2-glycolicacyl unit to a non-ribosomal peptide.34 More specifically, SclQ2 was homologous to QncN, a thiamin diphosphate (ThDP) binding domain. SclQ3 was homologous to the first two N-terminal domains of QncL, a pyruvate dehydrogenase/transketolase pyrimidine binding domain and a transketolase C-terminal domain while SclQ4 was homologous to the last two domains of QncL, a lipoyl attachment domain and an acyltransferase catalytic domain. Lastly, SclQ1 was homologous to the acyl carrier protein (ACP) QncM. Based on the homology of SclQ1-4 with QncN, QncL and QncM, we propose that SclQ1-4 are overall responsible for converting of a ketose phosphate from the primary metabolism (such as xylulose-5-phosphate) into the activated glycolic acid unit found in scleric acid.
The three-gene cassette made of sclA, sclD and sclI showed high homology to genes involved in biosynthesis of a benzoic acid unit in Streptomyces sp. YN86.33 Specifically, SclA showed high similarity to the anthranilate synthase enzyme PauY18, SclD to the DAHP synthase PauY21 and SclI to the isochorismatase PauY19. Overall these three enzymes were hypothesised to be responsible for biosynthesis of the benzoyl group found in scleric acid via chorismate as an intermediate.
SclN was predicted to be a NRPS enzyme consisting of a single minimal elongation module: a putative, atypical condensation domain (C), an adenylation domain (A) and a peptidyl carrier protein (PCP) domain.35 The SclN A-domain was predicted to specifically activate L-proline, which was in accordance with the presence of a L-proline residue in scleric acid.25 The SclN C-domain was proposed to catalyse the amide bond formation between L-proline and glycolic acid.
Other genes putatively involved in biosynthesis and export of scleric acid and present in the scl cluster are: sclT, sclG and sclE. The thioesterase SclT is predicted to release the L-proline–oxyacetic acid intermediate from the PCP-domain of SclN. We propose that the ATP-grasp family enzyme SclG would bind and activate the benzoic acid unit produced by SclADI. That same enzyme would also promote condensation of the benzoyl unit with L-proline–oxyacetic acid, giving scleric acid. This would be exported out of the cell by the putative MFS transporter SclE.
In order to confirm the proposed involvement of the enzymes SclN, SclA and SclQ1-4 in the biosynthesis of the building blocks that make up scleric acid, we constructed gene deletion mutants in strains where the transcriptional repressor sclM4 has also been inactivated (S. albus/scl ΔsclM4 background). Plasmids pCm2-sclN, pCm2-sclA and pCm2-sclQ1-4 were assembled and used to generate double mutant strains S. albus/scl ΔsclM4 ΔsclN, S. albus/scl ΔsclM4 ΔsclA and S. albus/scl ΔsclM4 ΔsclQ1-4. Deletions were confirmed by PCR screening (ESI Fig. S3†). UHPLC-HRMS analysis revealed that production of scleric acid was abolished in S. albus/scl ΔsclM4 ΔsclN and S. albus/scl ΔsclM4 ΔsclA (Fig. 3a), confirming the essential role of SclN and SclA. Residual scleric acid production was detected from S. albus/scl ΔsclM4 ΔsclQ1-4; this could be explained by the fact that glycolic acid is known to be produced by Streptomyces species in particular for the biosynthesis of N-glycolylmuramic acid.36 Addition of 5 mM glycolic acid to the culture medium of S. albus/scl ΔsclM4 ΔsclQ1-4 also resulted in scleric acid being produced in similar level to that observed with S. albus/scl ΔsclM4 (ESI Fig. S16†).
The identification of key precursors in scleric acid biosynthesis was also exploited to further increase the titres of scleric acid produced by S. albus/scl ΔsclM4. Enriching the culture medium with 5 mM L-proline, 5 mM benzoic acid or 5 mM glycolic acid significantly increased levels of scleric acid observed upon UHPLC-HRMS analysis of the ethyl acetate extracts compared to those observed with S. albus/scl ΔsclM4 grown on the standard supplemented minimal medium (ESI Fig. S17†). The strategy of manipulating the pathway-specific transcriptional regulatory system also makes scleric acid production not reliant on a complex culture medium. Importantly the utilisation of supplemented minimal media does significantly facilitate the isolation of the natural product of interest.
In support to this proposed pathway, we investigated by UHPLC-HRMS accumulation of the L-proline–oxyacetic acid intermediate from the ethyl acetate extracts of the scleric acid producing strain, as well as of the double mutant strains. A compound with a retention time of 3.0 minutes on C18 reverse phase HPLC column and an m/z value of 174.0763 [M(C7H12NO4) + H]+ (calculated m/z of 174.0761) was detected in the scleric acid producing strain S. albus/scl ΔsclM4, as well as in the double mutants S. albus/scl ΔsclM4 ΔsclQ1-4 and S. albus/scl ΔsclM4 ΔsclA (ESI Fig. S18†). Consistent with the predicted function of the L-proline-activating NRPS SclN, strain S. albus/scl ΔsclM4 ΔsclN did not show any accumulation of L-proline–oxyacetic acid. In order to further confirm the identity of this intermediate, which was detected from the crude extracts in amounts not sufficient for HPLC purification and subsequent NMR characterisation, a synthetic standard was prepared (see ESI Fig. S22† for a schematic representation of the synthetic route) and run alongside the crude extracts on UHPLC-HRMS. This showed the same retention time and mass spectrum as the natural product L-proline–oxyacetic acid intermediate (ESI Fig. S18†). Moreover, we grew S. albus/scl ΔsclM4 ΔsclN, unable to produce scleric acid, in the presence of 5 mM L-proline–oxyacetic acid. Feeding the intermediate to the mutant strain restored production of scleric acid, as visible from UHPLC-HRMS analysis of its acidified ethyl acetate extracts (ESI Fig. S19†). This provides additional evidence that L-proline–oxyacetic acid is a true intermediate in scleric acid biosynthesis. It also suggests that the L-proline–oxyacetic acid is released from the C-domain of SclN prior to SclG catalysing its condensation with the benzoyl group, in accordance with the order of reactions proposed in Fig. 4.
Scleric acid was then tested for a broader range of pharmaceutically relevant bioactivities through the Eli Lilly Open Innovation Drug Discovery (OIDD) Program. In a single point (20 μM) primary assay, scleric acid showed moderate antibacterial activity against Mycobacterium tuberculosis (H37Rv), exhibiting a 32% inhibition on the growth of this strain.
Scleric acid showed inhibitory activity on the cancer-associated metabolic enzyme nicotinamide N-methyltransferase (NNMT), the overexpression of which is known to contribute to tumorigenesis.37 NNMT catalyses the transfer of a methyl group from S-adenosyl-L-methionine (SAM) to nicotinamide, generating S-adenosyl-L-homocysteine (SAH) and 1-methylnicotinamide (MNAN).37 Scleric acid showed, on a concentration response curve assay, IC50 of 178.0 μM (NNMT MNAN) and 186.6 μM (NNMT SAH) (ESI Fig. S4†).
The widespread presence of orthologues of the methylenomycin regulatory genes among actinomycete genomes (ESI Table S4†) revealed that the approach described herein might be very promising for the discovery and characterisation of novel natural products, and therefore, of novel biocatalysts. Comparative genomics analyses also indicated that there is no apparent correlation between the presence of the regulatory cassette we targeted and the type of natural products that they regulate production of – both in relation to biosynthesis and bioactivity – methylenomycin,9 gaburedins12 and scleric acid being examples of natural products characterised so far.
In conclusion, beyond the discovery of this specialised metabolite, we strongly believe that targeting conserved pathway-specific regulatory elements, as opposed to mining BGCs encoding defined enzymatic machineries (i.e. PKS, NRPS), will lead to the identification and characterisation of microbial natural products assembled by truly novel types of biocatalysts.
S. cerevisiae VL6-48N was used for TAR cloning and grown on yeast extract peptone (YPD) broth (5 g L−1 yeast extract, 10 g L−1 peptone, 2% w/v glucose) or YPD agar (same as YPD, with 15 g L−1 agar). Purification of genomic DNA from S. sclerotialus was performed from a 100 mL liquid culture by phenol-chloroform extraction.13 The scl gene cluster was captured using TAR cloning.26 Assembly of plasmid pCAP03-scl was performed following the procedure described by Moore and colleagues; pCAP03 was a gift from Bradley Moore (Addgene plasmid # 69862) (see ESI Table S2† for a list of plasmids used in this study).27 For this purpose String DNA fragments (Thermo Fisher Scientific) were ordered to include 60-bp hooks homologous to either side of the scl cluster (ESI Table S2†) and introduced into pCAP03 via Gibson Assembly (New England Biolabs). The identity of the captured cluster was confirmed by PCR amplification and restriction digestion (ESI Fig. S1†). Insertion of the scl gene cluster in the genome of the heterologous hosts S. albus and S. coelicolor was accomplished via intergeneric tri-parental conjugation following the protocol described by Moore and colleagues.27 CRISPR/Cas9-based engineering of S. albus strains was performed using plasmids pCm2-sclM4, pCm2-sclN, pCm2-sclA and pCm2-sclQ1-4. Golden Gate Assembly was first performed to insert the specific sgRNAs into the backbone pCm2 plasmid, then Gibson Assembly was used to include 800-bp homologous recombination arms, all following the procedure described by Zhao and colleagues; pCRISPomyces-2 was a gift from Huimin Zhao (Addgene plasmid # 61737).15 Clearance of temperature sensitive plasmids based on pCm2 was achieved by culturing the mutant strains on SFM agar medium non-selectively at 39 °C.
UHPLC-HRMS analyses were carried out with 20 μL of prepared extracts injected through a reverse phase column (Zorbax Eclipse Plus C18, size 2.1 × 100 mm, particle size 1.8 μm) connected to a Dionex 3000RS UHPLC coupled to Bruker Ultra High Resolution (UHR) Q-TOF MS MaXis II mass spectrometer with an electrospray source. Sodium formate (10 mM) was used for internal calibration and a m/z scan range of 50–1500 was used with a gradient elution from 95:5 solvent A/solvent B to 0:100 solvent A/solvent B over 10 minutes. Solvents A and B were water (0.1% HCOOH) and acetonitrile (0.1% HCOOH), respectively.
Pre-purification of crude extract containing scleric acid was performed using flash chromatography. A column was loaded with C18-reversed phase silica gel, preconditioned with one volume of methanol, activated with one volume of solvent B (0.045% v/v trifluoroacetic acid in acetonitrile), and equilibrated with two volumes of solvent A (0.045% v/v trifluoroacetic acid in water). Crude extract was loaded onto the column. Compounds were then eluted with five different consecutive solvent systems: two volumes of 20:80 solvent B/solvent A, two volumes of 40:60 solvent B/solvent A, two volumes of 50:50 solvent B/solvent A, two volumes of 60:40 solvent B/solvent A and two volumes of 80:20 solvent B/solvent A. Fractions were collected throughout the elution steps, evaporated under reduced pressure and dissolved in 500 μL of 50:50 (v/v) HPLC grade methanol/water for UHPLC-HRMS analysis. Fractions containing scleric acid were combined and used for HPLC purification.
Reverse-phase HPLC was performed using a Zorbax XBD-C18 column (212 × 150 mm, particle size 5 μm) connected to an Agilent 1200 HPLC equipped with a binary pump and DAD detector. Solvent A: 0.1% TFA water, solvent B: 0.1% TFA in acetonitrile, 5% B to 95% B in 45 min. Retention time compound 1: 29.7 min, retention time compound 2 (scleric acid): 34.4 min. Gradient elution was used (solvent A: water with 0.1% HCOOH, solvent B: methanol) with a flow rate of 10 mL min−1. Fractions were collected by time or absorbance at 210 nm using an automated fraction collector. The fractions collected containing scleric acid were pooled, methanol removed under reduce pressure and scleric acid was re-extracted from the remaining water (2 × 50 mL ethyl acetate). The ethyl acetate was removed under reduced pressure and the sample re-dissolved in deuterated methanol for NMR analysis.
Footnote |
† Electronic supplementary information (ESI) available: Supplementary methods and results; Tables S1–S6; Fig. S1–S22. See DOI: 10.1039/c8sc03814g |
This journal is © The Royal Society of Chemistry 2019 |