Leping
Sun†
,
Xingyun
Ma†
,
Binliang
Zhang
,
Yanjia
Qin
,
Jiezhao
Ma
,
Yuhui
Du
and
Tingjian
Chen
*
MOE International Joint Research Laboratory on Synthetic Biology and Medicines, School of Biology and Biological Engineering, South China University of Technology, 510006, Guangzhou, China. E-mail: chentj@scut.edu.cn
First published on 9th August 2022
Nucleic acids have been extensively modified in different moieties to expand the scope of genetic materials in the past few decades. While the development of unnatural base pairs (UBPs) has expanded the genetic information capacity of nucleic acids, the production of synthetic alternatives of DNA and RNA has increased the types of genetic information carriers and introduced novel properties and functionalities into nucleic acids. Moreover, the efforts of tailoring DNA polymerases (DNAPs) and RNA polymerases (RNAPs) to be efficient unnatural nucleic acid polymerases have enabled broad application of these unnatural nucleic acids, ranging from production of stable aptamers to evolution of novel catalysts. The introduction of unnatural nucleic acids into living organisms has also started expanding the central dogma in vivo. In this article, we first summarize the development of unnatural nucleic acids with modifications or alterations in different moieties. The strategies for engineering DNAPs and RNAPs are then extensively reviewed, followed by summarization of predominant polymerase mutants with good activities for synthesizing, reverse transcribing, or even amplifying unnatural nucleic acids. Some recent application examples of unnatural nucleic acids with their polymerases are then introduced. At the end, the approaches of introducing UBPs and synthetic genetic polymers into living organisms for the creation of semi-synthetic organisms are reviewed and discussed.
Fig. 2 Chemical structures of the representative UBPs. (A) Hydrogen-bonded UBPs. (B) Non-hydrogen-bonded hydrophobic UBPs. |
Method | Advantages | Disadvantages | Application examples in polymerase evolution | Ref. |
---|---|---|---|---|
—: no application example in polymerase evolution yet. | ||||
Error-prone PCR | Simple and easy to implement | Base bias of mutagenesis | Klentaq M1 | 71 and 76–81 |
Universality | Lack of continuous mutations | Klentaq M2 | ||
No requirement for structural information of the target protein | Small actual sequence sampling space | SFM4-3 | ||
SFM4-6 | ||||
SFM4-9 | ||||
Taq T8 | ||||
Taq H15 | ||||
Taq M1 | ||||
Taq M4 | ||||
Tth SΔTthCs12RsEx pol mutants | ||||
Phi29 DNAP Mut | ||||
DNA shuffling with fragmentation by DNase I | Simple and easy to implement | Requirement for high sequence homology of the parental proteins | — | 82 |
No requirement for structural information of the target protein | Hard to control fragmentation | |||
Low recombination frequency | ||||
Family shuffling | Parental proteins can be from different species | Requirement for high sequence homology of the parental proteins | Taq/Tth/Tfl 5D4 | 83, 93 and 95 |
Larger functional sequence space can be sampled | Low recombination frequency | Bst LF/Klentaq v5.9 v7.16 | ||
No requirement for structural information of the target protein | ||||
Nucleotide exchange and excision technology (NExT) | Good controllability of the DNA fragment sizes during fragmentation of the parental sequences | Restricted digestion sites during fragmentation of the parental sequences | — | 85 |
Staggered extension process (StEP) | Simple and easy to implement | Requirement for high sequence homology of the parental proteins | SFM4-3 | 77–79 and 86 |
SFM4-6 | ||||
SFM4-9 | ||||
Taq T8 | ||||
Taq H15 | ||||
Taq M1 | ||||
Taq M4 | ||||
Synthetic shuffling/assembly of designed oligonucleotides (ADO) | Highly combinatorial DNA library | Requirement for carefully designed oligonucleotides | Pfu DNAP E10 | 80, 88, 89 and 94 |
Libraries without limits to the length and number of the parental sequences | ||||
Increased recombination resolution | ||||
Random chimeragenesis on transient templates (RACHITT) | Lower sequence homology of the parental proteins is required | Requirement for synthesized templates | KlenTaq Mut_ADL | 90 and 96 |
Higher recombination frequency | KlenTaq Mut_RT | |||
Incremental truncation for the creation of hybrid enzymes (ITCHY) | No requirement for sequence homology | Numbers of parental sequences and fragments for recombination are limited | — | 91 |
Site-saturated mutagenesis (SM) | Simple and easy to implement | Requirement for the structural information of the target protein | SFM4-3 | 69, 77, 88, 96, 97, 103, 106, 107 and 163 |
All possibilities of substituting amino acids at the mutation sites can be sampled | SFM4-6 | |||
SFM4-9 | ||||
KlenTaq Mut_ADL | ||||
KlenTaq Mut_RT | ||||
9°N-YRI | ||||
9°N-NVA | ||||
KF I709E E710G | ||||
T7 RNAP RGVG, E593G V685A | ||||
SFM19 | ||||
Combinatorial active-site saturation test (CAST) | Potential synergistic conformational effects can be taken into account | Requirement for the structural information of the target protein | — | 99 |
Iterative saturation mutation (ISM) | Small but focused high-quality mutant library for each round of evolution | Requirement for the structural information of the target protein | — | 100 |
Sequence saturation mutagenesis (SeSaM) | Consecutive point mutations | Only one randomized site in each mutant | — | 101 |
Controllable mutational bias | ||||
Controllable fragment distribution of the DNA library |
Error-prone PCR introduces random mutations into the genes of target proteins, and is one of the most frequently employed methods for creating protein libraries.71 In a typical error-prone PCR reaction, random mutagenesis is carried out simply by increasing the mutation rate of the gene during PCR amplification, which is achieved by using polymerases of low fidelity, unbalancing the concentrations of the dNTPs, using analogs of some of the dNTPs, increasing PCR cycles, enhancing the concentration of magnesium ions, and adding manganese ions.71–73 It is very important to control the mutation rate of the target gene in an error-prone PCR experiment, since the library size is restricted and only able to cover a small portion of all possible mutants due to limited transformation efficiency and screening or selection throughput, and an excessively high mutation rate usually leads to a rapid loss of protein activity during directed evolution.71,74 Error-prone PCR is especially useful for protein library construction when the structural information of the target protein is not available or sufficient to predict which exact residues are crucial for the desired activity and should be randomized or directly mutated to be certain amino acids. Also, beneficial mutations that are far away from the active site of a protein are frequently revealed from a completely randomized library.75 Error-prone PCR has already been successfully used for constructing libraries of many DNAPs, including Tth DNAP, Klentaq DNAP, Stoffel fragment (SF) of Taq DNAP, full-length Taq DNAP and phi29 DNAP.76–81
The DNA shuffling technique mimics natural hybridization or recombination processes for rapid molecular breeding of proteins by recombining the genes of homologous proteins in vitro.82 In traditional DNA shuffling experiments, the genes of two or more homologous proteins, or mutants of the same protein, are first segmented by DNase I, and then assembled to be recombined full-length genes by PCR to generate the libraries.82 Desired protein mutants with beneficial mutations accumulated and deleterious mutations reduced are then screened or selected from the DNA shuffling libraries. DNA shuffling of a set of homologous genes from different species is called family shuffling.83 The application of family shuffling on genes with relatively low homology may result in less efficient recombination, which could be improved by using restriction endonucleases, instead of DNase I, for DNA fragmentation.84 In another method called nucleotide exchange and excision technology (NExT), DNA fragmentation was achieved by dosing uridine triphosphate (dUTP) into the PCR reaction of the target DNA, excising uracil bases in the PCR product with uracil-DNA-glycosylase, and then cleaving the DNA at the positions where uracil bases were excised with piperidine.85 In this method, the size distribution of the DNA fragments could be easily controlled by the concentration of dosed dUTP. Rather than recombining the target genes by DNA fragmentation and reassembly, Arnold and co-workers developed another strategy called the staggered extension process (StEP) for DNA shuffling.86 In this strategy, the target genes were mixed and subjected to the PCR reaction with a shortened extension time in each PCR cycle, which led to frequent template switching for primer elongation before the elongation reached full-length of the genes every time. In another method for generating recombination libraries, random-priming recombination (RPR), random-priming synthesis is used to generate short gene fragments containing low levels of point mutations to be assembled.87 Synthetic shuffling, in which degenerate oligonucleotides encoding all the variations in the parental genes are used to assemble the mutants, has been demonstrated to be an effective library creation method for evolving highly chimeric enzymes.87,88 In another study, Reetz and co-workers created recombination libraries of proteins by the assembly of designed oligonucleotides (ADO), in which the oligonucleotides for assembly were designed based on sequence information to control the overlapping process and increase the recombination frequency.89 Random chimeragenesis on transient templates (RACHITT) method was developed for creating DNA shuffling libraries with unprecedentedly high recombination frequency.90 In this method, fragments of homologous genes were first annealed onto a transient DNA template, and regions not hybridizing with the template were digested by the nuclease activities of DNAPs. After gap filling, ligation of the nicks, and template destroying, the chimeric library was PCR amplified, cloned, and subjected to screening or selection. Methods for creating homology-independent recombination libraries have also been developed. For example, Benkovic and co-workers developed a method called incremental truncation for the creation of hybrid enzymes (ITCHY), in which the parental genes with low homology were incrementally truncated with exonuclease III first, and then the gene fragments were fused to generate the hybrid library.91 There have already been some successful examples of applying these artificial gene recombination strategies in the evolution of polymerase mutants, including the generation of Taq/Tth/Tfl DNAP variant 5D4, Pfu DNAP variant E10, Bst LF/Klentaq DNAP variants v5.9 and v7.16, Stoffel fragment variants SFM4-3, SFM4-6, and SFM4-9, Taq DNAP variants T5, H8, M1 and M4, Klentaq DNAP variants Mut_ADL and Mut_RT.77–79,92–96
For proteins with more information on structure and structure–activity relationship (SAR) available, semi-rational approaches may be applied for library design and creation to decrease the size of the library to be screened. Site-saturated mutagenesis is broadly used for creating protein libraries in which one or multiple specific amino acid residues that are closely related with desired properties, such as activity, thermal stability, and substrate specificity, of the parental protein are randomized based on structural analysis.97 Oligonucleotides containing randomized degenerate codons, which help further decrease the library size, are used to introduce random mutations into target amino acid residues via overlapping PCR reactions. Recently, Chaput and co-workers demonstrated that the identification of key residues to be mutated could be greatly facilitated by computational analysis of homologous polymerase mutants.98 When there is a synergistic effect of mutations at multiple residues of the parental protein, it is helpful to carry out site-saturation mutagenesis on these residues simultaneously to increase the probability of obtaining protein mutants with desired properties. In the combinatorial active-site saturation test (CAST), protein libraries are generated by simultaneous randomization of groups of two amino acid residues spatially close to each other around the active site, which allows the screening for combinations of side chains on these residues with an optimal synergistic conformational effect.99 To reduce the effort for screening protein libraries with multiple amino acid residues or focused regions to be randomized, the iterative saturation mutation (ISM) method has been developed.100 In this method, rationally chosen sites crucial for the desired properties, each of which consisted of one, two, or three residues, were subjected into iterative cycles of site-saturation mutagenesis and screening. In each cycle, only one site was randomized and screened, which greatly reduced the library size and labor force of screening. A sequence saturation mutation (SeSaM) method was developed to create protein libraries with mutants containing random mutations at every single nucleotide position of the target sequence.101 In this method, the target sequence was segmented to fragments with different lengths first, and the fragments were then 3′ tailed with universal nucleobase using terminal transferase, and elongated to full-length genes. During subsequent PCR amplification of the elongation product, the universal bases were replaced by random standard nucleotides. Some of these semi-rational strategies have been successfully used to obtain polymerases with improved unnatural activities, including variants of Tgo DNAP, KOD DNAP, Deep Vent DNAP, 9°N DNAP, Stoffel fragment of Taq DNAP, full-length Taq DNAP, Klentaq DNAP, Klenow fragment (KF) of E. coli DNAP and T7 RNAP.77,96,98,102–107
Method | Advantages | Disadvantages | Application examples | Ref. |
---|---|---|---|---|
Multi-well plate screening | Simple | Time consuming | SFM4-3 | 77, 102, 104, 106 and 115 |
Direct identification of single active mutants | Limited screening throughput | SFM4-6 | ||
SFM4-9 | ||||
Tgo RT-TKK | ||||
Tgo RT-C8 | ||||
Tgo Pol6G12 | ||||
Tgo PolC7 | ||||
Tgo PolD4K | ||||
Tgo RT521K | ||||
Tgo RT521 | ||||
KF I709E E710G | ||||
Taq AA40 | ||||
CSR | High throughput | Target polymerase needs to replicate the full-length of its own gene | Taq T8 | 78, 80, 81, 95, 116 and 167 |
Simple | High temperature is usually needed to break the emulsified cells | Taq H15 | ||
Tth SΔTthCs12RsEx pol mutants | ||||
Phi29 DNAP Mut | ||||
Bst v5.9 | ||||
Bst v7.16 | ||||
KOD RTX | ||||
KOD RTX-Ome v6 | ||||
spCSR | High throughput | High temperature is usually needed to break the emulsified cells | Pfu DNAP E10 | 94 and 115 |
Target polymerase only needs to replicate a part of its own gene | Taq AA40 | |||
Reduced adaptive burden | ||||
Tunable selection stringency | ||||
Improved selection sensitivity and versatility | ||||
CST | High throughput | High temperature is usually needed to break the emulsified cells | Tgo Pol6G12 | 104 |
Allows the selection for activities towards difficult nucleoside triphosphate substrates and under challenging conditions | Plasmid DNA has to be used as the extension template for the tagging primer | Tgo PolC7 | ||
Tgo PolD4K | ||||
Tgo RT521K | ||||
Tgo RT521 | ||||
CPR | High throughput | High temperature is usually needed to break the emulsified cells | T7 RNAP CGG-R7-8 | 117 |
Expanded scope of proteins to be evolved | Challenging to design genetic circuits | T7 RNAP CGG-R12-KIRV | ||
Mitigated effect on host fitness | ||||
CBL | High throughput | High temperature is usually needed to break the emulsified cells | Tgo RT-TKK | 102 |
Suitable for evolving various reverse transcription activities | Experiment complexity | Tgo RT-C8 | ||
Phage display | High throughput | The target polymerase needs to be actively displayed on phage | SFM4-3 | 77, 125 and 129 |
Kinds of the nucleic acid template, primer and nucleoside triphosphates for selection can all be well controlled | SFM4-6 | |||
Adjustable selection stringency | SFM4-9 | |||
Rapid reproduction of phage | SFR1 | |||
SFR2 | ||||
SFR3 | ||||
Phi29 DNAP | ||||
PACE | High throughput | Experiment complexity | T7 RNAP A6-36.4 | 131 |
Rapid reproduction of phage | Expensive facilities | |||
Continuous evolution | Challenging to design genetic circuits | |||
Minimal researcher intervention | ||||
Rapid evolutionary cycle | ||||
Cell surface display | High throughput | The target polymerase needs to be actively displayed on cell surface | KF I709E E710G | 106 |
Expanded scope of polymerases to be displayed for selection |
Multi-well plate screening methods for screening polymerase variants were developed by immobilizing a primer/template complex on the bottom surface of the wells, and extending the primer with certain nucleoside triphosphate substrates using cell lysate of each polymerase mutant in each well.77,104 The success primer extension led to the incorporation of fluorescent, biotinylated, or digoxigenin (DIG)-labelled nucleotides or the annealing of the extension product with labelled oligonucleotides, which could then be detected by reading the fluorescence or by binding with a DIG antibody or streptavidin-coupled enzyme and assaying the activity of this enzyme. Although single clones of active polymerases can be directly identified with these methods, the throughput is limited, which makes these methods more useful for screening pre-enriched or small focused polymerase libraries. For example, variants of Stoffel fragment, Taq DNAP and Tgo DNAP, have been identified with these methods from focused libraries or libraries pre-enriched with other high-throughput selection methods,77,102,104,115 which will be introduced below.
Emulsion or microfluidic system-based compartmentalization technology has been extensively used to develop novel methods for polymerase evolution (Fig. 3). For example, Holliger and co-workers developed a compartmentalized self-replication (CSR) method, in which a water-in-oil emulsion system was employed to confine PCR amplification of the gene of each polymerase mutant by the expressed protein of itself in an emulsion compartment, which led to rapid enrichment of polymerase mutants with good activities78 (Fig. 3A). Using this system, they successfully evolved mutants of Taq DNAP with enhanced thermostability or resistance to inhibitor heparin. Later, they developed a modified version of CSR, short-patch compartmentalized self-replication (spCSR).115 In this method, only a short region of the polymerase gene was diversified and amplified during the evolution, which reduced the requirements for catalytic activity and processivity of polymerases in the early stage of evolution, and thus made this method suitable for the evolution of challenging activities. A variant of Taq DNAP, AA40, which possessed replication, transcription and reverse transcription activities, as well as an expanded substrate spectrum for 2′-modified nucleoside triphosphates, was successfully evolved with this method. Ellington and co-workers developed a modified version of CSR, high-temperature isothermal compartmentalized self-replication (HTI-CSR), in which the self-replication of the polymerase gene was realized via rolling circle amplification (RCA) instead of PCR.95 This method was successfully used to evolve a thermostable strand-displacing polymerase mutant from a shuffled library of Bst LF and Klentaq DNAP. They also developed another modified CSR method, reverse transcription-compartmentalized self-replication (RT-CSR), to evolve reverse transcription activity of a DNAP.116 In the design of this method, to realize self-replication of the polymerase gene, the polymerase mutant had to reverse transcribe several RNA nucleotides in a flank primer, which partially annealed to the polymerase gene, to produce a full-length template that could be PCR amplified with outer primers. A high-fidelity thermostable reverse transcriptase, which they called reverse transcription xenopolymerase (RTX), was then evolved from KOD DNAP with this method. To expand the CSR method for the evolution of more proteins, a compartmentalized partnered replication (CPR) method was developed by the same group117 (Fig. 3B). In the CPR method, the activity of a partner protein that needed to be evolved was linked to the expression of Taq DNAP, which in turn PCR amplified the gene of the partner protein. This method was successfully applied on the evolution of several proteins, including T7 RNAP mutants for the recognition of orthogonal promoters.118
In CSR or the derivatives of CSR introduced above, full-length or a short region of the polymerase or the partner protein gene needs to be replicated by the polymerase to fulfill the evolution process. However, in some cases, especially when the desired activities are too exotic or challenging to evolve, replication of a gene or part of it during the evolution is unrealistic or hard to be correlated with the desired activities. To address this problem, several other compartmentalization-based strategies for polymerase evolution have been developed. For example, compartmentalized self-tagging (CST) was developed to evolve polymerases for the synthesis of xenobiotic nucleic acids (XNAs)104,119 (Fig. 3C). In this method, the selection of active polymerase mutants did not rely on self-replication in the compartment, but relied on the extension of a short biotinylated primer with unnatural nucleoside triphosphates using the plasmid harboring the polymerase gene as a template. Success extension of the primer resulted in tight binding of the primer and the plasmid, and thus enabled streptavidin bead separation of the active mutants. With this method, TgoT DNAP mutants have been evolved for efficient synthesis and reverse transcription of various XNAs. Recently, Holliger and co-workers developed a compartmentalized bead labelling (CBL) method for the evolution of RNA and XNA reverse transcriptases from a DNAP mutant102 (Fig. 3D). This method employed streptavidin-coated beads to co-display two kinds of oligonucleotides, one of which was responsible for the capture of the plasmid harboring polymerase gene, and another served as the primer for the reverse transcription of an XNA/RNA template. When a polymerase mutant successfully reverse transcribed the XNA/RNA template in a compartment, the reverse transcription product would later trigger a hybridization chain reaction (HCR), resulting in intensive fluorescent labelling of the bead, which then allowed fluorescent-activated bead sorting of the beads carrying plasmids of the active polymerase mutants. Polymerase mutants efficient for the reverse transcription of 2′-OMe-RNA, HNA, D-altritol nucleic acid (AtNA), 2′-methoxyethyl-RNA (2′-MOE-RNA), and P-α-S-phosphorothioate 2′-MOE-RNA (PS 2′-MOE-RNA) were obtained using this method.
In recent years, the rapidly developing microfluidic technology has also been employed in the design of compartmentalization-based methods for polymerase evolution. In these methods, the generation of the compartments was more controllable, and the process of sorting for the active polymerase mutants could also be directly integrated into the system. For example, Chaput and co-workers developed microfluidic-based protein evolution methods, such as droplet-based optical polymerase sorting (DrOPS) and fluorescence-activated droplet sorting (FADS)-based methods, and used them for evolving polymerases with expanded function.103,114 In these methods, polymerase mutants were encapsulated in water-in-oil-in-water or water-in-oil droplets generated by microfluidics. Polymerase-catalyzed primer extension led to the removal of a fluorescent quencher DNA annealed to the fluorophore-labelled template by strand displacement. The generated fluorescence was then used as an optical signal for the sorting of active polymerase droplets.
Phage display technology was initially developed for the evolution of small peptides or proteins, including antibodies, with high affinity towards the targets, and later proved to be a powerful tool for developing methods of polymerase evolution3,111,120–124 (Fig. 3E and F). For example, Romesberg and co-workers developed a phage-display-based method for polymerase evolution, in which a polymerase mutant was displayed on one of the p3 proteins of an M13 phage particle, while the primer/template substrate was attached to other p3 proteins.77,125 The substrate attachment was accomplished either by the coupling of an acidic peptide displayed on a p3 protein with a basic peptide conjugated to the primer, or by click reaction of an unnatural amino acid p-azidophenylalanine (pAzF) displayed on a p3 protein and a cycloalkyne conjugated to the primer. When the primer was extended with unnatural nucleoside triphosphates by the polymerase mutant displayed on the same phage, biotinylated-UTPs were incorporated to the end of the extension product, which allowed subsequent streptavidin bead separation of the active polymerase mutants. Using this method, mutants of SF of Taq DNAP that efficiently synthesize and amplify various 2′-modified nucleic acids have been obtained.77,126–128 Other strategies have also been used to attach the primer/template onto the phage. For example, Delespaul and co-workers co-displayed phi29 DNAP and a modified haloalkane dehalogenase, HaloTag, on M13 phage, which allowed the attachment of a DNA substrate coupled with a haloalkane ligand.129
Other than phage-display-based methods, bacteriophages have also been used to develop other methods for directed protein evolution. For example, Liu and co-workers developed a phage-assisted continuous evolution (PACE) strategy, in which the activity of a protein to be evolved was coupled to the propagation of a bacteriophage, and used it to rapidly evolve a variety of proteins with different traits.130–135 Variants of T7 RNAP with altered promoter specificity were successfully evolved with this method by coupling M13 phage propagation with T7 RNAP-mediated transcription of the phage p3 protein.131
Cell surface display technology has been used for the evolution of numerous proteins for either enhanced affinities against certain targets or increased catalytic activities.112,136–138 Recently, the application of an E. coli cell display system for polymerase evolution was also demonstrated by Schwaneberg and co-workers.106 The Klenow fragment (KF) of E. coli DNAP was displayed on the outer membrane of E. coli cells by fusing with autotransporter proteins, and the polymerase mutant-displaying cells were directly used for screening. The activity of each polymerase mutant was checked by monitoring the fluorescence of a fluorescent dye binding with double-stranded primer-extension product in multi-well plates. With this method, a KF mutant with enhanced activity against 2′-O-methyl nucleoside triphosphates (2′-OMe-NTPs) was successfully evolved.
Polymerase | Mutation sites | Activities | Ref. |
---|---|---|---|
DNAP I from E. coli | Incorporation of K–X | 147 | |
KF of DNAP I from E. coli | Incorporation of MMO2–5SICS, NaM–5SICS, NaM–TPT3, s–z, Q–Pa, Dss–Pa and Ds–Pa | 27, 28, 31, 36, 43, 155 and 156 | |
Taq | Incorporation of NaM–5SICS, NaM–TPT3, and Z–P | 29, 31 and 148 | |
TiTaq | Incorporation of isoG–isoC | 146 | |
Taq DNAP mutant | M444V, P527A, D551E, E832V | Incorporation of Z–P | 105 |
Taq DNAP mutant | N580S, L628V, E832V | 105 | |
Taq/Tth/Tfl 5D4 | V62I, Y78H, T88S, P114Q, P264S, E303V, G389V, E424G, E432G, E602G, A608V, I614M, M761T, M775T | Incorporation of 5NI and 5NIC | 93 |
Taq M1 | G84A, D144G, K314R, E520G, F598L, A608V, E742G | PCR with 7-deaza-dGTP, FITC-12-dATP, Biotin-16-dUTP and αS-dNTPs | 79 |
SF P2 | F598I, I614F, Q489H | Incorporation of PICS–PICS | 150 |
OneTaq DNAP | Incorporation of NaM–5SICS, NaM–TPT3 | 31 and 237 | |
Deep Vent DNAP | Incorporation of NaM–5SICS, MMO2–5SICS, Ds–Px, Dss–Pn and Dss–Px | 29, 157 and 158 | |
Vent DNAP | Incorporation of Ds–Pa | 36 | |
Phusion DNAP | Incorporation of NaM–5SICS | 29 | |
Pfu E10 | V93Q, D141A, E143A, V337I, E399D, N400D, R407I, Y546H | Incorporation of Cy3- or Cy5-modified dCTP | 94 |
KOD Dash DNAP | Incorporation of dNamTPs | 141 | |
KOD DNAP mutant | D141A, E143A, A485L | Incorporation of dNamTPs | 142 |
T7 RNAP | Incorporation of m1Ψ triphosphate | 30, 34, 36, 39, 40, 145, 154–156, 160 and 238 | |
Transcription of MMO2–5SICS, NaM–5SICS, NaM–TPT3, PTMO–TPT3, CNMO–TPT3, x–y, s–y, s–z, s–Pa, Ds–Pa, Dss–Pa and Ds–Px | |||
T7 RNAP F | Y639F | Transcription of Ds–Pa and Ds-modified Pa | 159 |
T7 RNAP F-M5 | Y639F, S430P, N433T, S633P, F849I, F880Y | Transcription of Ds–Pa and Ds-modified Pa | 159 |
T7 RNAP FA-M5 | Y639F, H784A, S430P, N433T, S633P, F849I, F880Y | Transcription of Ds–Pa and Ds-modified Pa | 159 |
T7 RNAP VRS-M5 | G542V, H772R, H784S, S430P, N433T, S633P, F849I, F880Y | Transcription of Ds–Pa and Ds-modified Pa | 159 |
Transcription of 2′-F-C/U modified RNA containing modified Pa | |||
T7 RNAP FAL | Y639F, H784A, P266L | Transcription of Z–P and S–B | 149 |
RNAP II from S. cerevisiae | Transcription of NaM–TPT3 | 151 | |
AMV reverse transcriptase | Reverse transcription of NaM–TPT3 and Q–Pa | 43, 153 and 154 | |
MMLV reverse transcriptase | Reverse transcription of NaM–TPT3 | 153 | |
SuperScript II reverse transcriptase | Reverse transcription of NaM–TPT3 | 153 | |
SuperScript III reverse transcriptase | Reverse transcription of NaM–TPT3 | 154 | |
SuperScript IV reverse transcriptase | Reverse transcription of NaM–TPT3 | 153 and 154 | |
Taq Volcano2G | Reverse transcription of NaM–TPT3 | 153 | |
SFM4-3 | I614E, E615G, V518A, N583S, D655N, E681K, E742Q, M747R | Synthesis or amplification of 2′-OMe, 2′-F, 2′-Az, 2′-Cl, 2′-Am-modified DNA/RNA and ANA | 77 and 127 |
SFM4-6 | I614E, E615G, D655N, L657M, E681K, E742N, M747R | Synthesis of 2′-F-DNA, 2′-OMe-RNA | 77 |
SFM4-9 | I614E, E615G, N415Y, V518A, D655N, L657M, E681V, E742N, M747R | Reverse transcription of 2′-F-DNA, 2′-OMe-RNA | 77 |
Bst DNAP | Reverse transcription of FANA and TNA | 177–179 | |
Deep Vent DNAP | Synthesis of HNA, ANA and FANA | 177 | |
Deep Vent-RI | D141A, E143A, A485R, E664I | Synthesis of TNA | 98 |
Tgo DNAP | Synthesis of HNA, FANA and ANA | 177 and 181 | |
Incorporation of C8-alkyne-FANA UTP into FANA | |||
Tgo-RI | D141A, E143A, A485R, E664I | Synthesis of TNA | 98 |
Tgo Pol6G12 | TgoT: V589A, E609K, I610M, K659Q, E664Q, Q665P, R668K, D669Q, K671H, K674R, T676R, A681S, L704P, E730G | Synthesis of HNA and FANA | 104 and 177 |
Tgo-6G12-I521L | Pol6G12: I521L | Synthesis of HNA and FANA | 177 |
Tgo RT521 | TgoT: E429G, I521L, K726R | Synthesis of TNA | 69 and 104 |
Reverse transcription of HNA, ANA, FANA, TNA and tPhoNA | |||
Tgo RT521K | RT521: A385V, F445L, E664K | Reverse transcription of LNA and CeNA | 104 |
Tgo RT-TKK | RT521K: I114T, S383K, N735K | Reverse transcription of 2′-OMe-RNA, AtNA | 102 |
Tgo RT-C8 | RT-TKK: F493V, Y496N, Y497L, Y499A, A500Q, K501H | Reverse transcription of 2′-OMe, 2′-MOE, PS 2′-MOE-RNA, HNA, and AtNA | 102 |
Tgo PolC7 | TgoT: E654Q, E658Q, K659Q, V661A, E664Q, Q665P, D669A, K671Q, T676K, R709K | Synthesis of CeNA and LNA | 104 |
Tgo PolD4K | TgoT: L403P, P657T, E658Q, K659H, Y663H, E664K, D669A, K671N, T676I | Synthesis of FANA, ANA, TNA, HNA, PMT and RNA | 104 and 177 |
Tgo QGLK | V93Q, D141A, E143A, Y409G, A485L, E664K | Synthesis of RNA, FANA, ANA and HNA | 177 |
Tgo EPFLH | V93Q, D141A, E143A, H147E, L403P, L408F, A485L, I521L, E664H | Synthesis of PMT, ANA, TNA, RNA, FANA and tPhoNA | 69 and 177 |
KOD DNAP | Synthesis of FANA | 177 | |
KOD Dash DNAP | PCR with 4′-Thiol-dTTP and 4′-Thiol-dCTP | 48 | |
KOD DGLNK | N210D, Y409G, A485L, D614N, E664K | Synthesis of 2′-OMe-RNA and LNA | 176 |
KOD DLK | N210D, A485L, E664K | Reverse transcription of LNA and 2′-OMe-RNA | 176 |
Kod RI | D141A, E143A, A485R, E664I | Synthesis of TNA | 98 |
Kod RS | D141A, E143A, A485R, N491S | Synthesis of TNA | 174 |
Kod QS | D141A, E143A, L489Q, N491S | Synthesis of TNA | 174 |
Kod RSGA | D141A, E143A, A485R, N491S, R606G, T723A | Synthesis of FANA, ANA, HNA, TNA, C5-modified TNA, RNA and PMT | 68, 175 and 177 |
KOD RTX | F38L, R97M, K118I, M137L, R381H, Y384H, V389I, K466R, Y493L, T514I, I521L, F587L, E664K, G711V, N735K, W768R | Reverse transcription of RNA and 2′-OMe-RNA | 116 |
KOD RTX-Ome v6 | RTX: A40V, E251K, S340P, G350V, V353L, H381R, H384Y, K468N, I488L, G498A, K664R | Reverse transcription of 2′-OMe-RNA | 167 |
KOD RT521K | V93E, D141A, E143A, A485L, I521L, E664K | Reverse transcription of tPhoNA | 69 |
9°N DNAP | Synthesis of FANA, ANA, HNA and TNA | 177 | |
9°N-Therminator | D141A, E143A, A485L | Synthesis of TNA | 172 and 173 |
9°N-YRI | D141A, E143A, A485R, E664I | Synthesis of TNA | 103 |
9°N-NVA | D141A, E143A, Y409N, D432G, A485V, V636A, E664A | Synthesis of TNA | 103 |
Phi29 DNAP mutant | D12A | Synthesis of HNA, FANA and 2′-F-DNA | 180 |
Tgo PGV2 | RT521L: D455P, K487G, R606V, R613V | Synthesis of phNA | 64 |
DNAP from E. coli | Polymerization of the Sp diastereomers of nucleoside 5′-(1-thiotriphosphates) | 63 | |
RNAP from E. coli | Polymerization of the Sp diastereomers of nucleoside 5′-(1-thiotriphosphates) | 63 | |
T7 RNAP | Transcription of RNA from 4′-thiol-modified DNA | 46, 63 and 194 | |
Transcription of 4′-thiol-modified RNA from DNA | |||
Polymerization of the Sp diastereomers of nucleoside 5′-(1-thiotriphosphates) | |||
T7 RNAP mutant | Y639F | Transcription of 2′-F, 2′-Am and 2′-F-EdU-modified RNA | 48, 49, 67, 168 and 169 |
T7 RNAP mutant | Y639F, H784A | Transcription of 2′-OMe and 2′-Az-modified RNA | 170 |
T7 RNAP RGVG, E593G, V685A | Y639V, H784G, E593G, V685A | Transcription of 2′-OMe-modified RNA | 107 |
T7 RNAP RGVG-M5 | RGVG: S430P, N433T, S633P, F849I, F880Y | Transcription of 2′-OMe-RNA | 171 |
T7 RNAP RGVG-M6 | RGVG: P266L, S430P, N433T, S633P, F849I, F880Y | Transcription of 2′-OMe-RNA | 171 |
RNAPs from mammalian cells | Transcription of RNA from 4′-thiol-modified DNA | 48 and 49 |
Although usual modifications of nucleobases, especially those at the C5 position of pyrimidines and the C7 position of deazapurines, are well tolerated by natural polymerases and broadly used in the labelling and functionalization of DNA and RNA,139,140 engineering of polymerases can help further increase the enzymatic incorporation efficiency of the nucleotides with base modifications, and even achieve efficient PCR amplification of DNA extensively modified on nucleobases. For example, using the CSR method, Holliger and co-workers evolved a mutant of Taq DNAP, M1, that had an expanded substrate spectrum, and could perform efficient PCR amplification of DNAs with 7-deaza-dGTP, FITC-12-dATP, Biotin-16-dUTP or αS-dNTPs replacing the corresponding dNTP(s).79 In another study, they applied the spCSR method on the evolution of a family B DNAP, Pfu, and successfully obtained a mutant, E10, which could PCR amplify a DNA fragment up to 1 kb with dCTP completely substituted by Cy3- or Cy5-modified dCTP.94 Fujita and co-workers reported the enzymatic synthesis of DNA containing high-density amphiphilic functionalities attached to the nucleobases with 7-substituted 7-deazapurine nucleoside triphosphates, dGamTP and dAamTP, and 5-substituted pyrimidine nucleoside triphosphates, dUamTP and dCamTP, using KOD Dash DNAP (KOD XL DNAP).141 Efficient PCR amplification of a 500-bp DNA fragment with the mixture of these nucleobase-modified nucleoside triphosphates and natural dNTPs using the same polymerase was also demonstrated. Later, Hoshino and co-workers demonstrated that a mutant of KOD DNAP, KOD exo−/A485L, could synthesize longer DNA products containing nucleobases modified with these amphiphilic functionalities faithfully and more efficiently.142 T7 RNAP has also been proven to be efficient for incorporating nucleoside triphosphates with various modified nucleobases, including N1-methylpseudouridine (m1Ψ) triphosphate, which allows the in vitro transcription of mRNA vaccines with modified bases against various diseases, such as COVID-19.143–145
UBPs for the expansion of genetic alphabet have been developed and optimized for good recognition by natural polymerases, which is crucial for their in vitro and in vivo applications, and in some cases, the replication or transcription efficiency of the UBPs was increased by engineering the DNAPs or RNAPs employed. Replication and transcription of hydrogen-bonding-based UBPs developed by Benner's group have been demonstrated with various DNAPs and RNAPs or their mutants. For example, DNA containing the isoG–isoC pair was successfully PCR amplified with a truncated mutant of Taq DNAP, TiTaq.146In vitro replication of the K–X pair was also carried out with DNAP I from E. coli.147 Although Taq DNAP can replicate the Z–P pairs,148 for enhanced replication efficiency of the Z–P pair, directed evolution of Taq DNAP was carried out with CSR method, in which oligonucleotides containing multiple P nucleobases were used as the primers for self-replication.105 The evolved Taq DNAP mutants Taq (N580S, L628V, and E832V) and Taq (M444V, P527A, D551E, and E832V) demonstrated a much less pause when incorporating dZTP against P nucleobases in a template. T7 RNAP mutant FAL has been shown to be able to efficiently transcribe DNA containing both the Z–P and S–B pairs (Hachimoji DNA), resulting in the production of RNA containing P, Z, B and S nucleobases (Hachimoji RNA).148,149 The in vitro replication of some of the representative hydrophobic UBPs developed by Romesberg's group, such as MMO2–5SICS, NaM–5SICS and NaM–TPT3, has been shown to be efficient with various family A or B DNAPs, including Klenow fragment of E. coli DNAP I, Taq DNAP, Deep Vent DNAP, Phusion DNAP, and OneTaq DNAP (a mixture of Deep Vent DNAP and Taq DNAP).27–29,31 Directed evolution of polymerases has also proven effective to increase their replication performance for hydrophobic UBPs. For example, a mutant of SF of Taq DNAP, P2, which could synthesize DNA containing PICS self-pair more efficiently than wild type SF, was successfully obtained by directed evolution using the phage-display-based selection method.150 In another approach, with the CSR method, Holliger and co-workers evolved a Taq/Tth/Tfl DNAP mutant, 5D4, for the ability of forming and extending other self-pairs of hydrophobic nucleobase analogs, including 5NI and 5NIC.93 The transcription of some of these UBPs has been demonstrated with several well-studied RNAPs, including T7 RNAP and eukaryotic RNAP II.30,151 Recent in vivo experiments suggested that E. coli RNAP could also transcribe DNA containing some of these UBPs.152 Recently, reverse transcription of RNA containing TPT3 or NaM has been investigated with several reverse transcriptases or DNAP mutants, including avian myeloblastosis virus (AMV) reverse transcriptase, Moloney murine leukemia virus (MMLV) reverse transcriptase, SuperScript II reverse transcriptase, SuperScript III reverse transcriptase, SuperScript IV reverse transcriptase, and an engineered Taq DNAP with reverse transcription activity, Volcano2G (V2G).153,154 It was found that the UBP reverse transcription efficiencies of different reverse transcriptases were sharply different. For UBPs developed by Hirao's group, it has been shown that UBPs Q–Pa, s–z, Ds–Pa and Dss–Pa could be recognized by the Klenow fragment of E. coli DNAP I,36,43,155,156 and remarkably, Ds–Px, Dss–Pn and Dss–Px pairs could be PCR amplified by Deep Vent DNAP efficiently and faithfully.157,158 PCR amplification of DNA containing the Ds–Pa pair has also been carried out with Vent DNAP.36 It has been shown that the x–y, s–y, s–z, s–Pa, Dss–Pa, Ds–Pa, and Ds–Px pairs could be transcribed by T7 RNAP or its mutant VRS-M5,36,39,40,155,156,159–162 and the Q–Pa pair could be reverse transcribed by AMV reverse transcriptase.43
Efficient synthesis of sugar-modified nucleic acids with polymerases is usually more challenging, and thus much effort has been made on engineering polymerases to achieve this goal. Using the phage-display-based method for polymerase evolution, SF of Taq DNAP has been evolved to efficiently synthesize, reverse transcribe, and even amplify nucleic acids with various 2′-modifications, including 2′-OMe, 2′-F, 2′-Cl, 2′-Az, 2′-Am and 2′-arabino-modifications.77,127,128,163–166 Among the evolved SF mutants, SFM4-3 and SFM4-6 demonstrated good activity for the synthesis of 2′-modified nucleic acids, SFM4-9 was more efficient for the reverse transcription of 2′-modified nucleic acids, and SFM4-3 could PCR amplify partially 2′-modified nucleic acids. Recently, Ellington and co-workers employed the RT-CSR method to further evolve a previously evolved mutant of KOD DNAP, RTX, which could reverse transcribe RNA faithfully,116 and obtained mutant RTX-Ome v6 that could reverse transcribe 2′-OMe-RNA efficiently.167 Mutants of T7 RNAP have been extensively investigated for the activity of incorporating 2′-modified nucleotides.159 T7 RNAP mutant Y639F has been found to be able to use various 2′-substituted-NTPs, including dNTPs, 2′-F-dNTPs, and 2′-Am-dNTPs, as substrates during transcription,168,169 and the mutant with one more mutation, T7 RNAP (Y639F, H784A), displayed higher activity against NTPs with bulkier 2′-substitutions, including 2′-OMe and 2′-Az.170 Later, Ellington and co-workers carried out directed evolution of T7 RNAP for enhanced activity towards 2′-modified NTPs by randomizing residues R425, G542, Y639 and H784.107 Active mutants were selected using the autogene selection method, in which the activity of T7 RNAP was coupled with the transcription of an antibiotic resistance gene. The activity towards 2′-modified NTPs of each selected active mutants was then checked. Evolved mutants ‘RGFA’, (‘RGVG’, E593G, and V685A), ‘RGFH’ and ‘RGLH’ showed good activity when 2′-OMe UTP was used as a substrate, and mutants (‘RGVG’, E593G, and V685A) showed the best activity when more kinds of 2′-OMe-NTPs were used as substrates. Further engineering of mutants (‘RGVG’, E593G, and V685A) by introducing more reported mutations responsible for increased activities and thermostability of other T7 RNAP mutants led to the generation of mutants RGVG-M5 and RGVG-M6, which could synthesize 2′-OMe-modified RNA much more efficiently.171 T7 RNAP mutant VRS-M5 has also been demonstrated to be able to efficiently transcribe RNA containing modified unnatural base Pa from a DNA template containing UBP Ds–Px, and allowed the production of functional RNA molecules with both 2′-modification and an expanded genetic alphabet.159
Polymerases for the efficient synthesis of nucleic acids in which the entire pentose is replaced with unnatural sugars have also been developed. For example, a mutant of replicative family B DNAP from Thermococcus gorgonarius, TgoT, has been evolved for the efficient synthesis and reverse transcription of various XNAs with the CST method.104 Among the evolved TgoT mutants, Pol6G12 showed good activity for the synthesis of HNA. PolC7 showed good activities for the syntheses of CeNA and LNA. PolD4K showed good activities for the syntheses of ANA and FANA. RT521 showed good activities for the synthesis of TNA and reverse transcription of HNA, ANA, FANA and TNA. RT521K showed good activities for the reverse transcription of CeNA and LNA. Recently, mutant RT521K was further evolved with the CBL RT selection method and reverse transcription activity screening to be reverse transcriptases for 2′-OMe-RNA, HNA, AtNA, 2′-MOE-RNA and PS 2′-MOE RNA with varied efficiencies.102 Natural 9°N, Deep Vent, and Vent DNAP were shown to be able to synthesize a short stretch of TNA from a DNA template with tNTPs, and several mutants of 9°N DNAP, A485L (Therminator), Y409V, and Y409V, A485L double mutant, demonstrated enhanced activity to extend a primer with tNTPs.172 Among these mutants, Therminator has the highest activity for TNA synthesis, and has been used for the construction of a TNA selection system.173 Using the DrOPS strategy, Chaput and co-workers carried out directed evolution of a mutant of 9°N DNAP, 9n-GLK (Y409G, A485L and E664K), and obtained mutant 9n-YRI harboring mutations A485R and E664I and mutant 9n-NVA harboring mutations Y409N, D432G, A485V, V636A and E664A.103 Both of the mutants could efficiently synthesize TNA in the absence of manganese, and thus increase the fidelity of TNA synthesis. By sampling mutations A485R and E664I in other homologous polymerase scaffolds, efficient TNA polymerases, Kod-RI, Tgo-RI, DV-RI, which are mutants of KOD, Tgo and Deep Vent DNAPs harboring mutations A485R and E664I, have been identified.98 Combining the microfluidic screening method and deep mutational scanning, two other mutants of KOD DNAP with enhanced TNA synthesis activity, Kod-RS and Kod-QS, both of which harbored two epistatic mutations, have been identified.174 Mutant Kod-RS also demonstrated inversed substrate specificity towards tNTPs and dNTPs, compared with wild type KOD DNAP. Further screening of Kod-RS variants with mutations in tiles 6 and 8 of the thumb subdomain led to the discovery of mutant Kod-RSGA, which demonstrated enhanced activity, high fidelity, and low template sequence bias for TNA synthesis.175 KOD DNAP has also been engineered for efficient synthesis of other unnatural nucleic acids. Obika and co-workers developed KOD DNAP mutants KOD DGLNK and KOD DLK, which could efficiently synthesize LNA or 2′-OMe-RNA from DNA templates and reverse transcribe LNA or 2′-OMe-RNA to DNA, respectively.176 Recently, Chaput and co-workers systematically compared the activities of some natural and evolved polymerases for the synthesis and reverse transcription of different XNAs.177 Natural 9°N, Deep Vent, Tgo and KOD DNAPs showed the ability to synthesize full-length FANA and limited activity for the syntheses of other XNAs. Laboratory-evolved polymerases, including Tgo-QGLK, Tgo-6G12, TgoD4K, Tgo-6G12-I521L, Tgo-EPFLH, and Kod-RSGA demonstrated varied activities for the syntheses of RNA, FANA, ANA, HNA, TNA, and 3′-2′ phosphonomethyl-threosyl nucleic acid (PMT). Full-length products of different XNAs could be produced by different polymerase mutants. In another study, Tgo-EPFLH was demonstrated to be a tPhoNA synthase, while Tgo RT521 and KOD RT521K showed efficient ability to reverse transcribe tPhoNA into DNA.69 Bst DNAP displayed good activities for the reverse transcription of FANA and TNA, but much lower activity for the reverse transcription of ANA.177–179 Other than the extensively explored polymerases described above, some other natural or mutated polymerases have also been investigated for the activities towards unnatural substrates. For example, production of HNA, FANA, and 2′-F-DNA with phi29 DNAP mutant D12A has been reported.180
Polymerases can be evolved to be efficient for the synthesis of nucleic acids with bulky modifications on the phosphate moiety as well. For example, Holliger and co-workers further engineered a Tgo DNAP mutant, RT521L, which was previously evolved to be a reverse transcriptase for several XNAs, for efficient synthesis of phNA.64 After screening of a site-saturation mutagenesis library, evolution with the CST method, and reverse introduction of a single point mutation, they successfully obtained a mutant, PGV2, with enhanced activity for the synthesis of fully modified phNAs.
Engineered polymerase mutants have also found application in the synthesis of nucleic acids containing combined modifications on different moieties. For example, KOD DNAP mutant Kod-RSGA, which was evolved for efficient TNA synthesis, also demonstrated good activities against tUTP containing various C5-modifications, and was used to synthesize TNA containing functionalized nucleobases.68 In another example, T7 RNAP mutant Y639F was shown to be efficient for the transcription of RNA with 2′-deoxy-2′-fluoro (5-ethynyl) uridine triphosphate and other natural NTPs, and used for the evolution of 2′-F-modified RNA-scaffolded carbohydrate clusters.67,168 Very recently, Niu and co-workers synthesized C8-alkyne-FANA UTP, and demonstrated its enzymatic incorporation into FANA by Tgo DNAP.181 This work further enriched the XNA toolbox with components containing clickable handles.
Besides proteinaceous polymerases, Z RNA polymerase ribozyme, which is an RNA replicase generated via in vitro evolution, has also been investigated for its activity of incorporating unnatural nucleoside triphosphates.182 It was found that this ribozyme was able to incorporate different sugar or base-modified nucleoside triphosphates with varied efficiencies, as well as efficiently replicate UBP isoG–isoC under appropriate conditions.
Modifications on nucleobases can help expand the chemical diversity of and add novel functionalities to aptamers. For example, SELEX experiments have been carried out with DNA containing hydrophobic groups or amino-acid-like modifications attached to uracil or both uracil and cytosine nucleobases, resulting in the generation of protein-targeting high-affinity aptamers, which were called slow off-rate modified aptamers (SOMAmers).183,186 Due to its good acceptance of base-modified triphosphates, KOD (exo−) DNAP has been employed to generate SELEX libraries with 5-modified dC and dU.183 Application of DNA containing UBPs in SELEX significantly expands the sequence diversity of the pools to be selected, and incorporates the properties of the unnatural nucleobases into the selected aptamers, which has proven effective to increase the probability of obtaining aptamers with higher affinities. Hirao and co-workers carried out SELEX with DNA containing Ds, and successfully obtained high-affinity aptamers for vascular endothelial cell growth factor-165 (VEGF-165) and interferon-γ (IFN-γ), with the Kd values in the subnanomole to astonishing subpicomole range.6 Later, DNA aptamers containing Ds or both Ds and Pa with high affinity towards von Willebrand factor A1-domain (vWF) or dengue non-structural protein 1 (DEN-NS1) serotypes were reported.187,188 Hydrophobic UBP Z–P developed by Benner's group has also been extensively applied in the SELEX of aptamers with expanded genetic information for various targets, including different cell lines and proteins.189–192
Development of aptamers with unnatural sugar backbones has drawn even more attention, since modification of the sugar backbone can lead to a dramatic improvement of the overall properties of the aptamers, such as obtaining good chemical or biological stabilities, which are properties that natural DNA and RNA aptamers lack the most for practical applications. By employing evolved SF mutants to transcribe, reverse transcribe, or amplify 2′-modified DNAs, Romesberg and co-workers selected fully 2′-OMe-modified or partially 2′-F-modified aptamers against human neutrophil elastase (HNE), which displayed good biological stability and retained high affinity in a high concentration of salt.126,193 Recently, they reported the selection of HNE and factor IXa aptamers with large hydrophobic groups attached to the 2′-position of the sugar backbone by producing 2′-Az-DNA with SF mutant SFM4-3 and coupling alkyne modified molecules to the 2′-azido group via click chemistry.128 It was found that these 2′-hydrophobic groups significantly increased not only the binding affinity, but also the serum stability of the selected aptamers. With the assistance of T7 RNAP, Matsuda and co-workers successfully selected 4′-thiol-modified RNA aptamers against human α-thrombin, which have not only high binding affinity, but also superior stability toward RNase A.46,194,195 Holliger and co-workers demonstrated the application of Tgo DNAP mutants that they evolved in SELEX experiments for HNA aptamers against different targets, including hen egg lysozyme (HEL) and HIV trans-activating response RNA (TAR).104 Later, using one mutant of Tgo DNAP, D4K, to transcribe and reverse transcribe FANA, DeStefano and co-workers selected FANA aptamers against HIV-1 reverse transcriptase, HIV-1 integrase, and very recently receptor binding domain of SARS-CoV-2 S protein.196–198 TNA aptamers against various targets, including small molecules and proteins, were selected either with a DNA display strategy, in which the polymerase-synthesized TNA was attached to the template DNA annealed with its complementary strand during selection, or through cycles of the transcription–selection–reverse transcription–amplification process.199–201 Different polymerases, including Therminator DNAP and a mutant of KOD DNAP, Kod-RI, have been used for the synthesis/transcription of TNA, and Bst DNAP has been used for the reverse transcription of TNA in these studies. Recently, using TNA polymerase in combination with nucleobase-modified tNTPs, a stable TNA aptamer with functionalized nucleobases has also been selected.202,203 Mirror-image DNA has drawn broad interest in recent years, since it possesses good resistance to nucleases while retaining similar properties and functions of DNA.204 In a very recent study, Zhu and co-workers carried out SELEX experiment with a chemically synthesized mirror-image DNAP D-Dpo4-5m, and successfully obtained biostable L-DNA aptamers against human thrombin.205
Nucleobase modification can be used to attach functional groups, including amino acid-like side chains, to nucleic acid catalysts, and thus confer novel activities, such as protein enzyme-like activities, to these catalysts. For example, recently, Perrin and co-workers used dCTP and dUTP modified with arginine and lysine-like side chains for the selection of DNAzymes that could cleave RNA in a divalent metal cation-independent manner.206
Development of nucleic acid catalysts with unnatural sugar backbones not only expands the scope of macromolecular biocatalysts out of DNA, RNA and protein, but also has great potential to provide practically valuable catalysts with superior biostability. Using evolved TgoT DNAP mutants, Holliger and co-workers successfully selected ANA, FANA, HNA, and CeNA enzymes (XNAzymes) that could cleave or ligate RNA substrates, as well as a FANA enzyme with XNA–XNA ligase activity.7 Later, Chaput and co-workers evolved a general RNA-cleaving FANA enzyme with both strong catalytic activity and good nuclease-resistance, which could be further engineered to target different RNA sequences.178 They also reported the introduction of XNA modifications, including FANA and TNA nucleotides, into an existing DNAzyme scaffold for the construction of a novel enzyme, X10–23, with enhanced biological stability and good catalytic activity, and demonstrated the application of X10–23 in gene knockdown and pathogen detection.207–209 Recently, selection of TNA enzymes with RNA cleavage or ligation activity has been reported by Yu and co-workers.210,211
Introduction of nucleobases modified with functional groups into nucleic acid materials immediately enables the coupling of various molecules onto these materials and thus expands the functionalities of these materials. For example, Brown and co-workers employed RCA with base-modified dUTP and dCTP to construct modified DNA nanoflowers, to which various cargos, including fluorophores and functional peptides, could be densely attached, and demonstrated their potential use in diagnostics and therapeutics.8 UBPs can be employed to increase the number of possible DNA or RNA sequences used for the assembly of nucleic acid nanostructures, and also to make these nanostructures uninvadable to natural DNAs or RNAs. For example, Tan and co-workers recently used DNA sequences containing unnatural bases Z and P to construct an aptamer-nanotrain assembly, and demonstrated its application in drug delivery.212
Modifications on the sugar-phosphate backbones of nucleic acid frameworks are valuable for augmentation of nucleic-acid-based materials with enhanced thermal, chemical and biological stabilities. Taylor and co-workers demonstrated the assembly of different nanostructures, including tetrahedron and octahedron, with various XNAs, including 2′-F-DNA, FANA, HNA or CeNA.213 Recently, Li and co-workers constructed FANA-based double crossover nanotiles with increased thermal and biological stability, and demonstrated their potential in cellular delivery of small molecules under physiological conditions.214 Other than improving the properties of nucleic acid materials, modification of the sugar-phosphate backbone can also be used for functionalizing the frameworks of, and even providing new strategies for the construction of, nucleic acid materials. For example, Chen and Romesberg used 2′-Az-DNA produced by SFM4-3 polymerase for the construction of a novel DNA hydrogel, in which the 2′-Az group was coupled with ssDNA primers for PCR crosslinking of the 2′-Az-DNA scaffolds.127
To expand the genetic alphabet with UBPs in vivo is much more challenging, and key issues that need to be addressed include the availability of unnatural nucleoside triphosphates in the cells, recognition of UBPs by endogenous replication, transcription, and translation machineries, and stability of UBPs in the cells during cell growth and propagation. In 2014, Romesberg and co-workers reported the first SSO with an expanded genetic alphabet, in which an initial information plasmid was constructed with UBP NaM–TPT3, and then replicated with dNaMTP and d5SICSTP.11 Nucleoside triphosphate transporter from Phaeodactylum tricornutum (PtNTT2)217 was employed to import dNaMTP and d5SICSTP into the cytoplasm of E. coli cells, allowing the in vivo replication of NaM–5SICS. Later, by using chemically optimized UBP NaM–TPT3 instead of NaM–5SICS for in vivo replication, engineering the PtNTT2 transporter, introducing the CRISPR/Cas system to eliminate DNA sequences that had lost the UBP, the SSO was optimized for robust growth, constitutive unnatural nucleoside triphosphate uptake, and much better UBP retention.9 In 2017, in vivo transcription and translation of UBP to incorporate non-canonical amino acids (ncAAs) into proteins was accomplished with the SSO.10
Since the successful development of SSOs for the storage and retrieval of increased genetic information, lots of efforts have been made to further explore and optimize the SSOs. For example, exploration of the contributions of different endogenous polymerases on UBP replication and the effects of cellular DNA repair mechanisms on UBP retention led to replisome reprogramming of the SSO for increased UBP retention, and subsequently allowed the incorporation of UBP into the chromosome of the SSO.218 Other than chassis cells for SSO construction, UBPs and unnatural triphosphates can also be continuously optimized for higher efficiencies of triphosphate uptake, in vivo replication, transcription, and translation. Early efforts of constructing SSOs used UBPs and unnatural triphosphates that have been screened and optimized based on in vitro SAR analysis, and thus might be less optimal for in vivo performance. The successful construction of SSOs enabled in vivo SAR analysis of UBPs, which led to the identification of more optimal UBPs and unnatural triphosphates for in vivo applications, exemplified by the combination of UBP CNMO–TPT3 and triphosphates NaMTP and TAT1TP, the use of which gave a high yield of a protein with high-fidelity incorporation of an ncAA.34,35
Expansion of the genetic alphabet with UBPs led to a great increase in the number of genetic codons, allowing the incorporation of much more kinds of amino acids into a protein at the same time. However, translation efficiencies of different unnatural base-containing codons can be dramatically different, and selective use of these codons for incorporating ncAAs into proteins is thus important for good protein yields, as well as high translation fidelity. Romesberg and co-workers have systematically analyzed unnatural codons, and identified nine most promising ones for efficient incorporation of ncAAs.216 Using three orthogonal ones of these codons, they successfully constructed an SSO with 67 codons, which includes 64 conventional codons and 3 new codons with unnatural bases. SSOs with additional sense codons containing unnatural bases have immediate application in producing novel protein products, including proteins site-specifically conjugated with other molecules for therapeutic use.23 For example, employing an SSO, human cytokine IL-2 variants, in which a modifiable unnatural amino acid was incorporated by decoding an unnatural base-containing codon, were produced, site-specifically modified with PEG polymers, and screened for altered receptor binding specificities and improved pharmacological properties.215
Expansion of the genetic alphabet in eukaryotes is also attractive, since it will not only allow the incorporation of various ncAAs into proteins that can only be well produced by eukaryotic cells, but also enable the development of molecular tools, including nucleic acid sequences containing unnatural nucleotide derivatives or proteins containing functional ncAAs, for regulating cellular functions or even behaviors of the entire organisms. As an initial effort for constructing eukaryotic SSOs with an expanded genetic alphabet, Romesberg and co-workers carried out translation experiment with unnatural codon–anticodon pairs containing NaM and TPT3 in HEK293 and CHO cells.219 The results suggested that eukaryotic ribosome could decode unnatural codons, and appeared more tolerant to different unnatural codons than prokaryotic ribosomes. Recently, Bornewasser et al. demonstrated the application of functionalized TPT3 for the labeling and visualization of mRNA in living cells.220
Development of methods for the sequencing of UBP-containing DNAs will significantly facilitate the ever-increasing efforts on expanding the genetic alphabet and constructing SSOs. Benner and Hirao groups have developed sequencing methods for their UBPs, respectively, in which the UBPs were first converted into different natural base pairs under different conditions and sequenced, and subsequent alignment and analysis of the resulting sequences revealed the positions of the UBPs.221,222 Hirao's group also developed a method for UBP sequencing, termed Sanger gap sequencing, in which the sequencing processivity was increased and modified Px analogs were used to generate clear gap patterns in the sequencing spectrum, which indicated the UBP positions.223 Recently, Romesberg and co-workers reported the application of nanopore sequencing for the thorough analysis of DNA containing UBP NaM–TPT3.25
To further expand the central dogma, more genetic polymers with novel modifications or combination of modifications can be designed and synthesized, and their efficient polymerases also need to be discovered or engineered, with the development and employment of novel polymerase evolution strategies, as well as the assistance of computational tools, including novel machine-learning methods.228,229 For existing unnatural nucleic acids, transcription of short stretches of them from a DNA template and reverse transcription of them back into DNA processes are already relatively efficient and sufficient for various in vitro applications, including SELEX for aptamers and XNAzymes, after years of efforts on engineering their polymerases. However, to achieve direct replication and even efficient amplification of the unnatural nucleic acids, the polymerases have to be engineered to be able to synthesize a strand of unnatural nucleic acid from an unnatural nucleic acid template. Although efficient amplification of partially sugar-modified short unnatural nucleic acids has been demonstrated with evolved DNAPs,77,127 further engineering of these polymerases is still needed to achieve efficient replication and amplification of fully sugar-modified long unnatural nucleic acids, which is the prerequisite of actually using these unnatural nucleic acids as full-function and augmented alternatives of DNA for the storage and transmission of genetic information, and will obviously lead to more efficient use of these unnatural nucleic acids, for example, SELEX of unnatural aptamers with less steps. Also, engineering polymerases for efficient transcription of different fully-sugar modified unnatural nucleic acids with big length will enable the full use of these unnatural nucleic acids as RNA alternatives with altered properties and expanded functions, not only for the production of larger biocatalysts or assembled nanomaterials, but also for the transmission of genetic information from the original carrier, such as DNA, to the function performer, say, protein or another genetic polymer. Moreover, in order to translate proteins from an unnatural nucleic acid, efforts have to be made to engineer the translational machinery to well adopt this unnatural nucleic acid, as well as to efficiently decode its genetic information with tRNAs or even other unnatural tRNA alternatives, the efficient charge of which with amino acids again may need extensive engineering of aminoacyl-tRNA synthetases (aaRSs).230,231 In an ideal world, all of the unnatural nucleic acids have efficient polymerases to replicate them, and to transmit genetic information from arbitrary one to another, which will lead to the expansion of the central dogma to higher dimensions (Fig. 6).
For expanding the central dogma in vivo, synthesis or replication of unnatural nucleic acids in living cells needs efficient polymerases as well. Moreover, to be used in vivo, unnatural nucleic acid polymerases need to be further engineered for good substrate specificity immediately, since all of these polymerases were derived from natural DNAPs or RNAPs, and may still possess good activities against dNTPs or NTPs, which are abundant in living cells, and will obviously interfere the synthesis of unnatural nucleic acids from unnatural nucleoside triphosphates. To make the mutant polymerases function better in vivo, their optimal working temperatures and ionic strengths may also need to be engineered to adapt to the internal environment of the hosts. Efficient pathways for cellular polymerases to acquire various unnatural nucleoside triphosphates, either direct import from the medium or step-by-step synthesis via metabolic pathways, also need to be further exploited and optimized immediately. For example, kinases for the phosphorylation of nucleosides, nucleoside monophosphates, and nucleoside diphosphates can be engineered for higher activities against the unnatural substrates, and then employed to produce unnatural nucleoside triphosphates in vivo, as well as to regenerate unnatural triphosphates that have been dephosphorylated by endogenous phosphatases. Initial efforts on engineering the phosphorylation pathways to produce unnatural nucleoside triphosphates have already been made by several groups, including Benner's group and Romesberg's group.232–235 Long-term efforts for expanding the central dogma in vivo may include construction of replicable XNA plasmids or chromosomes, establishment of in vivo XNA transcription systems, and engineering of the host cells to balance energy consumption between the pathways for the production of unnatural genetic polymers and natural metabolic pathways, as well as to achieve even distribution of unnatural genetic polymers into divided cells. The orthogonality between unnatural genetic systems and natural genetic systems is also important for not interfering replication and function of endogenous genomes of the hosts,236 and potentially can be achieved by engineering and employing replication or transcription systems with orthogonal replication origins or promoters and corresponding polymerases with good substrate specificity to build the unnatural genetic systems. With all these efforts, organisms with not only an expanded genetic alphabet, but also an increased number of fully functional genetic polymers may be developed to further expand the central dogma in vivo, and find unprecedentedly broad application in the fields of biotechnology and biomedicine in the future (Fig. 7).
Footnote |
† Authors have equal contributions. |
This journal is © The Royal Society of Chemistry 2022 |