Yipeng
Yin‡
a,
Reed
Arneson‡
b,
Yinan
Yuan
*b and
Shiyue
Fang
*a
aDepartment of Chemistry,, and Health Research Institute, Michigan Technological University, Houghton, Michigan 49931, USA. E-mail: shifang@mtu.edu
bCollege of Forest Resources and Environmental Science, Michigan Technological University, Houghton, Michigan 49931, USA. E-mail: yinyuan@mtu.edu
First published on 18th December 2024
The longest oligos that can be chemically synthesized are considered to be 200-mers. Here, we report direct synthesis of an 800-mer green fluorescent protein gene and a 1728-mer Φ29 DNA polymerase gene on an automated synthesizer. Key innovations that enabled this breakthrough include conducting the synthesis on a smooth surface rather than within the pores of traditional supports, and the use of the powerful catching-by-polymerization (CBP) method for isolating the full-length oligos from a complex mixture. Conducting synthesis on a smooth surface not only eliminated the steric hindrance that would otherwise prevent long oligo assembly, but also, surprisingly, drastically reduced synthesis errors. Compared with the benchmark PCR assembly gene synthesis method, the direct long oligo synthesis method has the advantages of higher probability to succeed, fewer sequence restrictions, and being able to synthesize long oligos containing difficult elements such as unusually stable higher-order structures, long repeats, and site-specific modifications. The method is expected to open doors for various projects in areas such as synthetic biology, gene editing, and protein engineering.
The most notable achievement in the area of de novo long oligo synthesis in recent years is the development of the template-independent enzymatic oligo synthesis (TiEOS) technologies, primarily utilizing engineered terminal deoxynucleotidyl transferases (TdT).7,11–15 While these technologies hold great promise, they are not without shortcomings. For example, the large enzyme-to-nucleotide mass ratio is not atomically economic, which may be one of the reasons for the high cost of the methods if the enzyme is not used in a catalytic quantity or recycled. The higher-order structures of long oligos may reduce synthesis efficiency.12 The coupling time may be lengthy, and the coupling yield may not meet the expectations for typical enzymatic reactions.12 In addition, enzymatic methods typically lack a capping step, increasing the likelihood of deletion errors.13 The TdT enzyme exhibits inherent nucleotide biases, leading to lower coupling efficiency for certain nucleotides, a problem that may be difficult to overcome through enzyme engineering.12 Finally, the method may be difficult to be adapted for synthesizing long oligos with site-specific modifications.
In contrast to the resources invested in developing enzymatic methods for long oligo synthesis, little effort has been dedicated to advancing chemical methods over the past decade, even though many of the aforementioned shortcomings of enzymatic methods may be addressable using chemical approaches. Since 2010, our research team has been making efforts to develop a method called catching-by-polymerization (CBP) for synthetic oligo purification (Scheme 1).16 The method involves tagging the full-length oligo with a polymerizable tagging phosphoramidite (PTP) and incorporating the conjugate into a polyacrylamide gel. Because failure oligos are capped during automated synthesis, they are not tagged, and therefore are not incorporated into the gel. Oligo purification can thus be achieved by washing away the failure oligos, followed by cleaving the full-length oligo from the gel. Recognizing the power of the CBP method, we attempted to use it to isolate the extremely low percentage (but sufficient quantities) of full-length oligos from the complex mixture generated from the thousands of reactions required for long oligo synthesis.17,18 Most recently, using the CBP method, we succeeded in purification of 400-mers. Sanger sequencing confirmed the sequences.10 Here, with additional innovations involving the use of glass wool and glass beads as a solid support for long oligo synthesis, we report direct chemical synthesis of the 800-mer green fluorescent protein (GFP) gene and the 1728-mer Φ29 DNA polymerase gene, and their isolation with CBP and characterization with Sanger sequencing.
With the potential loading problem in mind, we thought that glass wool would partially solve the problem. Therefore, we calculated the loading of glass wool, and made a comparison with that of glass beads (Table 1).
Entry | Items | Glass woola | Glass beads | CPG 2K Å tested | CPG 2K Å | Wang resin |
---|---|---|---|---|---|---|
a The length of glass wool can be 1 cm or longer. b Density is that of solid glass. c To achieve close to resistance-free flow of liquid, the diameter of glass beads needs to be ∼50 μm or larger. d The effect of the length of glass wool on loading is minimal and is omitted in the formula. The units for d and r in the formulae are g ml−1 and μm, respectively. Details for deriving the formulae are in the ESI. e Calculated value assuming 3.2 molecules per nm2. f Given by the manufacturer of the CPG tested. g Values from the literature.20,21 h 25 μmol g−1 is used for the calculation. i 2.5 mmol g−1 is used for the calculation. j Measure using trityl assay using the literature procedure.20 k Oligo obtained per gram of solid support after CBP purification as determined with a Qubit 4 Fluorometer. | ||||||
1 | Densityb | 2.2 g ml−1 | 2.2 g ml−1 | |||
2 | Diameterc | 8 μm | 58 μm | |||
3 | Loading formulad | 10.6 ÷ (d × r) μmol g−1 | 15.9 ÷ (d × r) μmol g−1 | |||
4 | Loading | 1.208 μmol g−1e | 0.249 μmol g−1e | 5.405 μmol g−1f | 20–30 μmol g−1g | 0.3–2.5 mmol g−1g |
5 | Relative loading | 4.8 | 1 | 22 | ∼100h | ∼10![]() |
6 | Measured loadingj | 0.981 μmol g−1 | 0.256 μmol g−1 | 5.359 μmol g−1 | ||
7 | 800-mer synthesizedk | 3.7 nmol g−1 | 0.034 nmol g−1 | |||
8 | 800-mer yield | 3.7/981 = 0.38% | 0.034/256 = 0.013% | |||
9 | 1728-mer synthesizedk | 0.041 nmol g−1 | ||||
10 | 1728-mer yield | 0.041/256 = 0.016% |
Glass wool with a diameter of ∼8 μm is commercially available and inexpensive. We tested its resistance to liquid flow, and found that it is virtually resistance-free, which is required for solid phase synthesis. Assuming a length of 1 cm, density of 2.2 g ml−1, and 3.2 molecules per nm2 (Table 1),22 the loading is 1208 nmol g−1 (see the ESI† for calculations). For glass beads, to allow for close to resistance-free liquid flow, ideally their diameter is ∼50 μm or larger. Assuming a diameter of 58 μm and a density of 2.2 g ml−1, the loading is calculated to be 249 nmol g−1. Therefore, the loading of glass wool is ∼4.8 times that of glass beads (entry 5).
For comparison, the loading of commercial CPG with 2000 Å diameter is typically 20–30 μmol g−1,20 which is ∼100 times higher than that of glass beads (entry 5). The loading of the Wang resin (widely used for peptide synthesis) is 0.3–2.5 mmol g−1,21 which is close to 10000 times higher. However, for long oligo synthesis, we reasoned that low loading is less of an issue. For most biological applications, as little as 1 pmol oligo is sufficient.23,24 Using glass wool, with a 100 mg support, which is the quantity that can be directly used under typical small scale oligo synthesis conditions, assuming an average stepwise yield of 99.7%, which corresponds to an overall yield of 0.25% for a 2000-mer synthesis, the quantity of full-length oligo is ∼296 pmol, which is much larger than 1 pmol. However, the low percentage yield is a serious problem because there is no method to purify or concentrate the full-length oligo. For example, HPLC would not be able to resolve the full-length oligo from failure ones. Gel electrophoresis would not be able to resolve this either and even if it could be engineered to resolve, the full-length oligo would be invisible on the gel due to its low percentage. Solid phase extraction methods25–28 may not be suitable for the task either because the high entropy barrier for reactions between large molecules and reactive sites on a solid surface would make the extraction inefficient, and it may be difficult for the large molecules to enter the pores of the solid support in the first place. However, using CBP, the low percentage problem can be overcome. With these considerations, we went ahead and synthesized long oligos on glass wool using the 800-mer GFP gene as an example.
![]() | ||
Scheme 2 Functionalization of glass wool and glass beads. Conditions: (a) 2 (1% PhMe), rt, 20 min; then, supernatant removed, glass wool 100 °C, 4 h. (b) NH4OH (30%), 55 °C, 2 h. (c) On a DNA synthesizer, standard coupling, oxidation and deblocking conditions with modifications; see the ESI† for details. |
The last nucleotide was introduced with PTP on the MerMade 6 synthesizer, which also tagged the full-length sequences with a methacrylamide group. Details are given in the ESI.† For deprotection and cleavage, the glass wool was first treated with 10% DBU in ACN, which removed the 2-cyanoethyl groups. Treating with concentrated NH4OH under typical oligo deprotection conditions gave a mixture of 5′-tagged full-length oligo and un-tagged failure sequences as well as other impurities (Scheme 1). CBP purification was then carried out by co-polymerizing the tagged full-length oligo into a polyacrylamide gel. The failure oligos and many other types of impurities were removed by washing. This gave only the full-length oligo on the polymer. The full-length oligo was then cleaved from the gel using 80% AcOH. After removing the acid, the oligo may be precipitated with nBuOH from an NH4OH solution. This is important for avoiding oligo damage by residue acid if the oligo needs to be stored before use. Otherwise, precipitation may be omitted. The quantity of the oligo was determined to be 27.4 μg (111 pmol) for the synthesis involving 30 mg glass wool. The overall yield for the entire 800-mer synthesis and purification was 0.38% (entry 8, Table 1).
Entry | Oligo sample | 800-mer from glass wool | 800-mer from glass beads | 1728-mer from glass beads | 1st 1000 nt of the 1728-mer | Literature error ratesa |
---|---|---|---|---|---|---|
a Data were from sequencing the 20th to 48th nucleotide region of chemically synthesized 85-mers. Oligo synthesis conditions: activation, 1H-tetrazole in ACN; capping, Ac2O in THF, 10% 1-methylimidazole in 10% pyridine/THF; oxidation, 0.02 M I2 in THF/pyridine/H2O; deblocking, 3% TCA in DCM. For more details, see ref. 31. b The error rates were calculated by dividing the number of errors by the total number of nucleotides subjected to sequencing. For example, for the 1728-mer synthesized on glass beads, a total of three G-to-A substitutions were found in the data of sequencing 16 colonies; the error rate is 3 ÷ (1728 × 16) = 0.0109%. c When DCI was used as the activator, the error rate was lower.31 d 0.1% for each nucleotide. e dA 0.005%, dC 0.003%, dG 0.008%, T 0.002%. f The total error rate is the sum of individual error rates. It does not represent the probability for a specific nucleotide position in a sequence to have substitution, deletion, addition and other errors. | ||||||
1 | Total colonies sequenced | 48 | 47 | 16 | 16 | |
2 | Colonies with the correct sequence | 41 | 45 | 7 | 14 | |
3 | Rate of the correct sequence | 85% | 96% | 44% | 88% | |
4 | G-to-A substitution/error rateb | 0 | 0 | 0 | 0 | 0.11%c |
5 | G-to-T substitution/error rate | 0 | 0 | 3/0.0109% | 1/0.0063% | 0.03% |
6 | C-to-T substitution/error rate | 1/0.0026% | 0 | 0 | 0 | 0.02% |
7 | T-to-C substitution/error rate | 1/0.0026% | 0 | 1/0.0036% | 0 | 0.01% |
8 | A-to-G substitution/error rate | 0 | 0 | 1/0.0036% | 0 | 0.01% |
9 | A-to-T substitution/error rate | 1/0.0026% | 0 | 0 | 0 | <0.01% |
10 | Single nt deletion/error rate | 4/0.0104% | 1/0.0027% | 3/0.0109% | 1/0.0063% | 0.4%d |
11 | Block deletion/error rate | One 10 nt deletion/0.0026% | One 2 nt deletion/0.0027% | Two 2 nt deletion/0.0072% | 0 | No data |
12 | Single nt insertion/error rate | 0 | 0 | 2/0.0072% | 0 | 0.00–0.01%e |
13 | Total error ratef | 0.0208% | 0.0054% | 0.0434% | 0.0126% | 0.58% |
Oligo synthesis was conducted under the same conditions using glass wool as the support. The scale was 12.8 nmol, for which 50 mg glass beads were used. Deprotection and cleavage as well as CBP purification were also the same except that only 20 mg (theoretically 5.12 nmol oligo) glass beads were used. The quantity of the oligo obtained was determined to be 168 ng (0.68 pmol) for the synthesis involving 20 mg glass beads. The overall yield for the entire 800-mer synthesis and purification was 0.013% (entry 8, Table 1), which is lower than 0.38% for glass wool. The reason is unclear but may be attributable to the loss of materials in the deprotection and purification process probably due to the increased difficulty to handle smaller quantities of oligos.
The CBP purified 800-mer was also subjected to PCR, cloning and Sanger sequencing. The image of the gel for electrophoresis analysis of the PCR product is shown in Fig. 1. Even though the quantity of oligos was much lower, the band corresponding to the 800-mer is clear. The image of the gel for analysis of colony PCR products is shown in Fig. 2D–F. As can be seen, all colonies selected for the analysis had the 800-mer sequence. Plasmids of 47 colonies were subjected to Sanger sequencing. The data are provided in the ESI.† The results are summarized in Table 2. Among the 47 colonies sequenced, 45 contained the correct sequence, which was 96% (entries 1–3). The errors in the incorrect sequences only include one deletion and one 2 nt deletion (entries 10 and 11). The rates for both errors were 0.0027%. The sum of the error rates was 0.0054% (entry 13).
The image of the gel for electrophoresis analysis of the PCR product of the CBP purified 1728-mer is shown in Fig. 1. As can be seen, the expected band can be clearly observed. Colony PCR was first conducted on 16 colonies using primers targeting only a portion of the 1728-mer (see the ESI†). All colonies were found to have the gene (Fig. 2G). Plasmids of the 16 colonies were subjected to Sanger sequencing. Sequencing data are provided in the ESI,† and the results are summarized in Table 2. Among the 16 colonies sequenced, 7 contained the correct sequence, which corresponds to a success rate of 44% (entries 1–3). The errors in the incorrect sequences include five substitutions, three single nucleotide deletion, two 2 nt deletion and 2 single nucleotide insertion (entries 5, 7, 8, 10 and 11). The sum of the error rates was 0.0434% (entry 13). We also performed gel electrophoresis on colony PCR products of additional colonies using primers covering the entire 1728-mer. Among 32 colonies, only one did not show the expected band (Fig. 2H and I).
Compared with results in the literature using CPG as the support,31 error rates in the present work were drastically lower (Table 2). For example, among the most frequent substitution errors, which include G-to-A, G-to-T, C-to-T, T-to-C, and A-to-G (entries 4–8),31 the highest G-to-A substitution was completely eliminated in the present work (entry 4). For all other errors, the rates were also lowered. The sum of the rates of substantial errors for literature syntheses is 0.58%, while that for the present syntheses is less than 0.0434%, which is more than 10 times lower (entry 13). It is noted that the error rates for the present work were from 800-mer and 1728-mer synthesis, while the numbers from the literature was from syntheses of oligos shorter than 100-mer. It is known that error rates increase as oligos grow longer. The increased accuracy of the syntheses on a smooth surface compared with within the pores of CPG may be attributed to higher reaction kinetics in the case of the former. The assumption that reactions on a smooth surface have better kinetics is consistent with discussions in a 1987 patent by Benner.32
As mentioned earlier, we successfully synthesized 401-mer and 399-mer oligos on CPG.10 Compared with that work, the present results are also much better. For the prior study, the bands corresponding to the full-length oligos after PCR amplification of CBP purified oligos were weak (see Fig. 2 in ref. 10) while the bands for the present work are strong (Fig. 1). The gel images of colony PCR results also provided evidence of superiority of the present work. For the prior study, according to gel images, plasmids from 26 out of 64 colonies could be readily estimated not to contain the full-length sequence of the target oligo (see Fig. 3 in ref. 10). For the present work, plasmids from 143 out of 144 colonies that were subjected to the analysis could be estimated to contain the expected sequence. For the prior study, six plasmids that were estimated to contain the full-length sequence based on gel analysis were subjected to Sanger sequencing. Two sequences were correct. Later, we intentionally sequenced 14 additional plasmids that were estimated to contain only a portion of the desired sequence.33 These sequences were found to contain one or more blocks of deleted nucleotides. The deleted blocks ranged from 8 to over 100 nucleotides. For the 20 sequenced sequences, besides the block deletions, other errors include 11 single nucleotide deletions, and three G-to-A and one T-to-C substitutions. The sequencing data are in the ESI.† Comparing these data with the present ones, it is evident that the major problem for synthesizing long oligos on porous supports is block and single nucleotide deletions. In addition, G-to-A substitution is much more likely to occur with porous supports. The comparison indicates that for long oligo synthesis, supports with a smooth surface should be used.
With the long oligo synthesis results that far exceed the expectations of many researchers including us, one may wonder how this is possible considering the many widely recognized side reactions of oligo synthesis. For example, the acetic anhydride capping efficiency is estimated to be ∼90%.31 Assuming a coupling efficiency of 99%, the deletion sequence in our products would be ∼0.1%, far higher than the 0.002–0.01% range we observed (entry 10, Table 2). The total detritylation time under acidic conditions for the 1728-mer synthesis is ∼47 hours. Common intuition would suggest high levels of depurination in the products. While answers to these questions are hard to obtain, for the former, it is possible that conducting the synthesis on a smooth surface not only improved the yield of coupling, but also drastically improved the yield of capping. For the latter, occasional exposure of oligos to acid might have less of an effect on depurination than constant acid exposure. It is also possible that depurination may be more likely in the pores than on a smooth surface under the same acidic conditions. Furthermore, it is also possible that depurination might have occurred, but it was not detected by our analysis method. The depurinated oligos were broken under the basic conditions of oligo deprotection and cleavage. The 3′-fragments were washed away during CBP, and the 5′-fragment had to compete with the excess primer for PCR amplification.
It is noted that although the current paper is presented in the context of synthesis of genes, which are double-stranded (ds) DNAs, the long oligo synthesis method can also be used to obtain single-stranded (ss) long oligos or to obtain oligos with site-specific modifications. In these contexts, PCR, cloning and Sanger sequencing should only be carried out using a small portion (e.g. about 10 pmol) of the synthesized and CBP purified oligos for the purpose of characterization because PCR and cloning would convert ss oligos to ds oligos, and eliminate the site-specific modifications. The remaining portion (more than 50 pmol) can then be used for the intended applications assuming that the sequence error rates are low, and the errors can be tolerated by the applications. If error-free ss oligos with or without site-specific modifications are required, sequences containing errors could be removed by patching error sites with short oligos and then using immobilized MutS to remove the sequences with errors.34,35 The short patching oligos in the remaining error-free ss oligos can then be easily removed using size-exclusion filtration, solid phase reversible immobilization (SPRI) bead extraction,36 or other methods.
Footnotes |
† Electronic supplementary information (ESI) available: Experimental details, glass wool and glass bead loading calculation, and sequencing results for oligos synthesized on glass wool, glass beads and CPG. See DOI: https://doi.org/10.1039/d4sc06958g |
‡ Equal contributors. |
This journal is © The Royal Society of Chemistry 2025 |