Filip T.
Szczypiński
and
Christopher A.
Hunter
*
Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK. E-mail: herchelsmith.orgchem@ch.cam.ac.uk
First published on 11th January 2019
Competition from intramolecular folding is a major challenge in the design of synthetic oligomers that form intermolecular duplexes in a sequence-selective manner. One strategy is to use very rigid backbones that prevent folding, but this design can prejudice duplex formation if the geometry is not exactly right. The alternative approach found in nucleic acids is to use bases (or recognition units) that have different dimensions. A long-short base-pairing scheme makes folding geometrically difficult and is compatible with the flexible backbones that are required to guarantee duplex formation. A monomer building block equipped with a long hydrogen bond donor (phenol, D) recognition unit and a monomer building block equipped with a short hydrogen bond acceptor (phosphine oxide, A) recognition unit were prepared with differentially protected alcohol and carboxylic acid groups. These compounds were used to synthesise the homo and hetero-sequence 2-mers AA, DD and AD. 19F and 31P NMR experiments were used to characterize the assembly properties of these compounds in toluene solution. AA and DD form a stable doubly-hydrogen-bonded duplex with an effective molarity of 20 mM for formation of the second intramolecular hydrogen bond. AD forms a duplex of similar stability. There is no evidence of intramolecular folding in the monomeric state of this compound, which shows that the long-short base-pairing scheme is effective. The ester coupling chemistry used here is an attractive method for the synthesis of long oligomers, and the properties of the 2-mers indicate that this molecular architecture should give longer mixed sequence oligomers that show high fidelity sequence-selective duplex formation.
It is clear that duplex formation is not restricted to the precise molecular structure found in DNA and RNA. A range of nucleic acid analogues have been prepared in which the phosphate diester,9–11 the bases,7,12–15 and the sugar have been replaced,16–22 and all of these oligomers form stable duplexes. Synthetic oligomers that bear no relation to nucleic acids have also been shown to form duplexes through various non-covalent interactions: metal–ligand coordination,23,24 salt bridges,25,26 aromatic interactions,27 and hydrogen bonding.28–30 By using two different complementary recognition sites as the equivalent of the nucleic acid bases, it is also possible to encode sequence information into synthetic oligomers, and sequence-selective duplex formation has been demonstrated for short sequences.26,31
We have been using a single hydrogen bond between a hydrogen bond donor (e.g. phenol, D) and a hydrogen bond acceptor (e.g. phosphine oxide, A) as the base-pairing interaction for duplex formation. This two letter alphabet allows information to be encoded in an oligomer as the sequence of A and D recognition sites. Provided the backbone does not contain any polar functional groups that could compete with the base-pairing interactions, the use of a single hydrogen bond as the base-pair removes any possibility of mismatches, because A cannot interact with A and D cannot interact with D. A number of different backbone architectures have been characterized, and the nature of the backbone was found to play a crucial role in the assembly properties of these oligomers.
The different possible self-assembly channels are illustrated in Fig. 1. The key requirement for duplex formation is that the equilibrium constant for propagation of the intramolecular hydrogen bonds that zip up the duplex, K EMp, is greater than one (K is the association constant for formation of an intermolecular hydrogen bond, and EMp is the effective molarity for propagation of intramolecular hydrogen bonds in the duplex).32,33 One of the competing assembly channels is formation of multiple intermolecular interactions that lead to higher order networks, but this process can be avoided by operating at a concentration, c, which is lower than the value of EMi, the effective molarity for formation of the first intramolecular hydrogen bond that initiates duplex formation. The other major competing assembly channel is due to the formation an intramolecular hydrogen bond within an oligomer, which leads to folding. The probability of this process is determined by the equilibrium constant K EMf, where EMf is the effective molarity for folding.
The values of the three effective molarity parameters depend on the conformational properties of the backbone. For the very flexible backbone shown in Fig. 2(a), the values of EMi and EMp are 10 mM to 30 mM, and the duplex channel dominates for length complementary homo-oligomers.34 For the very rigid backbone shown in Fig. 2(b), similar results were obtained with EMi and EMp values of 40 mM to 70 mM.35 Geometry is critical for more rigid backbones. The backbone shown in Fig. 2(b) has a well-defined geometry, which places the recognition groups in the correct orientation for duplex formation. However, for backbones of intermediate rigidity, where the conformational properties are more difficult to predict, mixed results were obtained. The backbone shown in Fig. 2(c) formed duplexes with EMi = EMp = 10 mM,36 but the backbones shown in Fig. 2(d) and (e) did not lead to extended duplexes. For these two systems, EMi was similar to the values found for the other backbones (10 mM to 20 mM), but the geometry was not compatible with duplex propagation, and EMp was too low to measure.37
Fig. 2 Backbones (a)–(e) of the previously reported synthetic information molecules.34–36,38 |
The results obtained for homo-oligomers suggest that highly flexible backbones should provide a reliable platform for the design of duplex-forming oligomers. Conformational flexibility ensures that the backbone will always be able to adapt to a geometry compatible with base-pair formation in an extended duplex. More rigid backbones are difficult to design with the degree of accuracy required to guarantee the geometric complementarity needed for formation of an extended duplex.37 The values of effective molarity measured for the very flexible backbone and the very rigid backbone shown in Fig. 2 are similar, so it appears that effective molarities associated with duplex formation are not adversely affected by conformational flexibility. Very flexible backbones are easily accessed, so this approach would make backbone design straightforward.
However, the effective molarity for intramolecular folding, EMf, also depends on the conformational properties of the backbone. As shown in Fig. 3(a), a long flexible backbone promotes 1,2-folding between A and D recognition units that are adjacent in sequence. The value of EMf for this system is about 10 mM, which is comparable to the values of effective molarity for zipping up the duplex, so the folding channel will dominate for mixed sequence oligomers of this architecture.39 Of course, longer mixed sequence oligomers will always be able to fold, no matter what backbone is used, and indeed sequence-encoded folding of single-stranded RNA is key to the biological properties.2 Folded nucleic acid structures involve looped out bases, so if a single-stranded nucleic acid is annealed with a sequence-complementary strand, duplex formation will dominate, because additional base-pairing interactions are made in the duplex. However, Fig. 1 shows that if 1,2-folding is possible, the number of base-pairs formed in the folding and duplex channels can be identical, so the folding channel will dominate. Minimising 1,2-folding is therefore critical to the design of recognition-encoded oligomers that form sequence-selective duplexes with high fidelity.
Fig. 3 (a) Intramolecular 1,2-folding in information molecules with flexible backbones. (b) Duplex formation in information molecules with rigid backbones. |
One strategy for avoiding 1,2-folding is to reduce the value of EMf by increasing the rigidity of the backbone. As shown in Fig. 3(b), the very rigid backbone that we studied previously does not fold, so duplex formation is the dominant assembly channel for mixed sequence oligomers of this architecture. However, it would be preferable to work with more flexible backbones to guarantee duplex formation, as explained above. Here, we explore an alternative strategy for preventing 1,2-folding in oligomers with a very flexible backbone. If two short bases are attached to a long flexible backbone, 1,2-folding is favoured (Fig. 4(a)). Fig. 4(b) illustrates how folding can be prevented by attaching the two short bases to a rigid backbone. Fig. 4(c) shows how changing the dimensions of the bases can be used to prevent folding. By making one of the bases longer than the other, the probability of finding a backbone conformation compatible with folding is significantly reduced, and the duplex assembly channel should dominate. Fig. 4(d) shows the corresponding molecular design that we validate in this paper. It is worth noting that this short-long base-pairing scheme has similar geometrical properties to the purine–pyrimidine base-pairing system found in nucleic acids.
The backbone proposed in Fig. 4(d) uses ester linkages as the coupling chemistry for the synthesis of oligomers. Esters are sufficiently weak hydrogen bond acceptors (β ≈ 5.5) not to compete significantly with the phosphine oxide recognition units (β ≈ 10.5).40 Ester coupling is sufficiently high-yielding to be used for the synthesis of polymers, and iterative coupling could be automated in a peptide synthesiser.41–43 Orthogonal protecting groups have been developed for the preparation of oligoesters with sequences of different building blocks.44–51 Here, we describe synthesis of the required monomer building blocks, demonstrate their use in the synthesis of different 2-mer sequences, and show that the long-short base-pairing scheme successfully prevents 1,2-folding for this oligomer architecture.
For the ester coupling reactions, the potentially reactive phenol moiety in 9 was first protected as the acetyl ester 11 (Scheme 2). The benzyl and TBDPS protecting groups in 10 and 11 were removed orthogonally to give the four precursors 12–15 required for ester coupling reactions. Treatment with hydrogen gas over palladium on charcoal gave the monoprotected carboxylic acids 12 and 14. Alternatively, reaction with n-tetrabutylammonium fluoride buffered with acetic acid gave the monoprotected alcohols 13 and 15. These monoprotected hydroxyacid monomers were used to synthesise three different 2-mer sequences by EDC coupling with a catalytic amount of N,N-dimethylaminopyridine (Scheme 3). Coupling 14 with 15 gave AA directly. AD and DD were obtained with the phenol groups protected as acetate esters, but these groups were removed quantitatively by stirring in a solution of ammonium acetate in water and methanol.
Complex | logK/M−1 | 19F NMR | 31P NMR | ||||
---|---|---|---|---|---|---|---|
δ free/ppm | δ bound/ppm | Δδ/ppm | δ free/ppm | δ bound/ppm | Δδ/ppm | ||
A·D | 3.6 ± 0.1 | −61.2 | −61.6 | −0.4 | 34.2 | 41.0 | 6.8 |
AA·DD | 5.8 ± 0.1 | −61.1 | −61.5 | −0.4 | 34.3 | 39.3 | 5.0 |
AD·AD | 5.2 ± 0.1 | −61.1 | −61.5 | −0.4 | 35.8 | 40.9 | 5.1 |
Fig. 5 Hydrogen bonded duplexes formed by (a) the AA and DD 2-mers, and (b) the self-complementary AD 2-mer. |
A schematic representation of the equilibria involved in duplex assembly is shown in Fig. 6. For AA·DD, formation of the first intermolecular hydrogen bond gives an open complex, and formation of the second intramolecular hydrogen bond gives the closed duplex. Assuming that all of the hydrogen bonds in the systems described here are of similar strength, it is possible to describe the association constant for formation of the closed c-AA·DD duplex in terms of the association constant for formation of a single intermolecular hydrogen bond KA·D and the effective molarity for the intramolecular interaction EMi. The backbone in these systems has a direction, because the hydroxyl and acid ends are different, so parallel and anti-parallel orientations of the duplex are possible. As the end groups are spatially separated from the recognition sites, we assume that the two possible c-AA·DD have similar stability. Therefore, the open complex o-AA·DD has four equally populated states and the closed duplex c-AA·DD has two.
It is possible to express the association constants for duplex formation in terms of KA·D and EMi:
(1) |
Hence the effective molarity for duplex formation can be determined as:
(2) |
The association constants in Table 1 were used to calculate EMi for this system as 19 ± 3 mM, which is consistent with values of supramolecular effective molarities we have measured for other hydrogen bonded duplexes.31,34–36,38,39 The equilibrium constant for closing the duplex is given by and is 40 for this system, which implies that the duplex is fully closed and only 2% of the species populate the partially-bound open state o-AD·AD.
For the closed hetero-2-mer duplex c-AD·AD, there is no degeneracy associated with the backbone directionality, because the anti-parallel orientation is determined by the sequence. However, there is the possibility of intramolecular 1,2-folding in the monomeric state, which is governed by the corresponding effective molarity EMf. Hence, the observed dimerisation constant KAD·AD depends on the concentrations of the folded (ADfolded) and open (ADopen) species that are populated in the monomeric state:
[AD] = [ADopen] + [ADfold] = [ADopen](1 + KA·DEMf) | (3) |
(4) |
Assuming that the effective molarity for duplex formation, EMi, is the same for AA·DD and AD·AD, it is possible to combine eqn (2) and (4) to determine (KA·DEMf + 1), which is the factor that describes the fraction of monomeric AD that exists in the folded state:
(5) |
Substituting the values from Table 1 into eqn (5) gives a value of 1.0 for (KA·DEMf + 1), which is consistent with the NMR chemical shift data. These results indicate that virtually all monomeric AD exists in the open state and the 1,2-folding does not compete with duplex formation in this system.
If the two arrangements of the c-AA·DD were not degenerate, the statistical factor in eqn (1) would be equal to one, giving (KA·DEMf + 1) ≈ 1.4. This value would require that 30% of monomeric AD exists in the folded state, which is not consistent with the NMR chemical shift data, suggesting that assumption that the parallel and antiparallel backbone arrangements are equally populated in the c-AA·DD duplex is reasonable.
(6) |
(7) |
The two donor binding sites were assumed to be independent and identical, hence K1K2 = KA·D2 could be fixed in the least squares regression analysis. The association constant for the DD·A was determined to be K1 = (15000 ± 2000) M−1, which is four times greater than the single hydrogen bond association constant KA·D and suggests additional stabilisation due to a hydrogen bond between the second phenol and the phosphine oxide. We can represent the equilibria leading to the doubly bonded complex as in Fig. 8.
Noting that both 1:1 complexes give rise to the observed association constant, K1 can be expressed as:
K1 = 2KA·D + 2KA·DK′ | (8) |
The association constant for the formation of the second hydrogen bond is, therefore:
(9) |
Using eqn (9) and the measured value for K1, the association constant for the interaction of the second phenol donor with the acceptor is K′ (1.0 ± 0.2), which means that the double-bonded complex represents 50% of the 1:1 complex. The ratio of KA·DEMi and K′ describes the competition between a correctly recognised duplex and a doubly hydrogen-bonded mismatched complex. This ratio is 80 for this system, therefore sequence selectivity should be achieved for longer information oligoesters with fidelity of 99%. For comparison, the previously reported sequence-containing information oligomer shows K′ = 1.6 and KA·DEMi = 9.9, hence exhibits sequence fidelity of 86%.31,39 While the value of K′ for the system described here is comparable with that reported earlier, the exceptionally strong hydrogen-bonding interaction between the recognition units should lead to superior performance the formation of closed duplexes with high sequence fidelity.
Footnote |
† Electronic supplementary information (ESI) available: Detailed experimental procedures with spectroscopic characterization data, 19F NMR titration spectra, binding isotherms, limiting chemical shifts for free and bound states. See DOI: 10.1039/c8sc04896g |
This journal is © The Royal Society of Chemistry 2019 |