Shuanghong
Yan‡
ac,
Xintong
Li‡
*d,
Panke
Zhang
abc,
Yuqin
Wang
ac,
Hong-Yuan
Chen
abc,
Shuo
Huang
*abc and
Hanyang
Yu
*d
aState Key Laboratory of Analytical Chemistry for Life Sciences, Nanjing University, 210023, Nanjing, China
bCollaborative Innovation Centre of Chemistry for Life Sciences, Nanjing University, 210023, Nanjing, China
cSchool of Chemistry and Chemical Engineering, Nanjing University, 210023, Nanjing, China. E-mail: shuo.huang@nju.edu.cn
dDepartment of Biomedical Engineering, College of Engineering and Applied Sciences, Nanjing University, 210023, Nanjing, China. E-mail: hanyangyu@nju.edu.cn
First published on 23rd January 2019
2′-deoxy-2′-fluoroarabinonucleic acid (FANA), which is one type of xeno-nucleic acid (XNA), has been intensively studied in molecular medicine and synthetic biology because of its superior gene-silencing and catalytic activities. Although urgently required, FANA cannot be directly sequenced by any existing platform. Nanopore sequencing, which identifies a single molecule analyte directly from its physical and chemical properties, shows promise for direct XNA sequencing. As a proof of concept, different FANA homopolymers show well-distinguished pore blockage signals in a Mycobacterium smegmatis porin A (MspA) nanopore. By ligating FANA with a DNA drive-strand, direct FANA sequencing has been demonstrated using phi29 DNA polymerase by Nanopore-Induced Phase Shift Sequencing (NIPSS). When bound with an FANA template, the phi29 DNA polymerase shows unexpected reverse transcriptase activity when monitored in a single molecule assay. Following further investigations into the ensemble, phi29 DNA polymerase is shown to be a previously unknown reverse transcriptase for FANA that operates at room temperature, and is potentially ideal for nanopore sequencing. These results represent the first direct sequencing of a sugar-modified XNA and suggest that phi29 DNA polymerase could act as a promising enzyme for sustained sequencing of a wide variety of XNAs.
2′-deoxy-2′-fluoroarabinonucleic acid (FANA) is an RNA analogue in which the ribose ring has been replaced by a 2′-fluoroarabinose moiety (Fig. 1A).7 It has been demonstrated that FANA exhibits potent gene-silencing capability and a longer serum half-life.8,9 Recent advances in polymerase engineering have enabled in vitro evolution of functional FANA molecules with specific ligand-binding and catalytic activities.10,11 Although two engineered polymerases, such as D4K and RT521, could catalyze efficient information transfer between FANA and DNA,12 the fidelity of such an enzyme-mediated polymerization reaction has not been thoroughly examined due to the lack of direct FANA sequencing technologies. Though FANA may be sequenced by the approach of sequencing by hybridization using an Affymetrix chip,13–15 no related work has been reported to the best of our knowledge. Consequently, the development of a direct sequencing methodology for FANA, preferentially also applicable to other XNAs, is of great significance for the growth of the XNA field.
Nanopore sequencing, which recognizes DNA base identities16 and the corresponding epigenetic modifications17,18 from their unique physical or chemical properties,16,19,20 is an emerging single-molecule technology which is promising for direct XNA sequencing. In a typical nanopore measurement, the nanopore is the only access which permits the flow of the ionic current between two chambers containing electrolytic solutions. With an applied potential, charged analytes are electrically driven through the pore, and the analytical information could be readily recognized and distinguished from the current blockades. Nowadays, the MinION™ sequencer (Oxford Nanopore Technologies, UK), which offers advantages of speed, label-free and long read length, could sequence genomic DNA with an affordable cost in a palm sized device.21 However, though urgently needed, direct sequencing of xeno-nucleic acids has not been demonstrated to date.
For FANA to be directly sequenced by a nanopore, the pore blockage signals should show a detectable sequence dependence. As a proof of concept, a static pore blockage experiment assisted by a streptavidin stopper was performed with an engineered Mycobacterium smegmatis porin A (MspA) nanopore.22 Clearly resolvable signals are observed from homopolymer FANA (polyU, polyC and polyA, ESI Table S1†). As demonstrated with DNA sequencing using nanopores,19 the selection of a ratcheting enzyme (a helicase or a polymerase) which drives the strand to move along the pore restriction is critical for direct FANA sequencing. However, as a member of xeno-nucleic acids, FANA doesn′t possess a large archive of compatible enzymes that work at room temperature with a high processivity like phi29 DNA polymerase (DNAP).23,24 Here the processivity means an enzyme′s ability to catalyze “consecutive reactions without releasing its substrate”. As a compromise, the height of the octameric MspA nanopore25 could be utilized to perform “Nanopore Induced Phase-Shift Sequencing (NIPSS)”, which is the method first defined in this paper and is a universal nanopore sequencing method for a variety of biomacromolecules provided a DNA drive-strand could be chemically attached.
According to single molecule kinetics results observed from direct FANA sequencing using NIPSS and ensemble assays investigated by gel electrophoresis, the phi29 DNAP has been surprisingly shown to perform FANA templated DNA synthesis. Since FANA is a RNA analogue, the phi29 DNAP should thus be categorized as its reverse transcriptase. Although with a reduced efficiency against the opposing electric field during nanopore sequencing, phi29 DNAP, which is a highly processive enzyme that works at room temperature, is a promising candidate for protein engineering aiming for sustained nanopore sequencing of FANA. As reported, significant efforts have been made similarly to realize direct RNA sequencing using nanopores.26
To the best of our knowledge, this is the first report of single molecule FANA identification and direct sequencing using a nanopore sensor. Although preliminarily, the reported “NIPSS” method provides a universal means to sequence a variety of xeno-nucleic acids or even other biomacromolecules in a nanopore sequencing scheme, which could unambiguously discriminate between DNA, RNA or other XNA nucleotides even within a chimeric strand. The nanopore sequencing assay demonstrated in this paper could also be adapted to the screening of other XNA compatible motor proteins for sustained single molecule sequencing. The demonstrated results also add phi29 DNAP to the family of FANA reverse transcriptases as a promising model for more optimized and sustained FANA sequencing using nanopores.
To evaluate potential FANA-pore restriction interactions28 and to probe whether FANA, which has an altered sugar backbone compared to DNA or RNA, could produce clear sequence specific blockage currents (Ib), three FANA homopolymers (FANA polyU, polyC and polyA) were designed (ESI Table S1,† Supporting Methods) and synthesized for use in the static pore blockage assay. FANA polyG was excluded to avoid complications from G-quadruplexes. Considering the high cost of the FANA monomer, three FANA homopolymers are designed in a chimeric DNA–FANA form (ESI Table S1†), where the nucleotides that are to be recognized by the pore restriction are still composed of FANA. When FANA, which is negatively charged, is electrophoretically driven into the pore at +180 mV, an instantaneous reduction in the ionic current (Ib) at the open pore level (Io) is observed. By evaluating different FANA homopolymers with this assay, which produces statistics for Ib from a programmed cycling voltage protocol, the value of Ib from different types of homopolymers follows the order Ib,polyU < Ib,polyA < Ib,polyC as demonstrated by their corresponding representative traces (Fig. 1C). By sequentially adding different types of homopolymer samples to the cis side during a continuous measurement with the same pore, the identity for each Ib value is judged by the order of appearance for the corresponding peak during statistics (ESI Fig. S1†). To minimize signal drifting from inevitable issues such as pore to pore variations or water evaporation during data acquisition, the value of Ib/Io is taken as a normalized percentage blockage amplitude. Histograms of the Ib/Io of FANA homopolymers and the corresponding Gaussian fittings are shown in Fig. 1D, which demonstrate the signal dispersion and peak separation. To form a statistical conclusion and to evaluate the reproducibility of the assay, three independent measurements were performed for each assay with different FANA homopolymers (ESI Table S2†).
As demonstrated, different FANA homopolymers can be readily distinguished by an MspA nanopore. According to previously published results, examination of DNA homopolymers shows a different order of Ib/Io (Ib,polyT < Ib,polyC < Ib,polyA) in contrast to the results from FANA (Ib,polyU < Ib,polyA < Ib,polyC), in which the order of Ib/Io from polyA and polyC is reversed. This order switch between polyA and polyC simply originates from the altered sugar backbone of FANA, which is a minor physical property variation sensed by the sharply restricting nanopore.
Since FANA polyU and polyA/C generate up to 6%/9% Ib/I0 values (ESI Table S2†), different sequence combinations in FANA are expected to result in detectable transitions in current values, presumably in a nanopore sequencing assay when a strand of FANA translocates through the restriction of MspA in a single nucleotide step.
Though limited, the selection of FANA compatible enzymes includes D4K, Deep Vent and RT521K polymerase,12,35 which may be adapted as motor enzymes in a nanopore sequencing assay. However, they are all designed to work at a higher temperature and they exhibit minimal FANA reverse transcriptase activity below 25 °C, at which routine nanopore sequencing is carried out. Although directed evolution might yield engineered FANA reverse transcriptases that are more temperature-tolerant, laborious efforts are required.
As a compromise, direct FANA sequencing could be carried out by NIPSS as a proof of concept. Being a conically shaped nanopore with a finite height (∼10 nm), the restriction site of MspA, which reads the nucleic acid identity, is always at a fixed distance ahead of the reaction site of the motor protein. As has been reported,20 this fixed distance is equivalent in length to 14–15 nucleotides, which generates a fixed phase shift between electrochemical DNA reading and enzymatic DNA ratcheting. In principle, a limited length of any biomacromolecule could be sequenced by NIPSS as long as this chain-shaped polymer could be chemically tethered to a fragment of ssDNA, which is defined as the “drive-strand” during NIPSS.
Phi29 DNAP, which is widely used for isothermal gene amplification24 or nanopore sequencing,16 is a highly processive (>70 kilobase) enzyme with a strong strand displacement capacity that operates at room temperature. It is thus employed as the motor protein for phase-shift sequencing of FANA. To demonstrate its feasibility, a chimeric DNA–FANA template, which is composed of an FANA strand embedded within a DNA template, is custom synthesized and named FANAx (Fig. 2A, ESI Table S1, Fig. S3†). An abasic spacer, which is known to produce an abnormally large blockage current, is placed between the FANA and DNA as a signal marker to identify the beginning of sequencing signals from FANA. During nanopore sequencing, any plateau-shaped signal transition that appears after the signal from the abasic spacer represents successful direct FANA sequence reading.
Fig. 2 Nanopore sequencing of chimeric DNA–FANA with an abasic spacer (FANAx). (A) The FANAx template annealed with a primer (x: abasic site). The 54 nt DNA acts as the drive-strand for a primer extension. The abasic nucleotide (x) acts as a marker separating DNA and FANA. (B) Schematic diagram of nanopore sequencing of chimeric DNA (grey)-FANA (cyan) with an abasic spacer (red). During sequencing, the nucleic acid strand is directionally (red arrow) driven by a phi29 DNA polymerase (DNAP, green) via the primer extension. The recorded signal corresponds to the (i) DNA, (ii) abasic nucleotide and (iii) FANA sequence passing through the restriction site of MspA. Enzymatic ratcheting halts when the abasic site reaches the binding pocket of phi29 DNAP (ESI Fig. S4†). (C) A representative nanopore sequencing trace for FANAx. The sequencing trace corresponds to DNA (i, black), the abasic site (ii, red) and FANA (iii, blue) within the pore restriction. The step with an arrow is from the TGTT blockage. (D) Overlay of multiple time-normalized events using a level detection algorithm. (E) The phi29 DNAP-mediated FANAx reverse transcription assay analyzed by denaturing PAGE. The primer extension stops at the abasic site (54 nt). M1 and M2: DNA markers. |
Experimentally, direct FANA sequencing by NIPSS was carried out with an electrolyte buffer composed of 0.3 M KCl, 10 mM MgCl2, 10 mM (NH4)2SO4, 4 mM DTT and 10 mM HEPES at pH 7.5. After single pore insertion, the thermally annealed nanopore sequencing library (Supporting Methods) is added to the cis at a final concentration of 5 nM. The sequencing library (ESI Fig. S4†) is composed of three parts: the DNA–FANA chimera, primer and blocker (ESI Table S1†). The blocker protects the DNA–FANA chimera template from enzymatic extension in the solution. During continuous recordings at +180 mV, electrophoretic unzipping of the blocker strand triggers the initiation of the nanopore sequencing signal for FANAx when the phi29 DNAP based primer extension starts. During primer extension driven by the polymerase reaction from the phi29 DNAP, the nanopore reads DNA (Fig. 2B i), the abasic spacer (Fig. 2B ii) and FANA (Fig. 2B iii) sequentially.
From the published literature,16 the MspA reading of AGAA and TGTT (5′→3′ convention, if not otherwise stated) shows the highest and the lowest blockage level, respectively. As a model analyte with clearly distinguishable sequencing patterns, the FANAx is designed to possess a sequence repeat of AGAATGTT in the DNA part and AGAAUGUU in the FANA part (ESI Table S1†). The pore restriction simultaneously reads four nucleotides, and a sequential nanopore reading from TGTT to AGAA with a single nucleotide progression results in five plateaus linked by amplitude transitions between “TGTT, ATGT, AATG, GAAT and AGAA”, given that the nanopore sequencing reads following a 3′→5′ direction. The nanopore sequencing signal from the DNA part is thus expected to possess a triangular shape starting by reading TGTT. The DNA part of the nanopore sequencing signal is black colored in Fig. 2C, where 1.5 period of a triangular shape is recorded. Immediately after the DNA signal, nanopore reading of the abasic spacer results in an abnormally high blockage level higher than that from AGAA, which is colored red in the trace. Any step-shaped trace that appears after the blockage signal from the abasic spacer belongs to that from direct sequencing of FANA.
The FANA part of the signal, which is blue in the trace, shows a similar sequencing pattern in reference to its DNA counterpart. It can be seen that the blockage amplitude of AGAA and UGUU from FANA appears at about the same height as that of AGAA and TGTT from the DNA. However, only 4 plateau transitions are detected when reading UGUU to AGAA within the FANA part of the strand, which indicates that signal amplitude degeneracy exists among AUGU, AAUG and GAAU. To demonstrate the repeatability of the sequencing signal, 29 NIPSS events from FANAx were normalized and overlapped in Fig. 2D. DNA, abasic marker and FANA sequencing signals can be clearly recognized from their characteristic amplitudes (ESI Table S3†).
From single molecule NIPSS events, it can also be seen that the sequencing signal of FANAx always halts when the abasic site reaches the binding pocket of the phi29 DNAP as a result of the primer extension failure caused by the abasic site (ESI Fig. S5†). This phenomenon is also verified in the ensemble by a reverse transcription assay, which is reported by gel electrophoresis (Fig. 2E).
Although the read-length is currently limited to 14–15 bases, the first direct FANA sequencing using NIPSS has been successfully demonstrated with a piece of designed chimeric DNA–FANA strand using NIPSS. Similar to nanopore sequencing of DNA, nanopore sequencing of FANA showed clear sequence-specific pore blockage signals with distinguishable amplitude transitions assisted by the phi29 DNAP based primer extension. As demonstrated by FANAx, the maximum amplitude difference within the sequence currently being read could be more than 20 pA (ESI Table S3†).
To further investigate pore blockage amplitude variations between reading DNA and FANA during NIPSS, a chimeric DNA–FANA strand with a random FANA sequence was designed, synthesized and named FANA30 (Fig. 3A, ESI Table S1†). A DNA reference strand, which is composed of DNA nucleotides with an identical sequence to that of FANA30, was synthesized and named DNA30. Here, the uridine in FANA30 is replaced by thymidine in DNA30 (Fig. 3A, ESI Table S1†).
Fig. 3 Discrimination between DNA and FANA via nanopore sequencing. (A) Diagram of the FANA30 template annealed with a primer. (B) Mean current levels of DNA30 (ESI Table S3†) extracted from multiple events (N = 22). Error bars (red) represent the corresponding standard deviations from different events. The step with an arrow is from the TGTT blockage. (C) Mean current levels of FANA30 extracted from multiple events (N = 22). Sequencing steps from DNA (black) and FANA (blue) are marked, respectively. Error bars (red) represent the corresponding standard deviations from different events. (ESI Table S3†) (D) Mean current level differences between FANA30 and DNA30. (E) The phi29 DNAP-mediated FANA30 reverse transcription assay analyzed by denaturing PAGE. The primer extension yields a detectable full-length product (86 nt) although a significant amount of the extension products stop at the DNA–FANA junction (46 nt). |
As extracted from nanopore sequencing signals from DNA30 and FANA30, sequence-specific mean current levels and the corresponding error bars from n = 22 NIPSS events are presented in Fig. 3B and C, respectively. Nanopore sequencing of FANA30 first generates five sequencing levels that originate from reading the DNA part of the strand. These five levels are marked in black in the event statistics of the signal and overlap almost completely with those from DNA30. The remaining signals, which are marked in blue, are from the phase-shift sequencing of the FANA (Fig. 3C). By subtracting the mean current signals of DNA30 from those of FANA30, the difference of signal amplitude between these two assays could be systematically evaluated (Fig. 3D). From these results, nanopore sequencing from the first five sequencing levels shows negligible differences between DNA30 and FANA30, considering that all these levels are from identical DNA nucleotides. The remaining signals show significant variations between DNA30 and FANA30 due to either the chemical structure variations from the 2′-fluoroarabinose and ribose sugars or from the difference in the base of uridine and thymidine. As shown in Fig. 3D, signal amplitude variations between DNA30 and FANA30 could reach up to 20 pA although chemical structure variations involve only a few atoms (ESI Table S3†).
Different from the NIPSS results of FANAx, where the abasic site prohibits further replication of FANA by the phi29 DNAP (ESI Fig. S5†), phi29 DNAP-based primer extension along the FANA30 in a NIPSS assay normally results in back and forth movement of the phi29 enzyme around the DNA–FANA interface (ESI Fig. S6†), where no abasic site remains. This single molecule phenomenon implies that the phi29 DNAP, which is a highly processive enzyme, attempts to replicate beyond the DNA–FANA interface as an FANA reverse transcriptase.
To further test this hypothesis, a phi29 DNAP-mediated reverse transcription assay for FANA30 was performed (ESI Fig. S7†) and analyzed by denaturing polyacrylamide gel electrophoresis (Fig. 3E). It was observed that a full-length reverse transcription product of 86 nucleotides formed, indicating that phi29 DNAP is capable of catalyzing FANA-templated DNA synthesis. Consistent with nanopore sequencing results, most primer extensions stop at the DNA–FANA junction, generating a 46 nt truncated product and suggesting that phi29 DNAP′s FANA reverse transcriptase activity is significantly reduced compared to its DNA polymerase activity.
Two chimeric DNA–FANA oligomers have been tested so far using NIPSS and confirm the new concept of direct FANA sequencing by the NIPSS strategy. As demonstrated with DNA, the restriction site of MspA simultaneously accommodates four DNA bases. This result indicates that by forming a look-up table composed of all 24 = 256 FANA sequence combinations, unknown FANA sequences could be deduced directly from the acquired nanopore sequencing data. To form an independent archive of FANA sequencing signals in the form of a look-up table, nanopore sequencing for FANA with a long read length is needed for efficient data acquisition. The unexpected reverse transcriptase activity from phi29 DNAP suggests that this DNA polymerase with a high processivity and low working temperature may also be compatible for sustained direct FANA sequencing beyond the DNA–FANA sequence interface.
In an ensemble reverse transcription assay investigated by denaturing polyacrylamide gel electrophoresis, primer extension for FANA42 mainly stops at the DNA–FANA junction although a noticeable amount of full length product (96 nucleotides) is detectable (Fig. 4B).
In the nanopore sequencing assay for FANA42, the DNA sequencing signal appears first and is followed by the NIPSS reading of the FANA (ESI Fig. S8†). Means and standard deviations of nanopore sequencing signals from FANA42 were extracted from N = 20 independent NIPSS events to generate statistics (Fig. 4C, ESI Fig. S9†). Within the statistics, a triangular shaped sequencing signal could be extracted and used as a reference with which the relative position of phi29 DNAP on the FANA42 can be detected (Fig. 4C). Within each cycle of FANA sequencing, a statistically normalized signal pattern with asymmetry could be extracted (image inset of Fig. 4C). Although back and forth movement of the enzyme is occasionally observed, the asymmetry of the sequencing signal serves to determine the position of the enzyme unambiguously. Within each cycle of FANA sequencing data, the pore blockage levels within each cycle are marked with characters from a to g for the sake of simplicity. Here, a1 means level “a” within the “1st” cycle.
The signal of interest starts from reading UGUU (a1), which is the 4 nucleotide combination that shows the lowest pore blockage level (a1, a2, a3, etc.) when read by the nanopore restriction (Fig. 2), and AGAA reports the highest pore blockage amplitude (d1, d3, d3, etc.) in contrast. This triangular shaped signal pattern from the nanopore sequencing assay can determine precisely the location of the enzyme in reference to the FANA template with an Å spatial resolution similar to that from nanopore tweezers, the SPRNT approach31 (ESI Table S3†).
According to the phase-shift sequencing data from FANAx, level f2 is achieved but the enzyme is unable to proceed further due to the abasic stopper within the binding pocket of the phi29 DNAP (ESI Fig. S5†). From nanopore sequencing of FANA42, it can be seen that the enzyme could proceed further to level a3, which is 2-nucleotides ahead of the DNA–FANA interface, and this is confirmed by measurements with FANAx (ESI Fig. S5†). Different from the results of FANAx, when the primer extension halts at level f2, nanopore sequencing of FANA42 shows frequent amplitude transitions within multiple levels (d2–a3). This single molecule phenomenon indicates that the phi29 DNAP attempts to proceed beyond the DNA–FANA interface but encounters difficulties (ESI Fig. S9†). Different from the spontaneous back and forth movement of the enzyme, which normally jumps between two sequencing levels, the observed amplitude transitions within multiple levels (Fig. 4D) are probably due to a dynamic balance between the reverse transcriptase and exonuclease activities which are both possessed by the phi29 DNAP. As shown in Fig. 4D, the phi29 DNAP moves backward from a3 to a2, which is a reverse movement of the enzyme with 7 nt steps, and this is immediately followed by a forward movement of the enzyme from a2 to g2. We can exclude the possibility that this backward movement of the enzyme is a result of electrophoretic unzipping, which is irreversible and would not be followed by any stepwise forward movement of the DNAP anymore. By acknowledging the pre-designed FANA42 sequence, which generates an asymmetric and periodic signal pattern during NIPSS, it is clear that this single molecule experiences enzymatic exonuclease/reverse transcriptase activity around the DNA–FANA interface.
Although the reverse transcriptase activity of phi29 DNAP is verified from ensemble assays (Fig. 3E and 4B), the DNA–FANA interface still appears to be a reverse transcription barrier for the phi29 DNAP according to the single molecule assay assisted by nanopore sequencing. It is suspected that the opposing electrophoretic force during nanopore measurements may be the origin of this reduced enzymatic efficiency, considering that both FANA30 and FANA42 show clear reverse transcription products in ensemble (Fig. 3E and 4B).
Still, in a practical nanopore sequencing assay of FANA by NIPSS as demonstrated in this paper, the FANA oligomer to be sequenced could be first enzymatically ligated with the DNA drive-strand on its 3′ end (ESI Fig. S3†). Although limited in read-length, phase shift sequencing of FANA is capable of decoding the sequence of the first 14–15 nucleotides on the 3′ end of the FANA analyte. Since short FANA polymers have been demonstrated to exhibit potent functional properties such as gene-silencing and catalytic activities,8,11 this read-length by NIPSS may be suitable for these FANA molecules, which have short lengths in nature. By introducing redundant materials above the pore, a NIPSS read-length of 30–40 bases should be technically feasible, which matches the read-length of early next generation sequencing platforms.36 In combination with a high-throughput nanopore array37 and bioinformatics tools for sequence decoding (ESI Fig. S10†), long FANA sequences may be re-assembled by fragmented sequence read-outs.
Footnotes |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c8sc05228j |
‡ These authors contribute equally to this work. |
This journal is © The Royal Society of Chemistry 2019 |