Robert K.
Neely
*a,
Peter
Dedecker
a,
Jun-ichi
Hotta
a,
Giedrė
Urbanavičiūtė
b,
Saulius
Klimašauskas
b and
Johan
Hofkens
a
aDepartement Chemie, Katholieke Universiteit Leuven, Celestijnenlaan 200F, B-3001, Heverlee, Belgium. E-mail: Robert.neely@chem.kuleuven.be; Fax: +0032 (0)16 327990; Tel: +0032 (0)16 327399
bInstitute of Biotechnology, V.A. Graiciuno 8, LT-02241, Vilnius, Lithuania
First published on 11th August 2010
We present a new method for single-molecule optical DNA mapping using an exceptionally dense, yet sequence-specific coverage of DNA with a fluorescent probe. The method employs a DNA methyltransferase enzyme to direct the DNA labelling, followed by molecular combing of the DNA onto a polymer-coated surface and subsequent sub-diffraction limit localization of the fluorophores. The result is a ‘DNA fluorocode’; a simple description of the DNA sequence, with a maximum achievable resolution of less than 20 bases, which can be read and analyzed like a barcode. We demonstrate the generation of a fluorocode for genomic DNA from the lambda bacteriophage using a DNA methyltransferase, M.HhaI, to direct fluorescent labels to four-base sequences reading 5′-GCGC-3′. A consensus fluorocode that allows the study of the DNA sequence at the level of an individual labelling site can be generated from a handful of molecules.
Such repeats of the DNA sequence are surprisingly common. Known as copy number variations (CNVs), they are measured relative to a reference genome4 and are of greater than 1 kilobase in length5 and can reach lengths of several megabases. On a study of the genomes of 270 individuals, copy number variable regions were found to cover a total of 360 megabases, or approximately 12% of the human genome.5 They have been implicated in a variety of genetic disorders including schizophrenia6 and congenital heart defects.7 Repeats can be detected using third-generation sequencing methods1 but these techniques represent a rather labour and material-intensive route to studying CNVs. Further, given the variable number of copies that may be present and the hugely variable length of these repeats, the suitability of parallel sequencing methods for studying copy number variations is debatable.
Optical mapping of DNA is a complementary technique to DNA sequencing and in principle it provides a simple and intuitive route to visualize the sequence of a DNA molecule, typically on the scale of kilo- to mega-bases.8 Such mapping is critical to validate the assembly of short DNA sequence reads, particularly in complex and repetitive genomes.9 Optical mapping utilizes molecular combing10 in order to linearly align large DNA molecules on a surface, allowing for their subsequent imaging and the linear positioning of, for example, restriction enzyme sites along the DNA. Indeed, optical mapping using restriction enzymes has been pioneered by the Schwartz lab11,12 and the technique has been critical in validating the final versions of many genomes.13,14 Typically, it utilizes restriction enzymes that recognize 6- or 8-base sequences, giving a cleavage site on average every ∼4 kilobases or ∼65 kilobases, respectively (though these figures vary significantly depending on the genome).
‘DNA barcodes’ offer an alternative strategy to optical restriction mapping that also yields a genomic-scale map of the DNA sequence. These methods use sequence-specific fluorescent labelling of DNA and have the potential to be combined with sub-diffraction limit imaging techniques to significantly improve on the resolution that results from restriction mapping. Yet no study to date has been able to successfully combine both the sequence-specificity of restriction mapping and sub-diffraction limit positioning of fluorescent probes. Gad et al.15 have reported DNA barcodes for the BRCA1 and BRCA2 genes, variations in which are known to increase susceptibility to breast cancer. Using fluorescent antibodies the detection of a large deletion (∼24 kb) in the BRCA1 gene at the single molecule level is readily achieved. The sub-diffraction-limit positioning of fluorophores on DNA has previously been achieved by Qu et al.16 who used 7-base-long bis-PNA molecules that bind to DNA. However, the binding of the bis-PNA molecules was found to be rather non-specific. Sequence specific fluorescent labelling of DNA is achievable using ‘nick-translation’.17 In combination with molecular combing, nick-translation has been used to produce DNA barcodes using standard optical microscopy.18 DNA nicking enzymes produce single-strand breaks in DNA with high sequence specificity (typically at sequences 6 bases in length). The breaks can be labelled using a DNA polymerase enzyme and a fluorescently labelled nucleotide. Furthermore, using this approach DNA molecules have been mapped as they are driven through ‘nanoslits’ by an electric potential.19 In such a high-throughput format, fluorophore positions were determined with a standard deviation of around 3.5 kb.
We report a significant advance on the current state-of-the-art in optical DNA mapping by using a DNA methyltransferase to label the DNA at sequences reading 5′-GCGC-3′. The unique and reproducible pattern produced by this labelling, in combination with the high labelling density and sub-diffraction-limit localization of the fluorophores, enables identification of elements of the DNA at the level of single genes.
Fluorescent labelling using mTAG is a simple two-step procedure. The first step is a DNA methyltransferase-catalyzed covalent attachment of a linear side chain with a terminal amino group to the DNA. This reaction occurs upon incubation of the DNA along with a DNA methyltransferase and a modified methyltransferase cofactor (see Supplementary Fig. 1†), which is synthetically prepared.22 We employed an engineered version of the HhaI DNA methyltransferase enzyme (M.HhaI)23, which recognizes the four-base sequence 5’-GCGC-3′ and targets the italicised cytosine for modification at the C5-position to direct the fluorescent labelling of genomic DNA from the lambda bacteriophage. DNA methyltransferases, which typically work with these modified cofactors as wild-type enzymes or sterically engineered variants,20,21 offer a broad range of recognition site specificities.24 Hence, sequence coverage can be tailored to suit the DNA molecule and problem of interest. The resulting ‘derivatized DNA’ can be fluorescently labelled by incubation with a standard, commercially available amine-reactive fluorophore (succinimidyl ester). For this, we used the highly photostable dye, Atto647N.
There are a total of 215 target sites for HhaI on the 48.5 kilobases of the lambda phage genome, which have a distinctive distribution along the molecule, as indicated in Fig. 1. 149 HhaI sites lie between base 1 and 22500, a ∼5000 base gap defines the central region of the lambda DNA molecule and a less densely labelled region, from 27500 bases to the end of the molecule contains the remaining 66 HhaI sites. Fig. 1 simulates the appearance of an ideal lambda DNA molecule, that is uniformly stretched and labelled at every HhaI site on the molecule, under the microscope.
Fig. 1 Generated image for a simulated lambda phage DNA. Each fluorophore position is displayed with a (Gaussian) point spread function that has a full-width half maximum (FWHM) of 305 nm, the expected size of a diffraction-limited spot for a single molecule emitting at 700 nm. The molecule is shown with a step between base pairs of 3.4 Å and has a length of 16.5 μm. |
Fig. 2 DNA combing using an evaporating droplet. Stills taken from a movie of DNA combing (see Supplementary Video 1†). Exposure time is 1 s and each frame is 41.5 μm by 41.5 μm in size. DNA molecules that are adsorbed to the surface in the early frames of the movie are swept away by the receding edge of the droplet. Deposition occurs at the air–water interface, which is clearly seen in the movie because of the bright but blurred fluorescence intensity from several DNA molecules that are rapidly diffusing there. DNA molecules are combed and stretched to around 1.6× their crystallographic length. |
In the context of the densely labelled DNA molecule, sub-diffraction-limit localization of a fluorophore necessitates the isolation and identification of the emission from individual fluorophores on the DNA. One established approach to enable this is the dSTORM30–32 technique, which utilizes on/off switching in organic fluorophores to ensure that single emitters can be readily isolated and their positions accurately determined. Whilst our labelling approach allows the use of this technique in principle, in practice we found that the DNA immediately dissociated from the surface upon addition of a solution (used to enable the on/off switching in dSTORM experiments) to the sample. Hence, we used an approach which utilizes the single-step photobleaching of individual fluorophores as a means to identify and localize them16,33 (Supplementary Figure S2†). This approach enables the use of a wide range of fluorophores for these experiments and does not require the use of an imaging buffer. Movies of the photobleaching of the labels on single DNA molecules were recorded, typically using a relatively long exposure time (i.e. 0.3 s) and low excitation power in order to minimize the effect of fluorophore blinking on our analysis. Fig. 3 shows the result of one such analysis. Following localization of each of the fluorophores on a DNA molecule, a line is projected along the molecule and the distance of each fluorophore along this line is determined, as shown in Fig. 3C. 20 individual DNA molecules were analyzed in this way. Molecules were selected for analysis where the labelling was sufficient that it was clear that the DNA molecule was approximately full length and where the DNA-strand was not obviously composed of more than one molecule. The number of localized fluorophores on a single DNA molecule was found to vary between 64 and 109 with a mean of 87 fluorophores.
Fig. 3 Positioning of the fluorophores. A) Image showing a single lambda DNA molecule. The average intensity image of the DNA taken from the movie is overlaid with the calculated fluorophore positions (red spots). B) Enlarged region of the DNA molecule in A. One pixel is 81 nm (∼150 bases). C) Positions from the DNA molecule in B projected onto a line. |
By convoluting each of the fitted points with a (Gaussian) point-spread function (PSF) with a full-width half maximum of 305 nm we can directly compare the fit to the raw experimental data. Fig. 4 shows the generated fit for one such molecule, along with the first frame from the movie and an image based on the average intensity of the emission over the entire movie.
Fig. 4 A comparison of the fluorocode to the raw data. A) Image taken from the first frame from the recorded photobleaching movie. B) An average image from all of the frames of the movie and (C) The DNA fluorocode, where each localized fluorophore is shown with a PSF with a FWHM of 305 nm. |
The high experimental resolution and sequence-specific labelling reveal heterogeneity in the stretching of the DNA molecules and deviations in the path described by the DNA molecules on the PMMA surface (for example, as shown in Fig. 3). This has important consequences for our measurements, since we ultimately want to know to which base a given fluorophore is attached. In fact, the error in determining the labelling site on the DNA can be significantly greater than the error in fitting its absolute position in the field of view. In order to estimate the error in our measurements along the DNA molecule we measured the observed gap between the fluorophores at the centre of the 20 DNA molecules we measured. Here, we find a standard deviation in the measurement of this ∼5000 base gap of 190 bases; approximately a 4% standard deviation in the distance measurement. This level of precision is unprecedented in any optical mapping study and, as we will show, allows the unambiguous alignment of single DNA molecules to a reference sequence.
In order to translate localized positions into labelling sites on the DNA molecule the experimental data is compared to a reference map of the known HhaI sites on lambda DNA34 (referred to as the “HhaI map”, henceforth). This is achieved by comparison of the intensity profiles of the two fluorocodes (experiment and HhaI map) and uses a simple convolution of the two profiles, stretching and shifting them relative to one another in order to maximize their overlap. We aligned 20 lambda DNA molecules in this way, with the result shown in Fig. 5A. The determined stretching factors (detailed in Supplementary Table 1†) vary between 1.50 and 1.67 with an average value of 1.62 implying an average step between base pairs of 0.55 nm for the combed DNA. Direct comparison between the experimental data and the HhaI map allows some quantification of the quality of the fluorocode as an optical map. For each DNA molecule, experimentally determined labelling sites were matched with the closest site (within 200 bases) on the HhaI map. On average 74 of the 87 (85%) fluorophores on a single DNA molecule were matched with a mean standard deviation of 71 bases (39 nm) between the fitted positions and their closest match on the HhaI map. By comparison, optical restriction mapping typically results in one cut to the DNA every 20 kilobases12 (though fragments as small as 700 bases can be characterized) and so one might expect to observe just three or four cut-sites on the lambda DNA molecule.11
Fig. 5 A) Automatically generated alignments of fluorocodes recorded for twenty lambda DNA molecules. Positions have been determined and all localized fluorophores are displayed with a 42 nm PSF. Each molecule is stretched 5-fold perpendicular to the DNA axis in order to enable simple inspection and intuitive alignment of the fluorocode. B) Top: The consensus fluorocode derived from the experimental data where more than three counts are required in a given 33-base bin before that bin is added to the consensus. Middle: The consensus fluorocode derived from the experimental data where more than two counts are required in a given 33-base bin before that bin is added to the consensus. Bottom: The fluorocode derived from the reference ‘HhaI map’ to which all of the experimental data is aligned. |
Notably, however, 15% of the localized fluorophores cannot be matched to the HhaI map. This is likely due in part to non-specific association of free dye molecules or short fragments of labelled DNA with the longer DNA molecules that were the focus of our experiments. Inhomogeneities on the PMMA surface and variable stretching (including breaks) of the DNA are also likely to contribute to the count of unassigned fluorophores.
Approximately one third of the available sites on the DNA are labelled and matched to a known HhaI target site. Previous work has shown the efficiency of DNA modification by the DNA methyltransferase/mTAG cofactor to be near complete.21 Hence, the efficiency of the coupling of the fluorophore to the modified DNA is relatively low. We cannot reliably count all of the emitters on a molecule, since some are bleached before imaging begins but we estimate a labelling efficiency of 50–60%, which is in line with the manufacturer's expectations for the amine-succinimidyl ester coupling. We see no dependence of the labelling efficiency on HhaI site density (Supplementary Figure S3†) though we do note a surprising lack of labelled sites below the 5000th base pair of the DNA. We attribute this is to breakage of the DNA molecules during the labelling and combing processes. The apparent bias toward molecules only missing a small fragment at one end of the DNA likely results from our selection of only the longest DNA molecules (missing short fragments from their ends) for analysis. We found that alignments of short fragments of DNA, containing relatively few fluorophores, to the HhaI map were not reliable.
Fig. 6 Histogram (red) showing number of localized fluorophores falling into bins of 33 bases in width along the DNA molecule. The positions of the HhaI sites on the DNA are shown (black tick marks) as are the sites where the counts in a bin exceed two (dark blue tick marks) and three (light blue tick marks). These positions are used to produce the consensus fluorocode, shown in Fig. 5B. |
The consensus fluorocode using a >2 count threshold contains 248 localized fluorophores. We can associate 163 (66%) of these positions with HhaI sites on the lambda molecule with a standard deviation between the experimentally derived and reference positions of 59 bases (32 nm) (see Supplementary Table 2 and Figure S4†). Raising the consensus threshold to >3 counts gives a fluorocode containing 120 fluorophore positions, 109 (91%) of which can be associated to known HhaI sites on the DNA with a standard deviation of 62 bases between the experimentally derived and expected positions of the fluorophores (see Supplementary Figure S5†). Take into consideration the fact that the sites below 5000 base pairs along the DNA are significantly underrepresented in the data and 86% of the 189 sites between 4158 base pairs and the end of the lambda molecule are assigned in the fluorocode with the threshold of two counts. This is an average of approximately one label every 270 bases for this section of the DNA. Hence, by combining the data from just twenty single molecules, sub-genetic resolution of the optical map can be achieved. The close match of the consensus to the HhaI map demonstrates the validity of the experimental approach and confirms that the DNA can be mapped with a precision of less than 50 nm, at a density of greater than one label every 300 bases. Inhomogeneous stretching of the DNA, non-specific association of free dye molecules and surface effects, which can be significant at the single molecule level, are negated by the consensus fluorocode.
Fig. 7 Internally referenced consensus fluorocode aligned at the right-hand end (top) and the left-hand end (bottom) to the fluorocode of the HhaI map (middle). All emitters are shown with a 50 nm PSF. |
We have shown that we can significantly improve sequence coverage by combining data from several DNA molecules to generate a consensus fluorocode. Indeed, 76% of the target sites for HhaI are described in our consensus fluorocode (Fig. 5B), constructed from 20 DNA molecules. If we consider the lack of experimental data describing the ends of the DNA molecules then, in fact, we see 86% of the sites (163 of 189) between the HhaI site at 4158 bps and the end of the lambda molecule assigned in the consensus fluorocode. On average this equates to one fluorophore every 272 bases. The standard deviation in the position of the fluorophores assigned to each of these sites is just 59 bases. Such labelling density and experimental precision enables the construction of an optical map of genomic material with unrivalled detail and the unambiguous study of DNA motifs on the scale of the single gene.
A fundamental advantage of both optical restriction mapping and the fluorocode over other methods of optical mapping is their lack of necessity for a priori targeting of specific DNA sequences (as in PCR- or antibody-based labelling approaches). This enables an holistic approach to genome analysis and, in theory, makes mapping the genome possible in a single experiment and without any prior knowledge of the DNA sequence. Indeed, as we show in Fig. 7, the fluorocode enables the study of the DNA sequence in the complete absence of a reference map permitting entirely independent detection of repeat sequences of DNA, such as copy number variations.
Using a fluorescent labelling approach to map genomic DNA has distinct advantages over optical mapping using restriction enzymes. We have shown that these include the use of a far higher density of targeted (labelled) sites on the DNA and improved precision in determining the location of these sites. Yet there are significant advances still to be made using the fluorocoding approach. For example, multi-colour labelling of the DNA using two or more methyltransferases to direct the labelling will create a colour fluorocode that allows a high degree of confidence in the analysis and interpretation of the fluorocode. Mapping of DNA methylation status would also be possible using a two-colour approach and one enzyme, such as HhaI, whose activity is blocked by CpG methylation. Multi-colour labelling also enables an optical readout of DNA sequence by flowing a DNA molecule through a nanoslit, such as those designed by Jo et al.19 In all, the fluorocode offers a novel and versatile route to optically map genomic DNA in unprecedented detail.
Footnote |
† Electronic supplementary information (ESI) available: Chemical structure of the mTAG cofactor, a movie showing DNA combing from an evaporating droplet and detailed results from the single molecule and consensus alignments. See DOI: 10.1039/c0sc00277a |
This journal is © The Royal Society of Chemistry 2010 |