Cécile
Mingard
a,
Junzhou
Wu
ab,
Maureen
McKeague
*cd and
Shana J.
Sturla
*a
aDepartment of Health Sciences and Technology, ETH Zürich, Schmelzbergstrasse 9, 8092 Zürich, Switzerland. E-mail: sturlas@ethz.ch
bDepartment of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, 02139, Massachusetts, USA
cDepartment of Pharmacology and Therapeutics, McGill University, 3655 Prom. Sir William Osler, Montreal, Quebec H3G 1Y6, Canada. E-mail: maureen.mckeague@mcgill.ca
dDepartment of Chemistry, McGill University, 801 Sherbrooke Street West, Montreal, Quebec H3A 0B8, Canada
First published on 24th September 2020
Cellular DNA is constantly chemically altered by exogenous and endogenous agents. As all processes of life depend on the transmission of the genetic information, multiple biological processes exist to ensure genome integrity. Chemically damaged DNA has been linked to cancer and aging, therefore it is of great interest to map DNA damage formation and repair to elucidate the distribution of damage on a genome-wide scale. While the low abundance and inability to enzymatically amplify DNA damage are obstacles to genome-wide sequencing, new developments in the last few years have enabled high-resolution mapping of damaged bases. Recently, a number of DNA damage sequencing library construction strategies coupled to new data analysis pipelines allowed the mapping of specific DNA damage formation and repair at high and single nucleotide resolution. Strikingly, these advancements revealed that the distribution of DNA damage is heavily influenced by chromatin states and the binding of transcription factors. In the last seven years, these novel approaches have revealed new genomic maps of DNA damage distribution in a variety of organisms as generated by diverse chemical and physical DNA insults; oxidative stress, chemotherapeutic drugs, environmental pollutants, and sun exposure. Preferred sequences for damage formation and repair have been elucidated, thus making it possible to identify persistent weak spots in the genome as locations predicted to be vulnerable for mutation. As such, sequencing DNA damage will have an immense impact on our ability to elucidate mechanisms of disease initiation, and to evaluate and predict the efficacy of chemotherapeutic drugs.
Fig. 1 Overview of endogenous processes and exogenous exposures leading to DNA damage discussed in this review. |
Translesion DNA synthesis (TLS), involving specialized DNA polymerases that can bypass DNA damage, counters the cytotoxic effects of DNA replication stalling and acts in concert with DNA repair functions. In cancer therapy with DNA-binding agents, which target and stall replication, this process can contribute to drug resistance. In normal cells, TLS may be protective, but even if cytotoxicity may be avoided in the short term, DNA damage bypass can be highly mutagenic and contribute to cancer and other adverse outcomes in the long run. The biological and toxicological consequences of DNA damage, repair, and bypass depend fundamentally on not only their structure and abundance, but also their distribution in the genome, including the interplay of chemical modification and higher chromatin structures in gene expression and mutagenesis.
High-throughput sequencing has recently enabled the whole genome sequencing of numerous cancer genomes. From these large datasets, mutational signatures that describe characteristic imprints left by mutational processes, including DNA damage and repair, have been deciphered in cancer genomes.5 Because the cellular impacts of DNA damage are also the basis of the most common cancer therapy drugs, understanding the genomic distribution of DNA modification induced by anticancer drugs is a potential strategy improve the safety and efficacy of cancer therapy. While there are many techniques to study outcomes of DNA damage (i.e. mutation, cytotoxicity), there is a lag in methods available to map how DNA is initially modified, therefore limiting the ability to predict adverse or therapeutic outcomes on the basis of early measurable markers.
Defining the relationship between the distribution of chemical forms of DNA damage on a genome-wide scale with adverse or therapeutic biological outcomes is a tough nut to crack. Early models of DNA damage and mutagenesis were built around a simple direct relationship between damage formation and the acquisition of a mutation, but there is a complex interplay between genetic and epigenetic landscapes factoring into cancer evolution and progression. Indeed, cancer is driven by natural selection enabled by the evolution of mutations conferring a growth advantage.6 However, within the large mutational landscape, only some mutations are driver mutations that confer a selective growth advantage; whereas many of the other mutations are passenger mutations acquired by a cell with driver mutations.7,8
There is controversy concerning whether most mutations in cancer genomes arise from DNA replication errors or other intrinsic events (the bad luck hypothesis), which are hard to prevent9 or from extrinsic factors that, on the contrary, could be avoided.10 Indeed mutational signatures are at the core of extensive ongoing work to uncover the etiology of individual cancers, but strategies for tracking analogous precursor DNA damage signatures in the genome lag behind gene sequencing and epigenetics because of their inherent chemical complexity and variation.11
There are many well-established strategies for DNA damage quantification integrated over the whole genome as well as strategies for identifying the sequence specific locations of damage in isolated genes, but typically not both. For example, mass spectrometry12 and 32P-postlabelling13 allow high-sensitivity quantification of total DNA damage in biological samples, but do not provide sequence or location information. In contrast, ligation-mediated polymerase chain reaction (LM-PCR) is based on the principle that DNA polymerases cannot synthesize DNA past certain types of damage. Thus, LM-PCR can indicate the exact sequence and position of DNA damage on the basis of PCR termination sites; however, this method is not damage specific, meaning the chemical nature of the damage may be unknown. These strategies have various advantages, but do not allow one to relate the chemistry of damage formation with biological changes in particular genes or in the genome.
In addition to the conceptual challenge of examining the complex relationship between genome sequences, structure, regulation and potential patterns of DNA damage processes, there are two major technical challenges towards obtaining the necessary robust damage sequence data to evaluate these relationships. The first is that DNA damage events are rare on a genome-wide scale. Typically 0.1–100 endogenous DNA damage events occur per 106 nucleotides,14 and those arising from discreet interactions with particular chemicals can be of even lower magnitude such as one DNA adduct per 1011 nucleotides.15 The second challenge is that chemical damage is not typically read by DNA polymerases, and they may either stall or insert an incorrect base or combination of incorrect bases opposite the altered site. As result, the chemical identity of DNA damage is generally lost in the process of standard DNA sequencing. Nonetheless, there are several very recent examples discussed in this review of exciting and innovative approaches to address these technical challenges and yield the first insights on DNA damage distribution at the genome-wide level.
We provide here a comprehensive review of the progress in sequence-specific mapping of DNA damage. Emerging methods described in this review have addressed long-standing obstacles facing damage sequencing by including a combination of damage enrichment, damage specific recognition, and functional marking of the damage position with a sequencing-compatible adaptor (Fig. 2). A few reviews highlight specific DNA damage sequencing methods; however, no reviews exist that cover all classes of DNA damage and discuss the importance of the biological findings.16–20 For each major class of DNA damage, we first provide an overview of the occurrence and biological relevance. Next, we describe each of the novel strategies that have enabled successful DNA damage sequencing of these specific DNA damage classes (Table 1). Finally, we compare the opportunities and challenges for each of these methods, focusing on the early glimpses of biological insight enabled by each unique method. The rapid improvement and adoption of these approaches is expected to spur advances in the study and prevention of aging, cancer, and disease related to genomic instability.
DNA damage product | Source of DNA damage | Mapping methods | Source of DNAa | Ref. |
---|---|---|---|---|
a When not specified the source of DNA is from immortalized cultured cell lines. | ||||
8-OxodG | Oxidative stress | OxiDIP-seq | Mouse, human | 55 and 56 |
enTRAP-seq | Mouse | 57 | ||
OG-seq | Mouse | 58 | ||
Click-code seq | Yeast | 60 | ||
AP-seq | Human | 59 | ||
Platinated crosslinks | Cisplatin, oxaliplatin | (HS)-damage-seq | Human | 82, 89 and 98 |
XR-seq | Human, mouse (in vivo) | 82 and 89–91 | ||
Cisplatin-seq | Human | 83 | ||
CPDs | UV light | DDIP-seq | Human | 106 |
HS-damage-seq | Human | 98 | ||
(t)XR-seq | Human | 84, 98 and 110 | ||
CPD-seq | Yeast | 87 and 108 | ||
Excision-seq | Yeast | 107 | ||
6-4PPs | UV light | HS-damage-seq | Human | 98 |
XR-seq | Human | 98 | ||
Excision-seq | Yeast | 107 | ||
BPDE-dG | Benzo[a]pyrene | tXR-seq | Human | 110 |
Abasic sites | Product of DNA damage, spontaneous depurination | AP-seq | Human | 59 |
snAP-seq | Human, parasite (in vivo) | 72 | ||
Nick-seq | Bacteria | 183 | ||
Single-strand breaks | Product of oxidative damage, failure in DNA repair, topoisomerase activity, disintegration of sugar | SSB-Seq | Human | 187 |
SSingLE | Human | 188 | ||
GLOE-Seq | Human, yeast | 189 | ||
Double-strand breaks | From SSB or fail in DNA repair | BLESS | Mouse (in vivo) | 194 |
Break-seq | Yeast | 195 | ||
DSBcapture | Human | 196 | ||
END-seq | Mouse (in vivo) | 197 | ||
GUIDE-seq | Human | 198 | ||
BLISS | Human, mouse (in vivo) | 199 | ||
DSB-Seq | Human | 187 | ||
qDSB-Seq | Human, mouse (in vivo) | 200 | ||
Ribonucleotides | Enzymatic insertion | Ribose-seq | Yeast | 214 |
HydEN-seq | Yeast | 215 and 223 | ||
Pu-seq | Yeast | 216 and 224 | ||
emRiboSeq | Yeast | 217 and 218 | ||
Uracil | Enzymatic insertion, cytosine deamination | Excision-seq | Yeast, bacteria | 107 |
dU-seq | Human | 220 | ||
UPD-seq | Bacteria | 221 | ||
U-DNA-seq | Human | 222 |
Fig. 2 Summary of sequencing strategies used to map DNA damage formation or repair at the genome-wide level. |
Enzymes | Function | Genes | ||
---|---|---|---|---|
Human | S. cerevisiae | E. coli | ||
8-Oxo-dGTP diphosphohydrolase | Degrade 8-oxo-dGTP to the monophosphate | NUDT1 | PCD1 | mutT |
Previous: MTH1 | ||||
8-Oxoguanine DNA glycosylase | Excise 8-oxodG when opposite dC | OGG1 | OGG1 | mutM |
Adenine DNA glycosylase | Excise dA when opposite 8-oxodG | MUTYH | not present | mutY |
Aside from pro-mutagenic effects, 8-oxodG is also a source of toxicity when transcribed. 8-OxodG can significantly arrest transcription by direct structural interference of transcription components or the repair intermediate of 8-oxodG/OGG1.35 Furthermore, when 8-oxodG is located on the transcribed DNA strand, other consequences like erroneous bypass of the lesion by the transcribing RNA polymerase may occur. Such transcriptional mutagenesis often results in a specific C → A mutation in the RNA transcript and aberrant protein production,36 which may play a role in protein aggregation and the pathogenesis of neurodegenerative diseases, such as Alzheimer's and Parkinson's disease.37
When 8-oxodG is located at promoter regions, OGG1 is recruited and enhances the binding of several TFs, including hypoxia-inducible factor 1a (HIF-1α),38 signal transducer and activator of transcription 1 (STAT1),39 and nuclear factor kappa-light-chain-enhancer of activated B cells (NF-kB).40,41 Reduction in OGG1 expression in rat pulmonary arterial endothelial cells strongly reduced the binding of the TF HIF-1α to the vascular endothelial growth factor gene (VEGF) promoter and reduced VEGF expression.38 OGG1 both coactivates STAT1 and induces the transcriptional activation of pro-inflammatory mediators after lipopolysaccharide (LPS) stimulation.39 In addition, the binding of OGG1 to 8-oxodG in promoter regions enhanced NF-kB/RelA binding to cis-regulatory elements and facilitated the rapid recruitment of specificity protein 1 (Sp1), transcription initiation factor II-D and phosphorylated-RNA polymerase II (Pol II), resulting in prompt gene expression upon oxidative exposure.40,41 Thus, interactions between 8-oxodG, OGG1 and relevant TFs lead to the expression of oxidative stress-induced genes.
In addition to the interactions between OGG1 and TFs, 8-oxodG in gene promoter regions regulates transcription via G-quadruplex (G4) folding. Indeed, potential G-quadruplex sequences are widely distributed in the human genome, with high enrichment in gene promoters.42,43 The formation of 8-oxodG in G-rich sequences can either impede G4–protein interactions or stall repair proteins at G4 structures which further recruit TFs. For example, the VEGF promoter contains three G-rich Sp1 binding sites, which is critical for regulating mRNA synthesis.44,45 When 8-oxodG accumulates due to hypoxia, Sp1 binding decreases in these G-rich elements, resulting in the up-regulation of VEGF transcription.38,46 These observations suggest that G4 formation activates transcription when 8-oxodG is present. Recently, Burrows et al. reported that plasmids containing 8-oxodG in G4 promoter regions produced more target protein than the same plasmid without 8-oxodG.47 The data suggest that 8-oxodG in G-rich regions of the VEGF promoter were removed by OGG1, generating abasic sites (AP sites) and destabilizing the duplex structure. This loss of stability led to the formation of a new G4 structure with an abasic-site-containing loop, which facilitated the binding and stalling of APE1 to the AP site, further stimulating TF binding and activating transcription.47–50
The emerging role of 8-oxodG as a transcriptional regulator highlights its biological and health relevance beyond classic toxicity aspects of DNA damage. However, genome-wide associations of 8-oxodG with gene expression and further with pathological processes are not understood due to the lack of precise location information of 8-oxodG in the genome. Thus, extensive efforts have been made to locate 8-oxodG with several recent high-throughput sequencing strategies providing advanced tools to understand how 8-oxodG is distributed and can modulate gene expression on a genome-wide level.
The first genome-wide map of 8-oxodG was constructed nearly 14 years ago using an antibody enrichment strategy, resulting in a map of 8-oxodG in human metaphase chromosomes at a 1000 kb resolution, revealing its heterogeneous distribution in the genome.51 Specifically, immunofluorescence revealed that 8-oxodG was unevenly distributed and located primarily within regions with a high frequency of recombination and single nucleotide polymorphisms (SNPs) in cultured human lymphocytes.51 However, the relatively low resolution of optical microscopy limited the resolution of the 8-oxodG map. By Sanger sequencing, a map of 8-oxodG at a 100 base pair resolution was achieved in mouse renal cortical samples,52 allowing for 8-oxodG analysis at the gene-level. These data suggested that 8-oxodG is preferentially enriched in highly expressed genes, presenting the first clue for the potential impact of 8-oxodG on gene expression.52 However, due to the limited throughput of Sanger sequencing, the resulting map only revealed several hundred 8-oxodG sites in the mouse genome. More recently, two microarray analyses allowed for a higher throughput genome-wide mapping of 8-oxodG in kidney tissues from rats and mice (244000 probes for rat genome and 720000 probes for mouse genome).53 Both studies revealed that 8-oxodG was preferentially located in gene deserts, devoid of protein-coding genes, and correlated with lamina-associated domains.53,54
In the last 5 years, next-generation sequencing technologies have been combined with affinity enrichment strategies to achieve genome-wide high-throughput mapping of 8-oxodG. As one example, OxiDIP-seq used an 8-oxodG antibody for immunoprecipitation followed by high-throughput sequencing in human non-tumorigenic epithelial breast cells and mouse embryonic fibroblasts.55,56 The sequencing revealed that 8-oxodG sites accumulated in the transcribed regions of long genes and at DNA replication origins, overlapping with γH2AX ChIP-seq signals and double-strand breaks. Furthermore, a strong reduction of 8-oxodG was observed within promoter regions with high GC content in quiescent (G0) cells without DNA replication. As another example, an OGG1 K249Q mutant lacking glycosylase activity was used to trap a stable complex of OGG1 with the sequences containing 8-oxodG (enTRAP-seq).57 Following affinity precipitation and sequencing, enTRAP-seq revealed enrichment of 8-oxodG in transcriptionally active chromatin regions and regulatory elements such as promoters, 5′UTRs, and CpG islands in the mouse embryonic fibroblast genome. While 8-oxodG-specific binding proteins are useful tools for 8-oxodG enrichment and sequencing, further studies comparing the binding specificity of antibody clones and glycosylase mutants will help to understand apparently conflicting results.
Besides protein-based enrichment, two chemical enrichment methods have also been developed for high-throughput sequencing of 8-oxodG. The first approach was based on the selective oxidation of 8-oxodG to form an electrophilic intermediate that can be specifically recognized and labelled with amine-terminated biotin for affinity enrichment (OG-seq).58 In mouse embryonic fibroblast cells, 8-oxodG levels were elevated in promoters, 5′-UTRs, 3′-UTRs and G4 structures in comparison with the baseline random distribution throughout the genome. The second chemical-based enrichment approach was based on the reaction between an AP site released from 8-oxodG and an aldehyde reactive probe (AP-seq), enabling both the specific recognition and enrichment of 8-oxodG sequences.59 In HepG2 cells, a reduction of 8-oxodG was found in functional elements such as promoters, exons, TF binding sites, and termination sites in a seemingly GC content-dependent manner. However, AP-seq has been used to sequence other aldehyde-containing nucleotides, to be discussed in the abasic site section. Depending on the biological questions addressed, a potential drawback of these protein- and chemical-based enrichment methods is lack of nucleotide resolution, preventing determination of sequence-specific 8-oxodG occurrence and distribution at the resolution level, for example of mutational signatures.
A nucleotide resolution map of 8-oxodG is of interest to better understand sequence context effects of 8-oxodG formation and repair, and origins of mutational signatures. Recently, we reported a nucleotide resolution sequencing method, click-code-seq, to map 8-oxodG.60 In this approach, 8-oxodG sites are specifically recognized and removed by an 8-oxodG glycosylase, generating a gap with a free 3′-hydroxyl at the damage site. Next, a synthetic O-3′-propargyl modified nucleotide (prop-dGTP) is incorporated into the resulting gap by DNA polymerase, giving rise to a 3′-alkynyl-modified end. The 3′-alkynyl DNA is then ligated to a 5′-azido-modified code sequence via a copper(I)-catalyzed click reaction, resulting in triazole-linked DNA that can be amplified by DNA polymerases.61Via this process, 8-oxodG sites are stably labelled with a code sequence that serves as a tag for affinity enrichment, an adaptor for PCR amplification, and a sequencing-compatible marker of the damage locations.60
Using click-code-seq, a single-base resolution whole genome map of DNA oxidation was obtained for S. cerevisiae.60 On a genome level, the first G in a 5′-GG-3′ dimer was more frequently oxidized than in other contexts. By analyzing 8-oxodG within discrete genomic features, especially transcription start sites (TSS), transcription terminator sites, DNase I hypersensitive sites, and autonomously replicating sequences, less 8-oxodG could be observed relative to the average coverage over the entire genome. On the other hand, telomeres, nucleosomes, and positions of low RNA Pol II occupancy had higher 8-oxodG frequency. Meanwhile, nucleosomes with post-translational modifications that accelerate nucleosome unwrapping had less 8-oxodG compared to nucleosomes without these modifications. These data suggest that chromatin accessibility may shape 8-oxodG distribution, with an accumulation of 8-oxodG in regions of reduced chromatin accessibility where repair proteins cannot penetrate.
All of the genome-wide sequencing methods for 8-oxodG reported to date rely on damaged sample enrichment (Fig. 3). However, 8-oxodG can also be sequenced directly at nucleotide resolution without enrichment using third generation sequencing technologies, such as single-molecule real-time sequencing62 and nanopore sequencing.63–65
Fig. 3 Chemical reactions involved in strategies for genome-wide mapping for 8-oxodG or subsequently formed AP sites. |
Finally, a number of methods have the potential to detect 8-oxodG at nucleotide resolution, but were designed for one gene or one position, such as DNA hybridization probes containing a non-natural nucleoside specific for 8-oxodG,66 LM-PCR,67 third base pair based amplification,68 BER-mediated deletion mutation69 and Hoogsteen base pairing-mediated PCR-sequencing.70 These technologies are faster and cheaper than whole genome sequencing and may be used as diagnostic tools to detect 8-oxodG hotspots within the genome.
From the 10 currently available genomic maps of 8-oxodG in biological contexts ranging from yeast and rodents to cultured human cells, and with resolutions varying from thousands of kb to a single nucleotide, a consistently emerging observation is that there is a non-uniform genomic distribution of 8-oxodG. In particular, DNA oxidation depends on the heterogeneous structure of a chromosome, consisting of protein-bound regions, open regulatory regions, and actively transcribed genes. However, it is too early to make strong biological conclusions from these data due to the differing species, conditions, library preparation protocols, and processing. Further methodological improvements are needed to understand, eliminate, or correct for embedded biases, as well as to control for artefactual DNA oxidation during sample preparation, a notorious problem in DNA oxidation analysis. Potential biases may arise from the binding specificity of different antibody clones/glycosylases, reaction selectivity of chemical probes, adaptor ligation, and PCR amplification.71–73 Meanwhile, artefactual 8-oxodG may arise from genomic DNA extraction and DNA shearing, leading to false positive reads during sequencing.74 Additionally, further work is anticipated to improve data reliability and the sensitivity of 8-oxodG sequencing methods. In the future, systematic sequencing studies of DNA oxidation are expected with a complement of robust methods to reveal a genomic basis of cellular oxidative stress responses.
Following enrichment, both methods then take advantage of the fact that cisplatin–DNA damage stalls polymerases to mark the specific location of damage during PCR (Fig. 4). Specifically, a biotinylated primer is used in damage seq for amplification of the enriched DNA from human lymphocytes with the high-fidelity Q5 polymerase. Q5 polymerase stalls upon encountering Pt-DNA damage such that DNA synthesis termination sites mark the site of the damage. Next, different sized DNA fragments yielded from the biotinylated primer were purified using streptavidin beads. Finally, a second adapter is ligated, and the resulting DNA library is sequenced by next-generation sequencing. Alignment of the sequencing reads with a human reference genome then allows for the identification of damage sites. Cisplatin-seq follows a very similar protocol as damage-seq for DNA damage location site marking. Damage-seq and cisplatin-seq methods reported the first genome-wide maps of cisplatin damage distribution in the human genome at single nucleotide resolution.
In addition to formation of DNA damage, DNA repair is expected to have a major role in shaping DNA damage distribution in the genome. Therefore, Sancar and co-workers examined damage distribution with damage-seq, but also used another method called XR-seq to map NER repair events in order to relate damage formation and repair patterns on a genome-wide level.82 The XR-seq method was previously described to map UV damage, but was adapted to map NER repair of cisplatin (Fig. 4).84 Therefore, the methodological details of XR-seq will be discussed below in the UV section. XR-seq requires DNA damage reversion for proper strand amplification containing cisplatin damage. Reversion was achieved by using sodium cyanide which can remove platinated DNA damage. While XR-seq was already a known technique, the strength of this study resided in the establishment of the damage-seq method which involved mapping any cisplatin damage present in genomic DNA and not only repair events.82
An important strength of the damage-seq study82 was the coupling of damage formation data derived from damage-seq with damage-specific NER events derived from XR-seq. The coupling of these types of data permitted several key findings. First, sequence context analysis revealed a preference for cisplatin damage formation at G–G dinucleotides downstream of A, but in damage repair data the preference switched to a T upstream and a G downstream, meaning that the first was more prevalently formed but the latter was more resistant to repair. These sequence context findings are in conflict with previous biochemical studies testing DNA damage recognition for NER where the preference was for an A both up- and down- stream.85 Therefore, further studies are needed to determine whether the differences are due to the cellular context or due to biases introduced in the library preparation. Second, comparing the damage distribution maps over time as well as the XR-seq maps indicated that NER repair is the main driver in shaping the distribution of platinum-DNA damage. In particular, overall damage formation was fairly uniform in genomic regions with the exception of only a slightly higher damage abundance at the TSS and a slightly lower one at the TES. Interestingly, less damage was found on the transcribed strand (TS) which is consistent with a key repair process dictating the distribution of damage is TC-NER.86
Damage-seq and XR-seq were compared to nucleosome occupancy data for the same lymphocyte GM12878 cell line annotated in the ENCODE database. This analysis suggested how chromatin folding may impact damage distribution. A 5% reduction in damage formation was observed within the nucleosome center, whereas repair was substantially inhibited due to the inaccessibility of the nucleosome center. As such, the overall damage load was higher in the nucleosome center, aligned with observations made for UV-induced photodimers.87
There are several important unique findings worth noting from the cisplatin-seq study.83 The combined cisplatin damage-seq, XR-seq study mainly focused on chromosome 17 which contains TP53, whereas the cisplatin-seq study provided a more detailed investigation of all chromosomes and mitochondrial DNA. Interestingly, in comparison to the fairly uniform distribution of cisplatin damage load observed using damage-seq, results from cisplatin-seq differed up to 3-fold amongst chromosomes. In addition, mitochondrial DNA (mtDNA) carried the largest amount of cisplatin damage, likely because NER does not take place in the mitochondria.88 This finding was supported in later studies in mice (described below).89 Furthermore, short-duration cisplatin exposure led to less damage on the mtDNA light strand, which carries more genes than the heavy strand, suggesting protection or repair proficiency especially for the mtDNA light strand by an as yet undefined mechanism. Unlike damage-seq, which benefited from the previously annotated nucleosome occupancy, the cisplatin-seq study additionally performed ChIP-seq on HeLa cells to determine the influence of chromatin states on damage distribution. Here, an increase in cisplatin damage was observed to coincide with nucleosome signals, suggesting that there is preferred crosslinking of cisplatin on nucleosomes.
These last results83 contradict the damage-seq study82 and may be due to the lack of repair data. Specifically, the higher cisplatin damage load could be the result of a lack of repair rather than a preference for damage formation. Subsequent studies mapping NER events in mice support this possibility, having demonstrated a very rapid peak of transcription-coupled NER (TC-NER) activity 2 hours after cisplatin exposure.90 Given that cisplatin-seq was performed following cell exposure after 3–24 h, it is likely that TC-NER already took place. Finally, cisplatin-seq data was compared with ChIP-seq data, indicating that the occupancy of DNA binding proteins Pol II, EZH2, and CTCF coincide with cisplatin damage. The conclusion of this comparison is that there is an increase in DNA damage formation at sites where NER accessibility is reduced. As repair seems to play a major role in cisplatin crosslink distribution, further efforts should characterize the influence of genomic architecture on repair accessibility. Additionally, because XR-seq data represent a snapshot of repair at a given moment, these sequencing techniques should be applied in a time-recovery course to investigate how damage distribution changes over time. Finally, cisplatin doses investigated were substantially higher than therapeutic levels; therefore, future studies should address relevant doses.
Damage-seq and XR-seq have been applied to investigate the effect of cisplatin chronochemotherapy on genome-wide cisplatin damage distribution and repair across different organs in mice, the first time DNA damage mapping has been performed in vivo.89–91 In addition to addressing a basis for cisplatin resistance, a second motivation to map cisplatin damage concerns potential off-target effects, for example, in chronochemotherapy, to find the optimal time of the day where the drugs will most efficiently kill cancer cells while reducing toxicity in other organs. Thus, NER of cisplatin–DNA damage was characterized using XR-seq and applied to analyze mouse liver and kidney due to the common side effects of nephrotoxicity and hepatotoxicity.91 Damage in TSs was repaired up to 10-fold more efficiently than in non-transcribed strands (NTSs), an observation potentially explained by TC-NER being more active than global genome NER (GG-NER). Indeed TC-NER is active all the time because it depends on transcription, whereas a peak was observed at a specific time within the circadian rhythm (here it was Zeitgeber time ZT08) for GG-NER. This study showed when each gene strand will be repaired giving the first circadian DNA damage repair map in mice and is now being extended by the same group to obtain individual circadian map in different human tumor cell lines.
In a second study, this time mimicking clinical dosing (70 days), the same approach combining damage-seq and XR-seq was used, to characterize damage maps in mouse liver.90 Results indicated that up to 5 weeks were need to completely remove platinated DNA crosslinks from the mouse genome. Again, 90% of TSs were repaired after only 2 days, whereas damage persisted for NTS, which might have a detrimental effect on healthy cells by causing replication fork arrest, leading to cell death. Therefore, TC-NER should be considered as being the dominant form of cisplatin damage repair following drug administration, and therefore could be an important pathway for additional targeted therapeutic strategies.
Finally, the most recent study of the genome-wide distribution of cisplatin–DNA damage coupled XR-seq, damage-seq, and RNA-seq data for mouse kidney, liver, lung, and spleen.89 The study revealed that the rate of NER on the TS and NTS of active genes is positively correlated with gene expression. Specifically, repair in the TS and NTS increases with gene expression and plateaus in the TS among highly expressed genes. The data further suggest that cellular transcription stimulates the repair of damage in the NTS due to the fact that transcription is associated with an open chromatin conformation and increased accessibility to repair machinery. Interestingly, the spleen carried the least cisplatin damage, which could be explained by the fact that genes thought to be associated with cisplatin transport were downregulated (atp7B & Steap3). Finally, consistent with cell-based results, patterns of damage distribution appear to mainly be driven by repair activity.
The pathological role of DNA photodimerization was compellingly revealed through the characterization of human genetic deficiencies in XP proteins, contributing to the rare human disease xeroderma pigmentosum. It was later understood to be part of the NER machinery, which effectively removes these dimers from cells and protects against mutagenicity.97 Similar to cisplatin crosslinks, UV-induced DNA damage is removed by NER and therefore UV damage repair can also be investigated using mapping methods specific to NER repair events.98 If cellular repair is overwhelmed, there are deleterious implications to the cell including cell death95 due to the stalling of replicative polymerases during DNA synthesis.99 Finally, as specialized TLS polymerases bypass UV lesions,96,100,101 there is a characteristic mutation signature comprised mainly of C →T and CC → TT mutations102,103 that correspond to signatures found in skin cancer.5 To test whether damage distribution is predictive of mutations, there is a need for more insight into the genome wide location of UV damage and identification of regions which are recalcitrant to NER repair.
Of the four strategies reported to date addressing the genome-wide mapping distribution of UV damage at single-nucleotide resolution, excision-seq was the first to map UV damage at the genome wide level.107 Genomic yeast DNA was selectively digested by Ultraviolet Endonuclease Damage (UVDE) enzyme, which cleaves upstream CPD and 6-4PPs (Fig. 5). This enzymatic digestion releases short damaged dsDNA fragments that need to be repaired before amplification. Next, specificity for either CPD or 6-4PPs mapping was achieved by repairing the fragments with specific photolyases. Specifically, Vibrio cholera CPD photolyase or X. laevis 6-4 photolyase was used to repair the pyrimidine dimers into mono pyrimidines, thus allowing for end-repair of the damage of interest and enabling adapter ligation for NGS library preparation. DNA fragment ends read in the sequencing data thus correspond to the location of previous UV damage.
With a very similar approach to excision-seq, CPD-seq also achieved precise genome-wide mapping of CPD damage, while also adding a new dimension to previous information on the distribution of photodimers in the yeast genome by integrating insight on the impact of repair and chromatin structure.87 Genomic yeast DNA treated with UV was fragmented, end-repaired, dA-tailed, and ligated to adapters prior to digestion by T4 endonuclease V that specifically cleaves downstream CPD damage generating single-strand DNA breaks with AP sites at the 3′ end. Next, the APE1 enzyme was used to remove the AP sites, releasing 3′-OHs at the end of the short ssDNA breaks to allow ligation and sequencing (Fig. 5).108 CPD-seq specifically permitted mapping of UV CPD damage at single nucleotide resolution in yeast.
Application of excision-seq and CPD-seq to map CPD photodimers in UV-exposed yeast, despite the use of UV doses 100 times higher for excision-seq, revealed similar sequence-associated preferences for photodimerization. As expected, CPD dimers primarily occurred between two Ts. The next most prevalent CPD sequence pairings were T–C, C–T, and C–C. Excision-seq additionally mapped 6-4PPs, indicating T–C is the most abundant followed by, T–T, C–C, and C–T. These results confirm older chromatography data.109 A key benefit of the sequencing approach is the ability to examine the sequence context surrounding the dimer positions. Specifically, excision-seq data indicated a preference for an A downstream of 6-4PPs in yeast and this same preference was also observed in later experiments in human cells.107 As the downstream A preference was not observed in the CPD dataset, it was concluded that the UVDE enzymes did not introduce this sequence bias; however, it is possible that this is an artifact related to the sequence preference of X. laevis 6-4 photolyase. While both studies revealed similar sequence preference, excision-seq revealed a uniform distribution of UV damage (CPD and 6-4PPs) in the yeast genome, whereas CPD-seq indicated that CPD damage distribution was not uniform. These differences may be a result of either the methodology or repair.
One strength of the CPD-seq study is that NER repair, chromatin structure, and their influence on CPD damage distribution were measured. Notably, UV damage formation and repair were reduced at strong nucleosome positions. Furthermore, NER was inhibited at translational positions near the strongly positioned nucleosome dyad and CPD formation within the nucleosome was lower at inward rotational DNA and higher in outward rotational settings. The interpretation of these data is that inward-rotated DNA is protected from UV damage because of DNA bending and flexibility imposed by the nucleosome structure (i.e. due to the principle that two pyrimidines need to be close and correctly aligned to form a dimer). Interestingly, cells might use the inward setting to protect A-T rich regions, which are more prone to be damaged by UV. Another finding was that there was significantly less CPD at TF-binding sites, suggesting TFs may act as guardians of important DNA sequences. While more studies are required, the CPD-seq study suggests UV-induced damage distribution is strongly influenced by nucleosomes and TF-binding sites.
Both excision-seq and CPD-seq methods are effective at mapping UV damage, however the use of digestion enzymes brings potential liabilities with regards to damage specificity and the potential for introducing artifacts during library preparation. For instance, UVDE enzymes can cleave after other bulky DNA adducts, or there could be certain sequence contexts in which the excision/repair enzymes are more efficient. In both studies, results were presented from experiments involving high levels of UV exposure (10000 J m−2 for excision-seq and 125 J m−2 for CPD-seq) with no indication of a dose–response relationship or threshold for effective mapping. Additionally, a common limitation is that these methods may not be entirely specific and cannot be generally extended to every bulky DNA adduct because damage specific digestion/repair enzymes are required. Finally, these observations contrast subsequent findings in the human genome, highlighting that DNA damage distribution might be unique to certain species, or even to certain cancers.
The distribution of UV damage in the genome has also been characterized using more generally-applicable methods for bulky adducts: specifically XR-seq and HS-damage-seq (Fig. 5).84,98 Rather than measuring damage itself, XR-seq uses NER's unique characteristic of releasing excised 30-mer damaged fragments during repair. These small fragments are isolated from genomic DNA based on their low-molecular-weight and subjectivity to specific NER repair protein immunoprecipitation (TFIIHα). Fragments that are pulled down can then be subjected to a second damage-specific immunoprecipitation (in this case for CPD or 6-4PPs). Finally, either CPD or 6-4PPs photolyases are used to repair the damaged fragments to allow PCR amplification and sequencing. XR-seq was applied to map UV damage repair in human cells. However, given that XR-seq captures a map of NER repair of bulky adducts, it is a general approach and has also been used to map cisplatin and BPDE damage as well as NER events in various model organism including bacteria, plant, yeast, mouse, and human.110–115
HS-damage-seq can also be considered a general approach for mapping bulky DNA damage. In fact, HS-damage-seq is based on the previously published damage-seq method (used to map for cisplatin and oxaliplatin damage82) but includes an extra antibody enrichment for UV damage. As such, HS-damage-seq results in an increase in sensitivity needed to map more physiologically relevant exposure conditions and only requires 1 μg of input DNA. Briefly, HS-damage-seq requires initial immunoprecipitation of the damage of interest. This enrichment process is followed by a primer extension using the pull-down fragments as templates and a high-fidelity polymerase to perform DNA synthesis. The high-fidelity polymerase, like in cisplatin-seq and damage-seq described above, stalls at the site of the bulky damage leading to a termination site and production of a shorter DNA fragment attached to biotin. The resulting synthesized strands are purified by a biotin–streptavidin system and then undergo a subtractive hybridization step to further remove undamaged strands prior to amplification. As such, more than 95% of the reads generated through sequencing are specific to the damage of interest, also increasing sensitivity. To compare HS-damage-seq, a new cisplatin map was generated using only 1 μg of input DNA. Importantly, this new map compared favorably with the original cisplatin map generated using damage-seq,98 confirming that the new HS-damage seq method enabled more sensitive mapping of damage. HS-damage-seq and XR-seq are excellent methods to understand UV damage distribution dynamics and can be further combined to understand the basis of damage-induced cancer in human cells.
First, global genome repair of CPD was slightly higher around the TSSs of genes. While the reason is not clear, it may be due to the higher levels of CPD formation in this area. However, it is more likely that the higher CPD repair in the TSS regions is a direct consequence of the higher levels of the TF TFIIHα binding, which ultimately initiates global NER. Additionally, the correlation between 16 different TF-binding sites and damage occurrence was investigated. However, no general pattern relevant to all TF-binding sites emerged. Rather, the relevance of TF-binding sites in damage formation is dependent on the TF and type of damage. As one example, the BHLEH40 TF position correlates with higher 6-4PPs but lower cisplatin damage load.
This study also confirmed that CPD damage is mainly repaired by TC-NER, while 6-4PPs are mainly repaired by GG-NER.98 However, CPD distribution was the same regardless of chromatin states whereas the occurrence of 6-4PPs varied by chromatin state. In particular, 6-4PPs were most abundant in poised and active promoters as well as in repetitive regions, but had a lower frequency in heterochromatin. Therefore, similar to the findings of 8-oxodG and cisplatin, UV damage distribution appears to be the result of fairly uniform damage formation, but heterogeneous repair. In summary, the combination of XR-seq and HS-damage-seq helps to understand the importance of damage formation and damage repair in providing the overall map of damage and may be useful to more closely link damage patterns with mutagenesis and mutational signatures.
In the case of applying XR-seq to map BaP damage, there is one major challenge: no enzymes to reverse BPDE-dG damage. Therefore, XR-seq was modified by incorporating a translesion DNA synthesis step during PCR amplification using pol κ for N2-BPDE-dG and pol η for CPD, permitting bypass of the damage. This variation on XR-seq, termed tXR-seq expands its capacity to map damage repaired by NER. Nonetheless, there is still a limitation related to availability of an error-free TLS polymerase or combination of polymerases that bypass the damage of interest. After validating the tXR-seq method comparing the previous XR-seq CPD data with new tXR-seq CPD data, tXR-seq was applied to provide the first human repair events map of BPDE-dG DNA damage associated with a wide range of exposure in humans.
Damage formation data, for instance from a damage-seq experiment, would be essential to discriminate between these two processes. Furthermore, the sequence coverage in this experiment was insufficient to make any claims regarding the repair of relevant genes involved in cancer development, such as TP53 hotpots. However, in comparing the repair of BPDE-dG to previous repair data (i.e. UV damage), the authors observed that the rate of the NER machinery repair of BPDE-dG damage ranged between fast repair for 6-4PPs and slower repair for CPD. Furthermore, the data revealed that BPDE-dG damage was only slightly more prevalent on the NTS, suggesting a minor role of TC-NER. This contradicts mutational signature data, specifically signature 4 associated with tobacco exposure, which shows the G → T mutations exhibit strong transcriptional strand bias.130 This difference could be explained by a tissue type difference, given that lymphocytes, which were used in the tXR-seq study are not the most biologically relevant model for tobacco exposure. Regardless, the novel method tXR-seq is an improvement of the previous XR-seq which could not have been applied to map BPDE-dG DNA damage. The main limitation of tXR-seq is the requirement of identifying TLS polymerases which are known to be more efficient in bypassing BPDE-dG DNA damage in certain sequence contexts, therefore introducing a small bias during the one round PCR amplification. Next studies on benzo[a]pyrene damage distribution should aim at treating cells with meaningful doses of BaP or BPDE, use higher sequencing depths, improved bioinformatics and statistical analysis, and couple repair data to new damage formation datasets.
The primary repair of O6-MedG is through direct reversion by AGT. Biochemical assays,137 crystal structure data,138 cellular studies,139 and transgenic mouse studies140 all confirm that AGT repairs O6-MedG and other O6-alkylguanine DNA damage, and that its expression greatly reduces the incidence of mutations caused by exposure to methylating agents. On the other hand, in the case of TMZ therapy, AGT overexpression renders TMZ less effective. To mitigate this problem, the methylation status of the AGT gene promoter, MGMT, is used as a diagnostic strategy for stratifying treatment regimes, and its epigenetic silencing in tumor cells is associated with glioma sensitivity to TMZ.141,142 AGT removes alkyl groups located on the O6-position of guanine in one direct transfer step, regenerating the undamaged guanine residue. The current model of AGT-mediated repair involves cooperative binding of the AGT protein to the minor groove of DNA. AGT then scans the genomic DNA and flips the O6-MedG residue into its active site, permitting the transfer of the alkyl group and releasing the dealkylated DNA.143 Several biochemical studies have revealed that sequence context, including the opposing base, impacts the repair of O6-MedG by AGT.137,144–146 In contrast, recent work from Essigmann and coworkers showed no specific mutational patterns arising from AGT repair in mouse cells treated with the alkylation agent N-methyl-N-nitrosourea (MNU).147 However, there is little work on AGT accessibility to different chromatin states in mammalian cells. Thus, developing a mapping method for methylation formation and repair processes would help fill this gap.148
O 6-MedG is highly mutagenic as a result of mispairing during DNA replication.149 Replicative polymerases and TLS polymerases including Pol η, Pol κ and Pol ζ can bypass O6-MedG, causing misincorporation of T with up to a 10-fold misinsertion.150–153 The O6-MedG:T mismatch results in a G → A transition mutation upon the second round of synthesis. In addition to being mutagenic, O6-MedG is cytotoxic via a unique indirect process resulting from the recognition of the O6-MedG:T mispairing by the mismatch repair pathway.154–156 This mispairing gives rise to the mutational spectra mainly composed of G → A transitions.157 In particular, signature 11 is observed in malignant melanomas and glioblastoma multiforme treated with the alkylating agent temozolomide.5 Likewise, in vitro immortalized primary murine embryonic fibroblasts treated with the methylating agent N-methyl-N′-nitro-N-nitrosoguanidine (MNNG) showed signature 11.158 A key defining feature of signature 11 is the prevalence of these G → A transition mutations;159 and thus suggests that signature 11 is relevant to O6-MedG damage derived from exposure to methylating agents. However, there are no methods to map O6-MedG at a single nucleotide resolution. Their development would allow for the extrapolation of a damage spectrum that could directly be compared with mutational signature, highlighting the contribution of specific DNA damage into the final mutational load.
A nanoparticle-based-hybridization strategy permitted sensitive detection of O6-MedG in a sequence-specific manner.169 In this approach, elongated hydrophobic nucleobase analogues were designed to base pair to O6-MedG. As such, short oligonucleotides that contained the nucleobase analogue formed a more stable duplex with a complementary sequence that contained O6-MedG vs. G. To develop a biosensor from this system, the nucleobase-containing oligonucleotides were conjugated to gold nanoparticles. In this way, the gold nanoparticles served as a quantitative dose-responsive optical readout, where dispersed nanoparticles displayed a bright red color and aggregated nanoparticles resulted in a measurable blue color. In particular, the abundance of O6-MedG located at a mutational hot spot could be quantified within the human KRAS gene in mixtures with competing unmodified DNA. While many challenges remain before this approach can be implemented for biological studies, the approach is suited for massively parallel sequence probes for analysis of multiple genes, but not whole genome analysis. Moreover, high levels of input DNA are needed,171 thus, additional enrichment of samples would be needed. In spite of these hurdles, this proof-of-concept suggests hybridization-based assays that incorporate nucleotide modifications as a strategy for detecting DNA damage within defined genomic contexts.
A second chemistry-oriented approach to the selective amplification of O6-MedG in defined sequence contexts and a potential basis for mapping DNA alkylation damage involves the use of artificial nucleotides incorporated opposite the damage site by a DNA polymerase. In this way, DNA adducts can be marked by a non-natural nucleotide rather than by inserting a mismatched T which causes a loss of the damage identity. Furthermore, if a polymerase can efficiently incorporate the artificial nucleotide, exponential amplification of the marked damage may also be possible, thus enabling the detection of extremely low abundance DNA damage. Polymerase-mediated incorporation of synthetic nucleotides opposite a DNA adduct has been reported for AP sites,172 8-oxodG,173cis-platinated guanine,174 as well as for O6-alkyldG.170,175 The detection of damaged bases was achieved thanks to the development of new artificial nucleotides that were designed and tested for the base pairing and that are reviewed here.176 Recently, we have used this strategy to quantify and localize the related mutagenic O6-alkyl-G adduct O6-carboxymethyldG by amplification with artificial nucleotides.177
Three methods were recently developed to sequence AP sites; AP-seq,59 snAP-seq72 (Fig. 7) and Nick-seq,183 with the latter two having single-nucleotide resolution. AP-seq and snAP-seq rely on the reactivity of the aldehyde group exposed in the acyclic form of the 2′-deoxyribose ring to tag the AP site with biotin. Following tagging, the DNA is enriched using streptavidin, recovered, and prepared for NGS, where AP sites are called and thus mapped to the genome. Since the epigenetic base modifications 5-fC and 5-formyluracil (5-fU) contain reactive aldehydes, sometimes occurring at a higher abundance than AP sites, snAP-seq includes an additional step involving alkaline cleavage in which only the AP sites (i.e. not the formylated bases) lead to DNA strand scission. This selection increases the specificity of the capture.184 As such, the DNA fragments containing AP sites are released from the biotin pull down and are recovered for NGS, whereas DNA fragments containing 5-fC and 5-fU remain immobilized.72
More recently, Nick-seq was reported to use endonuclease IV (Endo IV), an AP endonuclease to create new strand breaks at AP sites after blocking pre-existing breaks. Nick-seq is a versatile method that could generally be used for any DNA damage that can be converted into a single-strand break. The strand breaks originating from AP sites were captured at the 3′- and 5′-ends using two complementary strategies: nick translation with α-thio-dNTPs for 5′-end sequencing and terminal transferase tailing for 3′-end sequencing. The co-location of AP sites at both ends increases the sensitivity and specificity of the resulting map. As such, these AP-specific sequencing methods provide efficient approaches for further exploring the genome-wide biological impacts of AP sites.
AP sites have been so far mapped in parasitic worms (Leishmania m.), bacteria and human cell lines (HepG2 and HeLa). In the snAP-seq study, AP site distribution at single nucleotide resolution in human cells competent or not for BER (APE1) was investigated. No specific locations where AP sites might accumulate were identified, so it was concluded that AP site accumulation is stochastic in a cell population. On the other hand, when data was binned to characterize genomic regions more broadly, APE1 was identified as especially proficient in repairing AP sites in regulatory and genic regions. This latter observation highlights the paradox that with single-nucleotide resolution it might be harder to make claims due to the heterogeneity present between each cell and that often, more informative information may be derived based on lower resolution analysis.
In the SSB-Seq study, SSBs were tagged with nucleotides attached to digoxigenin during a nick translation step with DNA polymerase I followed by an immunoprecipitation with an antibody anti-digoxigenin.187 Results suggested that SSBs induced by topoisomerase II were primarily located in the promoter regions of genes in human cell lines.
The SSiNGLe method involves tagging the 3′-OH terminus of a DNA strand, which represents the position of a SSB, by adding a poly A tail with a terminal transferase. Helicos Single Molecule Sequencing (SMS) and Illumina platforms (ILM) were used to map SSBs in genomic DNA. Extracted DNA is first fragmented by MNase leaving 3′ primer-phosphate ends which are not recognized by the terminal transferase. The polydA tail is then either captured on a flow cell harboring chain of dT oligonucleotides for SMS or amplified with oligo-dT primers before adapter ligation for Illumina sequencing. The distribution of SSBs was characterized in a variety of human and mouse cell lines and called the breakome. The patterns of distribution of SSBs changed across cell types as well as within the same cell type in response to anti-cancer drugs. Thus, the breakome of peripheral blood mononuclear cells from patients correlated with age, which shows the close association between SSBs and aging-related disease states.
Similarly, GLOE-Seq takes advantage of the presence of the 3′-OH terminus but introduced a new strategy to eliminate the polydA and the possible repetitive sequence limitation inherent to the Illumina platform. Thus, a ligation strategy on the 3′-OH terminus of SSB with a biotinylated adaptor followed by a biotin pull down was carried out. The strength of this method is its applicability to double-strand breaks, Okazaki fragments and to any type of DNA damage that that can be converted into a nick or a gap with a free 3′OH terminus. The distribution of SSBs was mainly located in the leading strand due to polymerase ε activity. The authors suggest that since polymerase ε incorporates ribonucleotides, the main cause of SSBs are repair intermediates of ribonucleotides misincorporations.
First developed in 2013, the BLESS (breaks labeling and enrichment on streptavidin and sequencing) method involves the ligation of DSBs to a biotinylated linker. Following streptavidin enrichment, an additional linker is added which allows PCR amplification and sequencing of DNA fragments containing the DSBs.194 Follow-up methods to this basic ligation and enrichment strategy include Break-seq,195 DSBCapture,196 END-seq,197 GUIDE-seq,198 BLISS,199 DSB-Seq,187 and qDSB-Seq.200 Notable improvements on the original method include use for mapping DSBs in mice in vivo (i.e. END-seq) and increased sensitivity (i.e. DSBCapture,196 END-seq,197 and BLISS199). For example, the authors impressively demonstrated that END-seq is sensitive enough to detect a single DSB within a sample of 10000 cells.
The most recently reported method, qDSB-Seq, uses site-specific endonucleases to create a DSB in a controlled manner, using this material for quantitative normalization of the subsequent analysis. As such, the key innovation of qDSB-Seq is combining nucleotide-resolution mapping for localization and simultaneous quantification of DSBs. The coupling of damage localization with direct quantification is a very new concept which might play a central role when investigating dose–response relationships for DNA damaging agents.
Strategies to map ribonucleotides at single-nucleotide resolution are based on specific alkaline cleavage or enzymatic cleavage of ribonucleotides, because of the highly reactive 2′-hydroxyl groups (Fig. 8). One strategy is to use Arabidopsis thaliana tRNA ligase (AtRNL) to ligate the 2′-phosphate termini of DNA derived from alkaline cleavage to the 5′-phosphate terminus of the same DNA strand to produce an ssDNA circle containing the embedded ribonucleotide (Ribose-seq).214 Sequencing results using Ribose-seq in yeast show a higher rNMP incorporation on the newly synthesized leading strand than lagging strand. This is consistent with results showing that leading strand DNA Pol ε shows lower fidelity to ribonucleotides than the lagging strand Pol δ.214 Other sequencing methods include HydEn-seq215 that involves ligation of an adaptor directly to the free 5′-OH end and Pu-seq216 that uses random hexamer primer extension to synthesize a flush end adjacent to the initial ribose. Another strategy, emRiboSeq,217,218 uses the RNase H2 to recognize and cleave the ribonucleotides site, generating 3′-OH and 5′-phosphate groups. All these four methods have been applied to budding yeast, including strains with mutant replication polymerases that introduce excess ribonucleotides into DNA.
To date, all applications of ribonucleotide sequencing have focused on leveraging the distribution of ribonucleotides as markers of replication enzymology. Error-prone synthesis by Pol α is retained in yeast and incorporates ribonucleotides into 1.5% of the mature genome. The different methods used all obtained similar results concerning replication polymerases, and it appears that these methods are well suited to address broader biologically relevant or toxicology studies. For example, there are several open questions regarding the impact of ribonucleotide misincorporation and the influence of chromatin state, transcriptional activity and local sequence context on ribonucleotides distribution. Furthermore, future studies may be useful for understanding the genomic connections between embedded ribonucleotides and diseases related to RNase H2 mutation.
Excision-seq was first used to map uracil in genomic DNA.107 Specifically, cleavage at the site of the uracil, using UDG and T4 endonuclease IV coupled to Excision-Seq, generated a single-nucleotide resolution map of uracil distribution. The authors observed that distribution of uracil in the genome is tightly correlated with replication timing and hypothesized that it arises from changes in nucleotide pool composition during replication. This observation could be important to relate uracil distribution to mutational signatures where a replication timing factor is also observed.
Surprisingly, since its report in 2014, excision-seq was no longer used, but lower resolution methods were developed such as dU-seq,220 UPD-seq221and U-DNA-seq.222 Alternatively a gap-ligation approach developed by Burrows and et al. could in theory be used to map any lesions repaired by BER but has not yet been used for whole genomes.69
dU-seq uses UDG to remove uracil bases and replace them with biotinylated nucleotides to yield pull-down fragments for sequencing. The authors found that uracil content was high at centromeres in human genome. UPD-seq also uses UDG to remove uracil and then tag the abasic sites, forming a very easy disulfide link and a biotin-containing chemical (ssARP). With the UPD-seq strategy, the so-called uracilome was defined in a bacterial strain active in human apolipoprotein B mRNA-editing enzyme catalytic subunit (APOBEC), allowing for correlation of the mutational footprint left by cytosine deamination enzymes.
Finally U-DNA-seq involves the use of a uracil sensor (a mutant of human protein UNG2 one of the BER glycosylase of uracil) to locate uracil damage followed by immunoprecipitation and purification before sequencing of the uracil-enriched DNA. Furthermore, uracil distribution in human tumors cells upon chemotherapeutic treatment with raltitrexed and 5-fluoro-2′-deoxyuridine moves from heterochromatin regions to euchromatin (active transcriptional regions).
In the future, DNA damage maps are expected to be an important tool in the sequencing arsenal for studying mutagenesis, carcinogenesis, aging processes, and responsiveness to DNA damaging drugs. However, sequencing DNA damage in a reliable and robust manner still requires significant work. Notably, most of the methods discussed here rely often on a specificity step and therefore cannot be extended to different forms of DNA damage. This specificity is required because damage products are present at such low frequencies and DNA samples need to be enriched for analysis. For 8-oxodG, platinated DNA, UV damage, benzo[a]pyrene DNA damage and uracil, enrichment has been achieved using antibodies. Apart from such antibodies, which could be devised de novo, albeit with significant time and expense, other enrichment strategies such as specific excision/cleavage or chemical conjugation are exclusively applicable to particular forms of damage that they naturally target. Thus, specific direct excision of the damage or cleavage next to the damage site to insert a probe/adapter have also been possible for 8-oxodG and UV damage, in addition to ribonucleotides and abasic sites. Finally damage-specific chemical conjugation has been used to enrich 8-oxodG and abasic sites.
Enriched samples of damaged DNA fragments, obtained i.e. by antibody pull-down, could be sequenced directly after damage removal, or by using TLS polymerases to bypass the DNA damage. The sequencing resolution of this general approach depends on the basis of fragmentation and resulting size of the fragments sequenced. In the case of tXR-seq and XR-seq, the fragments were released by NER enzymes, and they were short (20 mer). Thus, a single damaged nucleotide position could be identified because the damage was present exactly in the middle of the resulting oligonucleotides. With OxiDIP-seq and enTRAPseq, however, the resolution depended on the size of the sonicated or digested DNA fragments, around several hundred base-pair resolution. A significant advance was introduced by the cisplatin-seq, damage-seq and HS-damage-seq methods in order to obtain single-nucleotide resolution data, namely a high-fidelity polymerase that stalls at the site of DNA damage during a single PCR round. While this high-fidelity polymerase strategy was effective for large adducts including platinated crosslinks, UV damage and BPDE-dG which reliably block synthesis, such an approach is unlikely to work for smaller DNA damage such as 8-oxodG or O6-MedG. Third-generation sequencing technologies could be in this case potentially useful as (single-molecular real-time sequencing and nanopore sequencing) that do not require amplification and work on small epigenetic DNA modifications or DNA damage.64,225–231
The second strategy involving specific DNA strand cleavage or enzyme excision of the damage was also suited to map DNA damage independent of their size. This strategy was based on the recognition of DNA damage by nature's own recognition systems, such as Fpg for 8-oxodG, UVDE for UV damage, T4 endonuclease V for CPD damage, RNase H2 for ribonucleotides, or UDG for uracil. These methods created a nick or gap at the damage location, enabling the introduction of a sequencing adaptor/enrichment probe by ligation, as in excision-seq, CPD-seq, all ribonucleotide seq, dU-seq or UDP-seq, or by click chemistry, as in click-code-seq. One of the largest shortcomings of these strategies was false positive reads from background gap or nick sites generated during DNA sonication for instance. Thus, excision-seq, Pu-seq, and HydEN-seq corrected for this by fragmenting genomic DNA directly with cleavage treatment therefore removing the sonication step. CPD-seq, emRiboseq, ribose-seq, click-code-seq and nick-seq corrected the false positive signals by blocking the already present nick or gap with an adapter before excision/cleavage, and only then a second adapter/ddNTP was introduced for specific amplification. Finally, as in the case of antibodies, enzymatic cleavage enrichment methods also are potentially limited in regards to specificity. Namely, the excision enzymes might not be only specific to one substrate and therefore the mapping is the result of the enzyme substrate scope rather than a single DNA damage. Nonetheless, such approaches are valuable to understand the biological implication of an enzyme in a controlled experiment with controlled exposures that generate DNA damage.
A third key strategy for damage sequencing has involved direct chemical conjugation of DNA damage with affinity probes, such as OG-seq for 8-oxodG, AP-seq and snAP-seq for abasic sites. Damage-enriched DNA could be used directly for sequencing with ∼100 bp resolution (OG-seq, AP-seq) or fragmented at damage sites by alkaline treatment, in order mark the single-nucleotide position (snAP-seq). Chemical conjugation can also pose the problem of specificity as the probe might react with other aldehydes group present on DNA, such as 2-deoxyribose oxidation.
Despite the impressive obstacles that have been overcome with damage sequencing, there remain significant limitations. One of these is the possibility to determine both quantities and locations of DNA damage, which are critically needed to evaluate dose–response relationships to DNA-damaging agents. So far only qDSB-Seq was able to accurately quantify the number of DSBs simultaneously with their location by using calibrated samples that have a known amount of DSBs. Additionally, when working with large genomes such as human or mouse genomes, billions of reads need to be sequenced to draw robust biological conclusions in specific genes. Finally, bioinformatics pipelines have been developed for each of the strategies, integrating key aspects such as data normalization, however there are no standardized pipelines as for more common sequencing analysis such as for variant calling. Despite these aspects that are undergoing development, damage sequencing has already offered the opportunity to understand dynamics of damage formation and repair at the genome-wide level in a variety of organisms.
Given the difficulties of uncovering discreet chemical reactions on the scale of the genome, it is astounding that there are finally several possibilities to use sequencing methods to address chemically induced DNA damage distribution from the level of local sequence context, to the influence of transcription factors and higher chromatin structures. Amongst these data, almost all have illuminated that there is a heterogeneous distribution of DNA damage in the genome, resulting from combined dynamics of damage formation and repair activity both being influenced to varying extents by sequence context, DNA protein binding sites and nucleosome positioning. For instance, coupling of damage sequencing with nucleosome location data have revealed how the rotational setting of wrapped DNA around histones influence damage formation and repair. Additionally, DNA damage such as 8-oxodG in gene promoters have also been found to have a regulatory role in gene expression. As DNA damage distribution may be an early event dictating potentially adverse effects, anticipated future development of diverse DNA damage maps is expected to help understand and better predict etiologies of mutational landscapes in human cancer genomes and other biological processes driven by genome instability.
This journal is © The Royal Society of Chemistry 2020 |