Emma N.
Welbourne
a,
Royce J.
Copley
a,
Gareth R.
Owen
a,
Caroline A.
Evans
a,
Kesler
Isoko
ab,
Ken
Cook
c,
Joan
Cordiner
a,
Zoltán
Kis
ad,
Peyman Z.
Moghadam
b and
Mark J.
Dickman
*a
aSchool of Chemical, Materials and Biological Engineering, University of Sheffield, Sheffield, UK. E-mail: m.dickman@sheffield.ac.uk
bDepartment of Chemical Engineering, University College London, London, UK
cThermoFisher Scientific, Hemel Hempstead, UK
dDepartment of Chemical Engineering, Imperial College London, London, UK
First published on 30th January 2025
mRNA technology has significantly changed the timeline for developing and delivering a new vaccine from years to months, as demonstrated by the development and approval of two highly efficacious vaccines based on mRNA sequences encoding for a modified version of the SARS-CoV-2 spike protein. Analytical methods are required to characterise mRNA therapeutics and underpin manufacturing development. In this study, we have developed and utilised partial RNase digests of mRNA using RNase T1 and RNase U2 in conjunction with an automated, high throughput workflow for the rapid characterisation and direct sequence mapping of mRNA therapeutics. In conjunction with this, we have developed novel software engineered to optimise and streamline the visualisation and analysis of sequence mapping of mRNA using LC-MS/MS. We show that increased mRNA sequence coverage is obtained by combining multiple partial RNase T1 digests-44% and 37% individually, 64% together-or RNase T1 and U2 partial digests-73% and 52% individually, 88% combined. The developed software automates the process of combining digests, ensuring faster and more accurate analysis. Furthermore, the software provides additional information on sequence coverage by taking into account multiple overlapping oligoribonucleotide fragments to increase the confidence of the sequence mapping. Finally, the software enables powerful and accessible visualisation capabilities by generating spiral plots to quickly analyse the sequence maps in a single output from combined multiple partial RNase digests.
Analytical methods are required to characterise mRNA therapeutics and underpin manufacturing development. Validated analytical methods are required to support the relevant phase of clinical development, regulatory submission requirements or to support ongoing quality control of the approved product. There is currently significant demand for improved analytical methods to characterise RNA therapeutics.
Liquid chromatography interfaced with tandem mass spectrometry (LC-MS/MS) has emerged as a powerful tool for the analysis and characterisation of mRNA vaccines and therapeutics. Recently a number of alternative workflows have been developed based on RNase mass mapping. mRNA sequence mapping using site-specific ribonucleases (RNases) have been developed to characterise the identity, sequence and chemical modifications of mRNA manufactured using IVT.7–11 Digestion of the mRNA has been performed using RNases such as RNase T1 in conjunction with alternative RNase enzymes, including MazF and human RNase 4 for RNase mass mapping approaches.7,9,10 The characterisation of large mRNA therapeutics/vaccines using LC–MS/MS is technically challenging and has been hindered by a lack of robust analytical and computational tools. Typical high frequency RNases including RNase T1 and RNase A generate short oligoribonucleotide fragments that do not uniquely match the mRNA sequence. In addition, enzymes such as E. coli interferase MazF generate large fragments that are typically difficult to confidently identify based on their MS/MS spectra. Therefore, novel mRNA sequence mapping approaches have been developed using partial T1 digests,8 parallel digestions using alternative RNases7 and alternative RNases such as human RNase 4,9 to overcome the above limitations and obtain high sequence coverage of the mRNA.
We have previously developed and utilised direct sequence mapping of mRNA using partial RNase T1 digests in conjunction with ion-pair reversed phase high performance liquid chromatography (IP-RP HPLC) coupled to mass spectrometry analysis.8 mRNA oligoribonucleotide identifications were performed using automated data analysis software, BioPharma Finder (BPF), which is able to identify oligoribonucleotides based on their accurate mass in conjunction with the MS/MS fragmentation spectra and map the corresponding oligoribonucleotide sequences to the known RNA sequence. Data analysis reveals that there are large numbers of (often overlapping), oligoribonucleotide fragments generated by the partial T1 digest that correspond to different numbers of missed cleavages. Data visualisation and sequence mapping methods have not previously taken advantage of the presence of multiple overlapping fragments, which provide additional confidence in assigning total sequence coverage in mRNA sequence mapping. Furthermore, the presence of multiple overlapping fragments may provide further insight into potential secondary/tertiary structures or reflect the accessibility of RNase to the mRNA. However, it should be noted that the primary mRNA sequence also influences the pattern and number of fragments generated with respect to the predicted theoretical fragments. In addition, previous mapping methods have not encompassed an ability to directly compare or combine complementary digests, for the purpose of developing sequencing workflows, improving sequence coverage or probing potential mRNA structure. Finally, the current manual method for data visualisation of sequence maps is laborious, time-consuming and error-prone.
Therefore, in an approach to optimise and streamline mRNA sequence mapping, whilst utilising the large numbers of multiple overlapping fragments and complementary RNase digests, novel software visualisation tools were developed to directly utilise the oligonucleotide identifications from multiple LC-MS/MS outputs. In this study, we have utilised partial RNase digests of mRNA using RNase T1 and RNase U2 in conjunction with automated, high throughput workflows for the rapid characterisation and direct sequence mapping of mRNA to improve sequence coverage, sequencing confidence and identity testing. The ability to rapidly identify, characterise and sequence map large mRNA therapeutics with high sequence coverage provides important information for identity testing, sequence validation, and impurity analysis.
Partial RNase U2 digestion of CSP mRNA was performed using 4 μL of immobilised RNase U2 and 40 μg of mRNA with the total reaction volume made up to 50 μL with SMART Digest buffer. The reaction was incubated at 37 °C for 30 minutes and was terminated by the magnetic removal of the immobilised RNase.
Automated RNase digests were performed using an automated robotic liquid handling system (KingFisher Duo Prime system, Thermo Scientific) using BindIt™ software (version 4.0) to control the system. A 96-deepwell plate was set up with 50 μl of SMART Digest buffer containing mRNA sample in row A and RNase T1/U2 immobilised on magnetic beads within 50 μl of SMART Digest buffer in row G. The KingFisher was programmed to transfer RNase immobilised magnetic particles to Row A to digest the RNA at 37 °C for the allotted time. Sedimentation of beads was prevented by repeated insertion of the magnetic comb using the mixing speed setting “Fast”. Immediately after incubation, the magnetic beads were collected and removed from the reaction.
A Vanquish binary gradient UHPLC system (Thermo Fisher Scientific), using a DNAPac RP column (2.1 mm I.D. Thermo Fisher Scientific), was implemented for chromatography. Chromatograms were generated using UV detection at a wavelength of 260 nm.
The chromatographic analysis of RNase T1 and U2 digests was performed using the following conditions: buffer A 0.2% triethylamine (TEA) and 50 mM 1,1,1,3,3,3-hexafluoro-2-propanol (HFIP); buffer B 0.2% TEA, 50 mM HFIP, and 20% acetonitrile (ACN). RNA was analysed using a gradient that held at the starting percentage of buffer B for one minute, followed by a linear extension to the final percentage of buffer B. A flow rate of 0.2 mL min−1 and a temperature of 60 °C was used.
Chromatographically separated mRNA digests were interfaced with an Orbitrap Exploris 240 MS instrument (Thermo Fisher Scientific). Data was collected using data dependent acquisition in full scan negative mode with an MS1 resolution of 120000 and a normalised automatic gain control (AGC) target of 200%. MS1 ions were selected for higher energy collisional dissociation (HCD). MS2 resolution was set at 30
000 with an AGC target of 100%, isolation window of 3 m/z, scan range of 150–2000 m/z and normalised stepped collision energies.
Additional details of the LC-MS/MS methods are supplied in ESI Table ST2.†
For data processing and review additional filters were included: “Identification” = “does not contain nonspecific”, “does not contain nonunique”; “Mod” = “does not contain None”; “Nonunique Seq” = “≤1”; “Δppm” = “≤20”, “≥−20”; “Conf. Score” = “≥90”; “Best ASR” = “≤2.0”; “ID Type” = “contains MS2”; “Mono Mass Exp.” = “>0”. All oligonucleotide identifications from BPF are shown in the ESI (Tables ST3–6†).
One or two BPF sequence maps were loaded into the software in a png format, along with the total sequence length of the analysed mRNA construct. The information from the BPF images is extracted via image analysis. Analysis begins by reading and converting the BPF colour images into greyscale images using OpenCV. The images are then cropped to remove white margins, focusing on the relevant data sections. The software normalizes the images and converts them to a greyscale matrix, which is then processed to identify and sort fragment bars into their respective row positions.
A search algorithm detects fragment bars by examining pixel intensity patterns. Detected bars are indexed based on sequence position rather than pixel location. The colour and confidence level of each bar is extracted, with specific confidence values assigned according to predefined colour mappings. These values generate a comprehensive dataset, including the start, end, length of each fragment and its confidence level. This colour-coded information is automatically recorded, and cumulative confidence is calculated for each nucleotide in the mRNA sequence.
For a single BPF png input, a linear map and a single spiral plot are outputted. For a pair of BPF png images, a linear map is outputted alongside three spiral plots: one representing each of the individual images, and one representing their combination.
Corresponding mRNA sequence maps can be generated from such outputs. Fig. 1B shows the type of sequence map produced using our previous method. To generate a single map, each row of recorded sequences (from BPF) was transferred to a spreadsheet with conditional formatting. In each sequence location, where at least one oligoribonucleotide sequence (represented by one bar) is found in that position the cell is coloured green. Where no bar is found, the cell is recorded as red. This process is time-consuming and prone to error when transferring data across. Additionally, and crucially, note that the data recorded is in binary format, either bar(s) are found or they are not found. This can not easily be utilised in a high throughput fashion or where multiple data sets could be combined to generate a single sequence coverage map.
To address these issues, we have developed software tools that automate the extraction and visualisation of the oligoribonucleotide identifications from BPF software outputs, including the start, length and end of each fragment, to generate linear and spiral sequence maps. Moreover, where additional overlapping fragments are present we have used this information combined with the intensity-based confidence score provided with each oligoribonucleotide identification to generate a visual representation of the overall confidence in the sequence. Where multiple overlapping oligoribonucleotides have been identified within an mRNA sequence this is reflected in the mRNA sequence map. An example of one of our new linear sequence maps is shown in Fig. 1C. Through automation of this process, which manually can take up to an entire working day, our tool significantly reduces the analysis time to approximately 4–10 minutes per image (dependent on the number of identified oligoribonucleotides), enhancing the efficiency of RNA sequence mapping.
![]() | ||
Fig. 2 Generation of linear mRNA sequence maps using novel visualisation software. (A) mRNA sequence mapping image generated from LC-MS/MS analysis of a partial RNase T1 digest. Highlighted section shows the overlapping oligoribonucleotide fragments. The confidence colour coding used by BPF is shown in the legend. (B) Linear mRNA sequence map automatically generated from the image shown in Fig. 1A. The BPF confidence score based colour coding is translated into our linear map. A colourmetric representation of the overlapped fragments is shown beneath. Here, the intensity of the blue colour represents a combined confidence score, encompassing both the number of overlapping fragments and their individual confidences from the original BPF oligo identification. The same section of sequence highlighted in (A) is highlighted in (B). |
When comparing and combining multiple RNase digests of the same mRNA, different colours are used to distinguish these, as shown in Fig. 3. A third colour is implemented into the colourmetric representation to highlight the sections of the sequence where oligoribonucleotide fragments are mapped in both of the digests. In this case, grey is also used for nucleotides that are not covered by either of the digests.
A complementary software visualisation tool was created to generate advanced spiral plots that display sequence maps both continuously and concisely, improving ease of interpretation and visualisation, and enabling superior presentation of the mRNA sequence maps. Fig. 4 shows how the sequence mapping data is visualised. Firstly, as with the linear maps, colour is used to distinguish each individual LC-MS/MS data set and their combination. However, in this case, each individual nucleotide in the mRNA sequence is represented by a dot. The confidence of each fragment that maps onto a specific nucleotide in the sequence is summed up, across either individual or multiple data sets, with the resulting value being represented by the size and opacity of the dot. Additionally, the percentage coverage for the combination of the sequence maps is automatically calculated and displayed.
LC-MS/MS analysis of multiple partial RNase T1 digests of CSP mRNA (mRNA encoding SARS-CoV-2 Spike protein) was performed using varied digestion conditions (see Fig. 5). An application of this approach is demonstrated by the ability to rapidly combine data sets from multiple partial RNase T1 digestions into a single combined mRNA sequence map. To demonstrate proof of principle, two partial RNase T1 digests were performed under conditions that under-digested and over-digested a sample of CSP mRNA compared to an optimal partial RNase T1 digest (details in section 2.3). Following LC-MS/MS analysis (see Fig. 5A and B) it was evident that the digests produced a significant variation in oligoribonucleotide fragments from the chromatograms produced; the over-digested sample produced a spectrum of fragments of different sizes, leading to peaks across the range of retention time, whereas the under-digested sample shows a majority of fragments that elute towards the end of the gradient.
The linear sequence maps generated for the pair of partial RNase T1 digests are shown in Fig. 5C. The advantages of combining multiple partial RNase digests to enhance mRNA sequence coverage are clear, with unique oligoribonucleotide fragments mapped to the mRNA sequence identified in different partial RNase T1 digests. The results also demonstrate the increased confidence in coverage utilising multiple overlapping fragments generated from the complementary digests. Furthermore, the data visualisation tool enables simple identification of RNA fragments identified in each individual mRNA digest and the combined data sets.
In Fig. 5D we present the spiral plots produced for the partial RNase T1 digests of CSP mRNA. Despite the extra fragment information that is accessible from the linear maps, it is clear that the spiral plots offer a distinct advantage in data visualisation: they provide quick and easy access to an overview of the sequence coverage of individual and combined digests.
Both types of plot demonstrate that combining the different partial RNase digests significantly increases the overall sequence coverage and reduces regions of the mRNA where no oligonucleotides were identified. Individually the digests produce 44% and 37% sequence coverage, whereas in combination this is increased to 64%.
Further application of the combined partial RNase digests in conjunction with the sequence mapping visualisation tools was used for the analysis of chemically modified CSP mRNA. The combined partial RNase digests and sequence mapping of CSP mRNA containing N1-methylpseudouridine is shown in ESI Fig. SF2.† Increased sequence coverage from the combined sequence mapping was observed, consistent with previous analysis.
The results of LC-MS/MS analysis of partial RNase T1 and RNase U2 digestion of CSP mRNA are shown in ESI Fig. SF3A and B,† respectively. The differences in the chromatograms produced by these digests demonstrate the differences in the oligoribonucleotide fragments generated by the RNases. The results of the differences in unique oligoribonucleotide identifications are even more evident in the linear sequence maps shown in Fig. 6A. Furthermore, the combined RNase sequence maps show that both digests map large portions of the mRNA sequence. However, the results show that gaps in the sequence coverage generated from the individual digests have been filled in the combined plot. This has led to significantly greater sequence coverage, based on unique oligoribonucleotide identifications, compared with the individual coverages (73% and 52% versus 88%).
In Fig. 6B we show the spiral plots produced for the partial RNase T1 and U2 digests of CSP mRNA. Consistent with previous analysis, the spiral plots and automatic sequence coverage calculations further demonstrates the benefit of combining multiple digests. Individual sequence coverages of 73% and 52% were found for the T1 and U2 digests respectively, whilst their combined sequence coverage was 88%.
To achieve high sequence coverage of the mRNA based on only unique oligoribonucleotide identifications, novel data visualisation methods were developed that combine the sequence maps from multiple samples or combinations of multiple enzymatic digests. The advantages of combining multiple partial RNase digests to enhance mRNA sequence coverage was demonstrated in the analysis of combined partial RNase T1 digests (by altering the digest conditions) and combining partial RNase T1 and U2 digests.
Furthermore, the developed software provides additional detail by translating the information where multiple overlapping oligoribonucleotide fragments are identified to increase the confidence of the sequence mapping. The presence of multiple overlapping fragments may also provide insight into potential secondary/tertiary structure of mRNA, as well as RNase access or behaviour.
Finally, the software enables powerful visualization tools to generate spiral plots for enhanced characterisation of the mRNA sequence maps in a single output from combined multiple samples and multiple combined partial RNase digests.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d5an00033e |
This journal is © The Royal Society of Chemistry 2025 |