Panteleimon G.
Takis
*ab,
Beatriz
Jiménez
ab,
Caroline J.
Sands
ab,
Elena
Chekmeneva
ab and
Matthew R.
Lewis
ab
aSection of Bioanalytical Chemistry, Division of Systems Medicine, Department of Metabolism, Digestion and Reproduction, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK
bNational Phenome Centre, Department of Metabolism, Digestion and Reproduction, Imperial College London, Hammersmith Campus, IRDB Building, London, W12 0NN, UK. E-mail: p.takis@imperial.ac.uk
First published on 27th May 2020
One-dimensional (1D) proton-nuclear magnetic resonance (1H-NMR) spectroscopy is an established technique for measuring small molecules in a wide variety of complex biological sample types. It is demonstrably reproducible, easily automatable and consequently ideal for routine and large-scale application. However, samples containing proteins, lipids, polysaccharides and other macromolecules produce broad signals which overlap and convolute those from small molecules. NMR experiment types designed to suppress macromolecular signals during acquisition may be additionally performed, however these approaches add to the overall sample analysis time and cost, especially for large cohort studies, and fail to produce reliably quantitative data. Here, we propose an alternative way of computationally eliminating macromolecular signals, employing the mathematical differentiation of standard 1H-NMR spectra, producing small molecule-enhanced spectra with preserved quantitative capability and increased resolution. Our approach, presented in its simplest form, was implemented in a cheminformatic toolbox and successfully applied to more than 3000 samples of various biological matrices rich or potentially rich with macromolecules, offering an efficient alternative to on-instrument experimentation, facilitating NMR use in routine and large-scale applications.
These issues may be addressed by physically removing the macromolecules, for example, by ultra-centrifugation with filtering,11,12 but the time and cost required for sample processing, potential for introducing procedural variability, and negative impact to the integrity of the sample itself all undermine the key strengths of NMR as a high-throughput, intrinsically precise and non-destructive technique.1,2 Instead, the more practical and routinely applied approach is to suppress resonances from macromolecular-derived signals on-instrument. This is accomplished by performing an ancillary “spin-echo” experiment such as the Carr–Purcell–Meiboom–Gill (CPMG)13 pulse sequence which filters macromolecular signals via transverse relaxation times (T2), generating a 1D spectrum of slow relaxing signals, mainly belonging to SMs. The approach is sufficiently reproducible although imperfect in its suppression of broad resonances (Fig. 1A) given the time limit for large cohort studies (e.g. metabolomics) and unsuitable for direct absolute quantification, as the signal integral is modulated by the high variability of T2 values for each proton spin system from each SM.14 It is also time consuming, contributing substantially to the acquisition time required by standard profiling workflows (ESI Fig. S1†). The approach is therefore costly, especially at the scale required for the routine analysis of samples from epidemiology cohorts, food industry quality control, and other large-scale applications.
Fig. 1 SMolESY analytical reproducibility and performance in various matrices. (A) 1D-NOESY, CPMG and SMolESY spectra of albumin titration (0–225 mM). CPMG spectra exhibit ineffective suppression of albumin signals (light blue boxed areas), whereas SMolESY achieves their complete attenuation. Moreover, SMolESY maintains SMs' (herein impurities) fingerprint. 1D-NOESY, CPMG and SMolESY spectra of (B) bovine milk and (C) olive oil, focused on the fatty acids-lipids aliphatic groups 1H-NMR region. It is clearly shown that SMolESY supersedes the routine CPMG spectrum (light blue boxed areas), enhancing the resolution by effectively narrowing the broad NMR signals of the aliphatic chains and increasing resolution. In addition, SMolESY affords the direct quantification by integration of several SMs, which are easily detected/assigned compared to both 1D-NOESY and CPMG spectra, where spectral deconvolution is needed. (D) Integrals of five 1D-NOESY 1H-NMR signals from cytidine in the artificial mixture of metabolites in 9 concentrations were correlated with the SMolESY with a linear correlation (R2 > 0.985), passing through the origin (dashed circle), and the statistical one-way ANOVA tests (ESI Table S1†) confirmed all intercepts/slopes coincidence (horizontal/vertical error bars show ±1% integration error). Regardless of signals multiplicity (doublets with different j-coupling, multiplets, triplet), SMolESY shows intra-metabolites analytical reproducibility. (E–H) PCA of a urine dataset produces the same results for both 1D-NOESY and SMolESY, capturing similar cumulative variability, whereas loading plots point at the same variables for groups discrimination. |
As a more efficient and higher performance alternative, we have developed a novel computationally derived experiment, “SMolESY” (Small Molecule Enhancement SpectroscopY), which reliably increases resolution and depletes macromolecular signals directly from the 1H 1D-NMR spectrum with no intensity modulation. The approach relies on mathematical differentiation, previously used for improving the spectral resolution of various spectroscopic techniques (e.g. near-infrared, electron-spin resonance, and NMR).15–18 By calculating the first partial derivative of the imaginary data of the NMR spectrum (see paragraph Differentiation of imaginary spectral data – basic theory in the Experimental section and ESI Fig. S2†), SMolESY yields a profile of SMs free from large molecule signal baseline interference and sample-to-sample fluctuation. As the approach does not rely on T2 or j-coupling constant modulation,19 the inherent quantitative quality of the conventional 1H-NMR spectrum is preserved. Furthermore, the resolution of SMs derived signals is enhanced by as much as three-fold,20 enabling the annotation of otherwise overlapping signals and further facilitating their quantification. However, it is also commonly understood that derivatives are prone to instability when applied to signals of very low intensity, and therefore the practical effects of a reduced signal-to-noise ratio (s/n) required evaluation. Herein, we demonstrate that despite the lower s/n, for the case of SMs of biologically relevant complex mixtures, the signal's limit of detection (LOD) is not functionally affected. To our knowledge, the application of our approach (even in its simplest form of differentiation without combined with any traditional or modern signal denoising filters21) to biofluids or complex matrices of large cohort studies, with the view to suppressing signals of macromolecules across entire spectra in a systematic way, has never been reported or tested. Based upon our findings, the SMolESY experiment may be used to functionally replace and additionally improve upon several weaknesses of traditionally used spin-echo experiments, particularly in the NMR-based metabolomics field.
SMolESY was then applied on a third dataset consisting of publicly available 1D-NOESY spectra from normal human urine samples22 (see paragraph Plasma – urine spectra employed for the present study in the Experimental section). Urine's complex SM composition in the virtual absence of macromolecules was used to assess SMolESY's preservation of SM signal information. Principal Component Analysis (PCA) on both the 1D-NOESY and SMolESY urine spectral datasets produced score plots with the same pattern of sample groups with similar cumulative captured variability (85% and 82.7% respectively for the two first components) and loading plots with the same pattern of variable weightings (Fig. 1E–H). 3D score plots from the same analyses are described in ESI Fig. S4.† The result demonstrates that the multivariate information sets recovered from each spectral type are equivalent, providing support for SMolESY's use in classical metabolomics (pursuit of diagnostic and prognostic chemical patterns) and “fingerprinting” applications. Beyond the intended validation, the use case itself is of potential value as numerous pathological conditions can significantly increase urinary excretion of macromolecules such as albumin and lipids.23,24 Although metabolically interesting in their own right, the presence of such lipid/protein signals in urine samples can also confound any subsequent SM multivariate analyses and quantitation, since these signals would not be attenuated by NMR experiments routinely applied to urine samples or by pre-processing methods, for example, normalization.25 SMolESY therefore has an ability to salvage otherwise compromised spectra from specimen in sample sets where macromolecules would not be expected or planned for.
Fig. 2 SMolESY performance in more than 2000 plasma-heparin samples. (A–O) Mean spectrum of 2026 plasma-heparin 1D-NOESY (upper panel), CPMG (middle panel) and SMolESY (bottom panel) spectra zoomed at ∼0.5 ppm window from 0.55–8.7 ppm. The mean SMolESY spectrum is colored according to the Pearson coefficients from SMolESY versus CPMG signals correlation in 2026 spectra (ESI Fig. S1†). The majority of highly resolved SMolESY signals are linearly correlated to the CPMG and >99.5% of CPMG features of SMs are maintained, while successfully suppressing the broad signals of macromolecules in contrast to CPMG (examples of unsuppressed CPMG broad signals are highlighted by red dashed boxes). It is noted that broad signal of urea along with 3–4 broad signals of very low abundance (i.e. <1.5 times the CPMG noise) metabolites are recovered by SMolESY but are highly suppressed and exhibit low correlation to the CPMG (black dashed boxes in panel (L)). |
Both correlation and STOCSY results confirm the efficacy and fidelity of SMolESY, with more 1D-NOESY SM features maintained (>99%) than those visible by CPMG owing to the resolution enhancement. It is noteworthy that the resolution enhancement of SM peaks due to Δv1/2 narrowing is further improved by the complete removal of broad signals background.
Comparison between spectral bins of SMolESY and CPMG spectral bins indicated a strong linear correlation for all spiked metabolites (R2 > 0.98) (Fig. 4A–C and S7†), even in cases where resonances overlapped with broad macromolecular signals (e.g.Fig. 4B and C). Furthermore, the ease of quantification as well as immediate deconvolution of the SM signals by SMolESY is exemplified in a randomly selected plasma spectrum, where the immediate identification and integration of above 20 metabolites' signals at high resolution and without interference from broad signals or baseline distortions (Fig. 4D–W) is accomplished. The metabolite quantification by straightforward integration of SMolESY features (see paragraph SMolESY signals integration procedure in the Experimental section) was compared to outputs of standard 1D-NOESY peaks' deconvolution and fitting algorithms (Bruker Biospin, http://www.bruker.com, commercially available IVDr quantitation4 and in-house algorithms). SMolESY-based quantification results for the tested spiked metabolites follow a linear correlation with spiked concentrations, as well as with the measured values from deconvolved/fitted 1D-NOESY data (Fig. 5). In addition, the calculated relative root mean square error (RRMSE) values (ESI Fig. S8†) from the quantification of 12 spiked metabolites (Fig. 5), clearly demonstrate that direct SMolESY signal integration has the propensity to provide substantially less error in the absolute quantification than the deconvolution algorithm. For instance, 1D-NOESY signals employed for the quantification of L-isoleucine (Fig. 4E), L-valine (Fig. 4F) and acetone (Fig. 4L) via deconvolution, resonate on top/foothills of very broad signals (i.e. require baseline removal through fitting) and thus, owing to cumulative baseline fitting errors, exhibit higher RRMSE values than when quantified via direct integration of SMolESY signals (ESI Fig. S8†).
Fig. 5 SMolESY for absolute quantification. Absolute quantification was performed for 11 concentrations of several spiked metabolites: (A) acetone, (B) L-isoleucine, (C) L-glutamine, (D) citric acid, (E) L-valine, (F) lactic acid, (G) acetic acid, (H) L-threonine, (I) formic acid, (J) ethanol, (K) glycerol and (L) L-phenylalanine in a plasma matrix by SMolESY (i.e. by direct integration of the transformed signals of each metabolite) and 1D-NOESY by deconvolution/fitting algorithms (herein by the commercially available IVDr platform from Bruker Biospin)4 and plotted against the spiked concentration values. Linear regression analyses clearly show the applicability of SMolESY for absolute quantification (R2 > 0.97), and all calculated concentrations based upon SMolESY data are in reasonable agreement with the deconvolution results. It should be noted that no calibration was applied to the SMolESY integrals so as to account for e.g. T1 relaxation times differences between 1H spin systems from different chemical groups28etc., whereas these refinements are implemented into the IVDr platform of Bruker Biospin. Hence, some slight discrepancies can be observed between SMolESY and the Bi-QUANT-PS™ values due to this refinement. The calculation of absolute concentration values is based upon the ERETIC signal (and its transformation) produced during the acquisition of 1D-NOESY data by Bruker. It is noteworthy that the instant quantification via integration has no computational cost and the deconvolution/fitting algorithms are prone to higher errors (see vertical error bars and calculated relative root mean square error (RRMSE) values ESI Fig. S8†) compared to the integration process (±∼1%) of the already deconvoluted signals in the “clean” baseline of SMolESY spectra. For the IVDr data, plotted error bars are taken from the Δ values produced by the corresponding reports. |
To facilitate the implementation of SMolESY in both targeted (direct metabolite signal integration), untargeted (profiling/fingerprinting) and quantitative NMR (qNMR) applications, we created a cheminformatic toolbox, “SMolESY_platform”, for producing and processing SMolESY data from raw NMR spectra (ESI Fig. S9,† see paragraph “SMolESY_platform” toolbox details in the Experimental section). It is freely available for download at: https://github.com/pantakis/SMolESY_platform.§
The compromising effect that common macromolecules (proteins, lipids and polysaccharides) exhibit on individual quantitative SM measurements and on the broader SM profile has yet to be adequately addressed. Consequently, modern standard protocols for biofluid, cell extracts,29 food30 and other rich in macromolecules complex mixtures analysis rely on a sequence of experiments, each of which is individually flawed in application to the most common of biofluids (e.g. blood products). Whereas 1D-NOESY is ineffective at detecting SMs contribution from MVA, CPMG cannot be used for accurate and reliable quantification. Here we demonstrate that the computational transformation of the standard 1D 1H-NMR experiment yields both high fidelity spectral SM profiles and data from which quantitative chemical measurements can be extracted.
Systematic evaluation of SMolESY clearly demonstrates its ability to cleanly suppress macromolecular signals in synthetic test cases (albumin titration), common agricultural products (milk and oil), and human plasma. In all cases, the suppression of macromolecular signals resulted in the enhancement of SM-derived information from the SMolESY's ability to reproduce the SM-derived information captured by the 1D 1H-NMR with high fidelity, ensuring the transformation is not detrimental to SM signals. SMolESY implementation both on a large cohort of more than 3000 individuals' plasma samples and >100 urine samples showed an outstanding reproducibility with virtually no loss of metabolic information. Although the approach does risk decreasing the s/n of very broad signals such as those from highly exchangeable and/or interacting protons of small molecules (e.g. urea), generally such signals are of low fidelity in 1H-NMR analyses unless specific experiments26 or sample preparation procedures31 are employed. This risk can be further mitigated by applying smoothing algorithms such as traditional or advanced approaches for signal denoising from acoustics, radio astronomy etc.15,17,20,21 on the SMolESY data acquired by our toolbox. An example of a denoising filter application is provided in ESI Fig. S10.† However, such approaches are not suitable for automation and must be undertaken with care as they may introduce artifactual NMR signals.
Additionally, the method helps recover signals in crowded regions of the spectrum and areas where macromolecule signals appear. This, combined with the expected enhancement to spectral resolution when calculating signal derivatives, facilitates the chemical assignment of SMs by increasing the analytical specificity. Furthermore, the linear mathematical transformation preserves the quantitative aspects of the data, given the appropriate calibration and reference signal from reference compounds or electronically produced by the PULCON experiments.32 SMolESY can therefore be used directly for absolute quantification without the need for complex and computationally expensive deconvolution algorithms typically applied to 1D experiments and unlike spin-echo pulse sequence experiments altogether.
Importantly, these improvements can also be realized post hoc by retrospective application of SMolESY to existing 1D 1H-NMR raw spectra. This could be of major importance for NMR analysis of sample types with low physiological macromolecular content (e.g. urine) for which spin-echo experiments are not routinely acquired, yet which occasionally or in pathological conditions (e.g. albuminuria) can contain macromolecules. Moreover, SMolESY is also readily applicable to historical datasets increasing its value and making them comparable with new processed datasets. Its application only requires high resolution 1H-NMR data (>65k data points) input which is the established norm within modern high quality metabolomics and analytical studies.22,33
The approach may further enable higher throughput sample preparation procedures by precluding the removal of macromolecules from sample types where such preparation is routine practice (e.g. for the NMR study of various food matrices30,34). SMolESY is therefore of major significance in biomedical research, food industry, environmental sciences and indeed any other applications where 1H-NMR is applied to chemically complex samples with abundant macromolecules. The approach is particularly pertinent for large cohort studies where up to 30% acquisition time could be saved compared to the conventional NMR-metabolomics pipeline (ESI Fig. S1†). Since 1H-NMR is emerging as the dominant technique for large scale application to biofluid analysis (e.g. supporting molecular epidemiology and biobanking efforts) and increasingly used for routine quality control assessment of agricultural products, we believe the time and cost savings provided by SMolESY will support the future application of NMR in these contexts.
Assuming a Fourier transformed Lorentzian signal f(x) across a specific frequency region equals to:35–37
(1) |
(2) |
From eqn (2), it can be seen that signals with large Δv1/2 (e.g. broad signals of macromolecules) are highly suppressed to zero (Iδ ∼ 0), whereas sharp signals (i.e. small Δv1/2 values) are sharpened, thus enhancing spectral resolution.
The 1st numerical derivative of the real data from an NMR spectrum (after Fourier transform and phase correction) produces an antisymmetric signal (positive on one side and negative on the other) (ESI Fig. S2A†), whereas the 1st derivative of the imaginary data, due to its gradient (namely positive–negative maxima per signal) (ESI Fig. S2B†), produces a positive transformed signal which exhibits the same δ as the real data without applying any symmetrisation algorithms. The transformed signal from the imaginary spectral data exhibits no chemical shifting compared to the real spectrum (ESI Fig. S2C†) and is immediately employable for any NMR-based metabolomics or analytical study. Furthermore, as differentiation is a linear technique the amplitude of the transformed signal is directly proportional to the original, theoretically retaining its quantitative nature.15 The same signal (i.e. at the positive side of the baseline) could be produced by the 2nd derivative of the real data of the NMR spectrum multiplied by −1 or the 2nd power derivative,35 however, the signal-to-noise ratio is decreased (ESI Fig. S2D†) compared to the 1st derivative.
The spiked 17 metabolites in a human plasma sample along with their different concentrations are summarized in the ESI Table S2.† Ten different concentrations of each metabolite were spiked in a new plasma sample, so ∼17 × 10 ≈ 170 plus 17 non-spiked (in total 187) samples were prepared and their corresponding NMR spectra were acquired.
Solution 1H NMR spectra of all samples were acquired using a Bruker IVDr 600 MHz spectrometer (Bruker BioSpin) operating at 14.1 T and equipped with a 5 mm PATXI H/C/N with 2H-decoupling probe including a z-axis gradient coil, an automatic tuning-matching (ATM) and an automatic refrigerated sample changer (Sample-Jet). Temperature was regulated to 300 ± 0.1 K and 310 ± 0.1 K for urine and plasma samples, respectively.
For each blood sample, three NMR experiments were acquired in automation: a general profile 1H NMR water presaturation experiment using a one-dimensional pulse sequence where the mixing time of the 1D-NOESY experiment is used to introduce a second presaturation time, a spin echo edited experiment using the Carr–Purcell–Meiboom–Gill (CPMG) pulse sequence which filters out signals from fast T2 relaxing protons from molecules with slow rotational correlation times such as proteins and other macromolecules, and a 2D J-resolved experiment. Each experiment had a total acquisition time of approximately four minutes [32 scans were acquired for the 1D-NOESY (98304 data points, spectral width of 18029 Hz) and the 1D-CPMG (73728 data points, spectral width of 12019 Hz) experiments while two scans and 40 planes were acquired for the 2D J-resolved experiment].
For each urine sample two NMR experiments were acquired as previously published in ref. 33. Free induction decays of all 1D-spectra were multiplied by an exponential function equivalent to 0.3 Hz line-broadening before applying Fourier transform. All Fourier transformed spectra were automatically corrected for phase and baseline distortions and referenced to the TSP singlet at 0 ppm.
Urine 1D-NOESY 1H-NMR spectra were taken from a publicly available study (available at Metabolights, accession number: MTBLS694).
It is noted that an integration function is incorporated in the SMolESY_platform toolbox.
Footnotes |
† Electronic supplementary information (ESI) available: The experimental scheme of NMR based metabolomics pipeline for biofluids with macromolecular content (e.g. proteins, lipoproteins, lipids etc.) – SMolESY contribution; examples of enhanced spectral resolution by the imaginary NMR spectral part differentiation; validation of SMolESY intra-metabolites signals reproducibility; extra PCA analysis; SMolESY performance in 994 plasma-EDTA samples; mean spectrum of 994 plasma-EDTA samples spectra focusing on the 3.5–4.0 1H-NMR ppm region; SMolESY application and reproducibility validation to spectra binning; SMolESY errors evaluation for absolute quantification; an overview of the SMolESY_platform graphical user interface (GUI) toolbox; statistical analyses results for SMolESY intra-metabolites signals reproducibility tests; computer code for the calculation of the Pearson correlation values; Example of SMolESY signals denoising. See DOI: 10.1039/d0sc01421d |
‡ NMR spectra (raw data) of several spiked metabolites in real plasma matrices could be freely downloaded from the Metabolights database, under the study identifier: MTBLS715 (after the data curation period). |
§ Source code and compiled versions for Windows and MacOS of “SMolESY_platform” are freely available at: https://github.com/pantakis/SMolESY_platform. |
This journal is © The Royal Society of Chemistry 2020 |