Karin
Ortmayr
ab,
Verena
Charwat
c,
Cornelia
Kasper
c,
Stephan
Hann
b and
Gunda
Koellensperger
*a
aInstitute of Analytical Chemistry, University of Vienna, Faculty of Chemistry, Waehringer Strasse 38, 1090 Vienna, Austria. E-mail: gunda.koellensperger@univie.ac.at; Fax: +431 42779523; Tel: +43 664 6027752303
bDepartment of Chemistry, University of Natural Resources and Life Sciences (BOKU) Vienna, Muthgasse 18, 1190 Vienna, Austria
cDepartment of Biotechnology, University of Natural Resources and Life Sciences (BOKU) Vienna, Muthgasse 18, 1190 Vienna, Austria
First published on 30th September 2016
The p-value is the most prominent established metric for statistical significance in non-targeted metabolomics. However, its adequacy has repeatedly been the subject of discussion criticizing its uncertainty and its dependence on sample size and statistical power. These issues compromise non-targeted metabolomics in model systems, where studies typically investigate 5–10 samples per group. In this paper we propose a different approach for assessing the relevance of fold change (FC) data, where the FC is treated as a quantitative value and is validated by uncertainty budgeting. For the purpose of large-scale application in non-targeted metabolomics, we present a simplified approach for uncertainty propagation using experimental standard deviations of metabolite intensities as type A-summarized standard uncertainties. The resulting expanded FC uncertainty can be used to derive a minimum relevant FC as a complementary criterion in metabolomics data evaluation. This concept overcomes the need for a uniform p-value cut-off for all metabolites by considering the experimental uncertainty for each metabolite individually. The proposed procedure is part of analytical method validation, however the concept has not previously been applied to non-targeted metabolomics. A case study on mesenchymal stem cells cultured in normoxia and hypoxia demonstrates the practical value of this approach, in particular for studies with a small sample size. An online two-dimensional LC method coupled to mass spectrometry was crucial in providing both broad metabolome coverage and excellent experimental precision (<8% CV for peak areas, on average 0.5% CV for retention times) that was required for sensitive differential analysis as low as FC 1.1.
Non-targeted metabolomics entails the comprehensive analysis of the metabolome, and is typically undertaken using a differential approach aiming to elucidate global changes in biological systems in response to a specific perturbation. As such, non-targeted metabolomics is treated as a relative quantification technique, where the fold change (FC) serves as a measure for the relative change in a given metabolite's concentration in the different conditions under investigation. FC values are calculated relative to a given reference sample based on averaged raw signal intensities. While some typical analytical challenges associated with the absolute quantification of cellular metabolites are mitigated by the differential approach, care must be taken in the quantitative interpretation of these intensity ratios. Several pitfalls in the generation and interpretation of non-targeted metabolomics data remain, including aspects of experimental setup and control, analysis of biological samples and data evaluation.
Following measurement, differential MS-based metabolomics addresses quantitative relative fold changes between different experimental groups, using null-hypothesis significance testing (NHST) and the p-value as a metric for judgements about the significance of observations in a study. This (in addition to multivariate approaches13–15) is still the preferred approach within the community, in spite of a long-standing debate over the universal adequacy of NHST and the importance attributed to the p-value.16–20 Indeed, wide-spread misinterpretations of the p-value have been highlighted by many authors.16,18,20–23 Most recently, the uncertainty of the p-value and its dependence on statistical power has been discussed.23,24 With a low number of biological replicates and limited statistical power, the p-value is fraught with high uncertainty, leading to poor reproducibility of once significant findings, implying that p can be unreliable.23 Moreover, a uniform p-value cut off for all measured entities may not be ideal. It is exactly this situation which compromises non-targeted metabolomics in model systems as typical studies include a limited number of biological replicates per group, typically 5 to 10. Moreover, the effects under investigation are in many cases very minor (i.e. low FC).
The magnitude of the observed effect is the focus point of alternative approaches to NHST. The use of effect sizes has been encouraged by many,17,19,22,25 but was met with considerable resistance in the scientific community.16,23,26 However, NHST investigates only whether the observed differences have a trivial cause or reflect an effect caused by the respective treatment. In a recent commentary on this topic, Claridge-Chang and Assam promote the use of point estimates and confidence intervals19 to assess the magnitude and relevance of observed effects. Unlike the p-value (an abstract statistic) these point estimates use the same units and scale as measurement data and are therefore more intuitive and easier to interpret. In the light of the wide sample-to-sample variability of the p-value, the interpretation of effect sizes overcomes several concerns and paves the way for meta-analysis and cumulative knowledge generation.19,25
In this paper we propose a different approach for judging the relevance of FC-based findings. We treat the fold change as a quantitative value, and firstly assess the quantification task of untargeted metabolomics by uncertainty budgeting according to the official Guide to the Expression of Uncertainty in Measurement (GUM).27 Although most comprehensive, stringent uncertainty budgeting is complex and time-consuming when applied on a large scale. We therefore propose the use of the experimental standard deviation of metabolite intensities as type A-summarized standard uncertainties. The Kragten method28 for error propagation calculations allows the relatively simple handling of error propagation calculations even on a large scale, and is ideally suited for automatization and integration into non-targeted metabolomics data analysis workflows. The resulting expanded FC uncertainty allows the derivation of a minimum FC value to be considered relevant for each metabolite, given the observed data variation. The concept hence circumvents the selection criterion of a uniform p-value cut-off for all metabolites. The procedure is an integral part of analytical method validation as described in the Eurachem guideline “Fitness for Purpose of Analytical Methods”.29 Method validation strategies in the context of non-targeted metabolomics have been addressed30 alongside with a series of initiatives to establish common standards in metabolomics,31–36 but have so far not included uncertainty budgeting for “omics” methods and the concept of FC uncertainty.
The practical value of the proposed approach is demonstrated in a case study on human mesenchymal stem cells cultivated under normoxia and hypoxia. An online two-dimensional LC-MS method, used for the first time in MS-based non-targeted metabolomics, proved invaluable in providing both broad selectivity (i.e. metabolome coverage) and high repeatability precision, which was manifested as low FC uncertainty. In combination with efficient sample-level normalization via total protein content, the method thus meets several important requirements for the generation of high-quality non-targeted metabolomics data sets.
(1) |
The harvesting and metabolite extraction procedure for adherently growing cells was adapted from Dettmer et al., 2011.37 At the time point of sampling, the medium was aspirated and the cell layer washed three times with 1 mL of a PBS solution (4 °C). Finally, the cells were scraped in 2 mL ice-cold methanol (80% v/v methanol, 20% v/v water) with a cell scraper. The methanolic cell extracts were transferred to separate sample tubes and centrifuged at 4 °C and 20000g for 5 min. The pellet, containing precipitated cellular protein and cell debris, was used for the determination of total protein content using 2-D Quant Kit (GE Healthcare, Little Chalfont, UK). For analysis by RPLC, aliquots of 400 μL of the methanolic extracts were dried in a GeneVac EZ-2 vacuum concentrator (GeneVac, Suffolk, UK) and reconstituted in 50 μL of LC-MS grade water prior to injection.
(2) |
The sources of variation (and hence uncertainty) are manifold in metabolomics studies, arising from study design, sample collection and pre-treatment, sample storage, analytical measurement and even data processing (Fig. 1). Many of these critical aspects are assessed during the development and validation of methods suitable for metabolome analysis, where the minimization of their impact is the primary goal. Overall, only methods and procedures that fulfill important robustness criteria are suitable for application in non-targeted metabolomics as described here, and the following considerations are valid under the assumption that only such methods are employed. As such, we assume an ideal measurement according to the following criteria:
1. Chromatographic selectivity: resolution of interfering compounds and isobaric overlaps, separation from matrix compounds.
2. Retention time stability: max. 1% variation of retention times within a measurement series.
3. High-resolution mass spectrometry: >20000 mass resolution (FWHM), <5 ppm mass bias, mass axis stability.
4. Linear dynamic range: sufficient for the observed magnitude of changes in metabolite signals (typically 4–5 orders of magnitude).
State-of-the-art methods in non-targeted metabolomics routinely fulfill these criteria, so that the contribution of many factors shown in Fig. 1 can be omitted in uncertainty considerations based on the experimental conditions assumed here. However, additional factors intrinsic to the experimental layout contribute to uncertainty in the context of non-targeted metabolomics that do not originate from the measurement process. These aspects include the measured peak area (a), the use of an external scalar (s, e.g. the total protein content in the sample) for sample-specific normalization of peak areas, peak integration (i), recovery in metabolite extraction (r), and biological variability (b), and hence have to be considered in the estimation of the uncertainty of a fold change result, i.e. uncertainty budgeting.
Uncertainty budgeting is an essential part of analytical method validation and takes all sources of uncertainty within a given analytical workflow into account and expresses a total combined uncertainty for the final reported result value. Guidance on the evaluation of measurement uncertainty and the steps to be followed is available from the Joint Committee for Guides in Metrology (JCGM) in the Guide to the Expression of Uncertainty in Measurement (GUM).27 The initial steps include the definition of the quantity to be measured and a suitable model equation for its determination. Subsequently, all possible sources of uncertainty are identified and each input quantity is associated with the appropriate standard uncertainty. Finally, error propagation calculations based on the model equation are used to determine the combined uncertainty acting on the quantity to be measured, and the contributions of all input quantities are calculated. Applying an additional coverage factor, k, one can further state an expanded uncertainty with which the result is to be reported. A typical value for k is 2, and approximates a coverage probability of 95% for a single FC value under repeatability conditions.27
The standard uncertainties associated to each contributor included in the model equation are in practice derived from repeatability data (type A uncertainty), or reference data and known performance characteristics (type B uncertainty). In the context of metabolomics, appropriate repeatability data that allows the separate evaluation of a single contributor is in many instances difficult to obtain. Here, the standard uncertainties associated with extraction recovery (metabolite-specific), peak integration (manual or automated), total protein content determination and general biological variability are therefore estimated from typical observations and experience (Table 2). The model equation for FC calculation is formulated as follows, with the contributors i, r, and b considered as factors with a value of 1:
(3) |
Abbr. | Input quantity | Input type | Standard uncertainty | Distribution |
---|---|---|---|---|
a | Peak area | Measured data | — | Poisson |
i | Peak integration | Factor | 3% | Triangular |
r | Extraction recovery | Factor | 5% | Normal |
s | External scalar | Measured data | 5% | Normal |
b | Biological variability | Factor | 15% | Normal |
The final fold change value is then calculated as the average across n replicate observations:
(4) |
The Monte-Carlo simulation method was used for the calculation of the total combined uncertainty, as described in Appendix E.3 of the Eurachem guide Quantifying Uncertainty in Analytical Measurement (QUAM).39 This approach allows the propagation of uncertainties with an associated probability density function (PDF, e.g. normal, triangular or rectangular distribution), by random sampling of a single value for each input quantity from its PDF. When repeated for a large number of iterations (105–106), a set of simulated results for the quantity to be measured is obtained. From the frequency distribution of these results, the mean and standard deviation can be derived as estimates of the quantity to be measured and its total combined uncertainty. Notably, this process only takes random errors into account. Fig. 2 shows the uncertainty budget and simulation results for the amino acid arginine as a model compound and a fold change determined across two differently-treated sample groups with 5 biological replicates each. The mean of the output distribution (Fig. 2a) is an estimate for the fold change, and its standard deviation is the corresponding total combined uncertainty. The expanded uncertainty UFC is then derived using a coverage factor k = 2, as an approximation of the 95% confidence interval given a normal distribution of FC simulation results (Fig. 2a). As such, the FC for arginine can be reported as 1.20 ± 0.26 (21.7% relative expanded uncertainty).
Fig. 2 Uncertainty budgeting using Monte Carlo simulations for the estimation of the total combined uncertainty of a metabolite fold change (model metabolite arginine). The model equation is given by eqn (3) and (4), considering the input quantities given in Table 2. a. Histogram of Monte Carlo simulation results (100000 iterations), b. Pie chart with the relative contributions of the individual input quantities to the overall variance in simulation results. |
Although uncertainty budgeting as an exhaustive approach provides a comprehensive view on the final quantity result, it also becomes relatively complex and time-consuming when multiple calculations need to be performed. The latter is the case in non-targeted metabolomics, where individual fold change values need to be calculated for several hundreds of metabolites, which essentially requires automatization.
(5) |
Applying this approach to the above-described example of the amino acid arginine, an expanded uncertainty of 0.25 is calculated, and the fold change for arginine can be reported as 1.20 ± 0.25 (20.8% relative expanded uncertainty). The fact that this is in very good agreement with the expanded uncertainty estimated by uncertainty budgeting (see above) demonstrates the validity of the simplified uncertainty propagation approach based on type A-summarized standard uncertainties. Thus, the approach is ideally suited for large-scale implementation in non-targeted metabolomics, as the calculations are easily automatized using eqn (5) in a spreadsheet layout and can be readily implemented in standard data analysis pipelines.
The actual question at hand in statistical differential analysis in non-targeted metabolomics is to determine whether a given observed effect in the form of an FC value is significantly different from a FC of 1 (indicating no change). To this end, the expanded relative uncertainty UFC can be used to derive a minimum FC value (FCmin) that can be distinguished from FC = 1 within uncertainty. In other words, FCmin represents a minimum factor by which a metabolite's signal intensity must change from one group to another in order to be recognized as a relevant difference, given a certain degree of data variation. This value of FCmin can be determined from the relative expanded uncertainty using the following simple equation, where UFC is the expanded uncertainty relative to the FC:
(6) |
The resulting minimum relevant fold changes rapidly increase with higher observed standard uncertainty of the group-averaged metabolite intensities (Fig. 3), so that the commonly used fold change threshold of 2 only holds true when the within-group variation is below 20%.
The information obtained from determining FCmin is somewhat complementary to traditional significance testing such as t-test procedures, where the p-value expresses the probability that an effect with a t statistic of this magnitude or greater could have occurred, assuming that there is no difference in the two sample groups (null-hypothesis significance testing). In contrast to classical statistical differential analysis, our uncertainty propagation approach focuses on the size of the observed effect, and aims to support decision-making regarding its relevance. As such, FCmin is probably most useful as an additional parameter with a message complementary to p-value-based statistical significance decisions. The p-value alone, being an abstract statistical metric with a very specific meaning, does not provide information on whether a fold change of a given magnitude exceeds within-group data variation – an aspect that is especially important given the often high noise level in metabolomics data. Alternative parameters with a better reflection of data reality are therefore needed. To this end, FC uncertainty propagation and FCmin validate effect magnitude with respect to the observed data variation, and serve directly as a basis for determining the relevance of a given observation. Importantly, using FCmin as a threshold value is possible for each compound individually within minimal calculation times, thereby overcoming concerns associated with the use of global metabolome-wide cut-offs. Nevertheless, global FC cut-offs for a given data set can be defined using the average of observed within-group variations and serve as the basis for early decisions about the quality of a data set.
To date, the variability of fold changes in large data sets is estimated using bootstrap resampling.40 In this approach, a large number of samples (typically 103) are generated from replicate data points by random sampling with replacement. The fold change is then calculated from each of these bootstrap samples. The resulting population of fold change values can then be used to derive an estimate for the FC from the mean, and its error interval from the 95% confidence interval of the PDF underlying the population of bootstrapped FC values. When a large number of samples is available, this approach indeed provides a good coverage of the apparent variability in the data set. When the number of samples is small, however, this approach is unreliable, as the random sampling with replacement allows repetitions of the same value in the same bootstrap sample, which becomes more likely with decreasing sample size. Furthermore, this procedure samples from a discrete population of observations which, in combination with the repetition problem, leads to an underestimation of FC variability in small data sets. This is demonstrated in a direct comparison of 95% confidence intervals derived by bootstrap resampling and the expanded uncertainty as derived by the herein proposed uncertainty propagation method in Fig. S1 (ESI†). In contrast to bootstrap resampling, uncertainty propagation approximates the sample data by a continuous distribution described by the average metabolite intensities as the group means and the corresponding experimental standard uncertainties as standard deviations. This implies that a PDF is imposed on the sample data (here: normal distribution), which can be difficult to define for such small sample sizes. However, this procedure provides a better reflection of data reality and – at least for small samples – more meaningful error intervals.
Adipose-derived human mesenchymal stem cells show promise in tissue engineering for regenerative medicine applications,41,42 and extensive research is ongoing with respect to controlling culture conditions as similar as possible to actual physiological conditions. In this proof-of-concept study, adherent stem cell cultures were subjected to either normoxic or hypoxic culture condition (n = 5, each) for 48 h before sampling, and the cell extracts were analyzed by RP-PGC-TOFMS.38 In a previous study, this online two-dimensional LC method was shown to provide excellent retention time repeatability precision and broad coverage of both polar and non-polar metabolites without the use of ion pairing reagents (Table S2, ESI†). Due to the extended selectivity of this approach, less matrix interference and co-elution can be expected, and high repeatability precision for retention times and peak areas is routinely achieved. The samples were further characterized by several parameters related to cell culture (Table S3, ESI†), all of which reflect the tightly controlled experimental conditions. Overall, all methods in use have been characterized with respect to repeatability precision and fitness for purpose. A QC sample (standard-based, containing 90 intracellular metabolites) was analyzed repeatedly throughout the total measurement time of 16 h, and the observed peak areas were stable within on average less than 10% RSD, hence verifying the stability of the analytical system and appropriate sample storage. After feature finding, alignment and normalization to the total protein content per sample, the resulting positive mode MS data set comprised 248 compounds with a coefficient of variation (CV) below 15% (n = 5). The fold changes observed for these compounds were small, with maximum values of −1.49 for down-regulated, and +1.48 for up-regulated compounds. The detection of such relatively small changes in metabolite levels is in practice complicated by the often high variability in metabolomics data resulting from many sources of uncertainty (Fig. 1), and careful evaluation is necessary.
Fig. 4 gives an overview on typical data analysis strategies in non-targeted metabolomics, targeting either effect significance or effect relevance, and selected results from the case study. A moderated t-test revealed a significant outcome (pcorr < 0.05) for 31 compounds, although the underlying assumption of normally distributed data could not be verified in a prior test (Shapiro–Wilk test). In contrast, the Mann–Whitney U test, the non-parametric alternative that does not assume normal distribution, found no compounds with a significant difference between the two sample groups after correction for multiple hypotheses testing (Benjamini–Hochberg FDR, pcorr < 0.05). However, the power of statistical tests on this data set may be limited due to the small sample size. The aspect of effect relevance is best viewed from the perspective of the fold change as a point estimate, complemented by an error interval reflecting its uncertainty. The commonly used method of bootstrap resampling revealed comparably small error intervals for the FC values in this data set (±3–±16%). As discussed above, the bootstrap resampling process is of limited power for such small data sets, as it simulates an erroneously narrow population. This becomes apparent in a direct comparison of bootstrap resampling with the herein proposed approach of uncertainty propagation for the compounds in the case study data set (Fig. S1, ESI†). Uncertainty propagation revealed relative expanded uncertainties of between ±6% and ±42%. Accordingly, the compound-specific minimum relevant fold changes FCmin lie between 1.07 and 1.72. This threshold value is exceeded for a total of 16 compounds. Interestingly, 15 of these compounds were also found statistically significant in the moderated t-test with pcorr < 0.05.
Strikingly, a comparison between FCmin and the observed FC values for the 31 compounds identified to be statistically significant in the moderated t-test reveals that 16 out of the 31 compounds do not exceed the FCmin threshold (Table S4, ESI†). Thus, these should not be considered relevant findings, despite their statistical significance as judged by the p-value. This is not surprising, since the p-value was not designed to provide information on this aspect. Hence, this represents yet another argument on why the p-value alone is not a suitable metric to describe and judge findings in non-targeted metabolomics when sample sizes are small, as is often the case in studies on model systems. However, despite all its shortcomings the p-value can still be considered a very useful parameter when used in combination with valid test procedures (e.g. parametric vs. non-parametric tests). Rather than intending to dismiss statistical significance testing, we therefore suggest that uncertainty propagation is implemented as a complementary tool in data evaluation.
Whether the combined information described above indicates a significant and relevant difference in intracellular metabolic profiles of mesenchymal stem cells, induced specifically by the change in oxygen tension, remains to be elucidated by further experiments that were not within the scope of demonstrating the technical feasibility in this study. Despite this and other examples for well-controlled metabolomics data sets, we emphasize that thorough investigations of non-targeted metabolomics practice are needed to allow the high expectations for the generation of biological knowledge to be met reliably.
The quantification of FC uncertainty is not only helpful for the immediate interpretation of FC data, but also offers the necessary information for relevance decisions. FC uncertainty considerations are also relevant for guiding the choice and improvement of methods for non-targeted metabolomics studies by allowing the definition of a target uncertainty43 that the method needs to comply with in order to allow finding effects of the expected magnitude.
To our knowledge, it is the first time that error propagation and fold change uncertainty in non-targeted metabolomics are systematically assessed based on available official guidelines. The practical relevance for routine non-targeted metabolomics is demonstrated using the example of mesenchymal stem cells. The observed differences between the sample groups from normoxic and hypoxic culture were minor, and high-precision methods were necessary in order to obtain useful information. While technically feasible, it is ultimately up to the operator to decide whether small effect sizes are considered biologically relevant. However, due to the highly dynamic nature and quick adaptation of cellular metabolism, subtle changes can contain information of high biological relevance and might be overlooked when fold change uncertainty is not considered and optimized.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: 10.1039/c6an01342b |
This journal is © The Royal Society of Chemistry 2017 |