Michael
Thompson
*a and
Roger
Wood
b
aSchool of Biological and Chemical Sciences, Birkbeck University of London, Malet Street, London WC1E 7HX, UK. E-mail: M.Thompson@bbk.ac.uk
bFood Standards Agency, Aviation House, 125 Kingsway, London WC2B 6NH, UK
First published on 23rd October 2014
This paper examines precision statistics from collaborative trials (interlaboratory method performance studies) reported between 1990 and 2000. The principal focus is on the ‘Horwitz ratio’ (sR/sr), the ratio of the estimated standard deviations of reproducibility (sR) and repeatability (sr) found for individual analytical procedures. A predictable ratio would be a valuable tool in assessing uncertainty and detection limit. While the median ratio observed was close to 2.0, a significant variation with a strong positive skew was observed, much of which could be attributed to particular types of analyte, test material, and analytical procedure.
The magnitude of the ratio, both within and among trials, is an important feature to characterise if it can be predicted reasonably accurately. A broadly constant average ratio would be a useful quantity because it would enable analysts to form a rough estimate of sR from sr in instances where information from a collaborative trial was unavailable (a value of sr can be obtained during single laboratory validation). In turn, a good estimate of sR is a valuable benchmark that can help analysts to avoid unrealistically small estimates of uncertainty. It is therefore of considerable interest to examine the variability of the Horwitz ratio to see whether it could be used reliably in this context.
The value of a Horwitz ratio may be relevant also to describing the detection capability of an analytical procedure. A detection limit is in effect a small multiple k (2 < k < 4) of the standard deviation of results replicated at or close to zero concentration. But what are the appropriate conditions of replication for estimating this standard deviation? This is a debated issue as a variety of conceivably relevant options are available, in particular instrumental, repeatability, and reproducibility conditions.3 But as reproducibility standard deviation provides the best approximation to uncertainty, the appropriate conditions for characterising detection capability in routine analysis might best be a reproducibility standard deviation estimated at zero concentration.
That statistic would be difficult, often impossible, to obtain directly. In principle, however, it could be estimated by extrapolation to zero concentration of standard deviations estimated at higher concentrations. It is therefore is of interest to determine whether the Horwitz ratios sR/sr found in individual collaborative trials is maintained at a constant level down to zero concentration. As repeatability-based detection limits are readily obtained, that information would assist analysts in avoiding unduly low estimates in analytical procedures where no collaborative trial had been conducted. There has been speculation that sR/sr → 1 as c → 0, which would greatly simplify matters if found to be true, and that conjecture also needs investigation.
(Note: the ratio sR/sr is difficult to determine directly at zero concentration. An authentic test material containing effectively zero concentration of analyte would be nearly always unobtainable. Moreover, organisers of collaborative trials tend to avoid low concentrations of the analyte because of problems in the statistical handling of the results. These problems are an outcome of common data recording practices, namely (a) recording too few significant figures for an adequate statistical analysis and (b) censoring results falling below zero concentration. Sub-zero results have no corresponding physical realisation, of course, but are important in forming unbiased estimates of location and dispersion).
• the original Horwitz function; σH = 0.02c0.8495;
• the 1997 study; sR = 0.0166c0.824;
• the present investigation: sR = 0.039c0.8891.
The recent trend shows a significantly higher sR than either the original Horwitz function or the 1997 investigation over that part of the range, a trend visible in Fig. 1.
Below a mass fraction of 10−7 the trend of the precision statistics conforms closely to an underlying constant reproducibility relative standard deviation of 0.22 (Fig. 2), which is consistent with other findings, specifications for fitness for purpose, and constraints imposed by detection capability.6 The deviation of the lowess fit from constant relative standard deviation below 10−9.2 in Fig. 2 is of no consequence given the sparseness of the data.
Fig. 3 Boxplot of log10 Horwitz ratios. Boxes show ratios observed within individual trials, arranged in order of increasing mean ratio. |
The within-trial mean ratios are summarised in Fig. 4. Some degree of positive skew would be expected, inter alia because the ratio is bounded at 1.0 on the low side, as σR ≥ σr. The long positive tail on the observed means cannot (as might be thought) be attributed to random deviations. Large simulations, from a model with 11 laboratories, a true ratio of σR/σr = 2 (bearing in mind that the two standard deviations are not independent), and the random normal assumption of measurement variation, show that the dispersion of sR/sr is indeed long tailed on the positive side. However, the distribution of the mean ratios (rather than individual values of the ratio) in median-sized trials, that is with six test materials, was only slightly skewed. Random variation therefore does not account for the observed dispersion of trial means.
Another worthwhile observation is that the value of the mean ratio within-trial shows no apparent dependence on the mean mass fraction of the analyte (Fig. 5).
Fig. 5 Plot of mean ratios found within trials vs. the mean mass fraction of the analyte. Each point represents a single trial. |
The outcomes individually were not of great information content because of the (statistically) small number of test materials within each trial. Taken together though, some worthwhile conclusions can be drawn. In the majority of instances (44/50 trials) the slope of the regression line was not significantly different from zero at 95% confidence, that is, there was no suggestion that the Horwitz ratio was linearly dependent on the concentration of the analyte. This outcome hardly differs from an overall null hypothesis (i.e., that there is never a variation with concentration), under which assumption we would on average expect between 47 and 48 instances from 50 trials to be non-significant. This is shown by the near-uniform distribution of the p-values derived from the slope coefficient (Fig. 7). The intercepts show a different pattern, with 16 instances (32%) significantly different from 2.0 (Note: strictly speaking, in instances like these, where there tends to be a strong correlation between the estimated regression coefficients, we should consider their joint confidence region rather than the individual confidence limits. Fig. 8 shows the example previously-used in Fig. 6, where the null hypotheses (β = 0, α = 2) fall within the individual 95% confidence limits of the estimates (b, a) but outside the 95% joint confidence region. This refinement would not affect the broad conclusions in this study).
Fig. 8 Strongly correlated regression coefficients (cross) showing the null hypotheses H0: β = 0, α = 2 (solid circle), which in this instance falls outside the joint 95% confidence boundary (ellipse) of the coefficients but inside their individual 95% confidence limits (dashed lines). Same data as Fig. 6. |
Among the 50 trials included in this part of the study, there was no suggestion of the ratios changing radically at concentrations approaching zero.
Fig. 9 Boxplot of mean Horwitz ratios from all 95 trials, classified by the type of analyte. The width of the boxes is proportional to the number of collaborative trials in each class. |
It is interesting to consider the origin of these anomalously high ratios, which could result from either exceptionally low sr or unusually high sR. The former would be expected of a simple procedure involving say only a few high-precision measurements and no complex chemistry or skilful manipulations (loss on drying would be an example). The latter would be expected when environmental or organisational factors affecting the results might differ substantially between laboratories. In the present study we see both circumstances in play, as shown below in the following average values of the standard deviations for the two anomalous subsets relative to those of the other types.
Analyte type | s r | s R |
---|---|---|
Oil/fat | 0.45 | 1.16 |
Element | 0.78 | 1.44 |
Both analyte types show a lower-than-typical sr, and a higher-than-typical sR. In the former instance, as sR is only slightly above average, no great problem exists for the quality of analysis, despite the high Horwitz ratio. For trace elements, however, the elevated value of sR suggests that an investigation of the causes might lead to a useful improvement of quality in that area. In this instance the high values of sR have been found often to be caused by contamination or variable recovery, while the within-laboratory variation is small because the procedures are largely instrumental. However, the reverse effect is noted when considering methods of analysis for crude fibre. Here the procedure is usually very manipulation-dependent so the within-laboratory results tend not to be very consistent. If the value of sr is equal to sR, as can occasionally happen by chance, then the ratio is set conventionally to unity. This is most likely to occur when very manipulation-dependent procedures are being considered.
This particular classification by analyte type correlates strongly with classification based on test material type and type of analytical method, as the fatty acids were determined exclusively in oily test materials by gas chromatography, but the individual elements were determined largely by atomic spectrometric methods after destruction of the organic matrix. In short, there is no further information to be gained by alternative classifications of the ratios by matrix or physical principles of analysis.
• In the great majority of procedures the ratio does not change significantly with the concentration of the analyte. In particular there was no evidence that the ratio changed abruptly at concentrations near zero, so that mean values (robustified if necessary) within a trial were valid estimates of the zero-point ratio in most instances. When, in the small proportion of instances, significant dependence between the ratio and concentration was observed, regression intercepts would be alternative valid estimates of the zero-point ratio.
• There was no evidence to support the idea that the ratio tended towards unity at concentrations near zero.
• The mode of the ratios was close to 2.0. The strong positive skew in the observed mean ratios was apparently due to special circumstances prevailing in specific types of analysis. In a majority of individual trials the mean ratio observed was not greatly different from 2.0. An assumption of a value of 2.0 for the purpose of gauging the value of sR from sr would be safe in most instances in food analysis, but not in the determination of trace elements or constituents of oils and fats.
• Whether the assumption of a ratio close to 2.0 would be valid in application sectors other than food is unknown, as the relevant statistics (that is, obtained from collaborative trials of specific procedures) are not currently produced in sufficient numbers to allow generalisation. Proficiency test statistics mostly cannot be considered as alternatives as they do not characterise procedures but the performance of participants free to use any measurement principle or procedure.
This journal is © The Royal Society of Chemistry 2015 |