Assessing arsenic species in foods using regularized linear regression of the arsenic K-edge X-ray absorption near edge structure†
Abstract
The toxicity and bioavailability of arsenic is heavily dependent on its speciation. Therefore, robust and accurate methods are needed to determine arsenic speciation profiles for materials related to public health initiatives, such as food safety. Here, X-ray spectroscopies are attractive candidates as they provide in situ, nondestructive analyses of solid samples without perturbation to the arsenic species therein. This work provides a speciation analysis for three certified reference materials for the food chemistry community, whose assigned values may be used to assess the merit of the X-ray spectroscopy results. Furthermore, extracts of SRM 3232 Kelp Powder, which is value-assigned for arsenic species, are measured to provide further evidence of its efficacy. These analyses are performed on the results of As K-edge X-ray Absorption Near Edge Structure (XANES) measurements collected on each sample. Notably, such analyses have traditionally relied on linear combination fitting of a minimal subset of empirical standards selected by stepwise regression. This is known to be problematic for compounds with meaningfully collinear spectra and can yield overestimates of the accuracy of the analysis. Therefore, the least absolute shrinkage and selection operator (lasso) regression method is used to reduce the risk of overfitting and increase the interpretability of statistical inferences. As this is a biased statistical method, results and uncertainties are estimated using a bootstrap method accounting for the dominant sources of variability. Finally, this method does not separate model and data selection from regression analysis. Indeed, a survey of many spectral influences is presented including changes in the: state of methylation, state of protonation, oxidation state, coordination geometry, and sample phase. These compounds were all included in the model's training set, preventing model over-simplification and enabling high-throughput and robust analyses.