Impact of data reduction on multivariate classification models built on spectral data from bio-samples
Abstract
Multivariate data analysis methods have been used to evaluate single shot spectral data, obtained by laser induced breakdown spectroscopy (LIBS), from ten different biological samples (simulants and possible interferents in Biological Warfare Agent (BWA) detection applications). Spectral data as echellograms (2D CCD images) and extracted 1D spectra were used and the classification performance was studied as the number of input variables was altered. Principal component analysis (PCA) indicated a possibility to separate the samples due to spectral differences, and partial least squares discriminant analysis (PLS-DA) was applied to study the predictability in more detail. For full resolution 1D spectra, a normalization of the data mainly resulted in visual effects in the PCA score-plots without significant effect in predictability by the PLS-DA models, however, normalization improved the predictability if the amount of variables were heavily reduced. A quite strong data (variable) reduction could be performed on both the 1D and 2D data without losing significant predictability. Using similar amounts of variables, the prediction models performed better using the echellograms directly compared to the extracted 1D spectra. The problem of spectral data shift (relative ‘database’ spectra) was also investigated, where already small shifts cause the models to fail. However, after a selection of important variables and allowing certain regions for these variables, the impact of shift on predictability could be reduced.