Addressing the sparsity of laser-induced breakdown spectroscopy data with randomized sparse principal component analysis†
Abstract
Emission spectra yielded by laser-induced breakdown spectroscopy (LIBS) exhibit high dimensionality, redundancy, and sparsity. The high dimensionality is often addressed by principal component analysis (PCA) which creates a low dimensional embedding of the spectra by projecting them into the score space. However, PCA does not effectively deal with the sparsity of the analysed data, including LIBS spectra. Consequently, sparse PCA (SPCA) was proposed for the analysis of high-dimensional sparse data. Nevertheless, SPCA remains underutilized for LIBS applications. Thus, in this work, we show that SPCA combined with genetic algorithms offers marginal improvements in clustering and quantification using multivariate calibration. More importantly, we show that SPCA significantly improves the interpretability of loading spectra. In addition, we show that the loading spectra yielded by SPCA differ from those yielded by sparse partial least squares regression. Finally, by using the randomized SPCA (RSPCA) algorithm for carrying out SPCA, we indirectly demonstrate that the analysis of LIBS data can greatly benefit from the tools developed by randomized linear algebra: RSPCA offers a 20-fold increase in computation speed compared to PCA based on singular value decomposition.