Data augmentation using GANN in the quantitative LIBS analysis of scarce samples: a case study on polymetallic nodules from 5000 m ocean depth
Abstract
As the world transitions towards renewable energy, the demand for critical resources such as nickel (Ni), cobalt (Co), and lithium (Li) in energy storage systems is ever more pronounced. The abundance of these elements in deep-sea polymetallic nodules provide an alternative to the land-based resources. However, the scarcity of deep-sea nodule samples poses a challenge in obtaining sufficient Laser-Induced Breakdown Spectroscopy (LIBS) data to train machine learning models for quantitative analysis. In this work, a Generative Adversarial Neural Network (GANN) with physical loss constraints was designed to augment the spectral database. Unsupervised classification techniques, including Principal Component Analysis (PCA) and Density-Based Spatial Clustering of Applications with Noise (DBSCAN), were employed to assess the similarity between experimental and generated spectra. Four machine learning models—Backpropagation Neural Networks (BPNN), Support Vector Machines (SVM), Extreme Gradient Boosting (XGBoost), and Convolutional Neural Networks (CNN)—were selected to represent a broad spectrum in current machine learning methods. Both experimental and expanded spectral datasets were used to train these models in quantitative elemental analysis. The model prediction performance was validated by comparing the results with those of inductively coupled plasma mass spectrometry (ICP-MS). The results demonstrated that augmenting the spectral database with GANN generated spectra improves the accuracy of machine learning models in the quantitative analysis of Ni, Co, and Li in deep-sea polymetallic nodules, providing a valuable approach for LIBS-based analysis of scarce samples.