Jie
Wu‡
a,
Fei
Li‡
b,
Jing-Wen
Zhou
d,
Hongmei
Li
d,
Zilong
Wang
a,
Xian-Ming
Guo
d,
Yue-Jiao
Zhang
d,
Lin
Zhang
*c,
Pei
Liang
*a,
Shisheng
Zheng
*d and
Jian-Feng
Li
*de
aCollege of Optical and Electronic Technology, China Jiliang University, Hangzhou 310018, China. E-mail: plianghust@gmail.com
bSchool of Optoelectronics, University of Chinese Academy of Sciences, Beijing 101408, China
cInstitute of Chemical Defense, Academy of Military Sciences, Beijing 102205, China. E-mail: zhanglin_zju@aliyun.com
dState Key Laboratory of Physical Chemistry of Solid Surfaces, iChEM, College of Chemistry and Chemical Engineering, College of Energy, College of Materials, College of Electronic Science and Engineering, College of Physical Science and Technology, Fujian Key Laboratory of Ultrafast Laser Technology and Applications, Xiamen University, Xiamen 361005, P. R. China. E-mail: zhengss@xmu.edu.cn; li@xmu.edu.cn
eInnovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen 361005, P. R. China
First published on 20th March 2025
Rapid and reliable detection of chemical warfare agents (CWAs) is essential for military defense and counter-terrorism operations. Although Raman spectroscopy provides a non-destructive method for on-site detection, existing methods show difficulty in coping with complex spectral overlap and concentration changes when analyzing mixtures containing trace components and highly complex mixtures. Based on the idea of convolutional neural networks and multi-layer perceptrons, this study proposes a qualitative and quantitative analysis algorithm of Raman spectroscopy based on deep learning (RS-MLP). The reference feature library is built from pure substance spectral features, while multi-head attention adaptively captures mixture weights. The MLP-Mixer then performs hierarchical feature matching for qualitative identification and quantitative analysis. The recognition rate of spectral data for the four types of combinations used for validation reached 100%, with an average root mean square error (RMSE) of less than 0.473% for the concentration prediction of three components. Furthermore, the model exhibited robust performance even under conditions of highly overlapping spectra. At the same time, the interpretability of the model is also enhanced. The model has excellent accuracy and robustness in component identification and concentration identification in complex mixtures and provides a practical solution for rapid and non-contact detection of persistent chemicals in complex environments.
Raman spectroscopy is a spectroscopic technique that is used to analyze molecular structures by examining their rotational and vibrational modes through non-elastic energy exchange.14,15 The technology is straightforward to operate, has high sensitivity, and does not require sample pretreatment or preparation of experimental reagents, meeting the urgent needs for chemical defense, counter-terrorism, and emergency monitoring tasks for sudden accidents.16–18 As a non-destructive analytical technique, Raman spectroscopy can quickly characterize molecular structures and is widely used in chemical analysis, biological detection, and materials science.19 Although each substance displays unique Raman spectral features that theoretically allow the identification and quantification of components in mixtures, in practical applications, especially in the application of detecting substances at very low concentrations, the sensitivity of Raman spectroscopy is very poor and susceptible to interference by environmental factors. The complexity of the interactions and reactions of the substances in the mixtures will further complicate the identification of the Raman spectral features, such as complex spectral overlap, non-linear combination of pure substance spectra, and unavoidable noise interferences, making Raman spectroscopy a great challenge.20
In the past, traditional chemometric methods, such as partial least squares regression (PLSR), partial least squares discriminant analysis (PLS-DA), and spectral peak matching algorithms, have typically been used to analyze Raman spectra of mixtures.21,22 These methods usually require preprocessing steps, including denoising (such as SG smoothing or wavelet transform), baseline correction, and normalization, to mitigate the effects of instrument noise and environmental interference on the Raman spectra. After preprocessing, the Raman spectra are analyzed. When using the PLSR method, it first divides the spectral subintervals through slope comparison and then combines the LMF algorithm to extract peak parameters. PLSR then extracts the principal components (LVs) that maximize covariance, and cross-validation is used to determine the optimal number of components. Finally, an overdetermined equation is constructed using Beer's law, which allows for the qualitative identification of components in the mixture. This method demonstrates relatively high accuracy when applied to mixtures with fewer spectral peaks. However, as the number of spectral peaks and the diversity of components in the mixture increase, its effectiveness significantly diminishes.23,24
The emergence of machine learning has brought revolutionary advancements to spectral analysis. Unlike traditional chemometric methods, machine learning can automatically extract complex features and handle nonlinear relationships, allowing for more accurate identification of spectral features and component information in complex mixtures, particularly in the classification and identification of chemical warfare agent simulants.25,26 Chen et al.27 used simulated spectra to tackle challenges like data scarcity and invisible spectral regions but did not validate their accuracy. Fan applied DeepCID,28 leveraging convolutional neural networks (CNNs) for Raman spectroscopy component analysis, achieved over 95% accuracy under controlled conditions but struggled with complex spectra. Models like Long Short-Term Memory (LSTM),29 K-Nearest Neighbors (KNNs),30 Random Forests (RFs), and Backpropagation Artificial Neural Networks (BP-ANNs)31 have reduced training sample requirements and improved accuracy in complex mixtures, often exceeding 95% even across wide concentration ranges. Despite their strengths in component identification, these methods are constrained by limited interpretability and versatility. Their performance degrades markedly when dealing with unknown mixtures, with prediction errors rising by a factor of 3 to 4. Zhang et al.32 achieved a 4.1% improvement in recognition accuracy over CNNs by leveraging transfer learning for Raman spectroscopy. However, the limited interpretability of existing models continues to hinder their practical applications.
This paper presents an RS-MLP framework, a novel multilayer perceptron architecture for Raman spectral analysis. This manuscript selects chemical agent simulants as the object of study, aiming to achieve qualitative and quantitative analyses of chemical agent simulant mixtures, thus providing a methodological reserve for future real agent analysis. Currently, using simulated agents to conduct methodological research is a common practice in the international academic community.33–36 Leveraging the principles of the MLP-Mixer architecture, this framework constructs a reference feature library from pure spectral fragments, combines attention mechanisms with hierarchical feature matching strategies, and performs dual-channel matching of convolution-extracted mixed spectral features against the reference library. Accurate analysis of mixtures is achieved through adaptive capture of spectral region contributions. RS-MLP ensures end-to-end interpretability via feature importance weighting and attention heatmaps, enabling traceable results. The model demonstrates exceptional performance in handling complex mixtures with high spectral overlap and significant concentration variations, particularly in the quantitative analysis of chemical agent simulant mixtures, achieving an extremely low concentration prediction error of only 0.473% RMSE. Furthermore, it maintains high accuracy even under extreme concentration ratios. Its stability and robustness are rigorously validated on low-concentration samples and independent datasets.
Initially, in the pure substance feature extraction module, the framework constructs a reference feature library by using convolution for feature extraction and labeling pure substance features. Specifically, eight key Raman peaks from the spectra of pure DIMP, DMMP, and TEP are labeled based on critical characteristics such as position, intensity, sharpness, width, and area. Next, the spectral features are reduced into 64 feature segments using convolution. These features collectively build the reference library, providing a foundation for subsequent spectral matching and integration.
Subsequently, we designed a simulation mixing algorithm to simulate linear and nonlinear mixing effects, concentration-dependent nonlinear responses, and pairwise spectral interactions. This algorithm was applied to the pure substance data to generate 45 binary and ternary mixtures with different concentration ratios. To address practical challenges like concentration gradients and extreme concentration scenarios, we used the Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP)26,37 to perform concentration gradient filling for 45 mixtures with different concentration ratios.
In the RS-MLP analysis network, Raman spectral data from the mixtures, augmented with simulated mixing and GAN-based concentration gradient filling, are the input. The network first extracts features using a multi-scale dilated convolutional network with residual blocks. The multi-head attention mechanism then focuses on the key peak positions and intensities. These features are labeled using position and intensity encoders and matched to a reference feature library through the MLP-Mixer's token and channel feature mixing. The output is a 0–1 probability for each of the three components, where 0 indicates absence and 1 indicates the presence of a pure substance, enabling both qualitative and quantitative analyses.
Mixture type | Category | Concentration combinations (%) | Total combinations |
---|---|---|---|
Binary | Balanced | 75/25, 50/50, and 25/75 | 21 |
Imbalanced | 10/90 and 5/95 | ||
Trace | 1/99 and 0.5/99.5 | ||
Ternary | Regular | 60/30/10, 60/10/30, 30/60/10, 30/10/60, 10/60/30, and 10/30/60 | 24 |
Balanced | 40/35/25, 40/25/35, 35/40/25, 35/25/40, 25/40/35, and 25/35/40 | ||
Imbalanced | 80/15/5, 80/5/15, 15/80/5, 15/5/80, 5/80/15, and 5/15/80 | ||
Trace | 90/9/1, 90/1/9, 9/90/1, 9/1/90, 1/90/9, and 1/9/90 |
The binary and ternary mixture concentration gradients were expanded to 200 samples per ratio using WGAN-GP. These samples were then merged with the original dataset for use in this study. The data were divided into training, validation, and test sets. The test set was uniformly sampled across the concentration gradient to ensure that each concentration was represented, with a sampling ratio of 1/5. The sampled data were used as the test set after sampling was complete, ensuring that it was completely independent and did not participate in the training or validation process. The remaining data were randomly sampled by the index, and the training and validation sets were divided in a ratio of 7:
3.
The experimental measurements were carried out using a Beijing Zhuo Li Han Guang FE1064-Pro Handheld Raman spectrometer.
For binary mixtures composed of DIMP, DMMP, and TEP, we defined seven concentration ratios, including extreme values as shown in Table 1, based on practical use scenarios. The specified concentration gradients were systematically combined to generate 21 unique binary mixtures. These mixtures were constructed as weighted linear combinations of the concentration ratios, while also accounting for nonlinear concentration effects. The binary mixing process is modeled as:
Binarymix = ∑(Ci(1 − αCi)Si) | (1) |
For the ternary mixture, we further improved the fidelity of the simulated spectrum by incorporating both linear and nonlinear mixing effects, as well as concentration-dependent nonlinear responses and pairwise spectral interactions. The ternary mixing process is represented as:
![]() | (2) |
In this model, α = 0.05 regulates the concentration's nonlinear response, and β = 0.02 determines the strength of interactions between components. The term I(Si,Sj) represents the spectral interaction between compounds i and j. M(Si,Sj) serves as a masking function, confining interactions to regions where either compound exhibits significant spectral peaks, defined as greater than 10% of the maximum intensity. This formulation strikes a balance between preserving the integrity of individual spectral peaks and accounting for realistic minor interactions between components, enhancing both the model's physical relevance and predictive accuracy.
Subsequently, we verified the accuracy and authenticity of the simulated mixtures and examined whether the simulation method could effectively overcome the scarcity of Raman spectral experimental data and the challenges associated with data acquisition by accurately replicating the spectral data of chemical agent simulants. Binary mixtures were simulated as weighted linear combinations of pure component spectra based on concentration ratios, while ternary mixtures incorporated both linear and nonlinear mixing effects to emulate real spectral interactions. For validation, we selected a balanced binary mixture and an extreme ternary mixture, simulating the mixing of pure components according to specified ratios to verify against real mixed data. For the validation and comparison results between simulated and real mixtures, refer to Fig. S4.†
Finally, we designed 24 combinations of ternary mixtures with different concentrations, which were divided into four groups: conventional mixtures, mixtures with extreme components, mixtures with medium to trace components and mixtures with equilibrium components. Due to space limitations, the dataset is not presented here. For the complete visualization and data comparison of all binary and ternary simulated mixtures, please refer to Fig. S1–S3.†
![]() | ||
Fig. 2 GAN network architecture diagram: (a) the network structure of the generator and (b) the network structure of the discriminator. |
The generator architecture comprises three core components: an initial upsampling module that expands the input noise vector dimensions via transposed convolution; a feature refinement module with three residual blocks and a self-attention layer to capture complex spectral features and correlations; and a progressive upsampling path that enhances spectral resolution over five stages while preserving spectral characteristics. Residual connections and batch normalization in each stage ensure training stability and maintain spectral details, while a final smoothing layer ensures spectral continuity.
The discriminator is designed with a complementary architecture that progressively down-samples the input spectrum through four convolutional layers and one residual block to extract hierarchical features. Batch normalization ensures training stability, and activation functions enable the model to capture complex nonlinear relationships. A dropout layer is incorporated to prevent mode collapse in the GAN. To improve gradient flow and stabilize training, the Wasserstein distance with gradient penalty is employed as the primary loss function, which is formulated as:
![]() | (3) |
Subsequently, ResBlock with pre-activation architecture is used to perform deep feature extraction (Fig. S6†). This architecture not only enables deeper feature propagation but also preserves original spectral information while allowing the network to learn complementary features for enhanced spectral data extraction. The feature extraction module employs stride-2 convolution operations to systematically downsample the 512-dimensional spectral vector to a 64-dimensional feature representation. Each feature corresponds to an 8.26 cm−1 (512/64) spectral region.
The network maintains stable feature distributions across different scales through batch normalization in ConvBlock and layer normalization in ResBlock. ConvBlock combines dilated convolution and progressive channel expansion for multi-scale feature extraction, while ResBlock's pre-activation architecture ensures effective feature propagation and enhancement. The final output is a feature map of shape (128, 64), where each spectral segment is represented by a 128-dimensional feature vector.
In the subsequent feature matching and analysis network, the mixture features after convolutional feature extraction are processed by multi-head attention, and the two sets of multi-head attention focus on Raman shift and peak intensity features, respectively. Combined with pure substance feature encoding, a mixture of token and channel is used in the MLP, and the similarity between the mixture features and each pure substance is computed through the multi-head attention mechanism, so that the mixture can be analyzed qualitatively and quantitatively.
In view of the fact that Raman spectroscopy contains two distinct types of basic information: determining the peak position of the molecular structure and the peak intensity representing the concentration of the component.39 In order to efficiently capture spectral properties, the network uses two parallel multi-head attention modules, and the number of heads for each multi-head attention is set to 8, aiming to focus on the characteristics of position and intensity, respectively. In each attention header, the input features are projected into the query, key, and value spaces through linear transformations. The attention mechanism calculates the score according to the following formula:
![]() | (4) |
In order to further improve the accuracy of feature matching and enhance the ability of global correlation of tandem spectra, this study used a multi-layer perceptron (MLP-Mixer) module to align the extracted spectral features with the reference pure component features. The module encodes position and intensity information separately using two independent encoders for pure substances. The coding features are processed by a token-wise and channel-wise hybrid network. The token-wise network processes each spectral segment independently, combines the multi-head attention of the mixture with the position coding of the pure substance, and deeply captures the correlation features to achieve preliminary matching. The channel-wise network captures the correlation between different feature channels and enhances the ability of global feature matching and fusion. Both networks use residual connection and layer normalization (layer norm) to maintain a stable feature distribution. Using pure substance feature encoding, the MLP performs token and channel mixing. The multi-head attention mechanism then calculates the similarity between the mixture features and each pure substance, enabling both qualitative and quantitative analyses of the mixture.
The final prediction phase consists of a multi-task learning framework built with three fully connected networks, each targeting specific aspects of mixture analysis. The component prediction module uses a stepwise dimension reduction structure to transform high-dimensional spectral features into concentration predictions for the three substances, generating probability scores for each chemical component. The concentration prediction network employs a three-tier architecture to optimize feature transformation and ensure precise outputs. The first layer enriches the representation space while maintaining the original feature dimensions, followed by dimensional compression to retain critical information for concentration estimation. The final layer maps the refined features to predicted concentrations, ensuring that output values represent the effective proportions of the three components, constrained within the range of 0 to 1. The uncertainty quantification module shares features through dimensionality reduction and generates confidence scores (0–1) for each component, providing a quantitative measure of prediction reliability.
The model's output consists of the predicted concentration of each pure substance in the mixture, represented as probabilities ranging from 0% to 100%. To achieve robust performance across various mixture types, we designed a comprehensive loss function that integrates multiple components, each addressing a specific aspect of the problem. This loss function is constructed to optimize the model holistically, enabling it to perform well in both classification and concentration prediction tasks.
The primary component of the loss function is the mean squared error (MSE) loss, formulated as:
![]() | (5) |
To enhance the model's sensitivity to trace components, defined as those with concentrations below 15%, a dynamic weighting mechanism is introduced. This mechanism assigns higher loss contributions to low-concentration samples, ensuring that they are not overshadowed by dominant components. The dynamic weight is computed as:
![]() | (6) |
![]() | (7) |
It is ensured that trace components contribute proportionally more to the total loss function, improving model sensitivity to low-abundance species.
Beyond concentration prediction, uncertainty quantification plays a crucial role in ensuring the reliability of model outputs. To penalize predictions with high uncertainty, an uncertainty-weighted error loss is incorporated, given as follows:
Luncertainty = E[|ŷ − y| × uncertainty] | (8) |
Runcertainty = 0.01 × E[uncertainty2] | (9) |
In addition to predicting concentration values, the model must correctly classify whether a component is present or absent in a given sample. To achieve this, binary cross-entropy (BCE) loss is employed for presence detection:
![]() | (10) |
The final total loss function integrates these components into a single objective:
Ltotal = Lconc. + 2 × Llow\_conc. + 0.1 × (Luncert + Runcert) + 0.1 × Lpres | (11) |
To validate the model's performance, we uniformly sampled 20% of the binary and ternary mixtures at different concentration gradients, totaling 1800 test samples. These were excluded from the training and validation processes to rigorously assess the model's robustness and generalization.
The model is designed to identify components in the mixture and predict their concentrations with high accuracy. To evaluate this, we classify all mixtures in the test set based on their components. For binary mixtures, we used three component combinations: DMMP + DIMP, DMMP + TEP, and DIMP + TEP. Ternary mixtures, containing all three components, form a single combination. These mixtures were then categorized based on the components identified during the data processing phase.
The model's effectiveness is measured by its classification accuracy and concentration prediction. Fig. 4 illustrates the model's evaluation of classification and prediction accuracy. Fig. 4(a) shows that the model correctly identifies the component combinations, achieving 100% classification accuracy using multi-head attention mechanisms to focus on Raman shifts. The confusion matrix confirms the model's accurate classification, with no errors in categorizing the mixtures.
The model's error metrics for concentration predictions, as illustrated in Fig. 4(b), highlight its predictive accuracy. The exceptionally low values of MAE, MSE, and RMSE across all components highlight the model's precision, with MAE values of 0.0035 for DIMP, 0.0032 for DMMP, and 0.0031 for TEP. Such robust performance demonstrates the model's ability to predict component concentrations with high accuracy and consistency across diverse mixture types. Building upon the accurate classification of component categories, the model further demonstrates its effectiveness in predicting individual component concentrations.
This strong classification accuracy provides a solid foundation for the concentration prediction of each component. Fig. 4(c) demonstrates that the predicted concentrations exhibit near-perfect linear correlations with the true values (R2 > 0.99), with regression slopes approaching unity for DIMP (0.999), DMMP (0.998), and TEP (1). This result reflects the model's ability to match spectral features with those from the reference feature library, enabled by its attention-based architecture. Additionally, the incorporation of uncertainty quantification further enhances the model's interpretability, with the color gradient representation in Fig. 4(c) illustrating the model's confidence in its predictions.
To further validate the model's performance, a comparative analysis of 20 representative samples is presented in Fig. 4(d). This analysis underscores the model's accuracy and stability in handling both binary and ternary mixtures. The close agreement between the predicted values (orange) and true values (blue) confirms the efficacy of the hierarchical feature extraction and matching strategy. This success is closely tied to the interpretable architecture adopted in the study, which employs convolutional residual features to identify feature peak positions, followed by position-intensity matching via an attention mechanism. Such a structured approach ensures not only high predictive accuracy but also a deeper understanding of the underlying decision-making process, making the model highly applicable for analyzing complex mixtures in real-world scenarios. By monitoring the learning process, the decline in training and validation losses (Fig. S13(a)†) indicates that the model is used for effective learning. Although occasional fluctuations were observed in the validation loss, the small gap between training and validation losses suggests strong generalization capability and the absence of overfitting. The final model successfully achieved a balance between loss minimization and learning rate adjustment.
Subsequently, attention from different layers, with dimensions of 256, 128, and 64, respectively, was combined and projected onto a unified attention layer for aggregation. To preserve the precision of the attention weight distribution in the first and second layers, point correspondence mapping was applied to align their attention to the spectral dimension, preventing the loss of fine positional information. Finally, a masking technique was employed to map the 512 data points from the first and second layers onto the attention weights of the third layer. This step accurately captures the true weight distribution of attention across spectral bands, which is crucial for final component discrimination and concentration prediction.
The attention weights at the corresponding positions across all three layers were then accumulated, and the total weights were renormalized to ensure consistency within a unified visual range. Finally, using linear interpolation, the 64 attention weight blocks were mapped to their corresponding spectral regions, providing detailed visualization of the weight distribution within the analysis matching network and enhancing the interpretability of the model. Here we show the visualization and analysis of binary and ternary mixtures to illustrate the inference process of the model.
To achieve spectral alignment and enhance the clarity of visualization, all spectra were aligned to the origin of the coordinate axis to ensure consistency. The inference process of the model is depicted in Fig. 5. Fig. 5(a) presents a heatmap of attention weight distribution over the mixture spectrum during the analysis. By visualizing the attention weights, the network's focus on mixture features during the matching analysis phase can be intuitively observed.
Although linear interpolation was applied to the aggregated attention weights, the mapping relationships remain neither entirely intuitive nor detailed due to changes in feature dimensions after convolution. Nevertheless, it is evident that the attention weights exhibit significant emphasis around Raman shifts of 0–100 cm−1, 600–780 cm−1, 1100–1220 cm−1, and approximately 1450 cm−1. These regions, highlighted in the figure, serve as the foundation for subsequent network matching analysis phases. Given the high similarity in the characteristic peaks of the three pure substances, Fig. 5(b) further illustrates the network's stepwise matching analysis process. The differently colored triangles in the figure represent distinct substances, and their marked positions within the mixture's Raman spectrum correspond to the results of attention mechanism matching with Raman shifts in the reference feature library. The heights of the dashed lines represent the network's comparison of peak intensities between the mixture's features and the reference feature library. These positions are crucial indicators for the network to determine the presence of specific substances in the mixture. The dashed line heights further reflect the network's analysis of mixture concentrations based on the peak intensities of pure substances and the attention weights. These two figures illustrate the inference process and working principles of the network: after assigning weights to regions with significant spectral peak variations in the mixture using a multi-head attention mechanism, the reference feature library is introduced. By independently encoding the Raman peak positions and intensities in the reference feature library, the network implements a dual-channel matching strategy between the attention weights of the mixture and the reference feature library. Positional correspondence determines the components of the mixture, while proportional correspondence in intensity reveals the concentration ratios of individual components. For example, in the peak matching analysis plot, at the highest peak corresponding to the Raman shift of 780 cm−1, the network identifies the characteristic peaks of three substances, with significant differences in the dashed line heights for each substance. By normalizing the Raman characteristic peak intensities in the reference feature library to 100%, it is evident that the network derives the concentration ratios of each component by comparing the peak intensities with those in the reference feature library. The height proportions of the dashed lines can be visually correlated with the predicted component concentration distribution of the mixture in Fig. 5(c).
Fig. 5(d–f) shows the attentional weight distributions, the locations of the attentional distributions, and the concentration prediction histograms for the spectra of binary mixtures. Since the analysis of ternary mixtures has already been elaborated in detail previously, and the case of the spectra of binary mixtures is quite similar to that of ternary mixtures, the details of such analysis will not be repeated in this section.
The network employs a dual-channel matching strategy between the attention weights and the reference feature library to identify the components and concentration information of pure substances within the mixture, enabling interpretable analysis of the mixture. Although cumulative effects lead to a significant overall increase in attention weights, this does not impede the visualization of regions with high attention.
To further validate the applicability of the RS-MLP algorithm and evaluate its generalization capability, we tested the model using a hazardous chemical dataset from Raman spectra and compared the results with those of CNNs,41 Residual Networks (ResNet),42 and Long Short-Term Memory (LSTM) + CNNs.43 A self-constructed hazardous chemical dataset comprising 75 substances was used for evaluation, including 15 pure substances, 20 binary mixtures, 15 ternary mixtures, 15 quaternary mixtures, and 10 quinary mixtures, with 50 samples collected per category. The dataset and detailed validation information can be found in Fig. S15.†
Following the modeling procedures outlined earlier in this manuscript, we collected data and established models. To further validate the model's qualitative analysis capability for hazardous chemical mixtures in practical application scenarios, random baseline noise was introduced as an interference signal. The validation was divided into 75 categories based on different component combinations. For a clear and comparative evaluation of the model, these categories were grouped into five major classes. Multiple tests were conducted under identical experimental conditions, and the validation results are summarized in Table 2, where “F” denotes values below the threshold or recognition errors.
Concentration | Accuracy/R2 | |||||||
---|---|---|---|---|---|---|---|---|
GAN & random baseline noise | No GAN & random baseline noise | |||||||
CNN | ResNet | LSTM + CNN | RS-MLP | CNN | ResNet | LSTM + CNN | RS-MLP | |
Pure | 0.954 | 0.966 | 0.961 | 0.998/0.999 | 0.931 | 0.92 | 0.92 | 0.982/0.99 |
Binary | 0.932 | 0.944 | 0.878 | 0.992/0.999 | 0.863 | 0.76 | 0.823 | 0.974/0.982 |
Ternary | 0.898 | 0.91 | 0.866 | 0.99/0.999 | 0.799 | 0.76 | 0.75 | 0.963/0.969 |
Quaternary | 0.729 | 0.833 | 0.752 | 0.984/0.99 | 0.5 | F | 0.72 | 0.952/0.96 |
Quinary | F | 0.712 | 0.7 | 0.98/0.986 | F | F | F | 0.938/0.96 |
Table 2 highlights the significant advantage gained by incorporating GAN-generated data. For all mixture categories, the accuracy and R2 values of the models improved substantially with the GAN compared to the baseline without the GAN. For example, in ternary and quinary mixtures, where spectral overlap and concentration gradients present considerable challenges, the inclusion of the GAN augmented the data diversity and mitigated overfitting, thereby enhancing the generalization capability. The RS-MLP model particularly benefited from this augmentation, reaching the highest accuracy and R2 values across all mixture types, with values of 0.998/0.999 in pure substances and 0.98/0.986 in quinary mixtures.
The RS-MLP model consistently outperformed ResNet, LSTM + CNNs, and CNNs under identical experimental conditions, highlighting its superior capability for feature extraction and matching. This capability is crucial for handling complex chemical mixtures. In challenging scenarios, such as quinary mixtures, RS-MLP maintained high prediction accuracy, indicating its robustness and adaptability. By contrast, models like ResNet and LSTM + CNN faced challenges in capturing subtle spectral variations and conducting feature matching, resulting in lower accuracy and R2 values, particularly in high-order mixtures.
RS-MLP excels at handling complex Raman spectra, particularly in mixtures with overlapping peaks, by combining multi-head attention with convolutional and residual blocks. The multi-head attention allows the model to focus on different parts of the spectrum simultaneously, while the residual blocks refine predictions by learning from errors. This structure enables RS-MLP to capture nonlinear interactions and efficiently extract both positional and intensity features, which are crucial for accurately identifying and quantifying components in mixtures.
Unlike traditional methods that extract all spectral features at once, potentially missing small but important variations, RS-MLP separates the positional and intensity information. By using a reference library of pure substance features and an MLP-based matching technique, it combines these two aspects, offering a more comprehensive and accurate representation of the spectral data.
In contrast, traditional models like CNNs, ResNet, and LSTM + CNNs struggle with nonlinear interactions, lack multi-scale feature matching, and do not account for the nuanced positional and intensity relationships in Raman spectra.44 These limitations make them less effective for analyzing complex mixtures, where RS-MLP outperforms them by handling intricate spectral patterns and extracting crucial information more precisely.
In summary, the results demonstrate that integrating the GAN with RS-MLP significantly enhances the model's capability to perform precise qualitative and quantitative analyses of hazardous chemical mixtures, even under challenging conditions. Future work should focus on optimizing data augmentation strategies and further refining network architectures to improve model performance, especially in quinary mixtures where slight discrepancies remain. This would ensure greater accuracy and robustness in practical applications, such as real-time CWA detection and analysis.
Footnotes |
† Electronic supplementary information (ESI) available: Fig. S1–S14. See DOI: https://doi.org/10.1039/d5an00075k |
‡ These authors contributed equally. |
This journal is © The Royal Society of Chemistry 2025 |