Shunfan Hua,
Jianming Dinga,
Yan Dongb,
Tianlong Zhanga,
Hongsheng Tang*a and
Hua Li*ac
aKey Laboratory of Synthetic and Natural Functional Molecular Chemistry of Ministry of Education, College of Chemistry & Material Science, Northwest University, Xi'an, 710127, China. E-mail: tanghongsheng@nwu.edu.cn; huali@nwu.edu.cn
bChina Certification & Inspection Group Shan Dong Co; Ltd, Qing Dao, 266000, China
cCollege of Chemistry and Chemical Engineering, Xi'an Shiyou University, Xi'an, 710065, China
First published on 21st May 2024
Driven by the “double carbon” strategy, petroleum coke short-term demand is growing rapidly as a negative electrode material for artificial graphite. The analysis of petroleum coke physicochemical properties has always been an important part of its research, encompassing significant indicators such as ash content, volatile matter and calorific value. A strategy based on laser-induced breakdown spectroscopy (LIBS) in combination with chemometrics is proposed to realize the rapid and accurate quantification of the above properties. LIBS spectra of 46 petroleum coke samples were collected, and an original random forest (RF) calibration model was constructed by optimizing the pretreatment parameters. The RF calibration model was further optimized based on variable importance measures (VIM) and variable importance in projection (VIP) methods. After variable selection, the elemental spectral lines related to ash content, volatile matter and calorific value modeling were screened out, thus initially exploring the correlation between these properties and elements. Under the optimized spectral pretreatment method, VI threshold and model parameters, the mean relative error (MREP) of the prediction set of ash content, volatile matter and calorific value were 0.0881, 0.0527 and 0.006, the root mean square error (RMSEP) of the prediction set of ash content, volatile matter and calorific value were 0.0471%, 0.6178% and 0.2697 MJ kg−1, respectively, and the determination coefficient (RP2) of the prediction set was 0.9187, 0.9820 and 0.9510, respectively. The combination of LIBS technology and chemometric methods can provide powerful technical means for the analysis and evaluation of the physicochemical properties of petroleum coke.
Laser-induced breakdown spectroscopy (LIBS) as a mature atomic spectral analysis technique,17,18 has the obvious advantage of without complicated sample pretreatment and multi-element simultaneous analysis. Moreover, it has the advantages of real-time, fast, in situ and micro-damage, and has significant application value in key scientific fields such as deep space exploration, archaeological science, metallurgical analysis, environmental monitoring, geological exploration and biomedicine.19–24 Petroleum molecular analysis is an important application field of LIBS. Currently, LIBS technology is mainly utilized for elemental analysis in the research of petroleum coke.25,26 In 2021, Zhang et al. innovatively applied LIBS technology to the detection of V, Fe, and Ni in petroleum coke.25 Subsequently, Lu et al. optimized the content of stearic acid binder in petroleum coke tablets and found that the performance of LIBS was significantly improved when stearic acid accounted for 30 wt%.26 However, there is no report on the prediction of ash content, volatile matter, and calorific value of petroleum coke using LIBS.
Recently, the combination of spectroscopy and chemometrics has shown high feasibility in the analysis of petroleum molecules, aiming at material elements, properties and structures.27,28 Aiming at the chemical composition diversity and spectral complexity of petroleum coke samples, an random forest (RF) calibration model based on ensemble learning and decision tree is used to achieve the requirements of rapid quantitative analysis of physicochemical properties of petroleum coke. Chen et al. compared RF, partial least squares (PLS), and least squares support vector machine (LSSVM) for predicting Zn, Cu, and Ni properties in a single micro-scale suspended particle.29 The results indicated that RF exhibited superior performance, with the mean relative error (MRE) values of 0.0862, 0.1020, and 0.1323, respectively. Furthermore, Wang et al. compared the performance of multiple linear regression (MLR), RF, and deep fully connected neural network (DNN) in predicting coal structures.30 In tests conducted with 300 and 1200 sample groups, RF demonstrated the highest accuracy of 83% and 86%, respectively, highlighting its advantages in precision and noise resistance.
Based on the actual demand for rapid and accurate analysis of petroleum coke properties, an optimized RF calibration model was proposed, integrating combined pretreatment strategy with variable importance measures (VIM) and variable importance in projection (VIP) for rapid determination of ash content, volatile matter, and calorific value. This study collected various petroleum coke types, such as sponge coke, shot coke, needle coke, calcined coke and pitch coke. Firstly, the LIBS spectrum was acquired from 46 samples, and an initial RF calibration model was constructed by optimizing pretreatment parameters. Secondly, by selecting appropriate thresholds for VIM and VIP, the model can be further optimized, eliminating irrelevant variables and retaining important variables related to analyte composition. Model performance was evaluated using determination coefficient (R2), MRE, and the root mean square error (RMSE). Finally, model stability was verified by calculating the values of the ratio of the standard deviation of the response variable to the RMSEP (RPD), the ratio of the RMSEP to the range (RER), and the relative standard deviation (RSD), which provides a novel approach for the petroleum coke properties analysis.
For standard samples, a certain amount of powder was weighed according to different proportions and ball-milled and mixed evenly by QM-3SP2 planetary ball mill (Nanjing Laibu Technology Industrial Co., Ltd). For actual samples, a certain amount of solid block samples was weighed and grinded with planetary ball mill. The mixed mode is bidirectional interval alternation, the mixed time is set to 60 minutes, and the rotating speed is set to 450 rpm so that it is finely ground and sieved with a 200 mesh sieve. 20 g sieved powder samples were taken out to prepare the reference values determination of ash content, volatile matter and calorific value. Before LIBS spectral acquisition, each sieved petroleum coke powder sample was pressed into tablets by PC-24 tablet press (Pinchuang Technology Co., Ltd, the maximum pressure is 30 MPa) under 20 MPa for 5 min.
No. | Ash (wt%) | Volatile matter (wt%) | Calorific value (MJ kg−1) | No. | Ash (wt%) | Volatile matter (wt%) | Calorific value (MJ kg−1) |
---|---|---|---|---|---|---|---|
a Selected for prediction sample. | |||||||
1 | 0.35 | 18.88 | 36.26 | 24a | 0.49 | 15.57 | 35.72 |
2 | 0.34 | 10.48 | 35.01 | 25 | 0.54 | 16.62 | 35.86 |
3 | 0.30 | 14.08 | 35.65 | 26a | 0.42 | 17.37 | 36.06 |
4 | 0.26 | 4.02 | 33.75 | 27 | 0.47 | 10.85 | 35.06 |
5a | 0.28 | 11.85 | 34.96 | 28a | 0.42 | 11.94 | 35.23 |
6 | 0.28 | 15.03 | 35.63 | 29a | 0.50 | 12.58 | 35.26 |
7 | 0.30 | 6.34 | 34.28 | 30a | 0.50 | 11.00 | 35.57 |
8 | 0.40 | 8.32 | 34.59 | 31a | 0.25 | 9.23 | 35.51 |
9a | 0.25 | 5.67 | 34.08 | 32 | 0.24 | 10.15 | 34.90 |
10a | 0.34 | 8.94 | 34.54 | 33 | 1.33 | 13.22 | 35.53 |
11 | 0.28 | 11.52 | 35.12 | 34 | 0.87 | 10.59 | 34.92 |
12 | 0.35 | 7.86 | 33.81 | 35 | 0.40 | 15.99 | 35.64 |
13 | 0.24 | 11.74 | 34.63 | 36 | 0.20 | 12.16 | 35.77 |
14 | 0.33 | 15.39 | 35.44 | 37a | 0.88 | 1.65 | 31.89 |
15 | 0.36 | 4.17 | 33.40 | 38 | 1.12 | 1.46 | 32.46 |
16a | 0.38 | 6.55 | 33.83 | 39 | 0.66 | 1.29 | 32.54 |
17 | 0.36 | 8.46 | 34.37 | 40 | 1.10 | 1.15 | 32.55 |
18 | 0.38 | 5.73 | 33.52 | 41 | 0.72 | 1.39 | 32.64 |
19 | 0.46 | 8.69 | 34.22 | 42 | 1.87 | 1.33 | 32.41 |
20 | 0.50 | 11.59 | 34.89 | 43 | 0.68 | 1.19 | 32.17 |
21 | 0.36 | 14.00 | 35.54 | 44 | 0.58 | 9.24 | 34.99 |
22a | 0.36 | 16.15 | 35.83 | 45 | 0.45 | 54.74 | 37.31 |
23 | 0.39 | 17.62 | 36.02 | 46 | 0.59 | 42.12 | 38.09 |
Fig. 2 Averaged LIBS spectrum of 3# petroleum coke sample based on noise reduction treatment ((a) 200–400 nm; (b) 400–600 nm; (c) 600–800 nm; (d) 800–935 nm). |
Taking the derivative of spectral matrix is an effective means to eliminate baseline drift, and distinguish overlapping peaks. The parameter optimization process of D1st is depicted in Fig. 3. As shown in Fig. 3(a), with the increase in smoothing points, the MREoob and RMSEoob of the RF calibration model for ash content exhibit a trend of initial decrease, followed by an increase, and then a subsequent decrease. When the smoothness point is 13, the RF calibration model shows better predictions (Roob2 = 0.8233, MREoob = 0.1558, RMSEoob = 0.1585). For volatile matter analysis, the optimal smoothing point is 25 and the RF calibration model obtained better prediction results (Roob2 = 0.9695, MREoob = 0.1708, RMSEoob = 2.2174). For calorific value analysis, the optimal smoothing point is 25, and the RF calibration model obtains better prediction results (Roob2 = 0.9837, MREoob = 0.0036, RMSEoob = 0.1836).
Fig. 3 Prediction results of RF calibration model based on different D1st smoothing points ((a) ash content; (b) volatile matter; (c) calorific value). |
Simultaneously, the impact of D2nd smoothing points on the analysis results of RF calibration models was discussed. Fig. 4 shows the impact of different smoothing points based on D2nd on the accuracy of the calibration model analysis results for ash content, volatile matter, and calorific value. As can be seen from Fig. 4, with the smoothing point increasing, the MREoob of the RF calibration model of ash content, volatile matter, and calorific value shows a trend of initial increase followed by a decrease. For ash content analysis, the RF calibration model showed better predictions (Roob2 = 0.8518, MREoob = 0.1550, RMSEoob = 0.1455) when the smoothing point was 15. For volatile matter analysis, the optimal smoothing number was 9, and the model obtained better predictions (Roob2 = 0.9521, MREoob = 0.2109, RMSEoob = 2.5631). For calorific value analysis, the optimal smoothing number was 19, and the RF calibration model obtained better prediction results (Roob2 = 0.9838, MREoob = 0.0037, RMSEoob = 0.1866). In summary, by optimizing the selection of smoothing points for D1st and D2nd, the prediction accuracy of the RF calibration models for ash content, volatile matter, and calorific value analysis can be significantly improved.
Fig. 4 Prediction results of RF calibration model based on D2nd different smoothing points ((a) ash content; (b) volatile matter; (c) calorific value). |
Due to the diversity of spectral interference causes and effects, a single pretreatment method may be insufficient to suppress the effects of spectral interferences on modeling effectiveness.37 Thus, the impact of spectral preprocessing combination strategies on the predictive performance of the RF calibration models was further explored (as shown in Fig. 5). Fig. 5 shows that for the ash content model, SNV corresponds to the minimum MREoob and RMSEoob, and D2nd to the maximum Roob2. Consequently, the combination of D2nd and SNV preprocessing methods was selected to train the ash content RF model. For volatile matter model, D1st combined with MSC pretreatment method corresponds to the minimum RMSEoob and MREoob, and the maximum Roob2. However, the prediction result of D1st combined with MSC is inferior to that of MSC alone, which may be due to the over-fitting of the training data caused by the optimization of model parameters, thus reducing the generalization effect of the model. For calorific value model, D1st combined with MSC pretreatment method corresponds to the minimum RMSEoob and MREoob, and the maximum Roob2. Therefore, the RF model of calorific value is trained by selecting D1st combined with MSC pretreatment method.
Fig. 5 Effects of different preprocessing methods on the predictive performance of RF calibration model (note: Ash: ash content, V.M.: volatile matter, C.V.: calorific value). |
Based on the optimal spectral preprocessing combination strategy, a comparison was conducted to assess the influence of the VI thresholds in the VIM/VIP algorithms on the predictive results of the RF calibration model. Fig. 7 examines the performance of different VI thresholds on the VIM/VIP-RF models for predicting three properties of petroleum coke. For ash content and volatile matter analysis, the VIM-RF model achieved better predictive performance. From Fig. 7(a), as the VI values increase, the MREoob values initially decrease and then increase. A minimum MREP is achieved at a threshold of 0.006 (RP2 = 0.9187; MREP = 0.0881; RMSEP = 0.0471). Therefore, a VI threshold of 0.006 is selected as the input variable for constructing the VIM-RF (ash content) calibration model. In contrast, for volatile matter analysis, the MREoob value initially increases and then decreases with increasing VI threshold. When the threshold is 0.08, MREP is at its minimum (RP2 = 0.9820; MREP = 0.0527; RMSEP = 0.6178). Therefore, the VI threshold value of 0.08 is selected as the input variable, and the VIM-RF (V.M.) calibration model is constructed. For calorific value analysis, VIP-RF model achieves better prediction performance. Fig. 7(b) shows that as the VI threshold increases, the MREoob value first increases and then decreases. As the threshold is 1.0, MREoob is the minimum (RP2 = 0.9510; MREP = 0.0060; RMSEP = 0.2697). Thus, the VI threshold of 1.0 is selected as the input variable and the VIP-RF (C.V.) calibration model is constructed.
Fig. 7 Predictive performance of the RF model of ash content, volatile matter, and calorific values under different thresholds ((a) VIM; (b) VIP). |
After comparing the feature-selected spectral lines with the original spectra, the elemental spectral lines related to ash content modeling are selected, such as Fe I 358.12 nm, 385.99 nm, AI I 394.40 nm, 396.15 nm, Ca I 824.88 nm, S I 796.39 nm. Notably, the Fe elemental spectral lines have a large VI value. Interestingly, although a significant VI value was observed near the peak of the Si element at approximately 250 nm, no Si elemental peak was evident in Fig. 2. It is speculated that the Si element peak may be obscured in the Fig. 2 due to excessive noise reduction. For volatile matter, the spectral lines of C, H, O and N have the greatest correlation with modeling after variable selection, and the VI value is about 40. The spectral lines Fe, Ca and Ti of metal elements also show correlation, but the VI value is lower, indicating that non-metallic elements may play a more dominant role in the modeling of volatile matter. For calorific value, the spectral lines of elements with large VI value are mainly C, O, N, and CN, which proves that there is a strong correlation between these elements and calorific value, and metal elements such as Fe, Ca and Na also have certain correlation.
Property | Model | Roob2 | MREoob | RMSEoob | RP2 | MREP | RMSEP | RPD | RSD (%) | RER |
---|---|---|---|---|---|---|---|---|---|---|
Ash | D2nd-SNV-VIM-RF | 0.8924 | 0.1167 | 0.1186 | 0.9187 | 0.0881 | 0.0471 | 3.82 | 2.90 | 14.62 |
V.M. | MSC-VIM-RF | 0.9781 | 0.1153 | 1.6333 | 0.9820 | 0.0527 | 0.6178 | 7.21 | 3.88 | 25.45 |
C.V. | D1st-MSC-VIP-RF | 0.9838 | 0.0031 | 0.1796 | 0.9510 | 0.0060 | 0.2697 | 4.20 | 0.22 | 15.46 |
It can be seen from the table that the idea of VIM/VIP variable selection after preprocessing method is feasible. Fig. 8 shows the relationship between reference values and predicted values under different RF calibration models. RP2 of ash content, volatile matter and calorific value increased by 35.8%, 7.46% and 9.15%, respectively. MREP decreased significantly, by 55.2%, 43.5% and 41.7%, respectively. RMSEP decreased by 50.6%, 60.7% and 35.1%, respectively. For ash content analysis, a D2nd-SNV-VIM-RF calibration model was established, achieving a low prediction RSD of 2.9%. The RPD value of the model, representing the ratio of the standard deviation of the response variable to RMSEP,38 was 3.82, indicating high accuracy. Generally, a RPD value exceeding 3 indicates acceptable prediction results. Furthermore, the RER value (RER = Rn/RMSEP,39 where Rn is the concentration range) reached 14.62, far exceeding the threshold of 10, demonstrating the model suitability for quality control applications and high robustness. For volatile matter analysis, an MSC-VIM-RF model exhibited a prediction performance with an RSD of 3.88%. The RPD value was calculated as 7.21, and the RER value was 25.45. For calorific value analysis, a D1st-MSC-VIP-RF model was established, the predicted performance of the calibration model being RSD 0.22%. The RPD value was 4.20, and the RER value was 15.47, both indicating high precision and strong robustness of the model.
Fig. 8 The relationship between the reference value and predictive value obtained by different RF calibration models ((a) ash content; (b) volatile matter; (c) calorific value). |
Furthermore, the model prediction results were compared with those of similar complex systems. Firstly, in the LIBS-based analysis of petroleum coke, compared to the previous research by Lu et al., the currently established models for ash content, volatile matter, and calorific value exhibited lower RSD values of 2.90%, 3.88%, and 0.22%, respectively, compared to their models for V, Na, and Ca element analysis, which had RSD values of 3.65%, 4.38%, and 5.53%.26 This demonstrates the superior stability of the current models. Additionally, the RMSEP values of the current models were 0.0471% for ash content, 0.6178% for volatile matter, and 0.2697 MJ kg−1 for calorific value. When compared to approximate analysis results based on coal, He et al. reported an RMSEP of 0.9687% for ash content and 1.3218% for volatile matter using a KELM model based on a primary spectral fusion strategy.40 Zhang et al. employed four calibration models, namely partial least squares regression (PLSR), support vector regression (SVR), artificial neural networks (ANN), and principal component regression (PCR), to quantitatively analyze 40 coal samples, achieving RMSEP values of 0.69% for ash content, 0.87% for volatile matter, and 0.56 MJ kg−1 for calorific value.41 The results obtained in this research further validating the effectiveness of the established calibration models.
Finally, the validity and limitation of the model in the study of ash content, volatile matter and calorific value were discussed. Considering the solid flake characteristics of petroleum coke samples, SNV and MSC effectively eliminate spectral differences arising from particle size and surface scattering. Additionally, derivative techniques correct for matrix effects and background interference. VIM and VIP feature selection techniques facilitate the identification of key spectral features, simplifying the model structure and enhancing prediction accuracy. Furthermore, the RF algorithm excels in handling high-dimensional data and nonlinear relationships, providing valuable variable importance assessments. Nevertheless, the quality and integrity of data are crucial, and different datasets may necessitate adjustments in preprocessing methods. Moreover, the RF algorithm may encounter challenges such as overfitting and computational efficiency. Therefore, it is crucial to comprehensively consider these factors when applying the model to practical petroleum coke detection.
Footnote |
† Electronic supplementary information (ESI) available. See DOI: https://doi.org/10.1039/d4ra02873b |
This journal is © The Royal Society of Chemistry 2024 |