A hybrid random forest method fusing wavelet transform and variable importance for the quantitative analysis of K in potassic salt ore using laser-induced breakdown spectroscopy
Abstract
Potash is the main raw material for the production of agricultural fertilizers. Herein, random forest (RF) models fusing variable importance and wavelet transform were proposed to determine the K content in a potassic salt ore. Specifically, 53 potassic salts samples were analyzed, of which 37 were treated as the calibration set. An original RF model was developed for regression with the optimized parameters ntree and mtry. However, RP2 (0.7399) and the modeling time (251.8 s) of the RF model were not satisfactory. Thus, we initially explored the effect of different variable importance (VI) thresholds on the quantitative results. When the VI threshold was set to 0.090, the variable number of the VIRF model was reduced from 27 620 to 3355. There were no significant improvements for VIRF in the other model performance parameters such as RMSEP and RP2. Then, wavelet transform was adopted to screen the input variables of the RF model (defined as WTRF). Their promotion ratios were 16% (RP2 from 0.7399 to 0.8555), 38% (RMSEP from 0.1798 to 0.1106), 62% (MRE from 0.2740 to 0.1032), and 11% (MRSD from 0.0686 to 0.0613). In the case of modeling time, it was promoted by about three orders of magnitude. Upon further using the variable importance for the WTRF model (defined as WT-VIRF), because all the selected input variables filtered by wavelet transform contributed significantly to the quantitative results, no more variables were removed and then, the WT-VIRF model achieved the exact result with the WTRF model. Thus, all the results demonstrate that the RF model combined with WT is a promising methodology for the quantitative analysis of the K content in potassic salt ores.