Laser induced breakdown spectroscopy combined with hybrid variable selection for the prediction of the environmental risk Nemerow index of heavy metals in oily sludge
Abstract
Oily sludge is an associated pollutant in crude oil exploitation, transportation, processing and subsequent treatment, which contains a large number of toxic components, including heavy metals, aromatic hydrocarbons, aged crude oil, bacteria, etc. Therefore, the prediction of the environmental pollution risk level of heavy metals in oily sludge is of great scientific significance for the prevention of environmental pollution and the management of the ecological environment by the petroleum industry. A risk level prediction method of heavy metals in oily sludge is proposed derived from laser induced breakdown spectroscopy (LIBS) and hybrid variable selection in this work. LIBS spectra of 30 oily sludge samples were obtained, and then the corresponding Nemerow index was calculated. The effects of different data processing methods on LIBS spectra were explored. A filter-wrapper hybrid variable selection method called mutual information-variable importance measurement (MI-VIM) was proposed for LIBS spectra, in which mutual information (MI) underwent preliminary variable selection, and then, variable importance measurement (VIM) was employed for further variable screening. Finally, the random forest (RF) model was established on the basis of the optimized model parameters and selected feature variables to predict the environmental risk caused by heavy metals in oily sludge. 10-fold cross validation (CV) was used for the spectral preprocessing method, mutual information threshold, variable importance threshold and parameter optimization in the process of model construction. With the purpose of further verifying the prediction performance of MI-VIM, the results of the RF models based on different methods were compared, which show that the combination of LIBS and MI-VIM-RF is a feasible method for the prediction of the Nemerow risk index of heavy metals in oily sludge. Compared with the original LIBS spectra based RF model, the determination coefficient of the prediction set (Rp2) increased from 0.9564 to 0.9681, the root mean square error of the prediction set (RMSEP) reduced from 0.7920 to 0.6009, and the modeling time decreased from 103.1 to 16.7 s. In conclusion, LIBS combined with MI-VIM-RF is an effective method to predict the Nemerow index of oily sludge, and can provide some new ideas or strategies for environmental risk estimation and restoration of the petroleum industry.