Discretized butterfly optimization algorithm for variable selection in the rapid determination of cholesterol by near-infrared spectroscopy
Abstract
The blood cholesterol level is strongly associated with cardiovascular disease. It is necessary to develop a rapid method to determine the cholesterol concentration of blood. In this study, a discretized butterfly optimization algorithm-partial least squares (BOA-PLS) method combined with near-infrared (NIR) spectroscopy is firstly proposed for rapid determination of the cholesterol concentration in blood. In discretized BOA, the butterfly vector is described by 1 or 0, which represents whether the variable is selected or not, respectively. In the optimization process, four transfer functions, i.e., arctangent, V-shaped, improved arctangent (I-atan) and improved V-shaped (I-V), are introduced and compared for discretization of the butterfly position. The partial least squares (PLS) model is established between the selected NIR variables and cholesterol concentrations. The iteration number, transfer functions and the performance of butterflies are investigated. The proposed method is compared with full-spectrum PLS, multiplicative scatter correction-PLS (MSC-PLS), max–min scaling-PLS (MMS-PLS), MSC-MMS-PLS, uninformative variable elimination-PLS (UVE-PLS), Monte Carlo uninformative variable elimination-PLS (MCUVE-PLS) and randomization test-PLS (RT-PLS). Results show that the I-V function is the best transfer function for discretization. Both preprocessing and variable selection can improve the prediction performance of PLS. Variable selection methods based on BOA are better than those based on statistics. Furthermore, I-V-BOA-PLS has the highest predictive accuracy among the seven variable selection methods. MSC-MMS can further improve the prediction ability of I-V-BOA-PLS. Therefore, BOA-PLS combined with NIR spectroscopy is promising for the rapid determination of cholesterol concentration in blood.