A new ensemble modeling method for multivariate calibration of near infrared spectra
Abstract
Ensemble modeling has gained increasing attention for improving the performance of quantitative models in near infrared (NIR) spectral analysis. Based on Monte Carlo (MC) resampling, least absolute shrinkage and selection operator (LASSO) and partial least squares (PLS), a new ensemble strategy named MC-LASSO-PLS is proposed for NIR spectral multivariate calibration. In this method, the training subsets for building the sub-models are generated by sampling from both samples and variables to ensure the diversity of the models. In detail, a certain number of samples as sample subsets are randomly selected from training set. Then, LASSO is used to shrink the variables of the sample subset to form the training subset, which is used to build the PLS sub-model. This process is repeated N times and N sub-models are obtained. Finally, the predictions of these sub-models are used to produce the final prediction by simple average. The prediction ability of the proposed method was compared with those of LASSO-PLS, MC-PLS and PLS models on the NIR spectra of corn, blend oil and orange juice samples. The superiority of MC-LASSO-PLS in prediction ability is demonstrated.