Molecular spectroscopic wavelength selection using combined interval partial least squares and correlation coefficient optimization
Abstract
Wavelength selection plays a vital role in employing near-infrared spectroscopy for analyzing samples. Existing wavelength selection algorithms present certain drawbacks that can be mitigated by combining algorithms. In this study, we employed a combination of algorithms to quantitatively analyze corn components using near-infrared spectroscopy data. We combined Savitzky–Golay (SG) preprocessing, the correlation coefficient (CC) method, and synergy interval partial least squares (siPLS) algorithms to propose CC-SiPLS and CC-SG-SiPLS methods. The results of applying full-spectrum partial least squares (PLS), correlation coefficient partial least squares (CC-PLS), synergy interval partial least squares (SiPLS), CC-SiPLS, and CC-SG-SiPLS methods to the near-infrared spectral wavelength selection were compared. The results showed that the mathematical models established from the spectral data after wavelength selection using CC, SiPLS, CC-SiPLS, and CC-SG-SiPLS were simplified, and the numbers of wavelengths were 33.6% (CC) and 14.3% (SiPLS), 11.1% (CC-SiPLS), and 6.3% (CC-SG-SiPLS) of that using the full spectrum. The accuracy of predicting the oil content of corn was improved compared to PLS. The CC-SG-SIPLS wavelength selection algorithm combined with the preprocessing method reduced the number of wavelengths from 700 to 44 and the model complexity was the most simplified. The root mean square error in prediction (RMSEP) and relative percent deviation (RPD) were 0.0552 and 2.5706, respectively, demonstrating adequate prediction accuracy. This result indicates that a combination strategy provides an effective way for multiple waveband selection, and that CC-SG-SiPLS can provide high analysis accuracy using molecular absorption bands composed of several wavelength intervals. Thus, this algorithm is an effective and robust wavelength selection strategy.