Comparison of chemometric approaches for near-infrared spectroscopic data
Abstract
Near-infrared (NIR) spectroscopy technology has demonstrated great potential in the analysis of complex samples owing to its simplicity, rapidity and being nondestructive. In this investigation, we compare the abilities of six popular multivariate classification techniques, extreme learning machine (ELM), support vector machine (SVM), semi-supervised SVM (S3VM), twin support vector machine (TWSVM), regularized logistic regression (RLR) and minimax probability machine (MPM). Two datasets of near-infrared spectroscopy data are used for classification comparison and the 5000–10 000 cm−1 NIR spectral region is chosen. When there are sufficient labeled data in the dataset, experimental results on different spectral regions illustrate that all six methods perform very well for identifying the hardness of licorice seeds, while the four methods, ELM, SVM, TWSVM and S3VM, are also very powerful for recognizing the purity of maize seeds. When there are relatively few labeled data, the S3VM can improve the generalization by incorporating unlabeled data in training for licorice seed classification. Compared with traditional linear discriminant analysis, the six proposed methods achieve better performances in two NIR datasets. These results show that these methods are feasible and effective in the analysis of near-infrared spectral data. And we hope that the results can help further investigations of chemometrics and NIR spectroscopy data.