Evaluation of chemometric methodologies for the classification of Coffea canephora cultivars via FT-NIR spectroscopy and direct sample analysis
Abstract
This work presents a study of chemometric tools for the classification of Coffea canephora (whole beans) cultivars via in situ direct sample analysis using near-infrared spectroscopy (NIR). The sample pretreatment was carried out by washing and sun-drying the coffee beans and then selecting only non-defective coffee beans. A near-infrared reflectance accessory was used for direct sample analysis. The NIR spectra were baseline-aligned, mean centered, and smoothed using Savitzky–Golay polynomial order zero with 15 points window mean centered data, and the first derivative was calculated. The principal component analysis results demonstrated the formation of different clusters that were further related to the presence of lipids, water, caffeine, chlorogenic acids, sugars, proteins, and carbohydrates (based on the plot of loadings versus wavenumber). Partial least squares discriminant analysis (PLS-DA), soft independent modeling of class analogy (SIMCA), self-organizing map (SOM), and support vector machine (SVM) were applied for the classification of different cultivars of coffee. The best results were achieved using SOM that provided 100.0% correct identification of the validation samples, whereas PLS-DA, SIMCA, SVM (3 PCs), and SVM (4 PCs) provided 82.9, 99.6, 82.9, and 99.6%, respectively. The performance of the methodologies was evaluated using the Matthews correlation coefficient, which confirmed that SOM presented the best results for all the classes.