Fuxiang Wanga,
Chunguang Wang*a and
Shiyong Songb
aSchool of Mechanical and Electrical Engineering, Inner Mongolia Agricultural University, Hohhot, Inner Mongolia, China. E-mail: jdwcg@imau.edu.cn
bInner Mongolia Lvtao Detection Technology Company Limited, Hohhot, Inner Mongolia, China. E-mail: songshiyong_880606@126.com
First published on 13th April 2021
Fresh-cut potatoes are popular with consumers because of their healthiness, hygiene, and convenience. Currently, starch content is mainly detected using chemical methods, which are time-consuming and laborious. Moreover, these methods may cause some side effects in the human body. Therefore, suitable methods are required for the rapid and accurate detection of starch content. In this study, Zihuabai and Atlantic potatoes were used as experimental samples. The potatoes were sliced with stainless-steel blades, and images of these potatoes were obtained through hyperspectral imaging. The images were preprocessed using different methods. Competitive adaptive reweighed sampling (CARS) and the successive projection algorithm (SPA) were used to extract characteristic wavelengths. A partial least squares regression (PLSR) model was constructed to predict the starch content from the preprocessed full spectrum and the spectrum under the characteristic wavelength. The results indicate that the full spectrum model constructed through standard normal variable transformation (SNV) preprocessing had the best performance, with a correlation coefficient in the calibration set (Rc) value of 0.9020, a root mean square error of correction (RMSEC) of 2.06, and a residual prediction deviation (RPD) of 2.33. The characteristic wavelength-based multivariate scattering correction (MSC)-CARS-PLSR model exhibited better performance than the PLSR model constructed using the full spectrum, with an Rc value of 0.9276, RMSEC of 1.76, correlation coefficient in the prediction set (Rp) value of 0.9467, root mean square error of prediction of 1.63, and RPD of 2.95. The starch content in fresh-cut potatoes was visualized using the best model in combination with pseudocolor technology. The results indicate that hyperspectral imaging is effective for mapping the spatial distribution of starch content; thus, a solid theoretical basis is obtained for the grading and online monitoring of fresh-cut potato slices.
After potatoes are subjected to mechanical cutting, the structure of the epidermal cell wall is destroyed, the interlayer structure of the cells is changed, and the material of the cell wall degrades, which results in tissue softening.7 These fresh-cut potatoes consume their own nutrients to maintain their metabolic activity, which leads to a continuous decline in their appearance, color, and quality.8–10
Mechanical cutting destroys some starch cells. The metabolic activity of starch cells changes the physical properties and content of starch. Moreover, the distribution of starch content in potatoes is uneven, which leads to different starch contents in multiple slices from the same potato. The starch content affects the taste of potatoes; if the starch content is too high, the potatoes will be rough and hard, and if the starch content is too low, the potatoes will not be crisp. With the popularity of science-based diets in modern society, producers and consumers require knowledge on the starch content of potatoes to rationalize their price for potatoes and diet plans, respectively. Therefore, a method for detecting the starch content of fresh-cut potatoes quickly is required to determine the quality of potatoes and to provide a theoretical basis for quality monitoring and food grading.
Starch content is determined through acidolysis, enzymatic hydrolysis,11 and spectrophotometry.12 Although these methods accurately and quantitatively detect starch content, the sample preparation is complicated and the experimental process is time-consuming and laborious.13 Moreover, high-level operation skills are required for the experimental process. Therefore, a rapid method is required for detecting the starch content of fresh-cut potatoes.
Hyperspectral imaging (HSI) integrates traditional imaging and spectral techniques. It obtains spatial and spectral information simultaneously. Each pixel in the image contains a one-dimensional spectrum. Each pixel represents different information, which is beneficial for analyzing the content and distribution of components simultaneously, which in turn makes the entire detection process more efficient. HSI, which is a powerful analysis tool, has been widely used for studying aspects such as fruit maturity,14,15 crop variety,16–19 and meat quality.20,21 Currently, HSI is used for inspecting the quality of potatoes. Qiao22 and Jiang23 predicted the moisture and starch content of potatoes, respectively, by using hyperspectral equipment. Bai24 detected residual sulfur dioxide on the surface of fresh-cut potato chips. Rady25 detected the sugar content in potatoes, and Sun26 predicted the moisture content of purple sweet potato slices during the drying process by using HSI. Su27 used HSI to monitor the moisture content of potatoes during drying in real time. Anders28 predicted the starch, soluble sugar, and amino acid content of potatoes. Also, Xiao et al.29 employed HSI to predict the water content in fresh-cut potatoes, and the visualization of water in potatoes was achieved by modelling. Although some progress has been made in the research of potatoes by using hyperspectral equipment, there is no report on starch content prediction and visualization of fresh-cut potatoes.
Therefore, by using hyperspectral image information, we detected the starch content of fresh-cut potato chips quickly. The objectives of this study were as follows: (1) to acquire hyperspectral images of fresh-cut potato, (2) to determine the optimal wavelength by using competitive adaptive reweighed sampling (CARS) and the successive projection algorithm (SPA), (3) to construct a calibration model by using the full spectrum and optimal wavelength, (4) to improve the accuracy and robustness of the model by comparing different spectral preprocessing methods and their combinations, and (5) to observe the distribution of starch content in fresh-cut potato.
(1) |
Li, Liang, Xu, and Cao33 simplified and improved the original CARS method, which is based on Darwin's theory of evolution. The aforementioned improved CARS method was adopted in this study. In the present study, the subset with the smallest root mean square error (RMSE) was obtained by subtracting the wavelength points with a low regression coefficient from those with a high regression coefficient in the partial least squares regression (PLSR) model. The optimal variable subset was selected using cross-validation. A total of 50 Monte Carlo samples were obtained, and 10 runs of cross-validation were performed.
The SPA is a positive variable selection method that uses a simple projection operation to obtain a subset of variables with the smallest collinearity. Thus, the characteristic wavelength is extracted from the full band, most of the redundant information in the original spectrum matrix is eliminated, and the modeling conditions are improved. The basic principle of the SPA is to simply project a set of wavelength subsets into the vector space and select the wavelength subset with the least redundancy.34 The number of characteristic wavelengths may be set in advance. In this study, the minimum number of variables and the maximum number of variables selected in the SPA were 1 and 30, respectively.
PLSR is one of the most widely used linear regression algorithms35,36 and is suitable for constructing a prediction model. PLSR considers matrices of the spectral data (x) and starch content (y). In addition, it resolves the problem of the presence of many variables (including collinear variables) in the original data. PLSR analysis is used to transform the original data into several independent latent variables (LVs). The sum of the RMSE values is minimized to determine the optimal number of potential variables and thus prevent overfitting or underfitting of the model. In the present study, the maximum number of LVs was set as 10, and triple cross-validation was used to obtain the optimal number of LVs.
The extraction process of the characteristic wavelength and establishment of the PLSR model were performed using Matlab 2014a (MathWorks, Natick, MA, USA). The PLSR codes used are contained in the libPLS_1.98 toolbox.
Fig. 1 The raw and pretreated spectral curves of all potato samples via different methods: (a) raw; (b) SNV; (c) SG; (d) MSC; (e) SG-SNV; and (f) SG-MSC. |
The visible-NIR spectrum of the samples depends on the vibration of molecular bonds, such as C–H, O–H, and N–H. Therefore, this spectrum can be used to predict the quality attributes of samples quantitatively.38 As displayed in Fig. 1(a), the visible spectrum curve is divided into two parts due to the use of two varieties of potatoes. A large absorption peak is observed at 410 nm. This peak may be attributed to the absorption of carbohydrates.39 Moreover, a small absorption peak is observed at approximately 450 nm. This peak is considered to be caused by a carotenoid.40 The two varieties of potatoes used in this study were yellow meat varieties with high carotenoid content. Clear valleys are observed around 980 nm, which may be due to the stretching of the O–H second overtone in water, because the water content in potatoes is over 70%.41
Fig. 2 Wavelength selection results on the pretreated spectral data via the CARS method. (a) SNV; (b) SG; (c) MSC; (d) SG-SNV;(e) SG-MSC. |
Fig. 3 Wavelength selection results on the pretreated spectral data via the SPA method. (a) SNV; (b) SG; (c) MSC; (d) SG-SNV;(e) SG-MSC. |
Pre-processing technique | Method | Number of wavelengths | Wavelength |
---|---|---|---|
SNV | CARS | 31 | 382, 389, 390, 392, 393, 395, 403, 406, 419, 425, 426, 504, 535, 831, 834, 836, 843, 845, 846, 899, 902, 903, 912, 914, 915, 927, 930, 932, 945, 972, 990 nm |
SPA | 21 | 382, 386, 389, 393, 397, 399, 403, 417, 502, 677, 741, 788, 833, 846, 887, 908, 920, 927, 949, 982, 1003 nm | |
MSC | CARS | 13 | 386, 389, 392, 395, 403, 406, 425, 485, 864, 914, 915, 927, 964 nm |
SPA | 16 | 386, 389, 390, 392, 395, 399, 401, 408, 414, 473, 572, 717, 818, 870, 915, 927 nm | |
SG | CARS | 20 | 395, 404, 407, 426, 470, 471, 484, 485, 843, 845, 903, 914, 915, 929, 932, 949, 972, 973, 976, 993 nm |
SPA | 16 | 382, 395, 407, 419, 477, 505, 736, 911, 932, 954, 981, 984, 988, 993, 996, 1000 nm | |
SG-SNV | CARS | 14 | 395, 404, 407, 424, 426, 504, 843, 845, 912, 914, 915, 932, 973, 979 nm |
SPA | 11 | 382, 397, 407, 418, 481, 567, 817, 827, 851, 917, 932 nm | |
SG-MSC | CARS | 28 | 382, 389, 395, 404, 406, 407, 421, 484, 505, 666, 667, 668, 670, 842, 843, 845, 846, 900, 902, 903, 905, 906, 912, 914, 915, 929, 932, 933 nm |
SPA | 11 | 382, 392, 403, 414, 574, 695, 741, 769, 839, 902, 994 nm |
As presented in Fig. 2 and 3 and Table 1, the number of characteristic wavelengths selected by the SPA was lower than that selected using CARS. The SPA outperformed CARS in terms of the screening of characteristic variables. The screening ability of the SPA characteristic variables varied according to the pretreatment method employed. After variable selection, the number of spectral variables in the spectrum reduced by 95.09%, 96.26%, 96.26%, 97.43%, and 97.43% when the SNV, MSC, SG-SNV, and SG-MSC pretreatment methods were employed, respectively. These results indicate the effectiveness of the SPA in dimension reduction. After wavelength selection, spectral reflection values at specific wavelengths were extracted, and a simplified prediction model was constructed to replace the full spectrum as the input for the subsequent regression prediction model.
Dataset | Number of wavelengths | Range (g kg−1) | Mean (g kg−1) | SD (g kg−1) |
---|---|---|---|---|
a N: number of samples; SD: standard deviation. | ||||
Purple and white | 62 | 47.7–136 | 100.1 | 2.68 |
Atlantic | 34 | 172–228 | 188.3 | 1.29 |
Calibration set | 64 | 47.7–228 | 125.6 | 4.75 |
Prediction set | 32 | 50.4–202 | 128.9 | 4.81 |
Pretreatment method | Parameter | Calibration set | Prediction set | |||
---|---|---|---|---|---|---|
Rc | RMSEC | Rp | RMSEP | RPD | ||
a N: number of spectral variables; LVs: number of latent variables; Rc: correlation coefficient in calibration; RMSEC: root mean square errors in calibration; Rp: correlation coefficient in prediction; RMSEP: root mean square errors in prediction; RPD: residual predictive deviation in prediction set. | ||||||
SNV | N = 428, LVs = 7 | 0.9020 | 2.06 | 0.9069 | 2.06 | 2.33 |
MSC | N = 428, LVs = 3 | 0.8641 | 2.38 | 0.8572 | 2.44 | 1.97 |
SG | N = 428, LVs = 6 | 0.8689 | 2.46 | 0.8796 | 2.26 | 2.13 |
SG-SNV | N = 428, LVs = 3 | 0.8685 | 2.36 | 0.8624 | 2.4 | 2.00 |
SG-MSC | N = 428, LVs = 3 | 0.8651 | 2.37 | 0.861 | 2.41 | 2.00 |
As presented in Table 3, different preprocessing methods had different effects on the performance of the model. Only the full spectrum model constructed for MSC after pretreatment had an RPD of less than 2. The models constructed using other pretreatment methods had an RPD of more than 2. The aforementioned results indicate that the model constructed using the full spectrum can suitably predict the starch content. The full spectrum model constructed after SNV pretreatment exhibited the best performance, with an Rc value of 0.9020, RMSEC of 2.06, Rp value of 0.9069, RMSEP of 2.06, and RPD of 2.33. However, 428 spectral bands were predicted with the full spectrum model, which is not conducive to rapid detection. Therefore, we used the characteristic wavelength model.
Pretreatment method | LVs | Calibration set | Prediction set | ||||
---|---|---|---|---|---|---|---|
Rc | RMSEC | Rp | RMSEP | RPD | |||
a N: number of spectral variables; LVs: number of latent variables; Rc: correlation coefficient in calibration; RMSEC: root mean square errors in calibration; Rp: correlation coefficient in prediction; RMSEP: root mean square errors in prediction; RPD: residual predictive deviation in prediction set. | |||||||
SNV | CARS | 6 | 0.9186 | 1.87 | 0.9258 | 1.81 | 2.66 |
SPA | 6 | 0.8803 | 2.25 | 0.9229 | 1.82 | 2.64 | |
MSC | CARS | 9 | 0.9276 | 1.76 | 0.9467 | 1.63 | 2.95 |
SPA | 10 | 0.8905 | 2.17 | 0.9272 | 1.82 | 2.64 | |
SG | CARS | 8 | 0.9076 | 1.98 | 0.9242 | 1.81 | 2.66 |
SPA | 10 | 0.8857 | 2.21 | 0.892 | 2.17 | 2.22 | |
SG-SNV | CARS | 7 | 0.9250 | 1.79 | 0.9259 | 1.80 | 2.67 |
SPA | 5 | 0.8807 | 2.24 | 0.8546 | 2.49 | 1.93 | |
SG-MSC | CARS | 7 | 0.9020 | 2.06 | 0.9069 | 2.06 | 2.33 |
SPA | 6 | 0.8637 | 2.4 | 0.8970 | 2.18 | 2.21 |
As presented in Table 4, the PLSR model constructed after characteristic wavelength extraction exhibited high prediction accuracy. Only the RPD of the SG-SNV-PLSR model was less than 2. The RPD of the other models was greater than 2. Furthermore, the RPD values of the SNV-CARS-PLSR, SNV-SPA-CARS, MSC-CARS-PLSR, MSC-SPA-PLSR, SG-CARS-PLSR, and SG-SNV-CARS-PLSR models were greater than 2.5. This result indicates that the characteristic wavelength extraction algorithm selected in this study can filter out useful information from the entire spectrum and eliminate redundant information. The MSC-CARS-PLSR model exhibited the best performance, with an Rc value of 0.9276, RMSEC of 1.76, Rp value of 0.9467, RMSEP of 1.63, and RPD of 2.95.
As presented in Table 4, the models constructed with the same preprocessing methods but different characteristic wavelength extraction algorithms exhibited different performances. The model constructed using CARS outperformed the model constructed after SPA characteristic wavelength extraction, which indicates that the characteristic band screened by CARS was more accurate than that screened by the SPA in the experiment.
The model constructed using the full spectrum indicates that SNV is superior to MSC, SG, SG-SNV, and SG-MSC for identifying the starch content of potato slices quickly. Although suitable values of Rc (0.9020), RMSEC (2.06), Rp (0.9069), RMSEP (2.06), and RPD (2.33) were obtained, the full spectrum model is unsuitable for practical application due to the time-consuming and laborious modeling process;42 therefore, the characteristic wavelength model was established. We found that under the same pretreatment conditions, the model based on the characteristic wavelength exhibited superior performance to the model based on the full spectrum, which indicates that developing a model based on the characteristic wavelength is ideal. After constructing the models, we found that although the number of bands filtered through CARS was more than that filtered through the SPA, the number of selected wavelengths by CARS was still greatly reduced when compared with full spectra. However, the performance of the PLSR model constructed using CARS was superior to that constructed using the SPA under any pretreatment method, which indicates that the correlation between the wavelengths extracted through CARS and the starch content is higher than that between the wavelength extracted through SPA and the starch content. Among all the models investigated in this study, the MSC-CARS-PLSR model exhibited the best performance, with an Rc value of 0.9276, RMSEC of 1.76, Rp value of 0.9467, RMSEP of 1.63, RPD of 2.95, and RPD close to 3, which indicates that this model is ideal for starch content prediction.
Currently, researchers mainly use NIR spectroscopy to predict the starch content of potatoes. But few people use hyperspectral technology to study the starch content of potatoes. Wei et al.43 used hyperspectral equipment and a random frog PLSR model to detect the starch content of potatoes. This model had an Rc2 value of 0.8514, RMSEC of 0.3259, Rp2 value of 0.8348, and RMSEP of 0.2906. The best model in our study (MSC-CARS-PLSR) had an Rc2 value of 0.8604, Rp2 value of 0.8962, RMSEC of 1.76, RMSEP of 1.63, and RPD of 2.95. Thus, the best model in the present study was similar to the model of Wei et al. By using hyperspectral equipment, Su et al.44 predicted the starch content of potatoes and sweet potatoes with the FMCIA-Es-PLSR model. This model had an Rp2 value of 0.963 and an RMSEP of 0.023. The model of Su et al. had a similar detection performance to the best model in the present study. However, the research of the above scholars is aimed at the whole potato, not fresh-cut potato slices. One study by Xiao et al.29 reported the use of HSI for the prediction of water content in fresh-cut potatoes, and the optimal model was established on the full wavelengths, instead of characteristic wavelengths. At present, there is no report on starch content detection in fresh-cut potatoes.
Fig. 4 Distribution maps of the starch content in fresh-cut potatoes: (a) 11.8 g kg−1; (b) 13.5 g kg−1; (c) 19.0 g kg−1; and (d) 21.6 g kg−1. |
Through the distribution map, we can clearly observe the difference of starch content among different fresh-cut potato samples. Different colors displayed on the distribution map represent different starch contents and correspond to different spectral characteristics of pixels. The difference of total starch distribution in different fresh-cut potato samples can be seen in the generated distribution map. Therefore, the content and spatial distribution of starch in fresh-cut potatoes can be predicted by hyperspectral imaging combined with a distribution map, which provides a rapid method for the study of internal component content and storage and preservation methods of fresh-cut potatoes.
This journal is © The Royal Society of Chemistry 2021 |