Yao Liua,
Fu Qiaob,
Shuwen Wang*a,
Runtao Wanga and
Lele Xuc
aSchool of Electronic and Electrical Engineering, Lingnan Normal University, 29 Cunjin Road, Chikan District, Zhanjiang 524048, Guangdong Province, China. E-mail: wangshuwen@lingnan.edu.cn
bSchool of Computer Science and Intelligence Education, Lingnan Normal University, Zhanjiang 524048, China
cSchool of Life Science and Technology, Lingnan Normal University, Zhanjiang 524048, China
First published on 15th November 2021
Human beings are confronted with a serious health hazard when ingesting Ruditapes philippinarum contaminated with heavy metals, and thus it is significantly necessary to identify heavy metal contaminated Ruditapes philippinarum. This study investigates the feasibility of hyperspectral imaging to identify heavy metal contamination in Ruditapes philippinarum rapidly. To reduce the effects of noise, four different spectral pretreatments were performed on the original spectra. To select characteristic wavebands for identification, four waveband selection algorithms based on neighbourhood rough set theory were proposed, namely, mutual information, consistency measure, dependency measure, and variable precision. The selected wavebands were input to an extreme learning machine to construct classification models. The results demonstrated that multiplicative scatter correction pretreatment was suitable for Ruditapes philippinarum hyperspectral imaging datasets. The identification models exhibited satisfactory performance to distinguish healthy Ruditapes philippinarum from those contaminated by both individual and multiple heavy metals. The identification results of Cd and Pb contaminated samples were more accurate than those of Cu and Zn contaminated samples. When the number of training samples decreased the identification performance decreased, but not significantly. The results showed that combined with pattern recognition analysis hyperspectral imaging technology can be used to distinguish healthy Ruditapes philippinarum samples from those contaminated by heavy metals, even with only a small number of training samples. This model is suitable for applications in analysing many shellfish rapidly and non-destructively.
At present, standard methods to detect heavy metals include flame atomic absorption spectrometry, graphite furnace atomic absorption spectrometry, and atomic fluorescence spectrometry,4,5 which can precisely measure the types and concentrations of the heavy metals. However, these methods do not facilitate rapid food safety inspection because of their high cost, intensive labour needs, complex preparation requirements, and time-consuming nature.6 Thus, a fast, low-cost, and reliable method is required to detect heavy metal contamination.
Although the near-infrared spectroscopy (NIRS) technique facilitates a fast and low-cost qualitative or quantitative determination of heavy metal levels, and has been applied extensively in quality control, the technique requires sample pretreatment and the use of chemical reagents. The NIRS reflects the vibrations of chemical bonds at different wavelengths. When samples are illuminated by a spectrophotometer, different chemical bond types within the organisms absorb or emit light at different wavelengths.7 Although heavy metals usually display no infrared activity and almost no characteristic peaks, the NIRS technique can indirectly detect heavy metals. The heavy metals in contaminated Ruditapes philippinarum induce the synthesis of detoxification proteins and inhibit antioxidant enzymes. These effects modify the structures and concentrations of relevant biological molecules, and the molecular vibration information in the infrared spectrum is obtained from these changes.8 Therefore, it can be applied to detect heavy metals indirectly about the infrared spectral information acquired by the interactions between heavy metal ions and enzymes. With the development of spectral technology, hyperspectral imaging (HSI) combines spectral analysis with traditional computer imaging with the use of spatial and spectral information to create a three-dimensional dataset that contains numerous images of the sample at different wavelengths. Hence, HSI has the potential to become a powerful technique for food quality and safety evaluation.9,10
HSI technology is suitable for quality control. Recently, it has been applied to agricultural products such as meat products, fruits, vegetables, dairy products, and cereals.11,12 For aquatic products, the HSI technology has been used to evaluate the safety and quality of various products, including the detection of fish fillet substitution and mislabelling, automated sorting for size and sex of sea bass, and automatic evaluation of freshness of gilthead sea bream.13,14 To date, few studies have investigated NIRS for the identification of heavy metal contaminated shellfish. Hu et al. used mid-infrared spectroscopy to estimate Cu content in Tegillarca granosa.3 Chen et al. applied an infrared spectroscopy approach to identify healthy Tegillarca granosa from samples contaminated by unspecified heavy metals.8 To identify contaminated Tegillarca granosa samples, competitive adaptive reweighted sampling methods, genetic algorithms, and successive projection algorithms had been used.4 In these studies, the samples were freeze-dried and ground into a powder prior to collecting the spectral. These studies indicate that NIRS is a feasible method for identification of heavy metal contaminated shellfish. However, there is little documented information on the use of HSI to identify heavy metal contaminated Ruditapes philippinarum, or shellfish more generally.
In this study, the HSI technology was developed to rapidly detect heavy metal contaminated Ruditapes philippinarum samples. Healthy Ruditapes philippinarum samples were manually contaminated by Zn, Cu, Cd, and Pb; then the ability of HSI technology to identify heavy metals was evaluated. To the best of our knowledge, this is the first study to identify Ruditapes philippinarum contaminated with heavy metals with the use of HSI technology and chemometrics. Compared with traditional laboratory-based techniques, the HSI technology neither requires chemical reagents nor pollutes the environment. In addition, it has been shown to improve efficiency by saving time, reducing labour needs, and removing the need for sample pretreatment.15
Unfortunately, noise in the sample information is unavoidable in the spectra obtained from the HSI technology. Thus, before the setup of the models, the spectra need to be pretreated to reduce noise and physical factors, such as particle size and path length. Additionally, in the HSI datasets, there is substantial irrelevant or redundant information. To increase the accuracy and the speed of the models, it is crucial to extract effective wavebands from the available range of wavebands.
The objectives of this study were to (1) investigate the effects of standard normal variate (SNV), multiplicative scatter correction (MSC), Savitzky–Golay smoothing (SG), and first derivative (DER) four different spectral pretreatments, and hence, identify the optimal pretreatment method; (2) extract characteristic wavebands that identify Ruditapes philippinarum samples contaminated with heavy metals with the application of the neighbourhood rough set (NRS) theory; (3) build extreme learning machine (ELM) classification models with selected wavebands to identify heavy metal contaminated Ruditapes philippinarum samples; (4) analyse the influence of neighbourhood size on the number of selected wavebands and classification accuracy; and (5) research the effect on identification results to reduce the number of training samples. The main aim of the study was to assess the HSI technique as a qualitative tool to rapidly detect heavy metal contamination in Ruditapes philippinarum, and improve the food safety.
During the experiment, seawater was continuously aerated and filtered through an aquarium pump that was connected to a PVC box (55 × 10.2 × 46 cm) containing filtration materials. The Ruditapes philippinarum were fed daily with spirulina algae powder, and the filter was shut off for four hours during the feeding. Fresh seawater with heavy metals was added to the tanks to compensate for the pillage and evaporation. All Ruditapes philippinarum samples were reared for 10 d to allow heavy metal accumulation. Some Ruditapes philippinarum died during this period. Those exposed to Cd suffered the highest mortality, whereas those exposed to Zn suffered the lowest. After the rearing period, 60 contaminated samples per heavy metal tank (Cu, Cd, Pb, and Zn), and 120 healthy (uncontaminated) Ruditapes philippinarum samples were collected for spectral measurements.
Ruditapes philippinarum were removed from the seawater tanks, dried with a towel and the shells were opened. An individual Ruditapes philippinarum, now in half of its shell, was placed on the platform. The HSI information of each sample was collected with the use of the HSI system and HyperScanner2.0 data acquisition software. To reduce external interference, the acquisition process was completed in a dark room. It was shown in Fig. 2 about the hyperspectral images for the Ruditapes philippinarum control group samples, and those contaminated with heavy metals (Cd, Cu, Pb, and Zn). With the SRAnal710 software, a standard calibration procedure was performed on the acquired hyperspectral images, including spectral calibration, radiometric calibration, and reflectance normalisation.
For hyperspectral images of Ruditapes philippinarum samples, the region of interest (ROI) was a rectangle area (1000 pixels) around the centre of a sample. The ROI area was controlled by ENVI 4.8 software (Research Systems Inc., USA). The pixel spectra within each ROI were extracted and averaged to represent the original sample spectrum. The spectra of all the Ruditapes philippinarum samples, from the five groups (healthy, contaminated by Zn, Cd, Pb, and Cu), were presented in Fig. 3. Because all five spectra were related to the same species, the spectra of samples were similar. To fully observe the differences in the spectra among the five groups of Ruditapes philippinarum, the average spectral curve of each group was shown in Fig. 4. Similar trends were observed in the average spectral curves; however, there were some differences in the spectral reflectance values.
In the region of 700 to 1000 nm, samples contaminated by Cd had the lowest average spectral reflectance, followed by the healthy samples. The spectral curves of samples contaminated by Cu, Zn, and Pb serious overlapped and were above the spectra of the healthy samples and samples contaminated by Cd. However, in the region of 400 to 650 nm, the spectral reflectance of samples contaminated by Zn was lower than that of the healthy samples. The spectra of these biological samples were extremely complex due to the light absorption and scattering processes associated with their microstructure. It was difficult to identify the obvious visual differences in the spectra caused by heavy metal contamination. Therefore, to ensure the distinction among the data, chemometrics and pattern recognition methods were employed. In our study, different pretreatment methods were employed to accentuate the differences between samples and amplify changes in the spectra. Waveband selection methods of the NRS theory were utilized to solve the problem that it was difficult to select the useful wavebands, caused by the overlapping spectra.
The collected HSI data usually contained noise. Therefore, before the setup of classification models, the spectra were supposed to be pretreated and the noise was reduced. In this study, four pretreatment methods (MSC, SG, SNV, and DER) were applied. The SG was an effective tool to reduce the high-frequency noise component, and an SG method using the third-order polynomial and a 21-point window was employed.17 The MSC coped with additive and multiplicative effects of light scattering.18 The DER identified overlapping spectrum peaks and adjusted the baseline effect.19 For the SNV, the standard deviations and mean spectrum were calculated and the value of spectrum then was recalculated.20 The spectra obtained by pretreatments were used for further data analyses. It was found that spectral pretreatments were effective to improve the accuracy of classification models.
A dataset is an information system IS = (U, C ∪ D), in which C and D are the condition and decision features, respectively, with C ∩ D = ϕ. U = {x1, x2, ⋯, xn} is the universe set. For HSI datasets, the class labels of s samples are denoted by D = {d1, d2, ⋯, ds}, where di = k (i = 1, 2, ⋯, s) indicates the sample i belongs to class k (k = 1, 2, ⋯, c). S = {s1, s2, ⋯, sn} is a set of samples, and W = {w1, w2, ⋯, wm} is a set of wavebands. C = {cij|i = 1, 2, ⋯, n; j = 1, 2, ⋯, m} is a hyperspectral-waveband matrix, in which cij is the expression of waveband j in sample i.
Definition 1. Given a neighbourhood relation N over U and a set of samples, a neighborhood decision system (NDS) is defined as (U, C ∪ D, N).
By different evaluation criteria, different waveband algorithms were generated. It was employed about the consistency measure (CON-NRS),22 dependency measure (DM-NRS),23 mutual information (MI-NRS)24 and variable precision (VP-NRS)25 with the purpose to search wavebands which identify contaminated Ruditapes philippinarum.
To adopt neighbourhood dependency, the DM-NRS algorithm was introduced with the following framework.
Definition 2. Given an NDS (U, C ∪ D, N), B ⊆ C, the neighbourhood dependency of B with respect to D is defined as
(1) |
In the formula, | | is the cardinality of a subset. The (where ) is defined as the lower approximations of D.23
Based on the DM-NRS algorithm, the VP-NRS algorithm avoided high sensitivity of calculation results by allowing a certain degree of misclassification. With the use of a precision coefficient β, the goal was achieved to divide the samples with a particular condition class as the same decision class. Based on a previous study,25 the value of the precision coefficient β was set as 0.7.
Definition 3. Given any subset X ⊆ U in an NDS (U, C ∪ D, N), the variable precision lower and upper approximations of X are defined as
(2) |
(3) |
To estimate the significance of the subsets, the dependency measure in the DM-NRS algorithm was also used in the VP-NRS algorithm.
In the CON-NRS algorithm, the samples in the positive region were calculated, while the samples of the majority class in boundary regions were calculated. The ratio of consistent samples to the entire set of samples was denoted as consistency.22
Definition 4. Given an NDS (U, C ∪ D, N), the neighbourhood of xi is δ(xi) and the class probability of class ωj is P(ωj|δ(xi)) (j = 1, 2, ⋯, c). If , the neighbourhood decision of xi is denoted as ND(xi) = ωl, where P(ωj|δ(xi)) = nj/K, K and nj are the number of samples in the neighbourhood and δ(xi) with a decision ωj.
For misclassified samples, the definition of 0–1 loss function is
(4) |
Definition 5. The neighbourhood decision error rate (NDER)22 is defined as
(5) |
For convenience, the neighbourhood recognition rate (NRR) was used to denote 1-NDER.22
In the MI-NRS algorithm, mutual information (MI) was used as the evaluation standard to estimate the correlation between wavebands and class labels.
Definition 6. Given S, R ⊆ C are two different subsets, the neighbourhood MI (NMI) of R and S is defined as
(6) |
When D is the class labels of samples, the NMI of D and R is
(7) |
The NMI(R;D) estimates the amount of information which the waveband subset R contains the decision D.24
Definition 7. Given an NDS (U, C ∪ D, N), B ⊂ C, a ∈ C − B, the significance of waveband a is defined as
(1) For the DM-NRS and VP-NRS algorithms,
SIG(a,B,D) = γB∪a(D) − γB(D). | (8) |
(2) For the CON-NRS algorithm,
SIG(a,B,D) = NRRB∪a(D) − NRRB(D). | (9) |
(3) For the MI-NRS algorithm,
SIG(a,B,D) = NMI(B ∪ {a};D) − NMI(B;D). | (10) |
The forward-greedy search method26 was described as follows: first, a NDS was established. Then, beginning with a null subset, new wavebands were continuously added to the subset. The waveband with maximum significance was selected first. Finally, an ELM classifier was used to evaluate the classification performance of the subsets.
The neighbourhood δ played an important role in the NRS. When the value of the neighbourhood was adjusted, the significance of wavebands varied accordingly. Optimum value of the neighbourhood depended on the research objects and was determined experimentally. Based on previous studies22,24,25 and the results of pre-experiments, the value for neighbourhood δ was set to vary between 0.01 and 0.50 and the increment was 0.01 in this experiment.
The idea behind an ELM classifier was presented as follows.28
For N arbitrary distinct samples {(xi,ti)}, i = 1, 2, ⋯, N, a standard ELM, with n inputs, m outputs, L hidden neurons and activation function g(x) is mathematically modelled by
(11) |
The ELM can be employed to reliably approximate these N samples with zero error, meaning that , and there exist {βi, ωi, bi} such that
(12) |
The N equations above can be written compactly as
Hβ = T, | (13) |
For fixed arbitrary input weights ωi and the hidden layer bias bi, training a ELM equals to find a least squares error solution of the linear system Hβ = T. The unique smallest norm least-squares solution of the linear system is
= H†T, | (14) |
The procedure of an ELM was summarized as follows.
Step 1: assign arbitrary input weights ωi and biases bi, i = 1, 2, ⋯, L.
Step 2: calculate the hidden layer output matrix H.
Step 3: calculate the output weights. = H†T.
In this study, the sigmoidal function g(x) = 1/[1 + exp(−x)] was chosen as the activation function. According to the experiments, the number of neurons L was chosen as the optimal value.
In this study, DM-NRS, VP-NRS, CON-NRS, and MI-NRS waveband selection methods were used to select the characteristic spectral variables. Then, an ELM classification model was built based on the selected wavebands as the input variables. There were five datasets, each of which contained 60 contaminated Ruditapes philippinarum samples per heavy metal (Zn, Pb, Cu, and Cd) and 60 healthy samples. For the contaminated and healthy samples, 45 samples were served as a training dataset, and 15 samples as a testing dataset. The datasets containing healthy samples and those contaminated with Cu were used to investigate the effects of various pretreatments.
Owing to the random selection of samples for the training and testing datasets, the ELM model was built 30 times to reduce random errors. The classification performance was evaluated with the maximum and average values of the classification accuracy for 30 times. The classification results of the healthy samples and Cu contaminated samples were illustrated in Fig. 5–8, respectively, in the SG, MSC, SNV, and DER pretreatment methods. The subgraphs in the first row showed the classification accuracy, including the maximum classification accuracy (MCA) and average classification accuracy (ACA). The subgraphs in the second row represented the number of selected wavebands. The abscissas of the graphs were the size of neighbourhood δ. The number of selected wavebands changed with the size of neighbourhood. To achieve dimension reduction, the optimal waveband subset was selected from subsets with less than 20 wavebands. Waveband subsets with a top five ACA were selected to compare the performance of the MI-NRS, VP-NRS, CON-NRS, and DM-NRS waveband selection algorithms.
Fig. 5 Identification results of healthy samples and samples contaminated by Cu in MSC pretreatment method. |
Fig. 6 Identification results of healthy samples and samples contaminated by Cu in DER pretreatment method. |
Fig. 7 Identification results of healthy samples and samples contaminated by Cu in SNV pretreatment method. |
Fig. 8 Identification results of healthy samples and samples contaminated by Cu in SG pretreatment method. |
In Fig. 5–8, the ACA of identifying healthy samples and Cu contaminated samples exceed 85%, and the MCA exceed 93%. In only a few cases, performance of the MSC pretreatment method was slightly inferior to that of other pretreatment methods. But a comparison of the figures clearly proved that the best overall classification result was obtained from the MSC pretreatment. To identify samples contaminated with the other three heavy metals, the MSC pretreatment method was superior to the other pretreatment methods overall. Owing to space limitations, detailed experimental results were not provided here. Throughout the subsequent parts of our study, the MSC was used as the pretreatment method.
Among the waveband selection algorithms, the CON-NRS algorithm usually selected fewer wavebands than other algorithms; however, its classification performance was similar to that of other algorithms. As for the DER pretreatment method, the classification performance of a 12-waveband subset selected by the CON-NRS algorithm was as effective as that of 17 and 19 wavebands selected by the VP-NRS and MI-NRS algorithms, respectively.
The number of wavebands changed with the size of neighbourhood δ, as shown in Fig. 9(a). With an increase in δ, the number of wavebands did not show an increasing or decreasing trend but fluctuate within the range of 2 to 11. In some cases, despite the neighbourhood δ values differing, the number of selected wavebands was consistent. By applying the CON-NRS waveband selection algorithm, the dimensionality of the hyperspectral data was reduced from the original 450 wavebands to less than 10 wavebands, providing significant dimensionality reduction.
Fig. 9 (a) The variation of the number of selected wavebands with neighbourhood δ. (b) The variation of MCA and ACA with neighbourhood δ. |
For the task to classify healthy samples and samples contaminated by other heavy metals, the trend of the number of wavebands selected by the other waveband selection algorithms was similar to this example. To determine the appropriate value of neighbourhood δ, the classification performance was also required as it was insufficient to only rely on the number of wavebands.
For classifying healthy samples and Cd contaminated samples, an ELM classification model was built based on the selected wavebands. Fig. 9(b) showed the variation of the MCA and ACA with neighbourhood δ. The MCA and ACA did not change linearly with the increase in neighbourhood δ. When the neighbourhood was 0.18, the best identification results were achieved. In this case, the MCA value was 100%, the ACA value 96.89%, and the number of wavebands 8.
In combination with Fig. 9(a), the classification accuracy did not increase with an increase in the number of wavebands. For example, when δ was 0.26, the ACA was 93.56% with 5 wavebands. However, when δ was 0.25, the ACA was only 90.44% with 7 wavebands. The compared results demonstrated that the classification performance did not necessarily improve when the number of wavebands increased. Therefore, to obtain a satisfactory classification performance, it was crucial to identify a reasonable value of neighbourhood δ. Thus, in Section 3.1, the selected neighbourhoods were those that correspond to subsets whose ACA ranked in the top five.
The classification models performed satisfactorily with MCA values reaching or approaching 100%. The identification for Cd and Pb contaminated samples was more accurate than that for samples contaminated with Cu and Zn. The model identified Cd and Pb contaminated samples with ACA values of over 95% and 93.5%, respectively. However, when identifying Cu and Zn contaminated samples, the ACA values were over 86% and 87.5%, respectively. Heavy metal contamination indirectly changed the vibrational spectrum by affecting the synthesis of detoxifying proteins and the activity levels of antioxidant enzymes. Because Cu and Zn were essential nutrients, it was postulated that the changes in spectral information induced by Cu and Zn contamination may be smaller than those induced by Cd and Pb contamination. The overall spectra of the Cu and Zn contaminated samples were slightly different from those of healthy samples. Therefore, there was a disparity in the identification effect on Cd and Pb contaminated samples versus Cu and Zn contaminated samples.
In the identification of a specific heavy metal, different waveband selection algorithms showed limited differences in identification performance but a significant difference in the number of wavebands. In general, the CON-NRS algorithm selected the lowest number of wavebands, and the DM-NRS algorithm selected the highest. As an example, for the healthy and Pb contaminated sample datasets, the CON-NRS algorithm selected approximately 10 wavebands, but the DM-NRS algorithm selected more than 15 wavebands. For the MI-NRS algorithm, the ACA was 94.23% when selected wavebands were 9, but it was 93.67% when selected wavebands were 18. An increase in the number of selected wavebands did not improve the identification accuracy but reduced the accuracy. These results were consistent with those in Section 3.2. Overall, to achieve the purpose of reduction and identification, we should select the appropriate neighbourhood value according to the number of wavebands and identification results.
For a more detailed description of identification results, the average accuracy and standard deviation (SD) of the task of identifying Cd contaminated and healthy samples were shown in Table 1. The results were obtained after the samples were randomly divided into testing and training sets and were processed 30 times. The five cases of different waveband selection algorithms were listed in Table 1, which corresponded to the five cases of different neighbourhood sizes in Fig. 10. The SDs of the various waveband selection algorithms were between 2.1% and 3.3%. This indicated that the algorithms were stable, with a small SD. The mean of the SDs of the VP-NRS algorithm was the smallest, 2.83%, and that of the CON-NRS algorithm was the greatest, 2.97%. For the samples contaminated by the other heavy metals, the SDs were approximately 3%, similar to the results in Table 1.
No. | VP-NRS | MI-NRS | DM-NRS | CON-NRS |
---|---|---|---|---|
1 | 95.89% ± 3.11% | 96.89% ± 3.15% | 97.00% ± 2.53% | 96.89% ± 2.77% |
2 | 95.78% ± 2.13% | 96.33% ± 2.89% | 96.56% ± 3.09% | 96.22% ± 2.87% |
3 | 95.56% ± 2.73% | 96.00% ± 2.54% | 95.56% ± 3.29% | 95.89% ± 3.12% |
4 | 95.44% ± 3.05% | 96.00% ± 3.08% | 95.56% ± 2.83% | 95.33% ± 3.11% |
5 | 95.11% ± 3.12% | 95.78% ± 3.02% | 95.11% ± 3.00% | 94.89% ± 2.99% |
Fig. 10–12 showed the number of selected wavebands but not specific wavelengths. To visualize wavelengths or ranges of wavelengths selected by different waveband selection algorithms, Fig. 13 showed the specific wavelengths contained in selected subsets for identifying Cd contamination and healthy Ruditapes philippinarum samples. For a particular algorithm, many selected wavelengths overlapped across different neighbourhood values. This indicated that these wavelengths were more useful when differentiating the specific type of heavy metal contamination. For the different neighbourhood values, although the subsets might vary in size, some common representative wavelengths were selected, or given priority.
For the different waveband selection algorithms, some characteristic wavelengths were always selected, such as wavelengths in the vicinity of 400 nm and 800 nm, respectively. However, wavelengths between 500 and 700 nm were rarely selected. The prioritised wavelengths reflected the molecular structural changes caused by heavy metal contamination in Ruditapes philippinarum.
Tables 2–5 showed the identification results for the VP-NRS, MI-NRS, DM-NRS, and CON-NRS waveband selection algorithms. In each table, the identification results of 45 training samples corresponded to earlier data in Fig. 10. As can be seen from the tables, generally, when the number of training samples reduced, the identification accuracy declined. This was to be expected and was in line with the law of classification algorithms in pattern recognition.
δ | 45 training samples | 30 training samples | 15 training samples | |||
---|---|---|---|---|---|---|
MCA | ACA | MCA | ACA | MCA | ACA | |
0.47 | 100% | 95.89% | 100% | 94.33% | 100% | 92.56% |
0.44 | 100% | 95.78% | 100% | 93.89% | 100% | 92.44% |
0.09 | 100% | 95.56% | 100% | 93.56% | 100% | 92.22% |
0.15 | 100% | 95.44% | 100% | 93.44% | 100% | 91.44% |
0.24 | 100% | 95.11% | 100% | 93.33% | 96.67% | 91.11% |
δ | 45 training samples | 30 training samples | 15 training samples | |||
---|---|---|---|---|---|---|
MCA | ACA | MCA | ACA | MCA | ACA | |
0.20 | 100% | 96.89% | 100% | 96.56% | 100% | 94.44% |
0.19 | 100% | 96.33% | 100% | 95.89% | 100% | 93.56% |
0.21 | 100% | 96.00% | 100% | 95.56% | 100% | 93.33% |
0.18 | 100% | 96.00% | 100% | 94.78% | 100% | 91.78% |
0.17 | 100% | 95.78% | 100% | 94.11% | 100% | 91.67% |
δ | 45 training samples | 30 training samples | 15 training samples | |||
---|---|---|---|---|---|---|
MCA | ACA | MCA | ACA | MCA | ACA | |
0.19 | 100% | 97.00% | 100% | 96.44% | 100% | 93.33% |
0.22 | 100% | 96.56% | 100% | 95.89% | 100% | 93.11% |
0.23 | 100% | 95.56% | 100% | 94.33% | 100% | 92.22% |
0.16 | 100% | 95.56% | 100% | 94.11% | 100% | 92.11% |
0.20 | 100% | 95.11% | 100% | 93.67% | 100% | 91.56% |
δ | 45 training samples | 30 training samples | 15 training samples | |||
---|---|---|---|---|---|---|
MCA | ACA | MCA | ACA | MCA | ACA | |
0.18 | 100% | 96.89% | 100% | 95.22% | 100% | 92.56% |
0.19 | 100% | 96.22% | 100% | 95.00% | 100% | 92.22% |
0.20 | 100% | 95.89% | 100% | 94.89% | 100% | 91.78% |
0.14 | 100% | 95.33% | 100% | 94.33% | 100% | 90.89% |
0.33 | 100% | 94.89% | 100% | 93.33% | 96.67% | 90.11% |
When the number of training samples decreased from 45 to 30, the identification performance did not decrease significantly. The classification accuracy of the MI-NRS algorithm decreased the least when δ was 0.20, and the ACA only decreased by 0.33%. The classification accuracy of the VP-NRS algorithm decreased the most. When the number of training samples decreased from 30 to 15, the identification performance decreased significantly. The identification performance of the CON-NRS algorithm decreased the most, while the performance of the VP-NRS algorithm decreased the least. However, even when there were only 15 training samples, the MCA of all the waveband selection algorithms was over 96.67% and the ACA was over 90%, indicating the model was suitable to identify Cd contaminated Ruditapes philippinarum. Overall, the results showed that the HSI technology was suitable to identify Cd contamination in Ruditapes philippinarum even only with a small number of training samples. Similar results appeared in the identification of Ruditapes philippinarum contaminated by the other heavy metals.
The identification models achieved a satisfactory performance, as indicated by the MCA of over 95%. All ACA values were all above 90%, except for the DM-NRS algorithm when δ was 0.36. The results showed that the model can be used to distinguish contaminated samples from healthy samples. The CON-NRS algorithm selected a lower number of wavebands than the other waveband selection algorithms. The minimum number of wavebands selected by the CON-NRS algorithm was 7, with an ACA of 90.63% and an MCA of 95%. For an example, with a result of classification accuracy 91.67%, 4 healthy samples were identified as contaminated, and 1 contaminated sample was identified as healthy. For an identification model, if misclassification did exist, it was expected that the model could identify healthy samples as contaminated samples, for which it did not cause any food safety issues. The model explored in this study followed that characteristic.
In this study, Ruditapes philippinarum were exposed to high concentrations of heavy metals for a short time. The concentrations of heavy metals in the cultured Ruditapes philippinarum were much higher than those found in polluted seawater. Therefore, the heavy metal content in the Ruditapes philippinarum may also be much higher than that in Ruditapes philippinarum harvested from polluted seawater. In future work, it should be studied further about the model's ability to distinguish Ruditapes philippinarum reared in progressively lowered concentrations of heavy metals.
This journal is © The Royal Society of Chemistry 2021 |