Zengwei Zhengac,
Yi Liud,
Mengzhu Heac,
Dan Chenac,
Lin Sunac and
Fengle Zhu*b
aSchool of Computer & Computing Science, Zhejiang University City College, Hangzhou 310015, China
bCollege of Mechanical Engineering, Zhejiang University of Technology, Hangzhou 310023, China. E-mail: zhufl@zjut.edu.cn
cIntelligent Plant Factory of Zhejiang Province Engineering Lab, Hangzhou 310015, China
dCollege of Computer Science & Technology, Zhejiang University, Hangzhou 310027, China
First published on 21st March 2022
The selection of effective and representative spectral bands is extremely important in eliminating redundant information and reducing the computational burden for the potential real-time applications of hyperspectral imaging. However, current band selection methods act as a separate procedure before model training and are implemented merely based on extracted average spectra without incorporating spatial information. In this paper, an end-to-end trainable network framework that combines band selection, feature extraction, and model training was proposed based on a 3D CNN (convolutional neural network, CNN) with the attention mechanism embedded in its first layer. The learned band attention vector was adopted as the basis of a band importance indicator to select effective bands. The proposed network was evaluated by two datasets, a regression dataset for predicting the relative chlorophyll content (soil and plant analyzer development, SPAD) of basil leaves and a classification dataset for detecting the drought stress of pepper leaves. A number of calibration models, including SVM, 1D-CNN, 2B-CNN (two-branch CNN), 3D ResNet and the developed network were established for performance comparison. Results showed that the effective bands selected by the proposed attention-based model achieved higher regression R2 values and classification accuracies not only than the full-spectrum data, but also than the comparative band selection methods, including traditional SPA (successive projections algorithm) and GA (genetic algorithm) methods and the latest 2B-CNN algorithm. In addition, different from the traditional methods, the proposed band selection algorithm can effectively select bands while carrying out model training and can simultaneously take advantage of the original spectral–spatial information. The results confirmed the usefulness of the proposed attention mechanism-based convolutional network for selecting the most effective band combination of hyperspectral images.
In order to effectively select bands, a great number of researchers have devoted to making their efforts. At present, the most popular and widely applied methods are still the traditional spectral feature selection algorithms, such as genetic algorithm (GA),14 successive projections algorithm (SPA)15 and uninformative variable elimination (UVE).16
In recent years, with the rapid development of deep learning technology and its wide application in various fields, many researchers also start to select effective spectral bands based on CNN (convolutional neural network, CNN). In addition to the average spectrum, CNN can also directly take the original hyperspectral images as input without flattening the sample in advance as the traditional method, and it is effective in both classification and regression tasks.17 Acquarelli et al.18 stated that CNN could effectively identify important spectral regions and the iterative training based on deep learning could successfully realize the selection of effective bands. Although both the traditional methods and the approach proposed by Acquarelli et al.18 contributed greatly to the effective band selection, they were all based on the average spectrum manually extracted by ROI (region of interest) from each sample and did not fully take advantage of the rich spatial information of hyperspectral images.
Recently, a few studies proposed the effective band selection based on spectral–spatial integrated CNN model. Feng et al.19 combined mathematical method with CNN and pointed out that the importance of band could be differentiated by the use of the hard thresholding function. Liu et al.1 proposed the 2B-CNN (two-branch CNN) model, consisting of 1D CNN and 2D CNN for extracting spectral and spatial features respectively. The weights of the first convolution layer in the 2D spatial branch were used as indicator of the effective bands, and was successfully applied to three classification datasets, achieving better results than the traditional selection methods based on extracted mean spectra. Similar to 2B-CNN, Torres-Tello et al.20 introduced SHAP (SHapley Additive exPlanations) to identify the effective spectral bands based on the spectral–spatial fusion neural network for predicting moisture content in canola and wheat. For plant disease identification based on hyperspectral images, Nagasubramanian et al.21 used the 3D CNN model to extract rich and consecutive information from spectral–spatial domains simultaneously. At the same time, they implemented band selection based on the calculated gradient magnitude of each band which reflected the band importance. In addition, Ortiz et al.22 proposed the ILFS (Integrated Learning and Feature Selection) framework based on FCN (Fully Convolutional Networks) to automatically screen important input features while training the model. In this process, they used the chain rule to calculate the differential value of the loss function relative to the selected effective bands as the basis for determining the importance of bands. The presentation of the above five models demonstrated that CNN enables us to perform band selection and end-to-end modelling tasks simultaneously, and our method also makes full use of this advantage.
In 2014, the Google Mind team proposed the attention mechanism to classify images and achieved good performance.23 Attention mechanism is a problem-solving method proposed by imitating human attention. It is a technology that enables the model to focus on important information then fully learn and absorb it, realizing the rapid screening of high-value information from a large amount of data. In recent years, the attention mechanism has been increasingly applied to various scenes in the field of deep learning, such as image recognition, semantic segmentation, machine translation.24 Nowadays, some researchers have pointed out that the effective band selection of hyperspectral images can be realized through the attention mechanism. For example, Lorenzo and Tulczyjew et al.12 proposed an attention-based CNN architecture, in which the attention module was embedded in each convolutional layer to extract attention heat map, which reflected the importance of each band in the training process and thus served as the basis for the selection of effective bands. Also, Cai and Liu et al.13 proposed a BS-Net architecture consisting of attention module and reconstruction module, through which they explicitly simulated the nonlinear interdependence between spectral bands. Experimental results showed that this method was superior to the most popular band selection methods. The above mentioned two studies focused on analysing remote sensing images at pixel level, whereas the attention-based band selection method for object-scale hyperspectral images analysis was not reported.
In this paper, we explored a method of attention mechanism-based 3D CNN to implement the selection of effective bands along with model training while taking advantage the spatial–spectral continuity of hyperspectral images in object-scale analysis. To evaluate the effectiveness of the proposed band selection approach, both regression and classification tasks were carried out on two hyperspectral image datasets respectively. The former was on the basil leaves to predict the leaf SPAD (soil and plant analyzer development) value, which is highly correlated with the chlorophyll content of plants.25,26 Chlorophyll plays a central role in the light absorption of plant photosynthesis. The latter was on the pepper leaves to detect the leaf drought stress, which greatly affects photosynthesis and plant growth. A number of calibration models, including SVM, 1D-CNN, 2B-CNN, 3D ResNet and the developed network were established for comparing the performance of selected effective bands with the full-spectrum data. The effectiveness of the proposed band attention method was also assessed by comparing with the traditional selection algorithms of SPA and GA as well as the latest algorithm of 2B-CNN based on the performance of calibration models.
(1) |
Note that the regression and classification analysis had exactly the same network structure except for the output size of the last linear layer, more specifically, the output size of regression was 1, and for classification it was 2.
The implementation details of band attention module are shown in Fig. 5, the procedure worked as follows: (1) computed two different spectral context descriptors Hbavg and Hbmax through average-pooling and max-pooling based on the original input hyperspectral image cube respectively; (2) fed the obtained descriptors into the shared network and generated the corresponding one-dimensional spectral features in turn. The shared network was consisted of MLP (Multilayer Perceptron) with a hidden layer (note that the output dimension of the shared network was consistent with the dimension of the input descriptor); (3) added up the output vectors of the shared MLP for band attention map generation; (4) used the obtained attention map to generate a band weights adjusted hyperspectral image cube by element-wise multiplication. The generation process of band attention map could be described as:
Mb(H) = σ(MLP(AvgPool(H)) + MLP(MaxPool(H))) = σ(W1(W0(Hbavg)) + W1(W0(Hbmax))) | (2) |
After the generation of band attention map, the final multiplication operation could be summarized as:
H′ = Mb(H) ⊗ H | (3) |
(4) |
In order to prove the usefulness of the proposed attention-based method for selecting effective bands, we compared this method with the common traditional algorithms of SPA, GA and the novel algorithm of 2B-CNN based on the training of SVM, 1D-CNN, 3D ResNet and the proposed network. For the sake of fairness, the number of effective bands selected by the comparison methods were kept consistent with the attention-based model.
(5) |
(6) |
(7) |
(8) |
(9) |
Dataset | Metric | SVM | 1D-CNN | 2B-CNN | 3D ResNet | Proposed model |
---|---|---|---|---|---|---|
Basil | RMSE | 2.890 | 5.552 | 4.458 | 2.420 | 2.379 |
R2 | 0.825 | 0.354 | 0.583 | 0.878 | 0.881 | |
Pepper | Accuracy (%) | 56.79 | 57.38 | 60.00 | 71.11 | 73.89 |
Precision (%) | 55.65 | 56.19 | 59.44 | 63.91 | 77.63 | |
Sensitivity (%) | 62.31 | 68.14 | 66.06 | 95.51 | 66.29 |
In addition, the effect of the added band attention module in 3D ResNet on the consistency and convergence of model training needed in-depth discussion. For basil leaf and pepper leaf datasets, the original hyperspectral images were fed into the 3D ResNet with and without the band attention module for comparison, in which other hyper-parameter configurations were kept the same. Results showed that the addition of attention module would not increase the number of epochs or training time, instead, the performance of regression and classification tasks could even be improved to some extent. In addition, the embedding of this band attention module allowed us to select the most effective bands while training. Therefore, it was demonstrated that the quality of the overall modeling performance was improved through this embedding.
Fig. 7 Effective bands of basil leaf dataset (a) and pepper leaf dataset (b) identified by attention-based band selection model. |
It can be clearly seen from Fig. 7 that the effective band subset obtained by the band attention module had few continuous bands and the distribution of these bands was relatively uniform. Based on the fact that adjacent spectral bands are usually highly correlated, the selected bands should contain less redundancy, which might be beneficial for the prediction performance of the proposed model. In addition, these bands were distributed in positions with large fluctuations in the average spectrum curve, indicating that the most important information of hyperspectral images was captured by the proposed band selection method.
To analyse the selected effective bands in more depth, wavelengths assignment was carried out. The total bands were labelled from 0 to 139. For basil leaf dataset, the selected band subset was [5, 6, 8, 25, 30, 45, 58, 73, 78, 82, 89, 111, 116], corresponding to the wavelengths of 486 nm, 489 nm, 496 nm, 553 nm, 569 nm, 625 nm, 665 nm, 713 nm, 728 nm, 741 nm, 764 nm, 828 nm and 841 nm respectively. These wavelengths were consistent with the strong reflection of green light (553 nm, 569 nm), absorption of red light (625 nm, 665 nm), abrupt reflection increment at the red edge region (728 nm, 741 nm, 764 nm) of a typical leaf's spectral profile, due to the porphyrin ring in chlorophyll molecules in basil leaves (Walsh 2020). Besides, the effective wavelengths of 728 nm, 741 nm, 764 nm could be attributed to the fourth overtone of the methyl (–CH3), methylene (–CH2) and methine (–CH) groups stretching vibration of the chlorophyll molecules.29 For pepper leaf dataset, the selected subset was [0, 2, 6, 10, 19, 31, 58, 59, 65, 70, 79, 94, 114], corresponding to wavelengths of 468 nm, 476 nm, 489 nm, 501 nm, 534 nm, 573 nm, 665 nm, 667 nm, 688 nm, 703 nm, 732 nm, 780 nm and 836 nm. These wavelengths were correlated with the biochemical (moisture, pigments, etc.) and cell structural changes in pepper leaves undergoing drought stress. The 468 nm and 476 nm in blue light region were attributed to the absorption of chlorophylls, carotenes and xanthophylls. The 534 nm and 573 nm in green light region were due to the combined effects of chlorophylls refection and anthocyanins absorption. The 665 nm, 667 nm and 688 nm in red light region were ascribed to the absorption of chlorophylls.30 The 732 nm, 780 nm and 836 nm were related to the third overtone of O–H group, forth overtone of C–H group and third overtone of N–H group respectively in various biochemical components (moisture, amino acids, carbohydrates, etc.) in pepper leaves.29 The correlation analysis of the effective wavelengths with the absorption of chemical functional groups in leaf biochemical components demonstrated that our proposed attention-based convolution network could select the key spectral wavelengths which were not only representative and interpretable, but also oriented at solving specific analytical problems.
In order to better evaluate the efficacy and importance of extracted effective bands by the developed attention-based band selection model, hyperspectral image data before and after band extraction were put into the same models for performance comparison, including the spectral–spatial models of 2B-CNN, 3D ResNet and attention-based 3D ResNet and the spectral models of SVM and 1D-CNN. The results are shown in Table 2. It can be seen that, for all modelling methods in both datasets, the band subset selected by the proposed method could give better results than the original hyperspectral image data (without band selection). This phenomenon could be interpreted as that for the original hyperspectral image data, a lot of redundancy and noise were contained in the consecutive spectral bands, which would cause interference to the prediction analysis. The proposed band selection removed this interference information while retaining the most important and representative bands, which not only compressed data and improved training efficiency, but also improved the performance of prediction analysis.
Dataset | Metric | SVM | 1D-CNN | 2B-CNN | 3D ResNet | Proposed model | |||||
---|---|---|---|---|---|---|---|---|---|---|---|
Full | Subset | Full | Subset | Full | Subset | Full | Subset | Full | Subset | ||
Basil | RMSE | 2.890 | 2.600 | 5.552 | 5.206 | 4.458 | 3.499 | 2.420 | 2.133 | 2.379 | 2.046 |
R2 | 0.825 | 0.858 | 0.354 | 0.432 | 0.583 | 0.743 | 0.878 | 0.904 | 0.881 | 0.912 | |
Pepper | Accuracy (%) | 56.79 | 60.95 | 57.38 | 61.14 | 60.00 | 65.56 | 71.11 | 76.67 | 73.89 | 76.90 |
Precision (%) | 55.65 | 60.56 | 56.19 | 59.76 | 59.44 | 61.82 | 63.91 | 71.17 | 77.63 | 73.17 | |
Sensitivity (%) | 62.31 | 61.22 | 68.14 | 68.26 | 66.06 | 70.23 | 95.51 | 88.76 | 66.29 | 85.31 |
For the two datasets, the selected band subset (total bands labelled from 0 to 139) and their corresponding wavelengths by the three comparative methods and the attention-based method are shown in Table 3. It can be observed that the effective bands selected by above algorithms did not show great similarity. This was due to the difference in their selection principle. The SPA aimed to calculate the bands with maximal discriminative power in spectral domain, while the GA aimed to select the combination of bands with the best performance, in which the selection was random. Both of SPA and GA relied on the average spectrum and only the information of spectral domain was considered in band selection. Meanwhile, the 2B-CNN was constructed based on the spatial–spectral fusion of hyperspectral images when performing modeling task, but the effective bands were selected only based on the spatial branch of 2D CNN. In contrast, the method proposed in this study took both spatial and spectral information into account for effective band selection while carrying out prediction analysis.
Dataset | Method | Selected subset (labelled bands from 0 to 139) | Corresponding wavelengths (nm) |
---|---|---|---|
Basil | SPA | 1, 4, 10, 11, 15, 36, 46, 47, 52, 59, 67, 115, 129 | 471, 483, 501, 505, 519, 591, 629, 632, 647, 667, 694, 839, 876 |
GA | 5, 25, 28, 38, 45, 55, 59, 68, 85, 96, 121, 129, 137 | 486, 553, 563, 596, 625, 656, 667, 697, 750, 785, 854, 876, 896 | |
2B-CNN | 1, 9, 10, 27, 59, 64, 69, 77, 85, 92, 106, 111, 139 | 471, 500, 501, 559, 667, 684, 700, 726, 750, 774, 814, 828, 902 | |
Proposed | 5, 6, 8, 25, 30, 45, 58, 73, 78, 82, 89, 111, 116 | 486, 489, 496, 553, 569, 625, 665, 713, 728, 741, 764, 828, 841 | |
Pepper | SPA | 0, 2, 3, 4, 6, 9, 12, 26, 47, 58, 65, 74, 129 | 468, 476, 479, 483, 489, 500, 508, 556, 632, 665, 688, 716, 876 |
GA | 12, 33, 41, 50, 51, 56, 65, 75, 85, 91, 104, 110, 114 | 508, 580, 606, 641, 643, 659, 688, 719, 750, 771, 808, 825, 836 | |
2B-CNN | 14, 18, 34, 47, 74, 75, 91, 95, 96, 104, 116, 123, 127 | 515, 531, 583, 632, 716, 719, 771, 783, 785, 808, 841, 859, 871 | |
Proposed | 0, 2, 6, 10, 19, 31, 58, 59, 65, 70, 79, 94, 114 | 468, 476, 489, 501, 534, 573, 665, 667, 688, 703, 732, 780, 836 |
To assess the effectiveness of these band subsets selected by different methods, calibration models were established, the results are shown in Table 4 and visualized in Fig. 8. For comparison among different calibration models, similar to the full-spectrum results in Table 1, for both datasets the proposed attention-based 3D ResNet model performed the best, and its performance difference with the 3D ResNet was insignificant. Overall, the 3D CNN models performed obviously better than others, demonstrating the merit of joint feature extraction from spectral–spatial dimensions of hyperspectral images.
Models | Band selection methods | Basil leaf dataset | Pepper leaf dataset | |||
---|---|---|---|---|---|---|
RMSE | R2 | Accuracy (%) | Precision (%) | Sensitivity (%) | ||
SVM | SPA | 2.829 | 0.832 | 56.61 | 55.60 | 59.89 |
GA | 2.792 | 0.837 | 57.38 | 58.49 | 60.32 | |
2B-CNN | 2.698 | 0.847 | 59.76 | 57.38 | 65.53 | |
Proposed | 2.600 | 0.858 | 60.95 | 60.56 | 61.22 | |
1D-CNN | SPA | 5.470 | 0.373 | 58.57 | 57.38 | 64.13 |
GA | 5.423 | 0.383 | 59.17 | 58.57 | 60.49 | |
2B-CNN | 5.319 | 0.407 | 60.36 | 57.98 | 77.16 | |
Proposed | 5.206 | 0.432 | 61.14 | 59.76 | 68.26 | |
2B-CNN | SPA | 4.399 | 0.594 | 61.11 | 65.93 | 65.64 |
GA | 4.241 | 0.623 | 63.32 | 59.67 | 77.95 | |
2B-CNN | 3.575 | 0.732 | 64.45 | 61.67 | 72.31 | |
Proposed | 3.499 | 0.743 | 65.56 | 61.82 | 70.23 | |
3D ResNet | SPA | 2.582 | 0.860 | 71.67 | 69.39 | 76.40 |
GA | 2.315 | 0.887 | 72.78 | 67.24 | 87.64 | |
2B-CNN | 2.308 | 0.888 | 73.33 | 65.17 | 77.33 | |
Proposed | 2.133 | 0.904 | 76.67 | 71.17 | 88.76 | |
Proposed model | SPA | 2.314 | 0.887 | 74.22 | 73.21 | 76.91 |
GA | 2.343 | 0.885 | 75.00 | 68.97 | 89.89 | |
2B-CNN | 2.217 | 0.897 | 76.11 | 77.38 | 73.03 | |
Proposed | 2.046 | 0.912 | 76.90 | 73.17 | 85.31 |
For comparison among different band selection methods, obviously the proposed band attention method showed superiority to SPA, GA and 2B-CNN for all models on both regression and classification tasks. In specific, for basil leaf dataset, the R2 obtained in SVM, 1D-CNN, 2B-CNN, 3D ResNet and the proposed model based on the spectral subset selected by the band attention module were 0.858, 0.432, 0.743, 0.904 and 0.912 respectively. And the averaged relative improvement of the attention-based method compared with SPA, GA and 2B-CNN calculated across all models was 10.39%, 7.91% and 2.48% respectively. For pepper leaf dataset, the accuracy achieved by SVM, 1D-CNN, 2B-CNN, 3D ResNet and the proposed model established with the spectral subset selected by the attention module were 60.95%, 61.14%, 65.56%, 76.67% and 76.90% respectively. And the averaged relative improvement of the attention-based method compared with SPA, GA and 2B-CNN calculated across all models was 5.98%, 4.19% and 2.12% respectively. Noticeably, although the performance of 2B-CNN was not good in regression analysis on the basil leaf dataset, the band subset selected by 2B-CNN still performed well on both datasets, superior to the traditional methods of GA and SPA. This was probably due to that the 2B-CNN was originally designed for classification and band selection.1 Thus, the advantage of the proposed model over 2B-CNN could be seen not only on effective band selection, but also on both classification and regression tasks. Moreover, for both GA and SPA, the effective bands were selected based on average spectra, then the prediction analysis was carried out in a separated procedure, which required a lot of manual interventions. In comparison, our proposed band attention module was directly embedded in the modeling analysis of convolutional network, which could conduct model training and band selection simultaneously while achieving good results. Therefore, our proposed band selection model is of practical significance both in performance and convenience, and may be applicable to other hyperspectral datasets in other plant science tasks, such as disease detection, heat stress identification and so on. In addition, with the largely reduced number of bands, more portable spectral imaging equipment could be developed to improve data collection efficiency in practical application. In future work, more samples need to be collected to further improve the performance of the proposed band selection model.
For both datasets, the effective bands selected by the proposed method were representative and interpretable, achieving better prediction performance than the full-spectrum data. Also, the proposed attention-based selection method performed better not only than the traditional SPA and GA but also than the latest 2B-CNN algorithm, this proved the good band feature extraction ability of attention mechanism. The overall results indicated that the proposed framework not only performed well to effectively select the key and representative wavelengths, but also was convenient and flexible to be implemented while carrying on model training. To sum up, the proposed attention-based 3D ResNet is a promising band selection method of hyperspectral image and has great development potential.
This journal is © The Royal Society of Chemistry 2022 |