Ruizhao Yanga,
Yun Lic,
Binyi Qin*ab,
Di Zhaoa,
Yongjin Gana and
Jincun Zhenga
aSchool of Physics and Telecommunication Engineering, Yulin Normal University, Yulin, China. E-mail: qby207@163.com
bGuangxi Colleges and Universities Key Laboratory of Complex System Optimization and Big Data Processing, Yulin Normal University, Yulin, China
cCollege of Chemistry and Food Science, Yulin Normal University, Yulin, China
First published on 11th January 2022
Feature extraction is a key factor to detect pesticides using terahertz spectroscopy. Compared to traditional methods, deep learning is able to obtain better insights into complex data features at high levels of abstraction. However, reports about the application of deep learning in THz spectroscopy are rare. The main limitation of deep learning to analyse terahertz spectroscopy is insufficient learning samples. In this study, we proposed a WGAN-ResNet method, which combines two deep learning networks, the Wasserstein generative adversarial network (WGAN) and the residual neural network (ResNet), to detect carbendazim based on terahertz spectroscopy. The Wasserstein generative adversarial network and pretraining model technology were employed to solve the problem of insufficient learning samples for training the ResNet. The Wasserstein generative adversarial network was used for generating more new learning samples. At the same time, pretraining model technology was applied to reduce the training parameters, in order to avoid residual neural network overfitting. The results demonstrate that our proposed method achieves a 91.4% accuracy rate, which is better than those of support vector machine, k-nearest neighbor, naïve Bayes model and ensemble learning. In summary, our proposed method demonstrates the potential application of deep learning in pesticide residue detection, expanding the application of THz spectroscopy.
Terahertz (THz) spectroscopy is considered to be a promising detection method due to its low-energy, high resolution and penetrability.9–11 Because it is sensitive to the vibrational modes, THz spectra contain abundant useful information on the vibrational modes of the target. In recent years, some researchers have combined THz fingerprints and chemometric techniques to detect foreign bodies,12,13 toxic and harmful compounds,10,14,15 pesticides,16–18 antibiotics,19,20 microorganisms21–23 and adulteration.24 The above studies show that data feature extraction is a key factor affecting the detection results. Compared to traditional methods, deep learning is able to obtain better insights into complex data features at high levels of abstraction. The residual neural network (ResNet), proposed by He,25 is a deep learning network, which solves the degradation issues of traditional deep convolutional networks. It has been applied in the real-time quality assessment of pediatric MRI images,26 the clinical diagnosis of COVID-19 patients,27 the identification of cashmere and sheep wool fibers,28 rotating machinery fault diagnosis29 and so on. However, there are few reports about the application of deep learning in THz spectroscopy. The main reason is that learning samples of THz spectra are too few to meet deep learning requirements. It is known that the learning results are worse when a deep learning model is short of learning samples. Measuring more THz spectra data is one way to solve the problem of insufficient learning samples. But it is not a good approach, because it demands a higher cost and more time.
The generative adversarial network (GAN) is a sample generation model, which was proposed by Goodfellow30 in 2014. It is able to generate new samples with the same distribution as real samples to expand the size of labeled samples.31 The GAN contains a generator and a discriminator. The GAN learns the distribution of real samples during the game between a generator and a discriminator. In the process of GAN training, the generator tries its best to fit the distribution of real samples and generate new samples, while the discriminator tries its best to distinguish real samples from new samples. In recent years, GAN has been used for generating new conversation data,32 new samples for the minority class of various imbalanced datasets33 and new high-quality images.34 As there is a shortage of training data, it is difficult to train a learning model from scratch. Fine-tuning a deep learning model, which has been pretrained using a large set of labeled natural images, is a promising method to solve the shortage of learning samples. It has been applied successfully to various computer vision tasks such as food image recognition,35 mammogram image classification36 and multi-label legal document classification.37
Carbendazim is a type of broad-spectrum benzimidazole fungicide, which has been commonly employed to control plant diseases in cereals and fruits. Rice is an important food crop for human beings, which a wide area has been cultivated for. A large amount of carbendazim is used in the prevention of rice blast and rice sheath blight fungus. Studies have shown that high doses of carbendazim can damage testicles, which causes infertility.38,39 In this study, we proposed the WGAN-ResNet method, which combines two deep learning networks, the Wasserstein generative adversarial network (WGAN) and the residual neural network (ResNet), to detect carbendazim based on THz spectroscopy. The WGAN was employed to generate new learning samples, which solves the problem of learning results being worse caused by insufficient learning samples. The ResNet was applied to quantify different concentrations of carbendazim samples. At the same time, pretraining model technology was employed to reduce the training parameters in the ResNet. The results demonstrate that our proposed method shows the potential application of deep learning in pesticide residue detection, expanding the application of THz spectroscopy.
Fig. 2 shows the flow chart for the whole process. It includes the preparation of the sample, data acquisition, generating new samples and the ResNet model. Firstly, samples were made into tablets after drying and sieving. Secondly, the absorption coefficients of the samples were calculated by using the THz time-domain spectra, and then the absorption coefficients were translated to two-dimensional images. Thirdly, new samples were generated by WGAN to increase the training samples. Finally, ResNet was trained and employed to quantify the samples. The details of these procedures are described in the following sections.
(1) |
(2) |
The depth of the network is very important for the performance of the model. When the number of network layers is increased, the network can extract more complex feature patterns. But deep networks present a degradation problem: when the network depth increases, the network accuracy becomes saturated or even decreases. The ResNet is able to solve the degradation problem as the network depth increases. The ResNet is good at two-dimensional image recognition tasks.29,42–44 To use our proposed WGAN-ResNet analysis with the THz spectrum, we firstly translated a one-dimensional absorption coefficient to a two-dimensional image as follows,
(3) |
After the above calculation, we obtained 429 images. These images were called actual images. Then, these actual images were put into the WGAN to generate 13 concentrations gradients of new image samples (0%, 2%, 4%, 6%, 8%, 10%, 15%, 20%, 25%, 30%, 40%, 50% and 100%). Each concentration was 3495 new image samples. To distinguish between new images and actual images, we called the new images generated images.
(4) |
Eqn (4) can be translated based on the Kantorovich–Rubinstein duality:
(5) |
(6) |
(7) |
(8) |
(9) |
The architectures of the generator and discriminator in the WGAN are illustrated in Fig. 3. The generated images were produced by the generator, when random noise was input into the generator. And then an image was decided to be either an actual image or a generated image by the discriminator. The generator was trained to generate images which are more similar to actual images, and the discriminator was also trained to discriminate between images more accurately. The generator and the discriminator were adversarial with each other. When the discriminator could not make a decision on whether an image is a generated image or an actual image, the training was finished.
Fig. 3 The architectures of the generator and a discriminator in the WGAN. (a) The architecture of the generator. (b) The architecture of the discriminator. |
Let the desired underlying mapping be denoted as H(x), where x denotes the input of the first of these layers. The stacked nonlinear layers fit another mapping of F(x)≔H(x) − x. So, the original mapping is recast into F(x) + x. At last, the formulation of F(x) + x can be realized by feedforward neural networks with identity shortcut connections, shown in Fig. 4.
Fig. 4 Residual learning: a building block from ref. 25. |
In this study, we selected two ResNets (an 18 layer ResNet and a 152 layer ResNet). To satisfy our classification task, we changed the last fully connected layer output of the original ResNet from 1000 to 13. The network architecture of the 18 layer ResNet and 152 layer ResNet are listed in Table 1. The number beside the bracket represents the number of blocks stacked. Down-sampling was performed by conv3_1, conv4_1, and conv5_1 with a stride of 2.
Pretraining model technology can be used to train a large target network without overfitting when the target dataset is smaller than the base dataset.47 In the experiment, we first trained the ResNet based on the ImageNet dataset, and then we transferred it to our target network to be trained on a target dataset. ImageNet is a large-scale hierarchical image database, which contains 3.2 million cleanly annotated images spread over 5247 categories.48 It has been the most influential dataset in computer vision.25,49–51
Fig. 6 (a)–(c) The actual images and (d)–(f) generated images (0%, 2% and 100% concentration, respectively). |
The structural similarity index (SSIM)52 was employed to quantificationally measure the similarity of the generated and actual images. The more similar the generated and actual images, the closer the SSIM value is to 1. For the 0%, 2% and 100% concentration samples, the SSIM values are 0.92, 0.94 and 0.98, respectively. The results mean that the generated images keep the key features of the actual images well.
(10) |
For the ResNet, the quantification accuracy rate of the 152 layer ResNet is 2.57% higher than that of the 18 layer. For the WGAN-ResNet, the quantification accuracy rate of the 152 layer model is 5.98% higher than that of the 18 layer. For both the ResNet and WGAN-ResNet, the 152 layer models have better performances than those of the 18 layer, which is consistent with the previous report.25 This indicates that the deeper network has better feature extraction ability.
The 18 layer ResNet has 1.8 GFLOPs, and the 152 layer ResNet has 11.3 GFLOPs.25 This means that the model complexity of the 152 layer ResNet is higher than that of the 18 layer ResNet. Thus, training the ResNet models requires a large number of samples. When the ResNet was trained without the WGAN, the number of training samples was only 429. The number of training samples could not meet the needs of model training. In order to make more samples for training the ResNet models, WGAN was employed. The results show that the quantification accuracy rates of WGAN-ResNet are better than ResNet. For the 18 layer model, the quantification accuracy rate of the WGAN-ResNet is 0.86% higher than that of the ResNet. And for the 152 layer model, the identify accuracy rate of the WGAN-ResNet is 4.27% higher than that of the ResNet. This is because the ResNet model parameters were trained sufficiently using the new images generated by the WGAN. This also indicates that the WGAN is a feasible way to augment data for a shortage of learning samples.
To avoid model overfitting, we introduced pretraining model technology to reduce the ResNet model parameters. The task which trains the ResNet based on the ImageNet dataset is called task A. Task B is the task that trained the ResNet based on our dataset. The network AnB_ResNet_18 is a network architecture based on the 18 layer ResNet, and the model parameters of the first n layers are copied from task A and frozen, while the parameters of the remaining 5 – n layers are randomly initialized and trained based on task B. The network AnB_ResNet_152 is a network architecture based on the 152 layer ResNet, and the model parameters of the first n layers are copied from task A and frozen, while the parameters of the remaining 5 – n layers are randomly initialized and trained based on task B. To train the ResNet networks, we used the Adam method with a learning rate of 1e−4 and a batch size of 128.
The quantification accuracy rates with different numbers of frozen layers are displayed in Fig. 8. When the frozen layer n is 0, it represents that the model parameters in the five layers shown in Table 1 will change with training. When the frozen layer n is 1, the first layer conv1 shown in Table 1 is frozen. And when the frozen layer n is 2, the first two layers conv1 and conv2_x are frozen, and so on. When the frozen layer n is 5, it means that the model parameters in conv1, conv2_x, conv3_x, conv4_x and conv5_x cannot be changed by training. As shown in Fig. 8, the accuracy rate rises as the frozen layers increase at the beginning. However, as the frozen layers increase further, the accuracy rate falls. The features of samples in our dataset are quit different from those in the ImageNet dataset. When the frozen layers increase gradually, the co-adaptation and the feature extraction capability deteriorates. The best accuracy rate is 91.4%, which is obtained by using the 152 layer ResNet with the first layer frozen.
Fig. 8 Accuracy rate of the 18 layer and 152 layer WGAN-ResNet with different numbers of frozen layers. |
For further analysis, we compared our proposed WGAN-ResNet with a support vector machine (SVM),53 k-nearest neighbor (KNN),54 naïve Bayes model55 and ensemble learning.56 The hyper-parameters of SVM were optimized by a genetic algorithm (GA)57 and particle swarm optimization (PSO).58,59 SVM, KNN, the naïve Bayes model and ensemble learning can be considered to be shallow learning. The sample features used by shallow learning are low-level features (generally edge texture features). Different from SVM, KNN, the naïve Bayes model and ensemble learning, the ResNet is not only able to extract low-level features, but is also able to extract high-level features.25 The high-level features are based on low-level features, which have richer semantic information. The accuracy rates of the above methods are shown in Fig. 9. Our proposed WGAN-ResNet achieved a 91.4% accuracy rate, which is higher than those of the compared methods.
Trace detection and the extraction of more features from complex samples will be our future research directions. To extract more features from complex samples, we will use the information fusion method. This fuses more spectrum parameters (such as the refractive index and dielectric constant) together as the input of the WGAN-ResNet. For trace detection, we will enhance the interaction between the pesticide and the terahertz spectrum using metamaterials.
This journal is © The Royal Society of Chemistry 2022 |