Abstract:
Abstract: The identification and prevention of crop diseases played a major role in promoting agricultural development. The key point for the identification of crop diseases task based on deep learning was to focus on subtle discriminative details that made similar classes different from each other. The traditional attention mechanisms implicitly addressed this requirement and improved recognition accuracy by reweighting the features. The attention mechanisms neglected irrelevant information and focused on more discriminant regions of the image by emphasizing relevant feature associations. However, the softmax activation function, which was used to normalize the attention coef?cients yielded sparser activations at the output, leading to a poor reinforcement effect. Inspired by the AlexNet, a group attention module based on a grouping strategy was proposed to strengthen activations at the output, which divided the features of the same concept into the same group and strengthened different groups by itself, reducing the inhibitory effect between different groups of semantic concepts. The grouping strategy greatly suppressed the negative impact of the softmax activation function. Moreover, traditional attention mechanisms could not effectively reinforce low-level features, because low-level features lacked effective semantic information. To reinforce low-level features, the attention coef?cients were calculated for low-level features from high-level features within the group attention module. The experimental results showed that the strengthening effect of the group attention module was better than the traditional attention mechanisms. Based on the group attention module, this study proposed a real-time and efficient semantic segmentation model of crop disease leaves that combined the advantages of the encoder-decoder semantic segmentation frameworks and the multi-branch semantic segmentation frameworks. The encoder-decoder frameworks boosted the performance by using the deconvolution layer, however leading to an expensive computation. Furthermore, the multi-branch frameworks enlarged the receptive field by fusing different level features, which met the balance of speed and accuracy. To achieve real-time performance, this study relied on a light-weight general-purpose architecture as the feature extractor network firstly. The light-weight ResNet18, which was pre-trained on the PlantVillage dataset, was adopted as the backbone due to the balance of its efficiency and accuracy. Then, the deconvolution layer was replaced by the light-weight bilinear upsampling layer to recover the spatial resolution of the input. To improve accuracy, the low-level features were enhanced by the high-level features within the group attention module. Finally, the receptive ?eld was enlarged by fusing different level features in a novel fashion. Combining different level features significantly boosted the performance because the high-level features provided the global context information, and the low-level features provided detailed information. The model of this study with ResNet18 backbone outperformed previous real-time semantic segmentation models, achieving the pixel accuracy of 93.9% and the mean intersection over the union of 78.6%. Furthermore, the model of this study reached the speed of 130.1 frames per second with 900×600 pixels resolution on one NVIDIA GTX1080Ti graphics card, which met the needs of real-time operation. To sum up, this model had a good balance of efficiency and accuracy for the crop disease leaves semantic segmentation and could provide a reference for modern agricultural disease identification, automatic fertilization, and precision irrigation applications.