Abstract:
Accurate crop classification can greatly contribute to the efficient decision-making on resource and yield estimation in modern precision agriculture. It is often required to accurately identify the crop types, farmers, and agricultural organizations. Ultimately, the productivity and sustainability can be enhanced to optimize the irrigation, fertilization, and pest control. This study aims to improve the accuracy and generalization of the crop classification. An advanced deep semantic segmentation was proposed to term the AF-DBUNet (Attention Feature Fusion and Dual-Branch Upsampling Network). Sentinel-2 satellite images were utilized to achieve the high-precision classification of the corn and peanut crops. Its applicability was then verified to monitor large-scale agriculture. The experimental areas were taken as the Pingyu and Runan County of Zhumadian City, Henan Province, and Tanghe County of Nanyang City, Henan Province. A high-precision dataset of the crop label was constructed to integrate the multi-temporal Sentinel-2 L2A-level images (10m resolution) and RTK-measured data. SNAP 10.0 software was used for the image preprocessing and resampling, in order to ensure the consistency of data quality. The crop distribution labels were then generated with the precise spatial positioning using ArcMap. Each crop was assigned a specific color code to assist in precise labeling. The model was improved to learn the accurate spatial and spectral features. Some feature was selected to improve the classification performance using the Relief-F algorithm. Initially, ten spectral features were extracted from the original Sentinel-2 imagery. The key vegetation indices also included the NIR (near-infrared), NDVI (normalized difference vegetation index), RVI (ratio vegetation index), and EVI (enhanced vegetation index). The Relief-F algorithm was then applied to rank these features according to their contribution to the classification performance. The top three most informative features were selected as the input. The redundant spectral information was effectively reduced to distinguish between different crop types. Additionally, data augmentation was applied to the satellite images and their labels, including horizontal flipping, vertical flipping, diagonal mirroring, and Gaussian blur. The generalization of the model was improved to prevent overfitting. The model was then exposed to diverse spatial variations during training. Two components were also introduced in the AF-DBUNet: the A-CFM (attention-guided cross-fusion module) and the Dual-Branch Upsampling Fusion Module. An encoder-decoder architecture was adopted with the encoder using ResNet50. Deep feature extraction was enhanced to remove the global average pooling layer and the fully connected layer. The A-CFM module was enhanced with the multi-scale feature fusion using residual connections and attention mechanisms. The key crop areas were accurately classified after fusion. The dual-branch upsampling fusion module combined bilinear interpolation and transposed convolution to reconstruct the spatial feature. An improved ResNet50 was used as the encoder. The Dice Loss + Focal Loss hybrid loss function and cosine annealing learning rate scheduling were combined to implement in the PyTorch framework for the end-to-end optimization. The model bias was effectively alleviated under sample imbalance. Experimental results showed that the AF-DBUNet significantly outperformed the PSPNet, DeepLabv3+, and U-Net models in the training area test. Specifically, the mPA (mean pixel accuracy) reached 92.13%, which was 5.65, 2.75, and 2.92 percentage points higher than PSPNet, U-Net, and DeepLabv3+, respectively; The mIoU (mean Intersection over Union) was 85.17%, which was 8.41, 3.15, and 4.03 percentage points higher than the rest. Additionally, the OA (overall accuracy) of AF-DBUNet was 92.30%, which was 2.42 to 4.74 percentage points higher than the rest. In terms of the misclassification and omission of peanut and corn crops, AF-DBUNet achieved the highest UA (user accuracy) and PA (producer accuracy) in all categories, thus enabling more accurate identification of the target crops. The cross-county independent test area evaluation showed that the AF-DBUNet achieved the highest generalization performance among the four test areas, with mIoU of 81.18%, mPA of 89.16%, and OA of 88.89%. The UA and PA of the peanuts were 87.85% and 90.50%, respectively, while those of the corn were 87.59% and 88.07%, respectively. The relatively stable generalization of the AF-DBUNet was achieved in the cross-city and cross-year independent test area generalization evaluation (2023 Tanghe County data). The overall accuracy of the AF-DBUNet remained stable at 80.42%, thus fully verifying its excellent generalization. In summary, the AF-DBUNet effectively improved the accuracy and generalization of the crop classification. There was the collaborative optimization of the attention-guided feature and dual-branch upsampling fusion modules. The high accuracy (OA > 92%) and strong generalization (cross-region OA > 80%) can also provide a reliable tool for large-scale remote sensing in modern agriculture.