基于无人机图像的多尺度感知麦穗计数方法

孙俊; 杨锴锋; 罗元秋; 沈继锋; 武小红; 钱磊

doi:10.11975/j.issn.1002-6819.2021.23.016

摘要: 小麦是世界上重要的粮食作物，其产量的及时、准确预估对世界粮食安全至关重要，小麦穗数是估产的重要数据，因此该研究通过构建普适麦穗计数网络（Wheat Ear Counting Network，WECnet）对灌浆期小麦进行精准的计数与密度预估。选用多个国家不同品种的麦穗图像进行训练，并且对数据集进行增强，以保证麦穗多样性。在原始人群计数网络CSRnet基础上，针对小麦图像特点构建WECnet网络。在网络前端，通过使用VGG19的前12层进行特征提取，同时与上下文语义特征进行融合，充分提取麦穗的特征信息。后端网络使用不同空洞率的卷积加大感受野，输出高质量的密度图。为了验证模型的可迁移性与普适性，该研究通过基于全球小麦数据集训练好的模型对无人机实拍的麦田图像进行计数。试验结果表明：在全球小麦数据集上，WECnet训练模型的决定系数、均方根误差（Root Mean Square Error，RMSE）与平均绝对误差（Mean Absolute Error，MAE）分别达到了0.95、6.1、4.78。在无人机拍摄图像计数中，决定系数达到0.886，整体错误率仅为0.23%，平均单幅小麦图像计数时间为32 ms，计数速度与精度均表现优异。普适田间小麦计数模型WECnet可以对无人机获取图像中小麦的准确计数及密度预估提供数据参考。

Abstract: Wheat is one of the most important food crops, of which the annual consumption reaches 750 million tons in the world. However, a timely and accurate estimation of wheat production has been a high demand for food security, as the higher grain supply with the ever-increasing population against climate change. In this study, a wheat ear counting network (WECnet) was constructed to accurately estimate the wheat density using the Unmanned Aerial Vehicle (UAV) images. A variety of wheat images were collected from many countries for training. The training set was then filtered and enhanced to ensure the diversity of wheat ears. Four methods were finally selected to verify the performance of WECnet. Among them, a rectangular box was used to mark the position of the target, indicating more intuitive data in the target detection. Furthermore, an end-to-end method was adopted in the CSRnet suitable for the crowd counting and high-quality generation of density map, particularly easy to train and extend the receptive field using the hole convolution. The overall counting performance of the density map was better than the previous network, where there often occurred to miss the dense and seriously occluded targets. In the selection of positive samples, a single target was output with the multiple predicted targets in the post-processing of target detection. In the density map counting, the multiple columns were used in the MCNN model to train separately, where the larger parameters failed to the different sizes of targets, leading to the difficulty to train. Therefore, the CSRnet was improved to deal with these issues, according to the characteristics of wheat. In the front end of the network, the first 12 layers of VGG19 model were used for the feature extraction, where the context semantic features were fused to fully extract the feature information of wheat ear. A back-end network used the convolution with the different void ratios to enlarge the receptive field and high-quality density map output. Additionally, the model was trained to verify the transferability and universality using the global wheat dataset, further to count the wheat field images taken by UAV in the two places. The experiments showed that the determination coefficient, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) of the training model in the global wheat dataset reached 0.95, 6.1, and 4.78, respectively, which were 4.4%, 13.2%, and 9.8% higher than those of the original population counting network. In the counting of UAV images, the determination coefficient of the optimal model was 0.886, and the total estimate number of 3 880 ears from the 46 images was 3 871, where the error rate was only 0.23%, indicating better performance than before. The average counting time of a single wheat image was 32 ms, indicating an excellent counting speed and accuracy. Consequently, the universal prediction model of field wheat density can also provide a potential data reference for the accurate counting and density prediction of the UAV wheat image.

基于无人机图像的多尺度感知麦穗计数方法

Method for the multiscale perceptual counting of wheat ears based on UAV images