基于RT-WEDT的麦穗检测与计数方法

    Method for detecting and counting wheat ears using RT-WEDT

    • 摘要: 小麦是重要的粮食作物之一,麦穗计数对于预测麦穗产量至关重要。针对现有的检测计数方法在复杂农田环境下存在检测精度不足、模型参数量大等问题,该研究提出一种轻量级麦穗检测模型RT-WEDT(real-time wheat ear detection transformer)。首先,选择基于transformer的轻量化网络EfficientFormerV2作为RT-WEDT的骨干网络,以提升特征提取效率的同时学习麦穗图像的长距离特征;其次,设计三重特征融合模块(triple feature fusion,TFF)并引入尺度序列特征融合模块(scale sequence feature fusion,SSFF)以构建多尺度增强混合编码器(multi-scale enhanced hybrid encoder,MSEHE),达到浅层和深层特征充分融合,提高模型在不同尺度上的检测精度;最后,采用WIoUv3损失函数作为边界框损失函数来优化模型对麦穗目标的定位准确度。在全球麦穗数据集上的试验结果表明,RT-WEDT模型的交并比阈值0.50的平均精度AP50为90.2%,高于传统的目标检测模型。在自建的无人机视角麦穗数据集(drone perspective wheat spike dataset, DPWSD)上的交并比阈值0.50的平均精度AP50为96.8%,验证了模型有较好的普适性。此外模型的参数量为12M,检测速度为79.7帧/s,可达到麦穗高通量实时检测的目的。该研究为实现高效、快速的小麦产量估计提供了技术支撑,对推动智慧农业的发展具有重要意义。

       

      Abstract: Wheat is one of the most widely cultivated staple food crops in the globe. Its yield prediction can share the profound implications for food security. Deep learning can be expected to detect and count wheat spikes, and then rapidly predict wheat yields. However, some challenges are still remained on in the low detection accuracy and a large number of model parameters under complex agricultural environments. This study aims to propose a lightweight wheat ear detection model, RT-WEDT (Real-Time Wheat Ear Detection Transformer), using RT-DETR. Firstly, EfficientFormerV2 was selected as the backbone network structure of RT-WEDT to fully capture both the long-range and local features of wheat ear images with the high computational efficiency. Secondly, a multiscale enhanced hybrid encoder (MSEHE) was introduced to take as the input feature maps at four scales output from the four downsampling stages of the backbone network. The MSEHE consisted of three sub-modules: the Attention-based intra-scale feature interaction (AIFI) module acted on the smallest feature maps to extract global features of the image; the Scale Sequence Feature Fusion (SSFF) module with multiscale fusion and 3D convolution was utilized to extract information about wheat ear targets at different scales. The outputs of these two modules were fed into the Enhanced Feature Fusion Module (EFFM) for feature fusion, in order to integrate the global and local information of the wheat ear image. Additionally, the localization accuracy was improved for wheat targets. WIoUv3 loss function was employed as the bounding box one to enhance the quality of the anchor frame. The detection dataset was obtained for the global wheat head. Experimental results demonstrate that the RT-WEDT model was had 12 M parameters, a floating-point operation capacity of 33.1 G, an average accuracy of 90.2%, and a detection speed of 79.7 frames/s. Compared with RT-DETR, the RT-WEDT model was had 62.5% fewer parameters, 68% fewer floating-point operations, an AP50-95 increase of 0.6 percent point, an AP50 increase of 0.5 percent point, and a detection speed increase of 22.4%. The AP50-95 values were improved by 8.2, 2.4, and 1.7 percent points, respectively, and the AP50 values were improved by 4.6, 1.1, and 0.7 percent points, respectively, compared with YOLOv5, YOLOv8, and YOLOX with a similar parameter volume. Furthermore, samples were classified from the detection dataset of global wheat heads. The performance was then evaluated on wheat ear targets in various scenarios. The experimental results indicate that the dense and overlapping wheat ears were the most significant influencing factors on the performance of the model, followed by image blurriness. The intensity of light during photography shared the a minimal effect on the detection. Drone The drone perspective wheat spike dataset (DPWSD) was constructed for two periods, in order to verify the robustness of the improved RT-WEDT. And then, the RT-WEDT was directly tested on the drone perspective wheat dataset. Specifically, 60.2% AP50-95 and 97.4% AP50 were achieved during the filling stage; 61.0% AP50-95 and 96.1% AP50 were achieved during the maturity stage. The counting experiments were conducted on the test set from the global wheat dataset and the self-built drone perspective wheat ear dataset, respectively, in order to validate the counting effectiveness of RT-WEDT. The R2 values of RT-WEDT on the DPWSD were 0.949, respectively, indicating an excellent fit between predicted and actual values. Therefore, the RT-WEDT was highly accurate for wheat ear detection and counting. The improved model was significantly reduced the complexity to maintain a high average accuracy, indicating the real-time detection of the wheat ear. This finding can provide the technical support for the efficient and rapid estimation of wheat yields in smart agriculture.

       

    /

    返回文章
    返回