Method for detecting and counting wheat ears using RT-WEDT

LI Jie; YANG Zihao; ZHENG Quan; QIAO Jiangwei; TU Jingmin

doi:10.11975/j.issn.1002-6819.202405200

LI Jie, YANG Zihao, ZHENG Quan, et al. Method for detecting and counting wheat ears using RT-WEDT[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(21): 146-156. DOI: 10.11975/j.issn.1002-6819.202405200

Citation:

Method for detecting and counting wheat ears using RT-WEDT

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Wheat is one of the most widely cultivated staple food crops in the globe. Its yield prediction can share the profound implications for food security. Deep learning can be expected to detect and count wheat spikes, and then rapidly predict wheat yields. However, some challenges are still remained on in the low detection accuracy and a large number of model parameters under complex agricultural environments. This study aims to propose a lightweight wheat ear detection model, RT-WEDT (Real-Time Wheat Ear Detection Transformer), using RT-DETR. Firstly, EfficientFormerV2 was selected as the backbone network structure of RT-WEDT to fully capture both the long-range and local features of wheat ear images with the high computational efficiency. Secondly, a multiscale enhanced hybrid encoder (MSEHE) was introduced to take as the input feature maps at four scales output from the four downsampling stages of the backbone network. The MSEHE consisted of three sub-modules: the Attention-based intra-scale feature interaction (AIFI) module acted on the smallest feature maps to extract global features of the image; the Scale Sequence Feature Fusion (SSFF) module with multiscale fusion and 3D convolution was utilized to extract information about wheat ear targets at different scales. The outputs of these two modules were fed into the Enhanced Feature Fusion Module (EFFM) for feature fusion, in order to integrate the global and local information of the wheat ear image. Additionally, the localization accuracy was improved for wheat targets. WIoUv3 loss function was employed as the bounding box one to enhance the quality of the anchor frame. The detection dataset was obtained for the global wheat head. Experimental results demonstrate that the RT-WEDT model was had 12 M parameters, a floating-point operation capacity of 33.1 G, an average accuracy of 90.2%, and a detection speed of 79.7 frames/s. Compared with RT-DETR, the RT-WEDT model was had 62.5% fewer parameters, 68% fewer floating-point operations, an AP_50-95 increase of 0.6 percent point, an AP₅₀ increase of 0.5 percent point, and a detection speed increase of 22.4%. The AP_50-95 values were improved by 8.2, 2.4, and 1.7 percent points, respectively, and the AP₅₀ values were improved by 4.6, 1.1, and 0.7 percent points, respectively, compared with YOLOv5, YOLOv8, and YOLOX with a similar parameter volume. Furthermore, samples were classified from the detection dataset of global wheat heads. The performance was then evaluated on wheat ear targets in various scenarios. The experimental results indicate that the dense and overlapping wheat ears were the most significant influencing factors on the performance of the model, followed by image blurriness. The intensity of light during photography shared the a minimal effect on the detection. Drone The drone perspective wheat spike dataset (DPWSD) was constructed for two periods, in order to verify the robustness of the improved RT-WEDT. And then, the RT-WEDT was directly tested on the drone perspective wheat dataset. Specifically, 60.2% AP_50-95 and 97.4% AP₅₀ were achieved during the filling stage; 61.0% AP_50-95 and 96.1% AP₅₀ were achieved during the maturity stage. The counting experiments were conducted on the test set from the global wheat dataset and the self-built drone perspective wheat ear dataset, respectively, in order to validate the counting effectiveness of RT-WEDT. The R² values of RT-WEDT on the DPWSD were 0.949, respectively, indicating an excellent fit between predicted and actual values. Therefore, the RT-WEDT was highly accurate for wheat ear detection and counting. The improved model was significantly reduced the complexity to maintain a high average accuracy, indicating the real-time detection of the wheat ear. This finding can provide the technical support for the efficient and rapid estimation of wheat yields in smart agriculture.

FullText(HTML)

References (48)

Cited By

Method for detecting and counting wheat ears using RT-WEDT

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content