CottonBud-YOLOv5s轻量型棉花顶芽检测算法

赵露强; 彭强吉; 兰玉彬; 康建明; 张敬文; 代建龙; 陈玉龙

doi:10.11975/j.issn.1002-6819.202407140

CottonBud-YOLOv5s轻量型棉花顶芽检测算法

CottonBud-YOLOv5s lightweight cotton bud detection algorithm

摘要

摘要: 针对棉花机械打顶作业过程中，边缘移动设备算力受限实时性差，运动模糊、小目标遮挡导致难以检测的问题，该研究基于YOLOv5s模型提出CottonBud-YOLOv5s轻量型棉花顶芽检测模型，该模型采用ShuffleNetv2主干网络和DySample动态上采样模块替换原始模块降低计算量，提高模型检测速度；头部（head）和颈部（neck）分别引入ASFFHead检测头和GC（global context）全局注意力模块增强模型尺度不变性和上下文特征提取能力，提高小目标遮挡和运动模糊图像的检测性能。通过消融试验和模型对比试验，验证CottonBud-YOLOv5s棉花顶芽检测模型的可行性。试验结果表明：引入ASFFHead检测头和GC全局注意力机制后，小目标平均精度AP_0.5:0.95和平均召回率AR_0.5:0.95值比引入前分别提升3.6、2.1个百分点，中目标平均精度AP_0.5:0.95和平均召回率AR_0.5:0.95值分别提升4.1、3.5个百分点，大目标平均精度AP_0.5:0.95和平均召回率AR_0.5:0.95值分别提升6.5、5.9个百分点；与Faster-RCNN、TOOD、RTDETR、YOLOv3s、YOLOv5s、YOLOv9s和YOLOv10s检测模型相比检测速度分别提升26.4、26.7、24.2、24.8、11.5、18.6、15.6帧/s，平均精度均值分别提升14.0、13.3、5.5、0.9、0.8、0.2、1.5个百分点，召回率分别提升16.8、16.0、3.2、2.0、0.8、0.5、1.2个百分点，CottonBud-YOLOv5s模型平均精度均值达到97.9%，召回率达到97.2%，CPU检测速度达到27.9帧/s。由模型可视化分析可知CottonBud-YOLOv5s模型在单株、多株、运动模糊、小目标遮挡的整体检测性能优于其他检测模型。该模型具有较高的检测精度、鲁棒性和检测速度，适用于密植环境下棉花顶芽的精准检测，可为棉花机械化打顶提供视觉检测基础。

Abstract: Cotton mechanical topping is one of the most important cultural practices to improve crop yield during production. The shoots of cotton topping can be cut at about 10–20 cm from the top of plants. However, the performance of mechanical topping has been limited to computing power and real-time transport in several edge-moving devices at present. The detection can also be confined to the motion blur and small target occlusion. In this study, a lightweight detection model of a cotton bud (named CottonBud-YOLOv5s) was proposed using the well-known YOLOv5s architecture. Both performance and efficiency were optimized to detect the cotton buds in complex field environments. The ShuffleNetv2 backbone network was utilized to enhance the overall performance of the CottonBud-YOLOv5s model. The computational complexity was reduced to maintain the high accuracy of detection. In addition, the DySample dynamic upsampling module was integrated to replace the original ones. The computational costs were further reduced to improve the speed of detection. As such, the improved model was run more efficiently on edge devices with limited computing power. Real-time performance was also achieved during cotton mechanical topping. Moreover, the ASFFHead detection head and GC (global context) attention mechanism were also introduced into the head and neck components, in order to handle the varying object scales and complex contextual information. The scale invariance was significantly improved to extract the context-based features, which was crucial to detect the small targets that occluded or blurred due to the various motions in fields. Ultimately, the robustness of the model was improved to perform the best in real-world conditions. A series of ablation and comparison tests were conducted to validate the efficacy of the CottonBud-YOLOv5s model. The experimental results demonstrated that the introduction of the ASFFHead detection head and the GC global attention mechanism led to notable improvements in detection accuracy. Specifically, the average precision (AP) at 0.5:0.95 for small targets increased by 3.6 percentage points, while the average recall rate (AR) at the same threshold was improved by 2.1 percentage points. In the medium-sized targets, the AP and AR increased by 4.1 and 3.5 percentage points, respectively. In the large targets, the AP and AR increased by 6.5 and 5.9 percentage points, respectively. The improved model performed the best to detect the targets across a range of sizes. Furthermore, the CottonBud-YOLOv5s model shared significant improvements in the detection speed, compared with the state-of-the-art detection models, including Faster-RCNN, TOOD, RTDETR, YOLOv3s, YOLOv5s, YOLOv9s, and YOLOv10s. Specifically, the speed outperformed with the increases of 26.4, 26.7, 24.2, 24.8, 11.5, 18.6, and 15.6 frames per second, respectively. Additionally, the mean average precision (mAP) was improved by 14.0, 13.3, 5.5, 0.9, 0.8, 0.2, and 1.5 percentage points. The recall rate substantially increased by 16.8, 16.0, 3.2, 2.0, 0.8, 0.5, and 1.2 percentage points, respectively. Overall, the CottonBud-YOLOv5s model achieved a remarkable mean average precision (mAP) of 97.9%, a recall rate of 97.2%, and a CPU detection speed of 27.9 frames per second, indicating exceptional performance in both accuracy and speed. Visual analysis confirmed that the CottonBud-YOLOv5s model excelled in various detection scenarios, including the single-plant, multi-plant, motion blur, and small target occlusion conditions. Its superior performance in these areas highlighted its robustness and effectiveness in real-world agricultural environments, where such challenges were commonly encountered. In conclusion, the CottonBud-YOLOv5s model can offer a promising solution to the precise, real-time detection of cotton buds in densely planted environments, indicating high detection accuracy, enhanced robustness, and efficient computational performance. The finding can provide a solid visual detection for cotton mechanized topping in automated agricultural practices.

HTML全文

参考文献(33)

施引文献

资源附件(0)