Abstract:
Cotton mechanical topping is one of the most important cultural practices to improve crop yield during production. The shoots of cotton topping can be cut at about 10–20 cm from the top of plants. However, the performance of mechanical topping has been limited to computing power and real-time transport in several edge-moving devices at present. The detection can also be confined to the motion blur and small target occlusion. In this study, a lightweight detection model of a cotton bud (named CottonBud-YOLOv5s) was proposed using the well-known YOLOv5s architecture. Both performance and efficiency were optimized to detect the cotton buds in complex field environments. The ShuffleNetv2 backbone network was utilized to enhance the overall performance of the CottonBud-YOLOv5s model. The computational complexity was reduced to maintain the high accuracy of detection. In addition, the DySample dynamic upsampling module was integrated to replace the original ones. The computational costs were further reduced to improve the speed of detection. As such, the improved model was run more efficiently on edge devices with limited computing power. Real-time performance was also achieved during cotton mechanical topping. Moreover, the ASFFHead detection head and GC (global context) attention mechanism were also introduced into the head and neck components, in order to handle the varying object scales and complex contextual information. The scale invariance was significantly improved to extract the context-based features, which was crucial to detect the small targets that occluded or blurred due to the various motions in fields. Ultimately, the robustness of the model was improved to perform the best in real-world conditions. A series of ablation and comparison tests were conducted to validate the efficacy of the CottonBud-YOLOv5s model. The experimental results demonstrated that the introduction of the ASFFHead detection head and the GC global attention mechanism led to notable improvements in detection accuracy. Specifically, the average precision (AP) at 0.5:0.95 for small targets increased by 3.6 percentage points, while the average recall rate (AR) at the same threshold was improved by 2.1 percentage points. In the medium-sized targets, the AP and AR increased by 4.1 and 3.5 percentage points, respectively. In the large targets, the AP and AR increased by 6.5 and 5.9 percentage points, respectively. The improved model performed the best to detect the targets across a range of sizes. Furthermore, the CottonBud-YOLOv5s model shared significant improvements in the detection speed, compared with the state-of-the-art detection models, including Faster-RCNN, TOOD, RTDETR, YOLOv3s, YOLOv5s, YOLOv9s, and YOLOv10s. Specifically, the speed outperformed with the increases of 26.4, 26.7, 24.2, 24.8, 11.5, 18.6, and 15.6 frames per second, respectively. Additionally, the mean average precision (mAP) was improved by 14.0, 13.3, 5.5, 0.9, 0.8, 0.2, and 1.5 percentage points. The recall rate substantially increased by 16.8, 16.0, 3.2, 2.0, 0.8, 0.5, and 1.2 percentage points, respectively. Overall, the CottonBud-YOLOv5s model achieved a remarkable mean average precision (mAP) of 97.9%, a recall rate of 97.2%, and a CPU detection speed of 27.9 frames per second, indicating exceptional performance in both accuracy and speed. Visual analysis confirmed that the CottonBud-YOLOv5s model excelled in various detection scenarios, including the single-plant, multi-plant, motion blur, and small target occlusion conditions. Its superior performance in these areas highlighted its robustness and effectiveness in real-world agricultural environments, where such challenges were commonly encountered. In conclusion, the CottonBud-YOLOv5s model can offer a promising solution to the precise, real-time detection of cotton buds in densely planted environments, indicating high detection accuracy, enhanced robustness, and efficient computational performance. The finding can provide a solid visual detection for cotton mechanized topping in automated agricultural practices.