Abstract:
In the context of cotton mechanical topping, several challenges arise due to the limitations of edge-moving devices, including restricted computing power and poor real-time performance. These issues, compounded by phenomena such as motion blur and small target occlusion, significantly hinder the detection process. The focus of this study is to address these challenges by proposing a novel, lightweight cotton bud detection model, named CottonBud-YOLOv5s, which is based on the well-known YOLOv5s architecture. This model incorporates several key improvements to optimize both performance and efficiency in detecting cotton buds in complex field environments. To enhance the model’s overall performance, the CottonBud-YOLOv5s model utilizes the ShuffleNetv2 backbone network, which is specifically chosen for its efficiency in reducing computational complexity while maintaining high detection accuracy. In addition, the DySample dynamic upsampling module is integrated to replace the original upsampling modules, further decreasing computational costs and improving detection speed. These innovations allow the model to run more efficiently on edge devices with limited computing power, addressing the real-time performance issues that often arise during practical applications in cotton mechanical topping. Moreover, the model is designed with an advanced detection head and attention mechanism to bolster its ability to handle varying object scales and complex contextual information. Specifically, the model introduces the ASFFHead detection head and the GC (global context) attention module in the head and neck components, respectively. The integration of these modules enhances the model's scale invariance and significantly improves its capacity for extracting context-based features, which is crucial for detecting small targets that may be occluded or blurred due to motion. These enhancements ultimately improve the model's robustness, enabling it to perform well in challenging real-world conditions. To validate the efficacy of the CottonBud-YOLOv5s model, a series of ablation studies and model comparison tests were conducted. The experimental results demonstrated that the introduction of the ASFFHead detection head and the GC global attention mechanism led to notable improvements in detection accuracy. Specifically, the average precision (AP) at 0.5:0.95 for small targets increased by 3.6 percentage points, while the average recall rate (AR) at the same threshold improved by 2.1 percentage points. For medium-sized targets, the average precision (AP) increased by 4.1 percentage points, and the average recall rate (AR) increased by 3.5 percentage points. For large targets, the average precision (AP) increased by 6.5 percentage points, and the average recall rate (AR) improved by 5.9 percentage points. These results underscore the effectiveness of the proposed enhancements in improving the detection of targets across a range of sizes. Furthermore, when compared to other state-of-the-art detection models, including Faster-RCNN, TOOD, RTDETR, YOLOv3s, YOLOv5s, YOLOv9s, and YOLOv10s, the CottonBud-YOLOv5s model showed significant improvements in detection speed. Specifically, it outperformed these models with speed increases of 26.4, 26.7, 24.2, 24.8, 11.5, 18.6, and 15.6 frames per second, respectively. Additionally, the mean average precision (mAP) was improved by 14.0, 13.3, 5.5, 0.9, 0.8, 0.2, and 1.5 percentage points in comparison to the aforementioned models. The recall rate also saw substantial increases of 16.8, 16.0, 3.2, 2.0, 0.8, 0.5, and 1.2 percentage points, respectively. Overall, the CottonBud-YOLOv5s model achieved a remarkable mean average precision (mAP) of 97.9%, a recall rate of 97.2%, and a CPU detection speed of 27.9 frames per second, demonstrating its exceptional performance in both accuracy and speed. Visual analysis of the model’s performance further confirmed that the CottonBud-YOLOv5s model excels in various detection scenarios, including single-plant, multi-plant, motion blur, and small target occlusion conditions. Its superior performance in these areas highlights its robustness and effectiveness in real-world agricultural environments, where such challenges are commonly encountered. In conclusion, the CottonBud-YOLOv5s model offers a promising solution for precise, real-time detection of cotton buds in densely planted environments. With its high detection accuracy, enhanced robustness, and efficient computational performance, it provides a solid visual detection foundation for cotton mechanized topping, contributing significantly to the advancement of automated agricultural practices.