基于改进YOLOv7的复杂环境下的葡萄成熟度检测

    Detecting grape ripeness in complex environments using improved YOLOv7

    • 摘要: 为实现葡萄果实成熟度的高精度、快速检测,该研究提出一种YOLOv7MCA的轻量化检测方法。以YOLOv7模型为基础,将MobileNetV4引入YOLOv7模型中作为新的骨干特征提取网络,以减少网络的参数量,在保持较低计算成本的同时,提高了模型的精度和效率;在YOLOv7的颈部加强特征提取网络中引入CBAM(convolutional block attention module),CBAM通过在通道和空间维度上分别引入注意力机制,增强了模型的特征选择能力;在特征融合结构中,采用自适应空间特征融合模块ASFF(adaptive spatial feature fusion)优化Head部分,ASFF通过不同尺度特征图之间自适应的融合,使得深度学习模型能够更好地处理不同大小的目标,提升了目标检测任务的精度和效率;试验结果表明,改进后的YOLOv7模型在葡萄图像测试集上的精确度为95.2%,召回率为87.2%,平均精度均值为93.9%,平均检测时间为52.2 ms,模型内存占用量为53.6MB,均优于Faster R-CNN、SSD、YOLOv5、YOLOv7、YOLOv8n、YOLOv9t和YOLOv10n检测模型。改进后的YOLOv7MCA模型占用内存少的同时保持了高检测精度,减少了检测时间,能够为葡萄果实的自动化机械采摘提供技术支撑。

       

      Abstract: An efficient and accurate detection is often required for the grape maturity during harvesting in precision agriculture. This study aims to propose a lightweight, high-precision, and rapid detection for the grape fruit maturity using an improved YOLOv7 model, referred to as YOLOv7MCA. The YOLOv7 model was modified to introduce the MobileNetV4 as the backbone network of feature extraction. The MobileNetV4 was utilized to reduce the number of parameters. In turn, the computational load was significantly reduced to maintain the high detection accuracy and efficiency. Computational cost and performance of the model were balanced suitable for real-time applications with the limited computational resources. In addition, Convolutional Block Attention Module (CBAM) was incorporated to enhance the feature extraction of the YOLOv7 neck part. CBAM was introduced the attention mechanisms in both the channel and spatial dimensions, in order to focus on the more relevant features while suppressing less important ones. The attention mechanism was significantly improved to select the discriminative features, thereby enhancing the accuracy of grape maturity detection, especially in the complex or cluttered images. Furthermore, Adaptive Spatial Feature Fusion (ASFF) module was introduced into the feature fusion. The head part of the YOLOv7 model was optimized after the adaptive fusion of multi-scale feature maps. As such, the self-adaptive fusion mechanism was effectively detected both small and large objects, suitable for the various targets of different sizes. The precision and efficiency were improved among different types of images. A wider range of object scales was determined in real-world agricultural applications. Experimental results demonstrate that the improved YOLOv7MCA model was achieved in a precision of 95.2%, a recall rate of 87.2%, and a mean average precision of 93.9% on the grape image test set. The average detection time was 52.2 ms, and the memory usage of the improved model was 53.6 MB. The performance of object detection was outperformed the existing models, including Faster R-CNN, SSD, YOLOv5, YOLOv7, YOLOv8n, YOLOv9t, and YOLOv10n. The improved YOLOv7MCA model was reduced the memory usage to maintain the high detection accuracy with the less detection time. The speed and memory efficiency were especially important for the practical deployment on embedded devices with the limited resources. The improved model was suitable for real-time applications, such as automated grape harvesting. The rapid and accurate object detection was realized for the grape harvesting in agricultural applications. The potential of the YOLOv7MCA model was highlighted to enhance the effectiveness and scalability of automation systems in precision agriculture.

       

    /

    返回文章
    返回