Abstract:
An efficient and accurate detection is often required for the grape maturity during harvesting in precision agriculture. This study aims to propose a lightweight, high-precision, and rapid detection for the grape fruit maturity using an improved YOLOv7 model, referred to as YOLOv7MCA. The YOLOv7 model was modified to introduce the MobileNetV4 as the backbone network of feature extraction. The MobileNetV4 was utilized to reduce the number of parameters. In turn, the computational load was significantly reduced to maintain the high detection accuracy and efficiency. Computational cost and performance of the model were balanced suitable for real-time applications with the limited computational resources. In addition, Convolutional Block Attention Module (CBAM) was incorporated to enhance the feature extraction of the YOLOv7 neck part. CBAM was introduced the attention mechanisms in both the channel and spatial dimensions, in order to focus on the more relevant features while suppressing less important ones. The attention mechanism was significantly improved to select the discriminative features, thereby enhancing the accuracy of grape maturity detection, especially in the complex or cluttered images. Furthermore, Adaptive Spatial Feature Fusion (ASFF) module was introduced into the feature fusion. The head part of the YOLOv7 model was optimized after the adaptive fusion of multi-scale feature maps. As such, the self-adaptive fusion mechanism was effectively detected both small and large objects, suitable for the various targets of different sizes. The precision and efficiency were improved among different types of images. A wider range of object scales was determined in real-world agricultural applications. Experimental results demonstrate that the improved YOLOv7MCA model was achieved in a precision of 95.2%, a recall rate of 87.2%, and a mean average precision of 93.9% on the grape image test set. The average detection time was 52.2 ms, and the memory usage of the improved model was 53.6 MB. The performance of object detection was outperformed the existing models, including Faster R-CNN, SSD, YOLOv5, YOLOv7, YOLOv8n, YOLOv9t, and YOLOv10n. The improved YOLOv7MCA model was reduced the memory usage to maintain the high detection accuracy with the less detection time. The speed and memory efficiency were especially important for the practical deployment on embedded devices with the limited resources. The improved model was suitable for real-time applications, such as automated grape harvesting. The rapid and accurate object detection was realized for the grape harvesting in agricultural applications. The potential of the YOLOv7MCA model was highlighted to enhance the effectiveness and scalability of automation systems in precision agriculture.