龙燕,杨智优,何梦菲. 基于改进YOLOv7的疏果期苹果目标检测方法[J]. 农业工程学报,2023,39(14):191-199. DOI: 10.11975/j.issn.1002-6819.202305069
    引用本文: 龙燕,杨智优,何梦菲. 基于改进YOLOv7的疏果期苹果目标检测方法[J]. 农业工程学报,2023,39(14):191-199. DOI: 10.11975/j.issn.1002-6819.202305069
    LONG Yan, YANG Zhiyou, HE Mengfei. Recognizing apple targets before thinning using improved YOLOv7[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(14): 191-199. DOI: 10.11975/j.issn.1002-6819.202305069
    Citation: LONG Yan, YANG Zhiyou, HE Mengfei. Recognizing apple targets before thinning using improved YOLOv7[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(14): 191-199. DOI: 10.11975/j.issn.1002-6819.202305069

    基于改进YOLOv7的疏果期苹果目标检测方法

    Recognizing apple targets before thinning using improved YOLOv7

    • 摘要: 疏果期苹果目标检测是实现疏果机械化、自动化需要解决的关键问题。为实现疏果期苹果目标准确检测,该研究以YOLOv7为基础网络,融合窗口多头自注意力机制,设计了一种适用于近景色小目标检测的深度学习网络。首先在YOLOv7模型的小目标检测层中添加Swin Transformer Block,保留更多小尺度目标特征信息,将预测框与真实框方向之间的差异考虑到模型训练中,提高模型检测精度,将YOLOv7中的损失函数CIoU替换为SIoU。最后利用Grad-CAM方法产生目标检测热力图,进行有效特征可视化,理解模型关注区域。经测试,该文模型的检测均值平均精度为95.2%,检测准确率为92.7%,召回率为91.0%,模型所占内存为81 MB,与原始模型相比,均值平均精度、准确率、召回率分别提高了2.3、0.9、1.3个百分点。该文模型对疏果期苹果具有更好的检测效果和鲁棒性,可为苹果幼果生长监测、机械疏果等研究提供技术支持。

       

      Abstract: Apple thinning is one of the most important steps in the apple tree management. Mechanical thinning can greatly reduce the labor costs while retaining young apples, according to the experience. Therefore, mechanical thinning instead of manual thinning can be the promising trend in modern orchard management. Target detection before thinning is often required for the automate apple thinning. However, the small baby apples are closely grouped with the background-like hue and texture, leading to the blur imaging under the other apples or leaves in bunches. In this study, an improved YOLOv7 of deep learning network was designed to integrate the Swin Transformer Block suitable for the close-range color small target detection. Firstly, Swin Transformer Block was added into the small target detection layer of YOLOv7 model, in order to retain more small-scale target feature information with the network's accuracy of small targets. Secondly, the loss function CIoU in YOLOv7 was replaced by SIoU to consider the difference between the direction of the prediction box and the real box during training, particularly for the high accuracy of model detection. Finally, the heat map of target detection was produced using the Grad-CAM approach. Effective feature visualization was also carried out to determine the model's zone of interest. There were two shooting methods (long- and close-range), in terms of image data set production. The image data sets were then divided into the close- and long-range one after labeling. The test results showed that the improved YOLOv7 model was achieved in the mean average precision (mAP) on the close-range dataset of 96.1%, which was 0.7 percentage points greater than that of YOLOv7. The mAP on the long-range image dataset was 93.6%, which was 3.5 percentage points higher than that of YOLOv7. Furthermore, the mAP of improved YOLOv7 was 95.2% on the whole data set, where the detection accuracy was 92.7%, the recall rate was 91.0%, and the memory occupied by the model was 81 MB. The mAP, accuracy, and recall rate were improved by 2.3, 0.9, and 1.3 percentage points, respectively, compared with the YOLOv7 model. The detection heat map of improved YOLOv7 shared the higher thermal value in the apple area, indicating the more effective features after extraction. At the same time, the contribution of Swin Transformer Block's to each target detection layer was evaluated to compare the effects of three detection layers on the ability to detect young apples. Therefore, the addition of the Swin-Transformer Block into the small target detection layer was considerably enhanced the detection performance of improved model. A total of 1.9, 0.3, and 1.6 percentage points were achieved in the mAP, precision, and recall rate, respectively. There was no significantly different performance of the improved model, when the Swin Transformer Block was added into the large, medium, and small detection layers. Once the Swin-Transformer module was just introduced to the small target detection layer, the model memory use increased by 25 MB. In summary, the improved YOLOv7 network can performed the better detection and stronger robustness for apples in the thinning period. The finding can provide the necessary technical support for apple growth monitoring, and mechanical thinning.

       

    /

    返回文章
    返回