基于改进YOLOv3的果园复杂环境下苹果果实识别

    Apple fruit recognition in complex orchard environment based on improved YOLOv3

    • 摘要: 为使采摘机器人能够全天候在不同光照、重叠遮挡、大视场等果园复杂环境下对不同成熟度的果实进行快速、准确识别,该研究提出了一种基于改进YOLOv3的果实识别方法。首先,将DarkNet53网络中的残差模块与CSPNet(Cross Stage Paritial Network)结合,在保持检测精度的同时降低网络的计算量;其次,在原始YOLOv3模型的检测网络中加入SPP(Spatial Pyramid Pooling)模块,将果实的全局和局部特征进行融合,提高对极小果实目标的召回率;同时,采用Soft NMS(Soft Non-Maximum Suppression)算法代替传统NMS(Non-Maximum Suppression)算法,增强对重叠遮挡果实的识别能力;最后,采用基于Focal Loss和CIoU Loss的联合损失函数对模型进行优化,提高识别精度。以苹果为例进行的试验结果表明:经过数据集训练之后的改进模型,在测试集下的MAP(Mean Average Precision)值达到96.3%,较原模型提高了3.8个百分点;F1值达到91.8%,较原模型提高了3.8个百分点;在GPU下的平均检测速度达到27.8帧/s,较原模型提高了5.6帧/s。与Faster RCNN、RetinaNet等几种目前先进的检测方法进行比较并在不同数目、不同光照情况下的对比试验结果表明,该方法具有优异的检测精度及良好的鲁棒性和实时性,对解决复杂环境下果实的精准识别问题具有重要参考价值。

       

      Abstract: Abstract: Automatic fruit recognition is one of the most important steps in fruit picking robots. In this study, a novel fruit recognition was proposed using improved YOLOv3, in order to identify the fruit quickly and accurately for the picking robot in the complex environment of the orchard (different light, occlusion, adhesion, large field of view, bagging, whether the fruit was mature or not). The specific procedure was as follows. 1) 4000 Apple images were captured under the complex environment via the orchard shooting and Internet collection. After labeling with LabelImg software, 3200 images were randomly selected as training set, 400 as verification set, and 400 as a test set. Mosaic data enhancement was also embedded in the model to improve the input images for the better generalization ability and robustness of model. 2) The network model was also improved. First, the residual module in the DarkNet53 network was combined with the CSPNet to reduce the amount of network calculation, while maintaining the detection accuracy. Second, the SPP module was added to the detection network of the original YOLOv3 model, further to fuse the global and local characteristics of fruits, in order to enhance the recall rate of model to the minimal fruit target. Third, a soft NMS was used to replace the traditional for better recognition ability of model, particularly for the overlapping fruits. Forth, the joint loss function using Focal and CIoU Loss was used to optimize the model for higher accuracy of recognition. 3) The model was finally trained in the deep learning environment of a server, thereby analyzing the training process after the dataset production and network construction. Optimal weights and parameters were achieved, according to the loss curve and various performance indexes of verification set. The results showed that the best performance was achieved, when training to the 109th epoch, where the obtained weight in this round was taken as the final model weight, precision was 94.1%, recall was 90.6%, F1 was 92.3%, mean average precision was 96.1%. Then, the test set is used to test the optimal model. The experimental results show that the Mean Average Precision value reached 96.3%, which is higher than 92.5% of the original model; F1 value reached 91.8%, higher than 88.0% of the original model; The average detection speed of video stream under GPU is 27.8 frame/s, which is higher than 22.2 frame/s of the original model. Furthermore, it was found that the best comprehensive performance was achieved to verify the effectiveness of the improvement compared with four advanced detection of Faster RCNN, RetinaNet, YOLOv5 and CenterNet. A comparison experiment was conducted under different fruit numbers and various lighting environments, further to verify the effectiveness and feasibility of the improved model. Correspondingly, the detection performance of model was significantly better for small target apples and severely occluded overlapping apples, compared with the improved YOLOv3 model, indicating the high effectiveness. In addition, the target detection using deep learning was robust to illumination, where the illumination change presented little impact on the detection performance. Consequently, the excellent detection, robustness and real-time performance can widely be expected to serve as an important support for accurate fruit recognition in complex environment.

       

    /

    返回文章
    返回