基于多模态图像的自然环境下油茶果识别

周宏平; 金寿祥; 周磊; 郭自良; 孙梦梦

doi:10.11975/j.issn.1002-6819.202303054

摘要: 针对自然条件下油茶果生长条件复杂，存在大量遮挡、重叠的问题，提出了一种基于RGB-D（red green blue-depth）多模态图像的双主干网络模型YOLO-DBM（YOLO-dual backbone model），用来进行油茶果的识别定位。首先，在YOLOv5s模型主干网络CSP-Darknet53的基础上设计了一种轻量化的特征提取网络。其次，使用两个轻量化的特征提取网络分别提取彩色和深度特征，接着使用基于注意力机制的特征融合模块将彩色特征与深度特征进行分级融合，再将融合后的特征层送入特征金字塔网络（feature pyramid network，FPN），最后进行预测。试验结果表明，使用RGB-D图像的YOLO-DBM模型在测试集上的精确率P、召回率R和平均精度A_P分别为94.8%、94.6%和98.4%，单幅图像平均检测耗时0.016 s。对比YOLOv3、YOLOv5s和YOLO-IR（YOLO-InceptionRes）模型，平均精度A_P分别提升2.9、0.1和0.3个百分点，而模型大小仅为6.21MB，只有YOLOv5s大小的46%。另外，使用注意力融合机制的YOLO-DBM模型与只使用拼接融合的YOLO-DBM相比，精确率P、召回率R和平均精度A_P分别提高了0.2、1.6和0.1个百分点，进一步验证该研究所提方法的可靠性与有效性，研究结果可为油茶果自动采收机的研制提供参考。

Abstract: An accurate and rapid identification can greatly contribute to the automated harvesting of Camellia oleifera fruits. However, Camellia oleifera grown in the natural environment has the dense branches and leaves, severely obstructed fruits, leading to the overlapping fruits. Only RGB images cannot fully meet the required effectiveness of the fruit recognition in modern agriculture. In this study, a dual backbone network model was proposed to combine the Red Green Blue-Depth (RGB-D) multi-modal images for the recognition and localization of Camellia oleifera fruits. Firstly, the lightweight improved YOLOv5s model was selected to detect the Camellia oleifera fruit targets. The YOLO-IR (YOLO-InceptionRes) was introduced the InceptionRes module into a feature extraction network for the multi-scale information fusion using four convolution operations of different sizes and concatenation. At the same time, the FPN (Feature Pyramid Network) + PAN (Path Aggregation Network) module of YOLOv5s was simplified into an FPN module to reduce the network complexity. Furthermore, the depth and width of the model were compressed to limit the model size for the smaller number of model parameters. The improved YOLO-IR was achieved in an average progress A_P decrease of 0.2 percentage points, compared with the YOLOv5s, but the model size decreased by 69%. Provide support for building A lightweight dual backbone model was provided for the building support. Secondly, a dual backbone detection of Camellia oleifera fruit object, YOLO-DBM (YOLO-Dual Backbone Model) was constructed with the RGB-D images, according to the YOLO-IR. Two feature extraction networks were the same as the YOLO-IR to extract the color and depth features. An attention mechanism was constructed with the feature fusion module to fuse the color and depth features, Hierarchical fusion of color features and depth features at different scales. The attention module consisted of the spatial and channel attention mechanism. Specifically, the spatial attention mechanism was used to increase the weight of effective regions in the deep feature layer, but to reduce the interference of deep holes. Then, it was concatenated with the RGB feature layer. As such, the channel attention mechanism was used to emphasize the contribution of effective channels in the fused feature layer. Finally, the fused feature layer was input into the prediction module for the prediction. The experimental results show that the accuracy P, recall R, and average accuracy A_P of the YOLO-DBM model using RGB-D images on the test set were 94.8%, 94.6%, and 98.4%, respectively. The average detection time for a single image was 0.016s. Compared with the YOLOv3, YOLOv5s, and YOLO-IR models, the average accuracy of A_P was improved by 2.9, 0.1, and 0.3 percentage points, respectively, while the model size was only 6.21MB, which was only 46% of the YOLOv5s size. In addition, the accuracy P, recall R, and average accuracy A_P increased by 0.2, 1.6, and 0.1 percentage points, respectively, compared with the YOLO-DBM model with the attention fusion module and the YOLO-DBM model with splicing fusion. The high effectiveness was also verified for the dual backbone network and attention fusion module. The finding can provide a strong reference and a new approach for the fruit recognition tasks in the oil tea fruit automatic harvesters.

基于多模态图像的自然环境下油茶果识别

Recognition of camellia oleifera fruits in natural environment using multi-modal images