改进Mask R-CNN的温室环境下不同成熟度番茄果实分割方法

龙洁花; 赵春江; 林森; 郭文忠; 文朝武; 张宇

doi:10.11975/j.issn.1002-6819.2021.18.012

摘要: 基于深度神经网络的果实识别和分割是采摘机器人作业成功的关键步骤，但由于网络参数多、计算量大，导致训练时间长，当模型部署到采摘机器人上则存在运行速度慢，识别精度低等问题。针对这些问题，该研究提出了一种改进Mask R-CNN的温室环境下不同成熟度番茄果实分割方法，采用跨阶段局部网络（Cross Stage Partial Network，CSPNet）与Mask R-CNN网络中的残差网络（Residual Network，ResNet）进行融合，通过跨阶段拆分与级联策略，减少反向传播过程中重复的特征信息，降低网络计算量的同时提高准确率。在番茄果实测试集上进行试验，结果表明以层数为50的跨阶段局部残差网络（Cross Stage Partial ResNet50，CSP-ResNet50）为主干的改进Mask R-CNN模型对绿熟期、半熟期、成熟期番茄果实分割的平均精度均值为95.45%，F1分数为91.2%，单张图像分割时间为0.658 s。该方法相比金字塔场景解析网络（Pyramid Scene Parsing Network，PSPNet）、DeepLab v3+模型和以ResNet50为主干的Mask R-CNN模型平均精度均值分别提高了16.44、14.95和2.29个百分点，相比以ResNet50为主干的Mask R-CNN模型分割时间减少了1.98%。最后将以CSP- ResNet50为主干的改进Mask R-CNN模型部署到采摘机器人上，在大型玻璃温室中开展不同成熟度番茄果实识别试验，该模型识别正确率达到90%。该研究在温室环境下对不同成熟度番茄果实具有较好的识别性能，可为番茄采摘机器人精准作业提供依据。

Abstract: Abstract: Fruit recognition and segmentation using deep neural networks have widely been contributed to the operation of picking robots in modern agriculture. However, the most current models present a low accuracy of recognition with a low running speed, due mainly to a large number of network parameters and calculations. In this study, a high-resolution segmentation was proposed for the different ripeness of tomatoes under a greenhouse environment using improved Mask R-CNN. Firstly, a Cross Stage Partial Network (CSPNet) was used to merge with Residual Network (ResNet) in the Mask R-CNN model. Cross-stage splitting and cascading strategies were contributed to reducing the repeated features in the backpropagation process for a higher accuracy rate, while reducing the number of network calculations. Secondly, the cross-entropy loss function with weight factor was utilized to calculate the mask loss for the better segmentation effect of the model, due to the imbalance of the whole sample. An experiment was also performed on the test sets of tomato fruits with three ripeness levels. The results showed that the improved Mask R-CNN model with CSP-ResNet50 as the backbone network presented the mean average precision of 95.45%, the precision of 95.25%, the recall of 87.43%, F1-score of 0.912, and average segmentation time was 0.658 s. Furthermore, the mean average precision increased by 16.44, 14.95, and 2.29 percentage points, respectively, compared with the Pyramid Scene Parsing Network (PSPNet), DeepLab v3+, and Mask R-CNN with ResNet50 as the backbone network. Nevertheless, the average segmentation time increased by 14.83% and 27.52%, respectively, compared with PSPNet and DeepLab v3+. More importantly, the average segmentation time of improved Mask R-CNN with CSP-ResNet50 as the backbone network was reduced by 1.98%, compared with Mask R-CNN with ResNet50 as the backbone network. Additionally, the new model performed well in the segmentation of green and half-ripe tomato fruits under different light intensities, especially under low light, compared with PSPNet and DeepLab v3+. Finally, the improved Mask R-CNN model with CSP-ResNet50 as the backbone network was deployed to the picking robot, in order to verify the recognition and segmentation effect on different ripeness of tomato fruits in large glass greenhouses. In a low overlap rate of tomato fruits, the model identified the number of tomato fruits consistent with manual detection, where the accuracy was more than 90%. When the occlusion or overlap rate of tomato fruits exceeded 70%, particularly when the target was far away, the accuracy of 66.67% was achieved in the improved Mask R-CNN model, indicating a large gap with manual detection. Only a few features with the blur pixels were attributed to the difficulty to extract the shape and color features of tomato fruits. In addition, low light also posed a great challenge on recognition difficulty. Correspondingly, it was more difficult to pick tomatoes for the picking robot, particularly a relatively low success rate of picking, as the overlap was more serious. Fortunately, the picking success rate improved greatly, as the occlusions reduced. Consequently, the integrated multiple technologies (such as image acquisition equipment, the performance of the model, the execution end design of robotic arm, and automatic mechanization) can widely be expected to effectively improve the picking rate of mature tomatoes under the complex environment of a specific greenhouse. The new model also demonstrated strong robustness and applicability for the precise operation of tomato-picking robots in various complex environments.

改进Mask R-CNN的温室环境下不同成熟度番茄果实分割方法

Segmentation method of the tomato fruits with different maturities under greenhouse environment based on improved Mask R-CNN