基于CSS-Cascade Mask R-CNN的有遮挡多片烟叶部位识别

朱波; 胡朋; 刘宇晨; 张冀武

doi:10.11975/j.issn.1002-6819.202311090

基于CSS-Cascade Mask R-CNN的有遮挡多片烟叶部位识别

Recognition of the position for partially occluded multiple tobacco leaves based on CSS-Cascade Mask R-CNN

摘要

摘要: 烟叶部位信息是进行烟叶分级的重要参考信息，准确识别烟叶部位对实现烟叶智能分级具有重要意义。在实际的烟叶智能分级应用中，为了提高分级效率，需要对多片烟叶等级进行同步识别。受现行上料方式的限制，同步识别的多片烟叶间往往存在局部遮挡的问题，给烟叶的目标检测和部位识别带来挑战。该研究提出一种基于改进Cascade Mask R-CNN，融合通道、非局部和空间注意力机制，并引入柔性极大值抑制检测框交并操作与斯库拉交并比损失函数（SIoU）的目标检测与识别模型（CSS-Cascade Mask R-CNN）。该模型对Cascade Mask R-CNN进行了三方面的改进：一是在其骨干网络Resent101上同时引入通道、非局部、空间3种注意力机制，使网络更加关注未被遮挡且部位特征明显区域的显著度；二是将Cascade Mask R-CNN中的损失函数SmoothL1Loss替换为SIoU损失函数，将预测框与真实框之间的方向差异引入到模型训练中提高模型检测精度；三是在筛选候选框时将常规的非极大抑制（non-max-suppression）替换为柔性非极大抑制，以避免删除候选框造成信息丢失。试验结果表明，利用提出的模型对有遮挡多片烟叶进行检测和部位识别，检测框平均准确率均值（bbox_mAP50）达到了80.2%，与改进前的Cascade Mask R-CNN模型相比提高了7.5个百分点。提出的模型与多个主流的目标检测模型（YOLOvX、YOLOv3、YOLOv5、Mask R-CNN、Cascade R-CNN）相比，分别高7.1、10.2、5.8、9.2、8.4个百分点，尤其是对较难区分的下部烟叶优势明显，因此研究结果可以为有遮挡多片烟叶部位的检测识别提供参考。

Abstract: Object detection technologies have played a pivotal role in agricultural informatics at present. Among them, the precision and efficiency of grading systems have been enhanced for tobacco leaves. These technologies can also be integrated to realize the accurate identification and classification of tobacco leaf features. Particularly, the partial occlusion has commonly occurred in the images of multiple tobacco leaves. In this study, an innovative recognition model was proposed using an improved Cascade Mask Region-based Convolutional Neural Network (Cascade Mask R-CNN). Three types of attention mechanisms were also introduced, including channel, nonlocal, and spatial attention. Furthermore, a refined detection framework was then incorporated to employ the SIoU loss function, thereby improving the detection accuracy. The automatic grading of tobacco was effectively realized using visual features. Specifically, Cascade Mask R-CNN was then chosen to treat the complex tasks of image recognition. Then, a series of enhancements were employed to improve the overall performance. Firstly, Channel-Nonlocal-Space (CNS) attention mechanism was introduced into the ResNet101 backbone. The improved model was then focused mainly on the unoccluded regions of the image with distinct and salient features. By doing so, this improved model was used to better discern the nuanced characteristics of tobacco leaves for the classification. Secondly, the conventional SmoothL1Loss function was replaced with the more sophisticated SIoU loss function. The former was typically employed in the Cascade Mask R-CNN for the loss calculation, but without considering the directional difference between the predicted and actual bounding box. As a result, the improved model benefited from the enhanced detection accuracy and expedited convergence, thus facilitating more precise identification of tobacco leaf features. Thirdly, the traditional Non-Max Suppression (NMS) was replaced with the Soft Non-Max Suppression (Soft NMS), in order to avoid the information loss caused by simple and rough deletion of candidate boxes in the standard NMS, thereby preserving more available information for the final analysis. A series of experiments was conducted to verify the effectiveness of the improved model. 1125 images of partially occluded multiple tobacco leaves were acquired from the actual tobacco leaf grading system. An average accuracy of 80.2% (bbox_mAP50) was achieved 7.5 percentage points higher, compared with the Cascade Mask R-CNN model. The better performance was also achieved by 7.1, 10.2, 5.8, 9.2, and 8.4 percentage points, respectively, compared with the multiple mainstream object detection models, namely the YOLOvX, the YOLOv3, the YOLOv5, the Mask R-CNN, and the Cascade R-CNN. Especially, the outstanding performance was also obtained for the lower tobacco leaves difficult to identify. The findings can provide an effective approach for the position recognition of partially occluded multiple tobacco leaves.

HTML全文

参考文献(45)

施引文献

资源附件(0)