Abstract:
Object detection technologies have played a pivotal role in agricultural informatics at present. Among them, the precision and efficiency of grading systems have been enhanced for tobacco leaves. These technologies can also be integrated to realize the accurate identification and classification of tobacco leaf features. Particularly, the partial occlusion has commonly occurred in the images of multiple tobacco leaves. In this study, an innovative recognition model was proposed using an improved Cascade Mask Region-based Convolutional Neural Network (Cascade Mask R-CNN). Three types of attention mechanisms were also introduced, including channel, nonlocal, and spatial attention. Furthermore, a refined detection framework was then incorporated to employ the SIoU loss function, thereby improving the detection accuracy. The automatic grading of tobacco was effectively realized using visual features. Specifically, Cascade Mask R-CNN was then chosen to treat the complex tasks of image recognition. Then, a series of enhancements were employed to improve the overall performance. Firstly, Channel-Nonlocal-Space (CNS) attention mechanism was introduced into the ResNet101 backbone. The improved model was then focused mainly on the unoccluded regions of the image with distinct and salient features. By doing so, this improved model was used to better discern the nuanced characteristics of tobacco leaves for the classification. Secondly, the conventional SmoothL1Loss function was replaced with the more sophisticated SIoU loss function. The former was typically employed in the Cascade Mask R-CNN for the loss calculation, but without considering the directional difference between the predicted and actual bounding box. As a result, the improved model benefited from the enhanced detection accuracy and expedited convergence, thus facilitating more precise identification of tobacco leaf features. Thirdly, the traditional Non-Max Suppression (NMS) was replaced with the Soft Non-Max Suppression (Soft NMS), in order to avoid the information loss caused by simple and rough deletion of candidate boxes in the standard NMS, thereby preserving more available information for the final analysis. A series of experiments was conducted to verify the effectiveness of the improved model. 1125 images of partially occluded multiple tobacco leaves were acquired from the actual tobacco leaf grading system. An average accuracy of 80.2% (bbox_mAP50) was achieved 7.5 percentage points higher, compared with the Cascade Mask R-CNN model. The better performance was also achieved by 7.1, 10.2, 5.8, 9.2, and 8.4 percentage points, respectively, compared with the multiple mainstream object detection models, namely the YOLOvX, the YOLOv3, the YOLOv5, the Mask R-CNN, and the Cascade R-CNN. Especially, the outstanding performance was also obtained for the lower tobacco leaves difficult to identify. The findings can provide an effective approach for the position recognition of partially occluded multiple tobacco leaves.