基于双注意力语义分割网络的田间苗期玉米识别与分割

王璨; 武新慧; 张燕青; 王文俊

doi:10.11975/j.issn.1002-6819.2021.09.024

摘要: 为实现复杂田间场景中幼苗期玉米和杂草的准确识别与区域划分，该研究提出改进的双注意力语义分割方法，通过获取形态边界实现玉米幼苗的识别与精细分割，在此基础上采用形态学处理方法识别图像中除玉米外的全部杂草区域。首先对6种当前最高性能的语义分割网络进行对比，确定模型原始架构；建立幼苗期玉米语义分割模型，包括改进深层主干网络增强特征，引入双注意力机制构建特征的场景语义依赖关系，以编码器-解码器结构组建模型并增加辅助网络优化底层特征，改进损失函数协调模型整体表现，制定改进的迁移学习策略；提出图像形态学处理方法，基于玉米像素分割结果，生成杂草分割图。测试结果表明，模型的平均交并比、平均像素识别准确率分别为94.16%和95.68%，相比于原网络分别提高1.47%和1.08%，识别分割速度可达15.9帧/s。该研究方法能够对复杂田间场景中的玉米和杂草进行准确识别与精细分割，在仅识别玉米的前提下识别杂草，有效减少图像标注量，避免田间杂草种类的多样性对识别精度的影响，解决玉米与杂草目标交叠在形态边界上难以分割的问题，研究结果可为智能除草装备提供参考。

Abstract: Weed control is an inevitably necessary task in field management. Effective recognition of crops and weeds has therefore been an essential basis to promote the development of intelligent weeding equipment. Nevertheless, the recognition targets are not fixed in images except for crops, due mainly to the variety of weeds and random distribution of their positions. It is highly demanded for better recognition performance to detect the crops from all categories of weeds in the image. All weed targets are required to be labeled in the dataset, where there are comprehensive all-inclusive weed species. However, human vision can only identify the target crops from the weeds. The species and quantity of weeds are still lacking in the identification. Moreover, the crops and weeds are usually overlapped in the field images with complex scenes. It is also difficult to accurately segment the boundary of various objects, especially when the generated anchor box was superimposed by a large area in deep overlapping. In this study, a recognition and semantic segmentation of maize at the seeding stage was proposed to identify the weeds on the premise of maize recognition using a dual attention network. Fine segmentation of morphological boundary was obtained. The main contents were as follows. 1) The original architecture of the model was determined to compare 6 state-of-the-art semantic segmentation networks. It was found that the architecture of the dual attention network presented the best performance for the training, validation, and testing dataset, thereby realizing the pixel-wise recognition and segmentation of maize field images. In the validation set, the mean intersection over union (mIoU) and mean pixel accuracy (mPA) at the end of iteration were 92.73% and 96.88%, respectively. In the test set, the mIoU and mPA were 92.8% and 94.66%, respectively, and the speed of segmentation was 15.2 frames/s. 2) The semantic segmentation model of maize at the seeding stage was established using the improved network architecture. The function of the model was a binary classification of maize pixels and all of the other pixels, suitable for the recognition and morphological segmentation of maize in complex field scenes at the seeding stage. The improved backbone was used to enhance the feature representation. More details of features were retained, while the amount of computation was reduced. Recurrent criss-cross and channel attention modules were combined to compose a dual attention mechanism, in order to synchronously construct long-range contextual dependencies in spatial and channel dimensions of the feature map. The discriminability of feature representation was improved significantly. The encoder-decoder structure was used to build the model, and then the auxiliary head was attached to optimize the underlying features. The loss function was improved, while the transfer learning strategy was formulated. 3) The segmentation map of weeds was obtained via image morphological processing on the segmentation map of maize at the seeding stage. The regions of weed were identified by the segmentation of maize, particularly without considering the pixel-wise prediction of the weed region. The results showed that the performance of the model was better than the original network in the whole training process. At the end of the iteration, the mIoU and mPA were 93.98% and 97.48%, increasing 1.35% and 0.62%, respectively, compared with the original network. There was an obvious increase in the accuracy of region segmentation, the accuracy of pixel recognition, and segmentation speed, indicating better comprehensive performance of the model. The mIoU and mPA of the test set were 94.16% and 95.68%, exceeding the baseline by 1.47% and 1.08%, respectively. The speed of segmentation achieved 15.9 frames/s, which was increased 4.61% compared with the original network. The finding can provide a promising reference for the development of intelligent weeding equipment, thereby accurately recognizing and segment the maize and weeds at the seeding stage in complex field scenes.

基于双注意力语义分割网络的田间苗期玉米识别与分割

Recognition and segmentation of maize seedlings in field based on dual attention semantic segmentation network