基于自生成标签的玉米苗期图像实例分割

赵露露; 邓寒冰; 周云成; 苗腾; 赵凯; 杨景; 张羽丰

doi:10.11975/j.issn.1002-6819.202301085

基于自生成标签的玉米苗期图像实例分割

Instance segmentation model of maize seedling images based on automatic generated labels

摘要

摘要: 在植物图像实例分割任务中，由于植物种类与形态的多样性，采用全监督学习时人们很难获得足量、有效且低成本的训练样本。为解决这一问题，该研究提出一种基于自生成标签的玉米苗期图像实例分割网络（automatic labelling based instance segmentation network，AutoLNet），在弱监督实例分割模型的基础上加入标签自生成模块，利用颜色空间转换、轮廓跟踪和最小外接矩形在玉米苗期图像（俯视图）中生成目标边界框（弱标签），利用弱标签代替人工标签参与网络训练，在无人工标签条件下实现玉米苗期图像实例分割。试验结果表明，自生成标签与人工标签的距离交并比和余弦相似度分别达到95.23%和94.10%，标签质量可以满足弱监督训练要求；AutoLNet输出预测框和掩膜的平均精度分别达到68.69%和35.07%，与人工标签质量相比，预测框与掩膜的平均精度分别提高了10.83和3.42个百分点，与弱监督模型（DiscoBox和Box2Mask）相比，预测框平均精度分别提高了11.28和8.79个百分点，掩膜平均精度分别提高了12.75和10.72个百分点；与全监督模型（CondInst和Mask R-CNN）相比，AutoLNet的预测框平均精度和掩膜平均精度可以达到CondInst模型的94.32%和83.14%，比Mask R-CNN模型的预测框和掩膜平均精度分别高7.54和3.28个百分点。AutoLNet可以利用标签自生成模块自动获得图像中玉米植株标签，在无人工标签的前提下实现玉米苗期图像的实例分割，可为大田环境下的玉米苗期图像实例分割任务提供解决方案和技术支持。

Abstract: Image segmentation has been widely used for the rapid and accurate detection of plants in the various robots of modern agriculture in recent years. However, fully supervised learning cannot obtain the sufficient, effective and low-cost mask labels (manual labeling) as training samples in the segmentation task of plant image instances, due to the diversity of plant species and forms. In this study, an automatic labelling-based instance segmentation network (AutoLNet) was proposed to improve the segmentation accuracy. The weak tags were also used to train the weak supervised deep learning model. Finally, the network model was used for the image segmentation of maize seedling stage. The top view of maize seedling stage was collected by unmanned aerial vehicle (UAV). Data enhancement was then used to improve the sample diversity. A weak label self-generation module was added in front of the backbone network using the weak supervised instance segmentation model. As such, the module was composed of color space conversion, contour tracking, and the minimum peripheral rectangle. The color threshold range of corn plants was firstly set to remove the background area of the image, in order to eliminate the influence of ground shadow and land on the foreground information. The foreground corn plant area was also expanded to remove the small noise points for the binary image with only foreground corn plants. Secondly, the edge detection was carried out on the binary image after threshold segmentation. The contour point set was then set for the foreground corn plants. Finally, the minimum peripheral rectangle of the foreground object was generated automatically in the original image using the coordinates of the contour point set. The final boundary frame was obtained to filter the threshold value. The weak label was generated automatically. The weak tags were used instead of manual tags to participate in network training. The image instance segmentation of maize seedling stage was realized without the manual tags, which was greatly reduced the labor cost that required for data annotation. The test results showed that the distance intersection ratio and cosine similarity between the self-generated and manual tags reached 95.23% and 94.10%, respectively. The quality of the tags was fully met the high requirements of weak supervision training. The average accuracy of AutoLNet's output prediction frame and mask reached 68.69% and 35.07%, respectively. By contrast, the average accuracy of Autolnet's output prediction frame and mask increased by 10.83 and 3.42 percentage points, respectively, compared with the manual label models (DiscoBox and Box2Mask). The average accuracy of the forecast frame increased by 11.28 and 8.79 percentage points, respectively, whereas, that of the mask increased by 12.75 and 10.72 percentage points, respectively. The accuracy of weakly supervised learning was improved to reduce the projection and paired loss during training in the AutoLNet, compared with the fully supervised model (CondInst and Mask R-CNN). The average accuracy of prediction frame and mask in AutoLNet reached 94.32% and 83.14% of the CondInst model, 7.54 and 3.28 percentage points higher than those of prediction frame and mask R-CNN mode. Once the intersection ratio threshold was greater than or equal to 0.5, the segmentation effect of AutoLNet was better than that of the fully supervised model Mask R-CNN, similar to the CondInst. Consequently, the improved AutoLNet can be expect to automatically obtain the corn plant labels in the image using the label self-generation module. Manual labeling process was improved using the label self-generation module. Case segmentation of corn seedling images was realized for the cost saving without manual labeling. The finding can provide the solution and technical support to the high precision and low-cost segmentation task of maize seedling image instance in field environment.

HTML全文

参考文献(33)

施引文献

资源附件(0)