基于深度掩码的玉米植株图像分割模型

邓寒冰; 许童羽; 周云成; 苗腾; 李娜; 吴琼; 朱超; 沈德政

doi:10.11975/j.issn.1002-6819.2021.18.013

摘要: 随着深度学习技术在植物表型检测领域的应用，有监督深度学习逐渐成为植物表型信息的提取与分析的主要方法。但由于植物结构复杂、细节特征多，人工标注的成本和质量问题已成为制约技术发展的瓶颈。该研究针对玉米苗期植株图像分割问题提出一种基于深度掩码的卷积神经网络（Depth Mask Convolutional Neural Network，DM-CNN），将深度图像自动转换深度掩码图像，并替代人工标注样本完成图像分割网络训练。试验结果表明，在相同的网络训练参数下，通过测试DM-CNN得到的平均交并比为59.13%，平均召回率为65.78%，均优于人工标注样本对应的分割结果（平均交并比为58.49%，平均召回率为64.85%）；此外，在训练样本中加入10%室外玉米苗期图像后，DM-CNN对室外测试集的平均像素精度可以达到84.54%，证明DM-CNN具有良好的泛化能力。该研究可为高通量、高精度的玉米苗期表型信息获取提供低成本解决方案及技术支持。

Abstract: Abstract: Supervised deep learning has gradually been one of the most important ways to extract the features and information of plant phenotype in recent years. However, the cost and quality of manual labeling have become the bottleneck of restricting the development of technology, due mainly to the complexity of plant structure and details. In this study, a Depth Mask Convolutional Neural Network (DM-CNN) was proposed to realize automatic training and segmentation for the maize plant. Firstly, the original depth and color images of maize plants were collected in indoor scene using the sensors of Kinect. The parallax between depth and color camera was also reduced after aligning the display range of depth and color images. Secondly, the depth and color images were cropped into the same size to remain from the consistency of spatial and content. The depth density function and nearest neighbor pixel filling were also utilized to remove the background of depth images, while retaining the maize plant pixels. As such, a binary image of the maize plant was represented, where the depth mask annotations were obtained by the maximum connection area. Finally, the depth mask annotations and color images were packed and then input to train the DM-CNN, where automatic images labeling and segmentation were realized for maize plants indoors. A field experiment was also designed to verify the trained DM-CNN. It was found that the training loss of depth mask annotations converged faster than that of manual annotations. Furthermore, the performance of DM-CNN trained by depth mask annotations was slightly better than that of manual ones. For the former, the mean Intersection over Union (mIoU) was 59.13%, and mean Recall Accuracy (mRA) was 65.78%. For the latter, the mIoU was 58.49% and mRA was 65.78%. In addition, the dataset was replaced 10% depth mask samples with manual annotations taken in outdoor scene, in order to verify the generalization ability of DM-CNN. After fine-tuning, excellent performance was achieved for the segmentation with the top view images of outdoor seedling maize, particularly that the mean pixel accuracy reached 84.54%. Therefore, the DM-CNN can widely be expected to automatically generate the depth mask annotations using depth images in indoor scene, thereby realizing the supervised network training. More importantly, the model trained by depth mask annotations also performed better than that by manual annotations in mean intersection over union and mean recall accuracy. The segmentation was also suitable for the different plant height ranges during the maize seedling stage, indicating an excellent generalization ability of the model. Moreover, the improved model can be transferred and used in the complex outdoor scenes for better segmentation of maize images (top view), when only 10% of samples (depth mask annotations) were replaced during training. Therefore, it is feasible to realize automatic annotation and training of deep learning model using depth mask annotations instead of manual labeling ones. The finding can also provide low-cost solutions and technical support for high-throughput and high-precision acquisition of maize seedling phenotype.

基于深度掩码的玉米植株图像分割模型

Segmentation model for maize plant images based on depth mask