基于三维点云的植物多任务分割网络

    Multi-task segmentation network for the plant on 3D point cloud

    • 摘要: 在植物表型研究中,植物器官分割是实现无损、高通量、自动化表型测量的重要步骤。然而,现有植物器官分割方法通常需要凭借经验设置合理的阈值参数,且很少同时执行语义分割和实例分割。该研究提出了一个基于三维点云的植物多任务分割网络(a multi-task segmentation network for plant on 3D point cloud,MT-SegNet),结合多值条件随机场(multi-value conditional random field,MV-CRF)模型,同时实现茎、叶语义分割和叶实例分割。在MT-SegNet中,为解决用最大池化或平均池化方法来聚合邻域点特征可能会导致重要信息丢失的问题,该研究提出了一种基于注意力机制的多头注意力池化模块。它能自动学习到重要的邻域点特征,从而有利于提高网络的分割性能。同时,MT-SegNet分成两个不同的分支,分别用于预测点的语义类别和将这些点嵌入到高维向量,以便将这些点聚类为实例。最后,使用MV-CRF进行多任务的联合优化。在彩叶芋点云数据集上的试验结果表明,该方法的茎、叶语义分割的交并比、准确率、召回率和F1分数的平均值分别为84.54%、93.64%、91.39%、92.48%,叶实例分割的平均准确率、平均召回率、平均实例覆盖率和平均加权实例覆盖率分别为88.10%、78.44%、76.24%、76.93%,均优于PointNet、JSNet等现有的深度学习网络。该模型也适用于类似植物的点云分割类任务。这有助于为植物自动化表型测量提供必要的技术条件。

       

      Abstract: Plant-part segmentation is one of the most important steps to achieve non-destructive, high-throughput, fully-automated phenotyping measurement in plant phenotyping. However, the existing plant-part segmentation can be usually required the empirical setting of reasonable threshold parameters. It is a high demand to simultaneously perform semantic and instance segmentation. In this study, a Multi-Task Segmentation Network was proposed for the Plant on 3D Point Cloud (MT-SegNet). Multi-Value Conditional Random Field (MV-CRF) model was combined to simultaneously implement stem and leaf semantic segmentation and leaf instance segmentation. A multi-head attentive pooling module in MT-SegNet was constructed using the attention mechanism. The reason was that the neighborhood point features were aggregated with the maximum or average pooling, leading to the loss of important information. Here the important neighborhood point features were automatically learned to improve the segmentation performance of the network. Specifically, a U-net design was adopted in MT-SegNet to follow an encoder-decoder architecture with skip connections. MT-SegNet consisted of three main modules: a multi-head attentive pooling, a downsampling, and an upsampling module. Each encoding layer contained a multi-head attentive pooling and a downsampling module. The point cloud was downsampled at a rate of four times, retaining only 25% of the points after each layer. There was a gradual decrease in the number of points in the point cloud while increasing the feature dimensionality of each layer. After the encoder, the number of points in the point cloud was recovered using four decoder modules, each of which consisted of an upsampling module and a shared multilayer perceptron layer. The high-level features were taken from the corresponding scale of the decoder part, whereas, the low-level features were from the corresponding scale of the encoder. A jump connection was also fused them after that. Then, MT-SegNet was divided into two branches for the prediction of the semantic classes of points. These points were then embedded into the high-dimensional vectors, in order to cluster into the instances. Finally, the joint optimization of multiple tasks was performed using MV-CRF. The experimental results on the colored-leaved taro point-cloud dataset show that the mean values of intersection over union, precision, recall, and F1 score for the stem and leaf semantic segmentation using the improved model were 84.54%, 93.64%, 91.39%, and 92.48%, respectively, and the mean precision, mean recall, mean coverage, and mean weighted coverage for the leaf instance segmentation using the advocated method were 88.10%, 78.44%, 76.24%, and 76.93%, respectively. All these results were better than the existing deep-learning networks, such as PointNet and JSNet. Meanwhile, the MV-CRF improved the segmentation of MT-SegNet combining the multi-task learning with joint optimization. A separate ablation test was designed to verify the effectiveness of the modules in the MT-SegNet for the plant segmentation tasks, including the location coding module in downsampling, multi-head attention pooling, and residual modules. These modules improved the splitting performance of the network. The addition of a multi-head attentive pooling module can significantly improve the network segmentation performance. The average scores of semantic segmentation metrics intersection over union, precision, recall and F1 all increased by about 2 percent, respectively, and the values of instance segmentation metrics the mean precision, mean recall, mean coverage, and mean weighted coverage increased by 5-9 percent. The multi-head attentive pooling module can be used to capture the important features among the neighborhood points, according to the learned attention weights, and aggregate them to get the global features, thus effectively improving the performance of semantic and instance segmentation. This improved model can be also applicable to the point cloud segmentation tasks for similar plants to colored-leaf taro, particularly in the automatic measurement of plant phenotypic parameters.

       

    /

    返回文章
    返回