ZENG An, LUO Lin, PAN Dan, et al. Multi-task segmentation network for the plant on 3D point cloud[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(12): 132-140. DOI: 10.11975/j.issn.1002-6819.202212059
    Citation: ZENG An, LUO Lin, PAN Dan, et al. Multi-task segmentation network for the plant on 3D point cloud[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(12): 132-140. DOI: 10.11975/j.issn.1002-6819.202212059

    Multi-task segmentation network for the plant on 3D point cloud

    • Plant-part segmentation is one of the most important steps to achieve non-destructive, high-throughput, fully-automated phenotyping measurement in plant phenotyping. However, the existing plant-part segmentation can be usually required the empirical setting of reasonable threshold parameters. It is a high demand to simultaneously perform semantic and instance segmentation. In this study, a Multi-Task Segmentation Network was proposed for the Plant on 3D Point Cloud (MT-SegNet). Multi-Value Conditional Random Field (MV-CRF) model was combined to simultaneously implement stem and leaf semantic segmentation and leaf instance segmentation. A multi-head attentive pooling module in MT-SegNet was constructed using the attention mechanism. The reason was that the neighborhood point features were aggregated with the maximum or average pooling, leading to the loss of important information. Here the important neighborhood point features were automatically learned to improve the segmentation performance of the network. Specifically, a U-net design was adopted in MT-SegNet to follow an encoder-decoder architecture with skip connections. MT-SegNet consisted of three main modules: a multi-head attentive pooling, a downsampling, and an upsampling module. Each encoding layer contained a multi-head attentive pooling and a downsampling module. The point cloud was downsampled at a rate of four times, retaining only 25% of the points after each layer. There was a gradual decrease in the number of points in the point cloud while increasing the feature dimensionality of each layer. After the encoder, the number of points in the point cloud was recovered using four decoder modules, each of which consisted of an upsampling module and a shared multilayer perceptron layer. The high-level features were taken from the corresponding scale of the decoder part, whereas, the low-level features were from the corresponding scale of the encoder. A jump connection was also fused them after that. Then, MT-SegNet was divided into two branches for the prediction of the semantic classes of points. These points were then embedded into the high-dimensional vectors, in order to cluster into the instances. Finally, the joint optimization of multiple tasks was performed using MV-CRF. The experimental results on the colored-leaved taro point-cloud dataset show that the mean values of intersection over union, precision, recall, and F1 score for the stem and leaf semantic segmentation using the improved model were 84.54%, 93.64%, 91.39%, and 92.48%, respectively, and the mean precision, mean recall, mean coverage, and mean weighted coverage for the leaf instance segmentation using the advocated method were 88.10%, 78.44%, 76.24%, and 76.93%, respectively. All these results were better than the existing deep-learning networks, such as PointNet and JSNet. Meanwhile, the MV-CRF improved the segmentation of MT-SegNet combining the multi-task learning with joint optimization. A separate ablation test was designed to verify the effectiveness of the modules in the MT-SegNet for the plant segmentation tasks, including the location coding module in downsampling, multi-head attention pooling, and residual modules. These modules improved the splitting performance of the network. The addition of a multi-head attentive pooling module can significantly improve the network segmentation performance. The average scores of semantic segmentation metrics intersection over union, precision, recall and F1 all increased by about 2 percent, respectively, and the values of instance segmentation metrics the mean precision, mean recall, mean coverage, and mean weighted coverage increased by 5-9 percent. The multi-head attentive pooling module can be used to capture the important features among the neighborhood points, according to the learned attention weights, and aggregate them to get the global features, thus effectively improving the performance of semantic and instance segmentation. This improved model can be also applicable to the point cloud segmentation tasks for similar plants to colored-leaf taro, particularly in the automatic measurement of plant phenotypic parameters.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return