Abstract:
Single-tree segmentation can greatly contribute to the extraction of forest structure parameters. The LiDAR point cloud from unmanned aerial vehicle (UAV) was a commonly used data source for the single tree segmentation. However, the point cloud data was large, and the processing was complex. This study aimed to accurately segment the single-tree crown from the point cloud data using deep learning. The main purpose was to improve the processing of point cloud data and the accuracy of single tree segmentation in complex stands. Point cloud rasterization was combined with deep learning to segment the canopy of a single tree. Firstly, the D2000 UAV (Feima Robotics company) platform equipped with the LiDAR sensor (D-LiDAR2000) was used to acquire the point cloud data of a mixed coniferous and broadleaf forest. Lidar360 software was then employed for point cloud denoising, ground point classification, and point cloud normalization preprocessing. Subsequently, the top-down rasterization of the sample plot cloud was performed to calculate the maximum height, maximum intensity, and density information per unit rasterized area of the point cloud. The RGB channels were then mapped corresponding to the rasterized image pixels, in order to make the tree canopy clearer in the rasterized image. Secondly, according to the Mask R-CNN model within the Detectron2 framework, the number of layers and iterations of different backbone networks were compared to select a backbone network that provided superior segmentation performance. Thirdly, the Global Context Network (GC) and Attention Mechanism modules were integrated into the ResNet network. A comparison was made with the simultaneous introduction of the GC Net and Attention Mechanism modules to enhance the segmentation accuracy of the Mask R-CNN model. To validate the practicality of the improved Mask R-CNN model, its segmentation accuracy was compared with that of similar deep learning networks (U-Net and DeepLabv3+). Finally, the tree crown masks that were segmented by the improved Mask R-CNN were used to segment the point cloud of individual tree crowns. The segmentation of the test plot was compared and evaluated using the watershed, K-means, and the improved Mask R-CNN. Among the three backbone networks of Mask R-CNN, the R50-FPN-3X network saved some training time and computational resources, compared with the R101-FPN-3X network. An average accuracy of 76.72% was achieved to increase by 1.01 percentage points higher than that of the R50-FPN-1X network. In the R50-FPN-3X backbone model, after introducing the Squeeze-and-Excitation (SE), Coordinate Attention (CA), and Convolutional Block Attention Module (CBAM) mechanisms, the average accuracy increased by 1.41, 2.14, and 4.65 percentage points compared to the original model, respectively. The GC Net module was integrated into the ResNet network, and the model accuracy was 80.35%, which was an improvement of 3.63 percentage points over the original model. While both CBAM and GC Net attention mechanisms achieved the highest accuracy of 82.91%, thus increasing by 1.54 percentage points and 2.56 percentage points, compared with the two modules alone. The improved Mask R-CNN model achieved an average accuracy that was 7.27 percentage points higher than the U-Net model and 4.62 percentage points higher than the DeepLabv3+ model. The improved Mask R-CNN outperformed the Watershed and K-means algorithms in single-tree crown point cloud segmentation, indicating the highest recall, precision, and F-score of 81.19%, 78.85%, and 80.00%, respectively. The point cloud segmentation with the improved Mask R-CNN network demonstrated the robustness of the tree crown segmentation in mixed coniferous and broadleaf forests. Point cloud data processing was also integrated with deep learning models. The accuracy of single-tree crown segmentation was enhanced significantly, thereby providing reliable foundational data and technical references to assess the forest resource, biomass, and carbon stock.