LU Jianqiang, CHANG Huhu, LAN Yubin, et al. Method for degraded grassland gap localization based on super-resolution reconstruction and Transformer[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(10): 203-212. DOI: 10.11975/j.issn.1002-6819.202401028
    Citation: LU Jianqiang, CHANG Huhu, LAN Yubin, et al. Method for degraded grassland gap localization based on super-resolution reconstruction and Transformer[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(10): 203-212. DOI: 10.11975/j.issn.1002-6819.202401028

    Method for degraded grassland gap localization based on super-resolution reconstruction and Transformer

    • Abstract: Grassland restoration has been a critical initiative of ecological engineering in China. Drones have effectively facilitated the reseeding in large-scale and diversified terrains at present. However, the accuracy of gap localization has been confined to the high workload during drone operations. This study aims to propose a novel approach for gap localization of degraded grassland, termed YOLOFG (YOLO for Gap), using super-resolution reconstruction and Transformer technology. Firstly, the challenges were addressed on the limited quantity and widespread distribution of gap samples in grassland scenes, in terms of sample acquisition. Drones were also employed to combine the fieldwork and simulation. A total of 132 and 282 aerial images were collected to constitute a mixed gap dataset. Secondly, three data augmentations were selected: random angular rotation, exposure adjustment, and brightness modulation, in order to enrich the mixed gap datasets. The objective of these techniques was to enhance the generalization of the model. Furthermore, 1 242 enhanced gap images were annotated to utilize the open-source labeling tool LabelImg. The annotated data was stored, according to the format specifications of the Pascal VOC dataset. Subsequently, the blurred texture was targeted in the unmanned aerial vehicle (UAV) aerial photography dataset. The modifications were made to the YOLOv5s model, thereby completing the training of the data. Finally, the spot anchor frame information was utilized to obtain from the recognition model. The imaging principles of the gimbal camera were used to obtain the conversion relationships among multiple coordinate systems. The real-time attitude information was selected from the UAV. A geodetic coordinate resolution model was established for the aerial photography targets, thus enabling precise localization of the target gaps. According to the theoretical framework of YOLOv5s target detection, a cascaded module was introduced to extract the feature texture in the neck network. The high-resolution feature maps were generated with enhanced textures, thereby enabling the model to focus on intricate feature textures. There was a great decrease in the significant scale variations and blurred textures in the drone-captured gap images. Furthermore, ShuffleNetV2 was selected as the backbone network for YOLOv5s. The number of model parameters was reduced to integrate a highly interactive Transformer self-attention structure into the backbone network. More comprehensive mid-to-high-level visual semantic information was obtained from the drone imagery. The information interaction was also enhanced among pixel blocks. More differentiated features were ultimately extracted among pixels to enhance the capture precision for the gap edge pixels. The experimental results demonstrated that the YOLOFG model achieved a mean Average Precision (mAP) of 96.57%, indicating an improvement of 11.86, 9.65, and 6.82 percentage points, respectively, compared with the YOLOv4, YOLOv7, and YOLOv8 models. The parameter count was approximately 6.24 M, with a model size of 13.9 MB. Compared with the YOLOv5s, the model size remained comparable, yet the precision was enhanced by 3.84 percentage points, and the parameter count was reduced by approximately 11.2%, thus facilitating the deployment. With an inference time of 39.6 FPS(frames per second), the improved model outperformed the baseline networks, including YOLOv5s, YOLOv8, and YOLOv4, thus demonstrating the high efficiency and robustness. Finally, 54 target points were measured with an average positioning error of 0.440 4 m between the center points of the targets and the anchor boxes. In summary, the finding can offer a better trade-off between speed and accuracy, in order to realize the rapid and efficient identification of gap distributions across different communities. Additionally, the precise localization requirements were fully met for the drone-based reseeding in grassland gaps. Thus, robust technical support was provided for the subsequent vegetation restoration and reconstruction in degraded grasslands.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return