基于超分辨率重建和Transformer的退化草地空斑定位方法

陆健强; 常虎虎; 兰玉彬; 王量; 罗浩轩; 黄捷伟; 袁家俊

doi:10.11975/j.issn.1002-6819.202401028

基于超分辨率重建和Transformer的退化草地空斑定位方法

Method for degraded grassland gap localization based on super-resolution reconstruction and Transformer

摘要

摘要: 无人机补播是草地修复的有效手段之一。针对无人机作业过程中，空斑定位精度不高导致的效率低下、工作量大等问题，该研究提出一种基于无人机图像超分辨率重建和Transformer的退化草地空斑定位方法YOLOFG（YOLO for Gap）。首先基于YOLOv5s网络框架，在模型颈部设计联级特征纹理选择模块，强化模型特征纹理细节聚焦力，解决无人机空斑影像尺度变化大、纹理模糊问题；其次，以ShuffleNetV2构建主干网络，嵌入信息交互Transformer自注意力结构，提取像素间更多差异化特征，以提升模型对空斑边缘像素的精确捕获能力；最后，基于空斑锚框信息建立无人机位姿信息和空间平面的成像模型，实现目标空斑的精准定位。试验结果表明，YOLOFG模型平均精度均值为96.57%，相较于原始YOLOv5s模型提升3.84个百分点；参数量约为6.24 M，比原始模型降低约11.2%。与YOLOv4、YOLOv7、YOLOv8模型相比，检测精度分别提高11.86、9.65、6.82个百分点。空斑定位的平均误差为0.4404 m，满足无人机作业对草地空斑精准定位的需求，可为开展退化草地植被恢复与重建工作提供技术支持。

Abstract: Abstract: Grassland restoration has been a critical initiative of ecological engineering in China. Drones have effectively facilitated the reseeding in large-scale and diversified terrains at present. However, the accuracy of gap localization has been confined to the high workload during drone operations. This study aims to propose a novel approach for gap localization of degraded grassland, termed YOLOFG (YOLO for Gap), using super-resolution reconstruction and Transformer technology. Firstly, the challenges were addressed on the limited quantity and widespread distribution of gap samples in grassland scenes, in terms of sample acquisition. Drones were also employed to combine the fieldwork and simulation. A total of 132 and 282 aerial images were collected to constitute a mixed gap dataset. Secondly, three data augmentations were selected: random angular rotation, exposure adjustment, and brightness modulation, in order to enrich the mixed gap datasets. The objective of these techniques was to enhance the generalization of the model. Furthermore, 1 242 enhanced gap images were annotated to utilize the open-source labeling tool LabelImg. The annotated data was stored, according to the format specifications of the Pascal VOC dataset. Subsequently, the blurred texture was targeted in the unmanned aerial vehicle (UAV) aerial photography dataset. The modifications were made to the YOLOv5s model, thereby completing the training of the data. Finally, the spot anchor frame information was utilized to obtain from the recognition model. The imaging principles of the gimbal camera were used to obtain the conversion relationships among multiple coordinate systems. The real-time attitude information was selected from the UAV. A geodetic coordinate resolution model was established for the aerial photography targets, thus enabling precise localization of the target gaps. According to the theoretical framework of YOLOv5s target detection, a cascaded module was introduced to extract the feature texture in the neck network. The high-resolution feature maps were generated with enhanced textures, thereby enabling the model to focus on intricate feature textures. There was a great decrease in the significant scale variations and blurred textures in the drone-captured gap images. Furthermore, ShuffleNetV2 was selected as the backbone network for YOLOv5s. The number of model parameters was reduced to integrate a highly interactive Transformer self-attention structure into the backbone network. More comprehensive mid-to-high-level visual semantic information was obtained from the drone imagery. The information interaction was also enhanced among pixel blocks. More differentiated features were ultimately extracted among pixels to enhance the capture precision for the gap edge pixels. The experimental results demonstrated that the YOLOFG model achieved a mean Average Precision (mAP) of 96.57%, indicating an improvement of 11.86, 9.65, and 6.82 percentage points, respectively, compared with the YOLOv4, YOLOv7, and YOLOv8 models. The parameter count was approximately 6.24 M, with a model size of 13.9 MB. Compared with the YOLOv5s, the model size remained comparable, yet the precision was enhanced by 3.84 percentage points, and the parameter count was reduced by approximately 11.2%, thus facilitating the deployment. With an inference time of 39.6 FPS(frames per second), the improved model outperformed the baseline networks, including YOLOv5s, YOLOv8, and YOLOv4, thus demonstrating the high efficiency and robustness. Finally, 54 target points were measured with an average positioning error of 0.440 4 m between the center points of the targets and the anchor boxes. In summary, the finding can offer a better trade-off between speed and accuracy, in order to realize the rapid and efficient identification of gap distributions across different communities. Additionally, the precise localization requirements were fully met for the drone-based reseeding in grassland gaps. Thus, robust technical support was provided for the subsequent vegetation restoration and reconstruction in degraded grasslands.

HTML全文

参考文献(35)

施引文献

资源附件(0)