基于半监督CST的湿地场景下细粒度鸟类检测

赵玥; 徐钐钐; 韩巧玲; 刘卫平; 郑一力; 赵燕东; 唐延龄

doi:10.11975/j.issn.1002-6819.202410077

基于半监督CST的湿地场景下细粒度鸟类检测

Detecting fine-grained bird images in wetland using semi-supervised learning with CST module

摘要

摘要: 针对细粒度鸟类检测的数据标注成本高，以及湿地地区鸟类种类繁多、现实场景复杂化等引起的湿地鸟类检测精度低的问题，该研究提出一种基于半监督CST的湿地场景下的细粒度鸟类检测算法（semi-supervised bird detection with CNN and swin transformer，SSBY-CST），首先基于北京14处监测站在不同湿地场景下采集到的图像，构建了涵盖17种鸟类图像数据集，为模型鲁棒性提供可靠数据支撑。其次提出基于伪标签学习法的单阶段半监督学习框架，基于Yolov5主干网络构建教师学生模型，高效利用无标签数据提升检测性能；训练阶段使用双阈值伪标签分配策略替代传统单一阈值伪标签分配，以优化无监督损失函数。然后设计了结合CNN和Swin Transformer的双通道卷积模块CST，以提高不同类别鸟类与湿地背景的区分能力。试验结果表明，仅在100张标注图像下，该文SSBY-CST算法对17种复杂环境下鸟类的检测精准率和mAP@0.5分别为77.5%和58.2%，相比同时期较先进的YOLO模型提升了17.4个百分点和15.5个百分点，在少量标注的前提下实现了较高的检测性能提升，其中黑鹳、西伯利亚银鸥的mAP@0.5分别达到了95.7%和94.5%，相比基线提升了24.9个百分点和14.3个百分点。此外，消融试验分析了双阈值伪标签分配的作用及CST模块的效果，验证了双阈值伪标签分配与CST模块设计的有效性。该框架利用无标注样本在极少量标注量下提升复杂环境下细粒度鸟类检测性能，以加强农林生态的智能数字化管理。该文将半监督扩展到细粒度鸟类检测，为处理农林生态环境下的鸟类检测提供了技术路径。

Abstract: Fine-grained detection of bird species has been confined to the high cost of data annotation during imaging. The low accuracy of detection can also be attributed to the diversity of bird species under the complex environments in wetlands. In this study, a semi-supervised CST-based algorithm was proposed to detect the fine-grained images of the birds in the wetland scenes, termed SSBY-CST (semi-supervised bird detection with CNN and Swin Transformer). The unlabeled samples were also utilized to enhance the fine-grained performance during bird detection in complex environments with minimal labeled data. This framework also facilitated the intelligent digital management of agroforestry ecosystems. The core contributions of this research were as follows. 1) Data Collection and Dataset Construction. The images were first collected from 14 monitoring stations in Beijing, China. A dataset was then created under different wetland environments. The dataset included 17 species of birds, thus offering reliable data support for the robustness of the model. The variability and diversity were obtained to train a robust detection model under real-world conditions, particularly in wetland habitats where environmental conditions were challenging and variable. A single-stage Semi-Supervised Learning Framework was also proposed using Pseudo-Labeling. A teacher-student model was constructed using the Yolov5 backbone network. The unlabeled data was efficiently utilized to improve the performance of detection. A dual-threshold pseudo-label assignment was introduced to replace the traditional single-threshold during training. The unsupervised loss function was optimized to effectively reduce the impact of low-quality pseudo-labels. The overall accuracy of the model was improved to minimize the reliance on the large amounts of annotated data. The labeled and unlabeled data were combined for the generalization and robustness of the improved model, particularly for the condition with scarce labeled data. 2) Dual-Channel Convolution Module (CST). A dual-channel convolution module (CST) was developed to distinguish between different bird species and wetland backgrounds. The convolutional neural networks (CNN) were integrated with the Swin Transformer for the local and global feature extraction. Fine-grained details of bird species were also distinguished from the complex background of wetland environments. The CST module was enhanced by both the local and global feature fusion. Diverse patterns were effectively captured from the input data. 3) Performance Improvement. The experimental results demonstrated that the SSBY-CST algorithm was achieved with 77.5% accuracy and 58.2% mAP@0.5 for the bird detection of over 17 species in complex environmental settings, particularly with only 100 labeled images. There were significant improvements of 17.4 percentage points in precision and 15.5 percentage points in mAP@0.5, compared with the state-of-the-art YOLO models. A remarkable performance was enhanced using limited annotated data. Notably, the mAP@0.5 reached 95.7% and 94.5%, respectively, for the species (such as the Black Stork and Siberian Gull), which were improved by 24.9 percentage points and 14.3 percentage points over the baseline. The semi-supervised learning significantly improved the detection performance in the fine-grained bird species, even under the conditions of limited labeled data. 4) Ablation Study and Effectiveness Validation: An ablation study was conducted to analyze the effect of the dual-threshold pseudo-label assignment and the CST module on the performance of the improved model. The effectiveness of each improvement was validated. Specifically, the dual-threshold pseudo-labeling technique was performed better to refine the assignment of pseudo-labels, and then correctly detect the low-confidence targets. Additionally, the CST module was enhanced to extract the features from both local regions and global contexts. A great contribution was gained to better understand the relationship between bird species and their wetland environments. In conclusion, the semi-supervised learning and self-attention mechanisms were performed the best to detect the fine-grained bird species. The SSBY-CST framework was also suitable for the detection of birds in agroforestry and wetland environments, where the high-quality annotated data was limited. This work can greatly contribute to the bird monitoring systems in the intelligent management of agroforestry ecosystems. Geographical location information can be integrated to improve the overall monitoring of bird species in dynamic habitats. Moving forward, the diversity of the dataset can be further extended to refine in complex environments. Additionally, the multi-modal data (including images and geographical data) can be explored for more robust and scalable bird detection for ecological conservation.

HTML全文

参考文献(36)

施引文献

资源附件(0)