ZHAO Yue, XU Shanshan, HAN Qiaoling, et al. Detecting fine-grained bird images in wetland using semi-supervised learning with CST module[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2025, 41(6): 185-194. DOI: 10.11975/j.issn.1002-6819.202410077
    Citation: ZHAO Yue, XU Shanshan, HAN Qiaoling, et al. Detecting fine-grained bird images in wetland using semi-supervised learning with CST module[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2025, 41(6): 185-194. DOI: 10.11975/j.issn.1002-6819.202410077

    Detecting fine-grained bird images in wetland using semi-supervised learning with CST module

    • Fine-grained detection of bird species has been confined to the high cost of data annotation during imaging. The low accuracy of detection can also be attributed to the diversity of bird species under the complex environments in wetlands. In this study, a semi-supervised CST-based algorithm was proposed to detect the fine-grained images of the birds in the wetland scenes, termed SSBY-CST (semi-supervised bird detection with CNN and Swin Transformer). The unlabeled samples were also utilized to enhance the fine-grained performance during bird detection in complex environments with minimal labeled data. This framework also facilitated the intelligent digital management of agroforestry ecosystems. The core contributions of this research were as follows. 1) Data Collection and Dataset Construction. The images were first collected from 14 monitoring stations in Beijing, China. A dataset was then created under different wetland environments. The dataset included 17 species of birds, thus offering reliable data support for the robustness of the model. The variability and diversity were obtained to train a robust detection model under real-world conditions, particularly in wetland habitats where environmental conditions were challenging and variable. A single-stage Semi-Supervised Learning Framework was also proposed using Pseudo-Labeling. A teacher-student model was constructed using the Yolov5 backbone network. The unlabeled data was efficiently utilized to improve the performance of detection. A dual-threshold pseudo-label assignment was introduced to replace the traditional single-threshold during training. The unsupervised loss function was optimized to effectively reduce the impact of low-quality pseudo-labels. The overall accuracy of the model was improved to minimize the reliance on the large amounts of annotated data. The labeled and unlabeled data were combined for the generalization and robustness of the improved model, particularly for the condition with scarce labeled data. 2) Dual-Channel Convolution Module (CST). A dual-channel convolution module (CST) was developed to distinguish between different bird species and wetland backgrounds. The convolutional neural networks (CNN) were integrated with the Swin Transformer for the local and global feature extraction. Fine-grained details of bird species were also distinguished from the complex background of wetland environments. The CST module was enhanced by both the local and global feature fusion. Diverse patterns were effectively captured from the input data. 3) Performance Improvement. The experimental results demonstrated that the SSBY-CST algorithm was achieved with 77.5% accuracy and 58.2% mAP@0.5 for the bird detection of over 17 species in complex environmental settings, particularly with only 100 labeled images. There were significant improvements of 17.4 percentage points in precision and 15.5 percentage points in mAP@0.5, compared with the state-of-the-art YOLO models. A remarkable performance was enhanced using limited annotated data. Notably, the mAP@0.5 reached 95.7% and 94.5%, respectively, for the species (such as the Black Stork and Siberian Gull), which were improved by 24.9 percentage points and 14.3 percentage points over the baseline. The semi-supervised learning significantly improved the detection performance in the fine-grained bird species, even under the conditions of limited labeled data. 4) Ablation Study and Effectiveness Validation: An ablation study was conducted to analyze the effect of the dual-threshold pseudo-label assignment and the CST module on the performance of the improved model. The effectiveness of each improvement was validated. Specifically, the dual-threshold pseudo-labeling technique was performed better to refine the assignment of pseudo-labels, and then correctly detect the low-confidence targets. Additionally, the CST module was enhanced to extract the features from both local regions and global contexts. A great contribution was gained to better understand the relationship between bird species and their wetland environments. In conclusion, the semi-supervised learning and self-attention mechanisms were performed the best to detect the fine-grained bird species. The SSBY-CST framework was also suitable for the detection of birds in agroforestry and wetland environments, where the high-quality annotated data was limited. This work can greatly contribute to the bird monitoring systems in the intelligent management of agroforestry ecosystems. Geographical location information can be integrated to improve the overall monitoring of bird species in dynamic habitats. Moving forward, the diversity of the dataset can be further extended to refine in complex environments. Additionally, the multi-modal data (including images and geographical data) can be explored for more robust and scalable bird detection for ecological conservation.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return