基于图像和文本哈希特征学习的跨模态枸杞害虫检索

    Cross-modal wolfberry pest retrieval based image and text hash feature learning

    • 摘要: 针对现有害虫智能识别方法仅能鉴别害虫类型而无法获取其详细生物特性的局限,该研究提出一种跨模态枸杞害虫检索模型(cross-modal wolfberry pest retrieval, CWPR),旨在实现害虫图像与其对应文本描述的精准匹配。模型通过一种两层级特征融合方法,深度融合视觉Transformer特征和文本双向编码特征;同时引入标签增强技术,融入物种分布信息以学习强化的标签矩阵,有效缓解害虫数据种类不平衡问题。相较于单层融合方案,两层级特征融合使检索性能提升了1.21个百分点;标签增强技术的引入进一步使性能平均提升0.8个百分点。与现有较先进的跨模态检索方法相比,CWPR在两种跨模态枸杞害虫检索任务中平均性能高出1.89个百分点。该模型具备较高的跨模态检索精确度,可为枸杞害虫相关情报信息的有效获取提供有力技术支撑。

       

      Abstract: Wolfberry (Lycium barbarum), valued for its nutritional and medicinal properties, has seen rising demand, yet its cultivation is challenged by pest infestations that compromise crop health and yield. Traditional pest management, reliant on agricultural experts, faces scalability limitations, necessitating advanced artificial intelligence (AI) solutions. While existing intelligent pest identification systems have progressed in uni-modal visual classification, they often fail to deliver comprehensive pest profiles, including morphological characteristics, life cycle details, and science-based control strategies, thus limiting their practical utility. To address this, we propose the Cross-modal Wolfberry Pest Retrieval (CWPR) model, a novel framework that establishes precise semantic correspondences between pest images and textual descriptions to enable holistic pest intelligence retrieval. This technical approach not only preserves the species identification capability inherent in conventional classification methods but further advances to provide comprehensive biological characterization of pests, thereby achieving a technological shift from mere identification to holistic cognition. Concretely, the CWPR model employs a dual-branch architecture, utilizing a Vision Transformer (ViT) to extract high-dimensional visual features from pest images, capturing intricate morphological details, and Bidirectional Encoder Representations from Transformers (BERT) to generate contextual text embeddings from descriptions encompassing morphological characteristics, lifestyle habits, and prevention and control measures. A key innovation is the two-tiered feature fusion mechanism, which tackles cross-modal heterogeneity by employing learnable weighted projection matrices to align visual and textual features into a shared latent space, followed by orthogonal rotation optimization to minimize quantization loss while preserving modality-specific characteristics. To mitigate data imbalance, a pervasive issue in agricultural datasets, a frequency-aware label enhancement technique is introduced to reduce biases toward dominant pest species, ensuring equitable representation. The model is optimized using an iterative alternating optimization strategy, with hash mapping functions for each modality derived via Ridge regression, ensuring efficient convergence. The CWPR model was evaluated on a curated wolfberry pest dataset through two cross-modal retrieval tasks: image-to-text and text-to-image retrieval, using mean Average Precision (mAP) as the performance metric. The dataset was preprocessed, with images resized to 224×224 pixels and text tokenized using BERT’s tokenizer. Experimental results demonstrate the model’s superior performance, achieving 96.49% mAP for image-to-text retrieval and 98.55% for text-to-image retrieval, surpassing state-of-the-art methods by 1.89 percentage points. Ablation studies indicate that the two-tiered feature fusion improves 1.21 percentage points gain over single-fusion schemes, while label enhancement increased by an additional 0.8 percentage points. These results highlight the model’s ability to integrate visual and textual modalities effectively, addressing cross-modal heterogeneity and data imbalance. By providing accurate and comprehensive pest information, including identification, life cycle details, and control strategies, the CWPR model supports informed decision-making in wolfberry pest management. Its scalability and robustness position it as a promising tool for precision agriculture, enhancing sustainable wolfberry production by enabling farmers to access actionable insights for pest prevention and control, ultimately contributing to improved crop health and yield in the face of growing global demand.

       

    /

    返回文章
    返回