基于ViT-改进YOLOv7的稻田杂草识别

陈学深，吴昌鹏，党佩娜，等. 基于ViT-改进YOLOv7的稻田杂草识别[J]. 农业工程学报，2024，40(10)：185-193. DOI: 10.11975/j.issn.1002-6819.202310004

引用本文:

陈学深，吴昌鹏，党佩娜，等. 基于ViT-改进YOLOv7的稻田杂草识别[J]. 农业工程学报，2024，40(10)：185-193. DOI: 10.11975/j.issn.1002-6819.202310004

CHEN Xueshen, WU Changpeng, DANG Peina, et al. Recognizing weed in rice field using ViT-improved YOLOv7[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(10): 185-193. DOI: 10.11975/j.issn.1002-6819.202310004

Citation:

CHEN Xueshen, WU Changpeng, DANG Peina, et al. Recognizing weed in rice field using ViT-improved YOLOv7[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(10): 185-193. DOI: 10.11975/j.issn.1002-6819.202310004

基于ViT-改进YOLOv7的稻田杂草识别

Recognizing weed in rice field using ViT-improved YOLOv7

摘要: 为解决光线遮蔽、藻萍干扰以及稻叶尖形状相似等复杂环境导致稻田杂草识别效果不理想问题，该研究提出一种基于组合深度学习的杂草识别方法。引入MSRCP（multi-scale retinex with color preservation）对图像进行增强，以提高图像亮度及对比度；加入ViT分类网络去除干扰背景，以提高模型在复杂环境下对小目标杂草的识别性能。在YOLOv7模型中主干特征提取网络替换为GhostNet网络，并引入CA注意力机制，以增强主干特征提取网络对杂草特征提取能力及简化模型参数计算量。消融试验表明：改进后的YOLOv7模型平均精度均值为88.2 %，较原YOLOv7模型提高了3.3个百分点，参数量减少10.43 M，计算量减少66.54×10⁹次/s。识别前先经过MSRCP图像增强后，与原模型相比，改进YOLOv7模型的平均精度均值提高了2.6个百分点，光线遮蔽、藻萍干扰以及稻叶尖形状相似的复杂环境下平均精度均值分别提高5.3、3.6、3.1个百分点，加入ViT分类网络后，较原模型平均精度均值整体提升了4.4个百分点，光线遮蔽、藻萍干扰一级稻叶尖形状相似的复杂环境下的平均精度均值较原模型整体提升了6.2、6.1、5.7个百分点。ViT-改进YOLOv7模型的平均精度均值为92.6 %，相比于YOLOv5s、YOLOXs、MobilenetV3-YOLOv7、YOLOv8和改进YOLOv7分别提高了11.6、10.1、5.0、4.2、4.4个百分点。研究结果可为稻田复杂环境的杂草精准识别提供支撑。

Abstract: A weed recognition was proposed using combinatorial deep learning, in order to reduce the influencing factors, such as the light shading of rice plants, interference of rice field algae and weeds with small targets. Data enhancement of weed sample was used to improve the model training and generalization for less overfitting. The MSRCP was introduced to enhance the image quality in complex environments. Weed target recognition was realized in the low contrast and clarity of rice field images, due to the light blockage of the rice plant. Front-end slicing and ViT (vision of transformer) classification were performed on the HD images. Information loss was avoided to detect the images in the process of network compression input. Small targets were retained in the high-definition images, in order to improve the effectiveness of the model in complex environments. The YOLOv7 model was replaced by the lightweight network GhostNet, and then embedded by the CA attention mechanism. The number of parameters and computations was reduced to enhance the feature extraction, particularly for the high accuracy and real-time performance of weed recognition. After the classification of target recognition, the image compression was attributed to the small target blurring and loss of effective information that was caused by only the image recognition model. The experiment showed that the weed dataset was expanded to improve the recognition of the model. The ablation test showed that the average mean accuracy of the test set after data enhancement was 84.9%, which was 10.8 percentage points better than the model trained on the original dataset. The ViT classification network outperformed Resnet 50 and Vgg, in terms of accuracy, recall and detection speed. Among them, the accuracy rate increased by 7.9 and 7.5 percentage points, respectively, and the recall rate increased by 7.1 and 5.3 percentage points, respectively. Comparative tests showed that the ViT network also achieved high classification accuracy and speed. The ablation test showed that the mean average accuracy of the improved YOLOv7 model was 88.2%, which was 3.3 percentage points higher than that of the original model. The number of parameters and the amount of computation were reduced by 10.43 M, and 66.54 ×10⁹ times/s, respectively, indicating the high speed and accuracy. The mean average accuracy of the improved model increased by 2.6 percentage points after MSRCP image enhancement before recognition. The mean average accuracy of the improved model increased by 5.3, 3.6, and 3.1 percentage points, respectively, for the light shading, algae interference, and similar shape of the rice leaf tip; Then, the mean average accuracy of the model further increased after the addition of ViT classification network. The mean accuracy also increased by 4.3 percentage points, compared with the original model. The mean average accuracies were improved by 6.2, 6.1, and 5.7 percentage points in three complex environments, respectively, compared with the original model. The mean average accuracy of the ViT-improved YOLOv7 model was 92.6%, which increased by 11.6, 10.1, 5.0, 4.2, and 4.4 percentage points, respectively, compared with the YOLOv5s, YOLOXs, MobilenetV3-YOLOv7, improved YOLOv7 and YOLOv8 model. Better detection was achieved to enable the weed identification in a complex rice field environment.

参考文献(31)

资源附件(0)

/

下载: 全尺寸图片幻灯片

分享

用微信扫码二维码

分享至好友和朋友圈

返回