Recognizing weed in rice field using ViT-improved YOLOv7
-
Graphical Abstract
-
Abstract
A weed recognition was proposed using combinatorial deep learning, in order to reduce the influencing factors, such as the light shading of rice plants, interference of rice field algae and weeds with small targets. Data enhancement of weed sample was used to improve the model training and generalization for less overfitting. The MSRCP was introduced to enhance the image quality in complex environments. Weed target recognition was realized in the low contrast and clarity of rice field images, due to the light blockage of the rice plant. Front-end slicing and ViT (vision of transformer) classification were performed on the HD images. Information loss was avoided to detect the images in the process of network compression input. Small targets were retained in the high-definition images, in order to improve the effectiveness of the model in complex environments. The YOLOv7 model was replaced by the lightweight network GhostNet, and then embedded by the CA attention mechanism. The number of parameters and computations was reduced to enhance the feature extraction, particularly for the high accuracy and real-time performance of weed recognition. After the classification of target recognition, the image compression was attributed to the small target blurring and loss of effective information that was caused by only the image recognition model. The experiment showed that the weed dataset was expanded to improve the recognition of the model. The ablation test showed that the average mean accuracy of the test set after data enhancement was 84.9%, which was 10.8 percentage points better than the model trained on the original dataset. The ViT classification network outperformed Resnet 50 and Vgg, in terms of accuracy, recall and detection speed. Among them, the accuracy rate increased by 7.9 and 7.5 percentage points, respectively, and the recall rate increased by 7.1 and 5.3 percentage points, respectively. Comparative tests showed that the ViT network also achieved high classification accuracy and speed. The ablation test showed that the mean average accuracy of the improved YOLOv7 model was 88.2%, which was 3.3 percentage points higher than that of the original model. The number of parameters and the amount of computation were reduced by 10.43 M, and 66.54 ×109 times/s, respectively, indicating the high speed and accuracy. The mean average accuracy of the improved model increased by 2.6 percentage points after MSRCP image enhancement before recognition. The mean average accuracy of the improved model increased by 5.3, 3.6, and 3.1 percentage points, respectively, for the light shading, algae interference, and similar shape of the rice leaf tip; Then, the mean average accuracy of the model further increased after the addition of ViT classification network. The mean accuracy also increased by 4.3 percentage points, compared with the original model. The mean average accuracies were improved by 6.2, 6.1, and 5.7 percentage points in three complex environments, respectively, compared with the original model. The mean average accuracy of the ViT-improved YOLOv7 model was 92.6%, which increased by 11.6, 10.1, 5.0, 4.2, and 4.4 percentage points, respectively, compared with the YOLOv5s, YOLOXs, MobilenetV3-YOLOv7, improved YOLOv7 and YOLOv8 model. Better detection was achieved to enable the weed identification in a complex rice field environment.
-
-