Abstract:
Weeds in the field are serious threats to agricultural production. The rapid and accurate identification of weed species can be critical steps to automated weed control. However, previous weed recognition relies mainly on hand-designed features, such as shape and texture. Fortunately, computer vision can be expected to serve as intelligent weed control equipment. The performance can also be obtained better on the data samples with significant morphological differences. But it cannot accurately identify the weeds and crops in field environments. It is highly required to promote the expression capabilities of the weed species during recognition in real time, due to the particularity of field weeding tasks. In this study, a novel weed recognition was proposed to balance the high accuracy and real-time performance using the improved lightweight MobileViT network. The efficient Channel Attention (ECA) module was also combined to enhance the important parts of the features for better extraction. Firstly, the MobileViT network structure was selected as the lightweight model using Transformers. Specifically, the MobileViT backbone network was used to replace the self-attention layer in the ViT model with the lightweight separable convolution layer. Some optimization techniques were also introduced, such as the depth separable convolution and channel attention mechanism. The feature extraction was then achieved at a high inference speed, but a lower model size and computational cost. As such, the core network was established for lightweight weed identification. Secondly, the ECA mechanism was used as the learnable channel weights to calculate the importance of each channel, with emphasis on the important channels in the input features. The important features were enhanced to further improve the extraction capability from the down-sampled map. Finally, the MobileViT module was trained to simultaneously learn both local and global semantic information in the feature extraction network using both MobileViT and MobileNet networks. Only a small number of modules were combined to accurately capture the subtle differences between the different classes of weeds and crops, in order to fully meet the requirements of real-time performance and recognition accuracy. The comparative experiments were conducted to verify the effectiveness of the model. The images of corn seedlings and four types of companion weeds were also collected in the field environment. The results show that the recognition accuracy, precision, recall, and F1 score of the improved model were 99.61%, 99.60%, 99.58%, and 99.59%, respectively, indicating better performance than the lightweight convolutional neural networks (CNNs), such as ShuffleNet and MobileNet, and the general CNNs, such as VGG-16, ResNet-50, and DenseNet-161. Furthermore, the inference time was only 83ms for the single image, indicating that the recognition efficiency fully met the real-time requirement for the weed identification during weeding operations. The visualization also showed that the key features were effectively extracted from the weed images without the influence of background areas. The new model can be expected to accurately and rapidly distinguish the multiple morphologically similar weeds and crops in a field environment. The finding can also provide a strong reference for the weed identification systems in intelligent weed control equipment.