Abstract:
Citrus psyllids have widely been served as transmitted vectors for a devastating Huanglong disease. It is highly urgent to rapidly identify and monitor citrus psyllids in orchard sites in real-time. The specific measures can be taken for the early prevention and control of disease. However, the field environments are unsuitable for deploying servers. In this study, a citrus psyllid identification was proposed suitable for embedded systems using YOLOv4-Tiny model. A feature fusion network was developed to improve the accuracy of model in recognizing citrus psyllids. A key path was added to reduce the loss of semantic information in the shallow network layer. At the same time, an output feature map was added, which was down sampled eight times relative to the input image. For an image with a 416 × 416 input, the improved feature fusion network outputted feature maps of three scales, with pixels of 13 × 13, 26 × 26, and 52 × 52. Cross mini-batch normalization was used instead of batch normalization, due to that this normalization combined the output information of previous mini-batch to calculate the average and standard deviation of current mini-batch. The outputs of convolutional layers were converted into the normal layers with a mean of 0 and a variance of 1 distribution. Learnable parameters were used in linearly transforming the outputs of standardized convolutional layers. Owing to the accumulation of output features, the accuracy of statistical information was improved, thereby improving the recognition accuracy of the model. The ability of model to recognize occluded targets was also improved using mosaic data augmentation during model training, particularly for the occluded citrus psyllids. More importantly, four images were randomly cropped in the training set and then stitched them into a single image. The intersection-over-union indicator was also used to filter the ambiguous target frame in the image generated by mosaic data augmentation. The improved mosaic data augmentation was used to simulate the occlusion of citrus psyllids, thereby to weaken the dependence of model on the characteristics of targets. A handheld camera was used to capture the images of adult citrus psyllids in field environments. Data augmentation was then used to obtain a dataset containing 21 410 images, which was divided into the training set, validation set, and test set in a ratio of 7:1:2, respectively. Various improvements were introduced further to verify by experiments. Results showed that the improved feature fusion network, the introduction of cross mini-batch normalization, and the improved mosaic data augmentation greatly increased the average precision of model in the test set. The differences between the proposed model and existing networks were analyzed, where the same training set was used to train YOLOv4, YOLOv4-Tiny, Faster R-CNN, and the proposed model. Furthermore, comparative tests were performed in the test set, where the model was evaluated in terms of average precision, inference speed, and model size. Specifically, the average precision of the proposed model was 96.16%, the inference speed on the Graphics Processing Unit (GPU) was 3.63 ms/frame, and the model size was 24.50 MB. Consequently, the new model can be expected to accurately and quickly identify citrus psyllids for early warning suitable for the deployment in embedded devices.