Abstract:
Abstract: Litchi has been one of the characteristic fruits in Guangdong Province of South China. Traditionally, the litchi can be inspected manually in the orchard, due to its susceptibility to the weather, diseases, and insect pests. The number of fruits in the litchi tree can be counted to determine the subsequent agricultural operations, such as pouring nutrient solution, or the removal of insect pests. Nowadays, an unmanned aerial vehicle (UAV) low-altitude remote sensing has been a promising way to observe the litchi, particularly for safety, efficiency, cost-saving, and easy operation. But the litchi images taken by the UAV are characterized by the smaller target size and fewer features. In this study, an improved Single Shot MultiBox Detector (SSD) model was proposed to detect the small litchi fruits using multiple feature enhancement and feature fusion (MFEFF). Firstly, 160 litchi images with a resolution of 4000×3000 pixels were collected in the orchard using the UAV. A sliding window of 512 pixels was also applied to the original litchi images, according to the target pixels between 20×20 and 30×30 pixels. As such, 3 590 images of 1 024×1 024 pixels were captured to boost the feature information of litchi, such as the size. Secondly, the data enhancement was implemented to improve the robustness of the model using some operations, including the image flip, color space transformation. Thirdly, the last two convolution layers of Vgg16 were deleted to reduce the unnecessary computation. The Receptive Field Block (RFB) was used on the Conv8 and Conv9 layers, and the feature map of Conv3_3 layer was added for the feature extraction, where much more features of the litchi were expanded the receptive field for the detailed information. The Efficient Spatial Pyramid (ESP) Block was also applied on the enhancement of the shallow features in the maps. Finally, the improved Path Aggregation Network (IPANet) was used for the multi-scale features fusion at the Conv3_3, Conv4_3, and fc7 layers. A Squeeze and Extension (SE) module was also introduced in the first two feature layers, further to improve the detection accuracy. The channel attention network was also used the global information of litchi images to selectively enhance the weight of the litchi channel, but to efficiently suppress the useless feature information, such as green leaves against the background. At the same time, the size and quantity of anchors were adjusted to match the size of the small target litchi. A training detection was carried out to verify the model for the total of 3 590 labeled images, where 3 to 114 litchi targets were set in each image. Among them, 2 907 images were distributed as the training set, 324 as the validation set, and 359 as the testing set. The precision indexes were the mean Average Precision (mAP), recall, precision, and F1-Score (F1). The results showed that the RFB module was significantly improved the detection, compared with the original, where the mAP, Recall (R), Precision (P), and F1 increased by 2.61, 0.48, 1.49, and 1 percentage points, respectively. The IPANet detection was better than that of the feature pyramid network (FPN), where the mAP value increased by 0.44 percentage points. The SE module was better than the Convolutional Block Attention Module (CBAM) and Efficient Channel Attention (ECA), indicating the best score of the three modules. The ESP detection was superior to the atrous spatial pyramid pooling (ASPP), in which the mAP, R, and P increased by 2.51, 0.38, and 1.13 percentage points, respectively. Consequently, the MFEFF-SSD had improved the mean average precision by 30.62, 14.58, 44.46, and 15.93 percentage points, respectively, compared with the SSD, Yolov4-tiny, Faster-RCNN, and CenterNet models. Anyway, the MFEFF-SSD model can be widely expected to more accurately and effectively detect the litchi images taken by UAV. This finding can also provide a strong reference for the detection of small target fruits.