Recognizing apple targets before thinning using improved YOLOv7
-
Graphical Abstract
-
Abstract
Apple thinning is one of the most important steps in the apple tree management. Mechanical thinning can greatly reduce the labor costs while retaining young apples, according to the experience. Therefore, mechanical thinning instead of manual thinning can be the promising trend in modern orchard management. Target detection before thinning is often required for the automate apple thinning. However, the small baby apples are closely grouped with the background-like hue and texture, leading to the blur imaging under the other apples or leaves in bunches. In this study, an improved YOLOv7 of deep learning network was designed to integrate the Swin Transformer Block suitable for the close-range color small target detection. Firstly, Swin Transformer Block was added into the small target detection layer of YOLOv7 model, in order to retain more small-scale target feature information with the network's accuracy of small targets. Secondly, the loss function CIoU in YOLOv7 was replaced by SIoU to consider the difference between the direction of the prediction box and the real box during training, particularly for the high accuracy of model detection. Finally, the heat map of target detection was produced using the Grad-CAM approach. Effective feature visualization was also carried out to determine the model's zone of interest. There were two shooting methods (long- and close-range), in terms of image data set production. The image data sets were then divided into the close- and long-range one after labeling. The test results showed that the improved YOLOv7 model was achieved in the mean average precision (mAP) on the close-range dataset of 96.1%, which was 0.7 percentage points greater than that of YOLOv7. The mAP on the long-range image dataset was 93.6%, which was 3.5 percentage points higher than that of YOLOv7. Furthermore, the mAP of improved YOLOv7 was 95.2% on the whole data set, where the detection accuracy was 92.7%, the recall rate was 91.0%, and the memory occupied by the model was 81 MB. The mAP, accuracy, and recall rate were improved by 2.3, 0.9, and 1.3 percentage points, respectively, compared with the YOLOv7 model. The detection heat map of improved YOLOv7 shared the higher thermal value in the apple area, indicating the more effective features after extraction. At the same time, the contribution of Swin Transformer Block's to each target detection layer was evaluated to compare the effects of three detection layers on the ability to detect young apples. Therefore, the addition of the Swin-Transformer Block into the small target detection layer was considerably enhanced the detection performance of improved model. A total of 1.9, 0.3, and 1.6 percentage points were achieved in the mAP, precision, and recall rate, respectively. There was no significantly different performance of the improved model, when the Swin Transformer Block was added into the large, medium, and small detection layers. Once the Swin-Transformer module was just introduced to the small target detection layer, the model memory use increased by 25 MB. In summary, the improved YOLOv7 network can performed the better detection and stronger robustness for apples in the thinning period. The finding can provide the necessary technical support for apple growth monitoring, and mechanical thinning.
-
-