Abstract
Abstract: Apple industry in China is leading the modern agriculture production around the world in recent years. In mechanized harvesting, a visual system is one of the most important components for an apple-picking robot, whose primary task is to quickly recognize the fruit. However, some factors can inevitably make the fruit target recognition difficult, such as complex backgrounds, variable illumination, shadows of branches and leaves, as well as the severe overlap in natural scenes. In this study, an improved target detection of apples was proposed using an improved Fully Convolutional One-Stage Object Detection (FCOS) network, in order to rapidly identify and accurately locate the fruits under complex natural conditions. A darknet19 with a smaller model volume was applied for the network as the backbone network, and then the center-ness branch was introduced to the regression branch. At the same time, a loss function combining Generalized Intersection over Union (GIoU) and Focus loss was presented to enhance the detection performance, while reducing the error caused by the imbalance of positive and negative sample ratios. In the beginning, the dataset of apple images was collected to enhance and label in the field under natural growth conditions, and then the features of images were extracted through the Darknet backbone network. After that, the objects to be detected at different scales were assigned to different network layers for subsequent prediction. Finally, classification and regression were carried out to realize the target detection of apples. The specific identification steps were as follows. Firstly, some parameters were needed to be modulated, including the brightness, contrast, hue, and saturation of the original image. Different operations were then utilized to accelerate the data enhancement, such as the horizontal mirror image, color disturbance, and noise adding. After that, a field experiment was conducted on the FCOS networks with varying degrees of improvement, where only multi-scale training was used as FCOS-A, the loss function applying only Focal loss was recorded as FCOS-B, the loss function using only GIoU loss as the bounding box loss was marked as FCOS-C, and all the above improvements at the same time were recorded as modified FCOS. The detection performances of improved FCOS-A, FCOS-B, FCOS-C, and modified FCOS were improved significantly, compared with the traditional FCOS network. It infers that the multi-scale training, together with the application of GIoU and Focal loss greatly contributed to a better performance of the network. Both networks of FCOS and improved FCOS accurately identified the apple targets in the case of double fruits. Similarly, the improved FCOS network also achieved a better detection performance in the case of multi-fruit and dense fruits, compared with the traditional FCOS network. Since the fruits in the natural growth environment were often shaded with each other or by branches or leaves, some contour information of fruits was lost in the fruit detection, leading to a detecting difficulty in these losing parts. In addition, the size of the target frame in the modified FCOS network was closer to the size of the true outline of fruits, when the fruit was severely obscured by branches and leaves. In any way, the modified FCOS network achieved better detection and higher robustness under the conditions of forward light and backlight, compared with the traditional network. A detection test of apple fruits was also carried out with different lighting conditions, density, and shading degrees on the computer workstation. The precision of detection was 96.0%, and the mean Average Precision (mAP) was 96.3%, indicating a higher detection accuracy and stronger robustness than before in the apple detection using the improved FCOS network.