Abstract:
Abstract: The posture of target fruit is ever-changing in the complex orchard environment. Some target fruits are homochromatic with background, and the limited number of samples have brought great challenges to accurately detect the target, due mainly to the difficulty of collecting some environmental data. Therefore, the detection needs to meet the high requirements of intelligent yield measurement and automatic harvesting, in terms of both accuracy and efficiency. In this study, an efficient detection model was proposed for the green target fruits suitable for small samples under different light conditions and fruit postures. The optimized transformer network was also employed in this model. Firstly, the convolutional neural network (CNN) was used to extract image features. The transformer encoder was input after feature dimension reduction and positional encoding. Multi-head attention and feed-forward network (FFN) were then selected to obtain the encoder output. Secondly, the transformer decoder processed the input using multi-head attention and feed-forward network. The positional encoding was then added to each link. After that, the outputs were generated with different data sizes. The bounding boxes were much larger than the actual objects after prediction, indicating a low missing rate of green target fruit after decoder settlement. Finally, the feed-forward network (FFN) was utilized to predict the detection. The training of detection model was mostly used sufficient samples to avoid overfitting in the training process for higher generalization of the model. Bootstrapping was also introduced to repeatedly mapping the original data for several times. As such, the expanding dataset was utilized to meet the high requirement of larger samples for the higher accuracy of detection mode in the training process. Transfer learning was selected to significantly improve the training efficiency of the model, while, accelerate the convergence of the network. The apple and persimmon datasets were made separately for the model training. The experimental results show that the training efficiency of the model was greatly improved by more than 13% after migration learning. An excellent illustration of features transferability increased the speed and efficiency of detection, as the difference decreased between the pre-training and target task. Transfer learning was adopted to improve the efficiency of the model, where the model converged faster and was better suitable for the complex orchard environment. The new model can widely be expected to effectively realize the detection of green target fruit in the complex orchard environment with multiple postures, illumination, and scenes, indicating better generalization ability and robustness. The accuracies of detection were 93.27% and 91.35%, respectively, when testing green persimmons and green apples. Consequently, the new optimized model presented the best performance, compared with the conventional. The finding can also provide a sound theoretical reference for the target detection of green fruits and vegetables in the intelligent yield measurement and automated harvesting in orchards.