基于优化Transformer网络的绿色目标果实高效检测模型

贾伟宽; 孟虎; 马晓慧; 赵艳娜; Ji Ze; 郑元杰

doi:10.11975/j.issn.1002-6819.2021.14.018

摘要: 果园环境中，检测目标果实易受复杂背景、果实姿态和颜色等因素影响，为提高绿色目标果实检测的精度与效率，满足果园智能测产和自动化采摘要求，本研究针对不同光照环境和果实姿态，提出一种适于样本数量不足的绿色目标果实高效检测模型。该模型采用优化Transformer结构，首先借助卷积神经网络（Convolutional Neural Network，CNN）网络提取图像特征；然后输入编码-解码器生成一组目标果实预测框，最后通过前馈神经网络（Feed-forward Network，FFN）结构预测检测结果。在训练过程中，引入重采样法扩充样本数量，解决样本数量不足问题；引入迁移学习，加速网络收敛。分别制作苹果、柿子数据集用于模型训练。试验结果表明，经迁移学习后该模型训练效率大幅提高；与流行的目标检测模型相比，优化后的模型在检测绿色柿子与绿色苹果时，精度分别为93.27%和91.35%。该方法可为其他果蔬绿色目标检测提供理论借鉴。

Abstract: Abstract: The posture of target fruit is ever-changing in the complex orchard environment. Some target fruits are homochromatic with background, and the limited number of samples have brought great challenges to accurately detect the target, due mainly to the difficulty of collecting some environmental data. Therefore, the detection needs to meet the high requirements of intelligent yield measurement and automatic harvesting, in terms of both accuracy and efficiency. In this study, an efficient detection model was proposed for the green target fruits suitable for small samples under different light conditions and fruit postures. The optimized transformer network was also employed in this model. Firstly, the convolutional neural network (CNN) was used to extract image features. The transformer encoder was input after feature dimension reduction and positional encoding. Multi-head attention and feed-forward network (FFN) were then selected to obtain the encoder output. Secondly, the transformer decoder processed the input using multi-head attention and feed-forward network. The positional encoding was then added to each link. After that, the outputs were generated with different data sizes. The bounding boxes were much larger than the actual objects after prediction, indicating a low missing rate of green target fruit after decoder settlement. Finally, the feed-forward network (FFN) was utilized to predict the detection. The training of detection model was mostly used sufficient samples to avoid overfitting in the training process for higher generalization of the model. Bootstrapping was also introduced to repeatedly mapping the original data for several times. As such, the expanding dataset was utilized to meet the high requirement of larger samples for the higher accuracy of detection mode in the training process. Transfer learning was selected to significantly improve the training efficiency of the model, while, accelerate the convergence of the network. The apple and persimmon datasets were made separately for the model training. The experimental results show that the training efficiency of the model was greatly improved by more than 13% after migration learning. An excellent illustration of features transferability increased the speed and efficiency of detection, as the difference decreased between the pre-training and target task. Transfer learning was adopted to improve the efficiency of the model, where the model converged faster and was better suitable for the complex orchard environment. The new model can widely be expected to effectively realize the detection of green target fruit in the complex orchard environment with multiple postures, illumination, and scenes, indicating better generalization ability and robustness. The accuracies of detection were 93.27% and 91.35%, respectively, when testing green persimmons and green apples. Consequently, the new optimized model presented the best performance, compared with the conventional. The finding can also provide a sound theoretical reference for the target detection of green fruits and vegetables in the intelligent yield measurement and automated harvesting in orchards.

基于优化Transformer网络的绿色目标果实高效检测模型

Efficient detection model of green target fruit based on optimized Transformer network