段宇飞, 孙记委, 王焱清, 张三强. 基于改进卷积神经网络的油茶果壳籽分选方法[J]. 农业工程学报, 2023, 39(3): 154-161. DOI: 10.11975/j.issn.1002-6819.202209096
    引用本文: 段宇飞, 孙记委, 王焱清, 张三强. 基于改进卷积神经网络的油茶果壳籽分选方法[J]. 农业工程学报, 2023, 39(3): 154-161. DOI: 10.11975/j.issn.1002-6819.202209096
    DUAN Yufei, SUN Jiwei, WANG Yanqing, ZHANG Sanqiang. Sorting camellia oleifera husk and seed using an improved convolutional neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(3): 154-161. DOI: 10.11975/j.issn.1002-6819.202209096
    Citation: DUAN Yufei, SUN Jiwei, WANG Yanqing, ZHANG Sanqiang. Sorting camellia oleifera husk and seed using an improved convolutional neural network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2023, 39(3): 154-161. DOI: 10.11975/j.issn.1002-6819.202209096

    基于改进卷积神经网络的油茶果壳籽分选方法

    Sorting camellia oleifera husk and seed using an improved convolutional neural network

    • 摘要: 为了进一步提升油茶果壳籽分选效率,该研究采集油茶果脱壳后经过初步筛分的果壳与茶籽图像,构建壳籽分类图像数据集,以VGG16为基础网络,通过深度可分离卷积模块和全连接层神经元数目优选等方式缩小模型规模,采用跨层特征融合机制与引入指数线性单元(exponential linear units, ELU)激活函数优化网络结构,提出一种适用于油茶果壳籽分选的卷积神经网络模型。结果表明,跨层特征融合机制加强了深层网络特征的有效信息表达能力,相比于未融合时的模型精度得到了明显提升,并且三次跨层特征融合总体优于一次与二次融合方式。ELU激活函数加快了模型收敛速度,同时缓解了梯度爆炸,提高了模型鲁棒性。当全连接层神经元个数减少为128时模型得到进一步压缩,并且拟合程度较好。改进模型在油茶果壳籽图像分类上的验证集准确率为98.78%,模型的占存仅需8.41MB,与未改进的VGG16模型相比,准确率提高了0.84个百分点,模型占存减少了519.38MB,并且改进模型的性能相比于AlexNet、ResNet50与MobileNet_V2等其他网络更具优势,同时在测试试验中该模型分选准确率达到了98.28%,平均检测时间为85.06 ms,满足油茶果的壳籽在线快速分选要求。该研究提出的改进卷积神经网络模型具有较高的准确率与较强的泛化能力,可为深度学习运用于油茶果壳籽实时分选提供理论参考。

       

      Abstract: The sorting of camellia oleifera husk and seed is one of the most important procedures after harvesting. Traditional image recognition has been widely used the color and shape characteristics during sorting. However, the accuracy and robustness cannot fully meet the large-scale production in modern agriculture. In this study, a modified model was constructed to realize the efficient and accurate sorting using improved convolutional neural network (CNN) VGG16. The photographs were also obtained to sort the camellia oleifera husk and seed using the automatic system, including the transmission device, industrial camera (MV-CA013-20GC), and broad area linear arrays. The brightness of image was then adjusted in real time. Data augmentation was used to increase the number of pictures to 11 904, such as the geometric transformation, gaussian noise, and brightness adjustment. The generalization ability and robustness of training deep model were improved significantly after that. The large number of parameters were reduced for the simplifying model in practice production, in order to avoid the more memory and computing resource. Convolutional layers that the numbers of convolution kernels were modified to half of the initial VGG16, and then replaced by the depthwise separable convolution, except for the first and the second convolutional layer. The number of neurons was reduced to 128 in the fully connected layers. Meanwhile, the output layer of Softmax was changed to three classifications, including the seek, husk and empty groove. These improvement was used to reduce the complexity of the model for the high operation speed. The cross-layer feature fusion was then utilized to ensure that the low-level image information was fully expressed in the whole neural network. Three fusion processes were used in the feature map concatenate method. In the first fusion, the feature map was generated by the second convolutional layer as the backbone feature map. The fourth layer was concatenated after 2×2 max pooling, and then input into the fifth layer. In the second fusion, the backbone feature map was intersected the four layers to concatenate with the seventh layer after 4×4 max pooling, and then input into the eighth layer. In the third fusion, the backbone feature map crossed seven layers to concatenate with the tenth layer after 8×8 max pooling and input into the eleventh layer. The ReLU activation function was replaced by the ELU in the neural network. Adaptive moment estimation optimizer was selected for the model training. The initial learning rate was 10-4. Batch size of training and verification stage was 16, and the epochs were set to 100. The training loss was calculated by the cross-entropy loss function. The result showed that the accuracy rate with the three times cross-layer feature fusion was generally higher than that with one and two times fusion in the whole epochs, where was steadily converged to about 98.5%. The accuracy rate of validation set was very low in the model without the feature fusion. But the performance was improved significantly after single cross-layer feature fusion, which the accuracy rate of validation set was maintained about 97.5%. It was found that the convergence speed of training model with the ELU activation function was faster than that with the ReLU. The training curve was more stable, due to the strong robustness for inhibiting gradient exploding. The overfitting occurred only at the large neurons number of the first and the second fully connected layers, such as 4 096, 2 048 or 1 024. The fitting degree of model decreased significantly, due to the excessive reduction of neurons number, such as 32. According to the comprehensive measurement of the parameter number and the accuracy rate, the optimal number of neurons was achieved in the 128, indicating the highest accuracy rate of validation set and the less gap between the training and validation. The validation accuracy rate of the advanced model was 98.78%, which was 0.84 percentage points increasing, compared with the unimproved. There were the more performance advantages than the AlexNet, VGG16, ResNet50, and MobileNet_V2. Besides, the memory requirement of advanced model was 8.41MB, which was 519.38MB less that of the initial model. Another batch of samples (containing 240 seeds, 195 husks, and 205 empty grooves) were predicted by this model, which only 7 seeds and 4 husks were judged incorrectly. The accuracy rate was 98.28% in the testing set, and the detection time of one sample was spent 85.06 ms on the average. Consequently, the advanced model presented the better stability and generalization performance, fully meeting the requirement of real-time sorting. Meanwhile, the deep CNN can be expected as more applicable way for the sorting of the camellia oleifera husk and seed.

       

    /

    返回文章
    返回