Abstract
The sorting of camellia oleifera husk and seed is one of the most important procedures after harvesting. Traditional image recognition has been widely used the color and shape characteristics during sorting. However, the accuracy and robustness cannot fully meet the large-scale production in modern agriculture. In this study, a modified model was constructed to realize the efficient and accurate sorting using improved convolutional neural network (CNN) VGG16. The photographs were also obtained to sort the camellia oleifera husk and seed using the automatic system, including the transmission device, industrial camera (MV-CA013-20GC), and broad area linear arrays. The brightness of image was then adjusted in real time. Data augmentation was used to increase the number of pictures to 11 904, such as the geometric transformation, gaussian noise, and brightness adjustment. The generalization ability and robustness of training deep model were improved significantly after that. The large number of parameters were reduced for the simplifying model in practice production, in order to avoid the more memory and computing resource. Convolutional layers that the numbers of convolution kernels were modified to half of the initial VGG16, and then replaced by the depthwise separable convolution, except for the first and the second convolutional layer. The number of neurons was reduced to 128 in the fully connected layers. Meanwhile, the output layer of Softmax was changed to three classifications, including the seek, husk and empty groove. These improvement was used to reduce the complexity of the model for the high operation speed. The cross-layer feature fusion was then utilized to ensure that the low-level image information was fully expressed in the whole neural network. Three fusion processes were used in the feature map concatenate method. In the first fusion, the feature map was generated by the second convolutional layer as the backbone feature map. The fourth layer was concatenated after 2×2 max pooling, and then input into the fifth layer. In the second fusion, the backbone feature map was intersected the four layers to concatenate with the seventh layer after 4×4 max pooling, and then input into the eighth layer. In the third fusion, the backbone feature map crossed seven layers to concatenate with the tenth layer after 8×8 max pooling and input into the eleventh layer. The ReLU activation function was replaced by the ELU in the neural network. Adaptive moment estimation optimizer was selected for the model training. The initial learning rate was 10-4. Batch size of training and verification stage was 16, and the epochs were set to 100. The training loss was calculated by the cross-entropy loss function. The result showed that the accuracy rate with the three times cross-layer feature fusion was generally higher than that with one and two times fusion in the whole epochs, where was steadily converged to about 98.5%. The accuracy rate of validation set was very low in the model without the feature fusion. But the performance was improved significantly after single cross-layer feature fusion, which the accuracy rate of validation set was maintained about 97.5%. It was found that the convergence speed of training model with the ELU activation function was faster than that with the ReLU. The training curve was more stable, due to the strong robustness for inhibiting gradient exploding. The overfitting occurred only at the large neurons number of the first and the second fully connected layers, such as 4 096, 2 048 or 1 024. The fitting degree of model decreased significantly, due to the excessive reduction of neurons number, such as 32. According to the comprehensive measurement of the parameter number and the accuracy rate, the optimal number of neurons was achieved in the 128, indicating the highest accuracy rate of validation set and the less gap between the training and validation. The validation accuracy rate of the advanced model was 98.78%, which was 0.84 percentage points increasing, compared with the unimproved. There were the more performance advantages than the AlexNet, VGG16, ResNet50, and MobileNet_V2. Besides, the memory requirement of advanced model was 8.41MB, which was 519.38MB less that of the initial model. Another batch of samples (containing 240 seeds, 195 husks, and 205 empty grooves) were predicted by this model, which only 7 seeds and 4 husks were judged incorrectly. The accuracy rate was 98.28% in the testing set, and the detection time of one sample was spent 85.06 ms on the average. Consequently, the advanced model presented the better stability and generalization performance, fully meeting the requirement of real-time sorting. Meanwhile, the deep CNN can be expected as more applicable way for the sorting of the camellia oleifera husk and seed.