Potato detection in complex environment based on improved YoloV4 model

Zhang ZhaoGuo; Zhang Zhendong; Li Jianian; Wang Haiyi; Li Yanbin; Li Donghao

doi:10.11975/j.issn.1002-6819.2021.22.019

Zhang ZhaoGuo, Zhang Zhendong, Li Jianian, Wang Haiyi, Li Yanbin, Li Donghao. Potato detection in complex environment based on improved YoloV4 model[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(22): 170-178. DOI: 10.11975/j.issn.1002-6819.2021.22.019

Citation:

Potato detection in complex environment based on improved YoloV4 model

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Abstract: Potatoes have been provided more guarantee for the national food security as the fourth largest food crop in China. However, the relatively low harvest efficiency and intelligence operation have been serious bottlenecks in the potato industry at present. It is necessary to real-time detect and evaluate the potato's state during harvesting, particularly on the grading and cleaning treatment in a combine harvester. In this study, a machine learning model was proposed to quickly and accurately identify the number and damage of potatoes under the various working environments, such as light brightness, shielding of soil and potato blocks, machine vibration, and dust interference. A lightweight attention mechanism was also introduced into the convolutional neural residual network. The attention mechanism acted on the full connection layer was then added to the YoloV4 using the different weights of each channel. The original K-means aggregation was abandoned, due to the relatively consistent size of potatoes. Three output layers of YoloV4 were combined into a large output layer, where the cspdarknet53 was replaced by the mobile netv3 network structure to realize the feature extraction. As such, the MobilenetV3 presented an inverse residual structure with the deeply separable convolution blocks and linear bottlenecks. The amount of calculation and parameters were reduced to 1/4 of the original using the H-swish activation function instead of the swish function, thereby significantly improving the detection speed without loss of the recognition rate of the potato. Some operations were selected to process the collected images for the better generalization ability of the training model, including the horizontal flip, vertical flip, mirror image, and adding noise. Among them, there were 1 296 images with high quality, 322 images of mechanically damaged potatoes, and 231 images with disturbing for comparison. The collected image data set was used for the model training at the workstation, where the loss value of training set and test set were recorded. Subsequently, the comparative and field tests were carried out, where the trained network was introduced into the embedded equipment. The evaluation indexes were set as the precision-recall curve, AP (detection accuracy), map (mean value of AP value in all categories) and detection speed. It was proved that the depth learning improved the recognition accuracy of potato, compared with the traditional open CV model. The MobilenetV3-YoloV4 also presented a higher recognition speed, and an excellent extraction performance to the target, compared with YoloV4, YoloV3, VGG16, and traditional open CV models. The results show that the average accuracy of potato recognition was 91.4%, indicating strong robustness for the target detection of normal potato and mechanically damaged potato in various environments. There was a better performance at the illumination of 30o, 45o, 60o and 90o, where the transmission speed of 23.01 frames per second when the network model was applied to embedded devices. A field experiment proved that the MobilenetV3-YoloV4 was used to real-time detect the potato flow in the actual harvest. According to the flow, the separation speed of the vertical annular was adjusted to avoid the excessive accumulation of potatoes, when the potato was fed too much. Otherwise, the linear scratch between potato and soil potato would result in the increase of the skin breaking rate. Once the feeding amount was reduced, the rotating speed of the vertical annular was adjusted to reduce the damage caused by the vibration of the device, where there was less energy consumption, as well as the less linear scratch between the potato and the grid. This finding can provide sound technical support for the intelligent cleaning and grading of potatoes in a combine harvester.

FullText(HTML)

References (27)

Cited By

Potato detection in complex environment based on improved YoloV4 model

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content