Li Guojin, Huang Xiaojie, Li Xiuhua, Ai Jiaoyan. Detection model for wine grapes using MobileNetV2 lightweight network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(17): 168-176. DOI: 10.11975/j.issn.1002-6819.2021.17.019
    Citation: Li Guojin, Huang Xiaojie, Li Xiuhua, Ai Jiaoyan. Detection model for wine grapes using MobileNetV2 lightweight network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2021, 37(17): 168-176. DOI: 10.11975/j.issn.1002-6819.2021.17.019

    Detection model for wine grapes using MobileNetV2 lightweight network

    • Efficient detection of grape image has widely been one of the most important technologies in automatic grape harvesting robots. In this study, a wine grape detection model (WGDM) was proposed to improve the speed and accuracy of field grape detection using a lightweight network. Firstly, the MobileNetV2 lightweight network was adopted to significantly increase the detection speed for real-time objects in the WGDM model, due to the smaller size, faster speed, and higher accuracy in the image recognition, compared with DarkNet53 in the original YOLOv3. Secondly, the M-Res2Net module was added to the multi-scale detection of YOLOv3, as some standard convolutional layers with 11 and 33 convolution kernels were removed, particularly for the better capability of multi-scale feature extraction and higher accuracy of detection in the improved model. Finally, a new location loss function was established using the balanced loss and the intersection over union loss. The classification and object loss stayed the same as the YOLO. As such, a more balance was achieved in the object, classification and location during the model training, thereby to enlarge the precision of object location. Different detection models were trained, including the proposed WGDM, Single Shot Detector (SSD), the original YOLOv3, YOLOv4, and Faster Regions with Convolutional Neural Network (Faster R-CNN). The available wine grape instance segmentation dataset (WGISD) was also selected, including 300 images of wine grape and 300 annotation files with 4 432 objects under the same experimental conditions. Additionally, the resolution of input image was adjusted from the original resolution of 2 0481 365 pixels or 2 0481 536 pixels to 608608 pixels. The experimental results showed that the proposed WGDM model in the test set of wine grape image dataset achieved an average accuracy of 81.20%. The F1-score (a metric function that balances the precision and recall of the model) of the proposed model reached 0.856 3, which was 0.056 3 higher than that of SSD, 0.005 4 higher than that of the original YOLOv3, 0.041 7 higher than that of YOLOv4, and 0.012 5 higher than that of Faster R-CNN. The network structure size of the proposed model was 44 MB, which was 50 MB smaller than that of SSD, 191 MB smaller than that of the original YOLOv3 or YOLOv4, and 83 MB less than that of Faster R-CNN. The average detection time for each grape image in the proposed model was 6.29 ms, which was 4.91 ms shorter than that of SSD, 7.75 ms shorter than that of the original YOLOv3, 14.84 ms shorter than that of YOLOv4, and 158.2 ms shorter than that of Faster R-CNN. Moreover, the number of floating-point operations (the sum of the number of multiplication operations and the number of addition operations) of the proposed model was only 10.14 109, which was 11.58% of SSD 14.54% of the original YOLOv3, 16.05% of YOLOv4, and 5.48%-15.33% of Faster R-CNN. Therefore, the proposed WGDM model presented the faster and more accurate recognition and location of grape fruits in the field, providing a feasible path for the efficient visual detection of grape picking robots.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return