Zhou Yuncheng, Xu Tongyu, Deng Hanbing, Miao Teng. Real-time recognition of main organs in tomato based on channel wise group convolutional network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(10): 153-162. DOI: 10.11975/j.issn.1002-6819.2018.10.019
    Citation: Zhou Yuncheng, Xu Tongyu, Deng Hanbing, Miao Teng. Real-time recognition of main organs in tomato based on channel wise group convolutional network[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2018, 34(10): 153-162. DOI: 10.11975/j.issn.1002-6819.2018.10.019

    Real-time recognition of main organs in tomato based on channel wise group convolutional network

    • Abstract: The real-time and accurate recognition for the organs in tomato is crucial to achieve automated agricultural production, such as automatic harvesting and targeted drug application. Due to the difference of fruit maturity and flower age, and the color of tomato organs varying frequently during the growth period, it is very difficult for the simultaneous detection of different tomato organs with the traditional image segmentation method based on color space. In the meanwhile, because of the non-real-time nature of filters such as SIFT (scale-invariant feature transform) and Haar-like, they cannot be directly applied to the real-time detection of plant organs. The convolution neural networks (CNNs) can automatically extract the low-level image feature and high-level semantics of image, and they have real-time performance on GPU (graphics processing unit) devices. Therefore, inspired by Faster R-CNN, especially YOLOv2, in this paper, we proposed a real-time recognition method of main organs in tomato based on CNNs, and designed a corresponding recognition network model which can predict the object boundary and type only using feature map. In greenhouse, the images of various forms of tomato flower, fruit and stem organs were collected, and the image data set of tomato organs was constructed according to the influence of illumination of the image which was considered during the acquisition process. For the sake of screening the underlying network structure for recognition network, the performance and applicability of several typical CNN-based classification networks were analyzed based on the criterion of model size, statistical separability, classification performance, and computation speed. Inspired by the advantages of these typical networks, a channel wise group convolutional (CWGC) block and a corresponding classification network (CWGCNet) were designed. A sample extension training method was presented to further improve the feature extraction ability of these classification networks. The CWGCNet, Darknet-19 and Inception v2 were selected as candidate infrastructure for recognition network. Subsequently, a CWGC block with dropout layer and 3 full convolution layers were respectively attached to the infrastructure to form the overall recognition architecture. Based on the Microsoft Cognitive Toolkit (CNTK), all CNN-based classification networks and recognition networks were implemented by using of Python, and the relevant experiment was performed on a computer equipped with a Tesla K40c GPU. The results show that, compared with the typical CNN-based classification networks, CWGCNet combines high feature statistical separability and real-time performance. On tomato organ image dataset, using Caltech256 to perform sample extension training can significantly improve the feature extraction ability of the classification networks. Compared with the exponential function, the nonlinear scaling factor in the Sigmoid form makes the recognition networks easier to train. In contrast to the 3 full convolution layers, using CWGC block with dropout as an additional convolution layer to the recognition network CNN infrastructure can dramatically reduce the size of the model, while significantly improve the network recall rate, recognition speed and average precision (AP). The convolution part of CWGCNet and the CWGC block with dropout are used as the final structure of the recognition network. The final recognition network can identify the different maturity and different forms of tomato organs, which gets the AP of 96.52%, 97.85% and 82.62% respectively for flower, fruit and stem. The growth stage and maturity of tomato organs have a certain influence on the recognition accuracy, and especially the flowering flower, full maturity fruit and lower stem have higher recognition accuracy. The final network can recall different forms of tomato organs, and the recall rates of flower, fruit and stem can reach 77.39%, 69.33% and 64.23% separately. And the recognition speed of the final network is 62 fps. Compared with YOLOv2, the recall rate can be improved by 14.03 percentage points, and AP can be improved by 2.51 percentage points.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return