Abstract:
Current target detection cannot adapt to the tilted targets so far. Particularly, the objects are not aligned vertically or horizontally, such as the bananas in orchards that typically grow at varying angles. Furthermore, the large number of parameters are also confined to deploy on the embedded devices with limited computing resources. Moreover, the high computational requirements of advanced models have hindered their deployment on embedded systems, which are commonly used in agricultural automation. In this study, an improved YOLOv7 algorithms was proposed for the banana target detection and localization with a rotational positioning frame. GSConv module was also incorporated to reduce the computational complexity and the number of parameters in the model for the high detection accuracy. Specifically, the GSConv module was the lightweight convolutional structure to maintain the model efficiency, thus more feasible for the real-time applications on embedded platforms with constrained hardware. The banana target was firstly positioned using a rotational bounding box, and then defined by a five-parameter representation. The tilted objects were better handled, as the rotational bounding box was used to accurately represent the various angles at which bananas grow. Furthermore, the Kullback-Leibler divergence loss function was employed to map the rotational frame into a two-dimensional Gaussian distribution. As such, the difference between two probability distributions (the predicted box and the ground truth box) was calculated to optimize as the loss function. The KL divergence-based approach also provided a more precise measure on the difference between the predicted and the actual bounding box, thus improving the localization accuracy. Another, Criminisi algorithm was integrated to solve the problems on the local holes of depth camera, where depth information was missing, due to the influencing factors, such as light interference or occlusions. Among them, the image inpainting was implemented to fill the missing information, and then correct the positioning errors in the depth data, further enhancing the accuracy of three-dimensional localization in the real-world orchard environments. Experimental results show the significant improvements were achieved in the detection speed and accuracy of banana targets using the improved model. Specifically, the average detection accuracy reached 96.15%, which was an impressive 17.04% increase over the standard YOLOv7 model. Additionally, the detection frame rate of the improved model was boosted by approximately 40 frames per second, highly suitable for real-time applications in agricultural settings. Moreover, the position of banana stems was predicted to notably enhance using the rotational boundary frame. The mean positioning error was reduced to 7.02 mm, and the mean error ratio was now 0.65%, which were reduced to 24.3 mm and 1.96%, respectively, compared with the original YOLOv7. In conclusion, the improved model can offer an effective solution to the fast and accurate identification and localization of banana bunches and fruit stalks in complex orchard environments. The higher detection accuracy was also achieved to significantly reduce the computational requirements, particularly for the real-time agricultural applications on embedded devices.