Abstract
The accurate location of targets can greatly contribute to the precision operations within agricultural scenes. Binocular stereo vision can be used to obtain the three-dimensional (3D) perception of the real world. Considerable application potential can depend mainly on the 3D localization and point cloud reconstruction of targets in agricultural environments. This study aims to review the latest research on binocular stereo vision and its applications in the agricultural field. Firstly, the pipeline of binocular stereo vision was summarized, including binocular camera calibration, epipolar rectification, and stereo matching. The binocular camera was utilized to calculate the depth of targets using disparity data. The objective of stereo vision calibration was then to determine the intrinsic and extrinsic parameters of the camera, including reference calibration, active vision calibration, self-calibration, and neural network calibration. A mapping was also established among points in pixel and world coordinates. In epipolar rectification, the constraints were employed to reduce the search space, in order to match the points from two dimensions to one. Stereo matching was used to calculate the disparity, in order to match the left and right images in both feature-based and deep learning. Furthermore, the local, global, and semi-global methods were categorized in the search range of matching pixels. The local method was used to search for the matching points within surrounding areas, the global method was to minimize the global energy function, and the semi-global method was to aggregate the costs from various directions. In contrast, more complex features were learned to enhance the stereo matching using deep learning. The network frameworks were introduced, such as convolutional neural networks (CNN), generative adversarial networks (GAN), transformers, neural architecture search (NAS), iterative optimization (IO) and graph neural network (GNN). CNNs performed extensive convolution operations to compute the matching costs for high accuracy, including convolutional encoders and decoders, hierarchical pyramids, as well as complex cost volumes. GANs synthesized the data through adversarial generation, in order to acquire the realistic disparity in binocular datasets. In Transformer, the self-attention mechanisms were utilized to capture the contextual information, indicating the limitations of CNN receptive fields. In NAS, the stereo-matching network architectures were automatically constructed to incorporate the human prior knowledge, in order to removal the peripolar need for manual design. The IO with no requirements on the construction of cost volumes and aggregation, leading to significant resource savings for the large ranges of disparity. GNN was used to simulate the complex relationships among features, and then extract the global information. Furthermore, the number and impact of publications were analyzed to examine the widespread applications of binocular stereo vision in agricultural research. The latest applications were explored from the recent literature. As such, 3D localization of fruit targets facilitated map navigation in practical operations using point cloud processing. The technology also supported the 3D reconstruction of crops or the segmentation of individual organs for growth parameter measurement. Additionally, the crop diseases or pests were identified to combine with precision spraying by agricultural machinery. Ultimately, the challenges were summarized to apply the binocular stereo vision to agriculture. The high precision was demonstrated in the localization, measurement, and identification. Some issues still remained, such as model complexity, scene limitations, scarcity of datasets, and fewer evaluation standards for stereo matching. Looking ahead, future research should focus on the algorithm design and optimization, the intelligent assistance platforms, the comprehensive datasets, and the evaluation system, in order to enhance the practicality and efficiency of binocular stereo vision systems in precision agriculture.