Abstract:
To achieve precision operations, it is essential to accurately locate targets within agricultural scenes. As an important method for obtaining three-dimensional (3D) perception of the real world, binocular stereo vision technology can facilitate the 3D localization and point cloud reconstruction of targets in agricultural environments, thereby showcasing considerable application potential. This paper conducts an in-depth study of binocular stereo vision technology and its applications in the agricultural field. Firstly, we summarize the pipeline of binocular stereo vision technology, reviewing its latest research advancements along the technical threads of binocular camera calibration, epipolar rectification, and stereo matching. The binocular camera calculates the depth information of targets based on disparity results. The objective of stereo vision calibration is to determine the intrinsic and extrinsic parameters of the camera, establishing a mapping between points in pixel coordinates and world coordinates, which includes reference calibration, active vision calibration, self-calibration, and neural network calibration methods. Epipolar rectification employs constraints to reduce the search space for matching points from two dimensions to one. Stereo matching calculates disparity by matching left and right images in both feature-based and deep learning methods. Feature-based methods can be further categorized into local, global, and semi-global methods, depending on the search range of matching pixels. Local methods search for matching points within surrounding areas, global methods minimize the global energy function, and semi-global methods aggregate costs from various directions. In contrast, deep learning methods can learn more complex features to enhance stereo matching results, further categorized by network frameworks such as Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN), and Transformer methods. In addition, three prominent methods of Neural Architecture Search (NAS)、 Iterative Optimization (IO) and Graph Neural Network (GNN) are also introduced. CNNs perform extensive convolution operations to compute matching costs, with notable research directions aim at improving accuracy, including convolutional encoders and decoders, hierarchical pyramids, and complex cost volumes. GANs synthesize data through adversarial generation, alleviating the challenges of acquiring realistic disparity in binocular datasets. Transformer methods utilize self-attention mechanisms to capture contextual information, addressing the limitations of CNN receptive fields. NAS can automatically construct stereo matching network architectures by incorporating human prior knowledge, eliminating the need for manual design. IO methods do not require the construction of cost volumes and cost aggregation, saving significant resources and enabling the processing of large disparity ranges. GNN can model complex relationships between features and extract global information. Furthermore, the trend in the number of publications in recent years are analysed to examine the widespread applications of binocular stereo vision technology in agricultural research. Recent literature is synthesized to explore the latest applications. This technology enables 3D localization of fruit targets and facilitates map navigation in practical operations based on point cloud processing. It also supports 3D reconstruction of crops or segmentation of individual organs for growth parameter measurement. Additionally, it aids in the identification of crop diseases or pests, combined with precision spraying by agricultural machinery. Ultimately, we summarize the challenges of applying binocular stereo vision technology in agriculture. Although it demonstrates high precision in localization, measurement, and identification, it still faces issues such as model complexity, scene limitations, scarcity of datasets, and a lack of evaluation standards for stereo matching. Looking ahead, future research in this technology for agricultural applications should focus on algorithm design and optimization, the establishment of intelligent assistance platforms, the construction of comprehensive datasets, and the improvement of evaluation system to further enhance the practicality and efficiency of binocular stereo vision systems in precision agriculture.