Abstract:
An accurate characterization is required for the multi-scale and complex phenotypic details of fruit trees using three-dimensional reconstruction in stereo vision. In this study, 3D reconstruction was proposed for the fruit tree using camera pose recovery and neural radiance field theory. Data acquisition of fruit tree images was designed suitable for standard orchard environments. The collection scheme was to capture the videos from the multiple heights around the fruit tree and then to extract the frames for the images. Firstly, the panoramic video was captured surrounding the fruit tree to obtain the multi-angle images using frame extraction. A three-axis stabilized gimbal camera was also utilized to capture the videos from multiple heights around the fruit tree and frame extraction. Secondly, the motion structure recovery was employed for the sparse reconstruction to calculate the pose of the fruit tree images. Subsequently, the neural radiance field was trained for the fruit tree. The images with poses were pre-processed to the ray projection, layered sampling, and position encoding. A multi-layer perceptron was then used for the training under the volume rendering supervision. A converged radiance field was represented by the true morphology of the fruit tree. Finally, the 3D point cloud model of the fruit tree was exported with high accuracy and detailed phenotypic features. Experimental results indicate that the collection scheme was more efficient in obtaining the multi-angle videos of the fruit tree. The stability of the fruit tree video frames was enhanced by combining the stabilized gimbal and digital images in both hardware and software, resulting in high-quality images from frame extraction. There was a significant improvement in the image acquisition speed, compared with the traditional stereo vision 3D reconstruction using handheld cameras. The average reprojection error in the Structure from Motion (SfM) sparse reconstruction stage was only 0.847 66 pixels, with a mean trajectory length of 5.851 04 for the 3D points. The average number of feature points observed per image was approximately 1 601.87, with 600 camera poses, and a 100% success rate in camera pose recovery. In the NeRF scene training stage, the neural radiance field training process was taken as 30 000 steps over 1 109.69 seconds, in order to generate approximately 1.2×10
5 rays per second. The scene representation was stabilized after 10 000 steps. Some parameters were gradually converged, such as learning rate and training loss, where the PSNR index was stabilized between 22~23 dB. The fruit tree point cloud accurately represented the macroscopic structures, such as branches and canopies at the plant scale, as well as microscopic structures including fruits, branches, leaves, and even leaf stems and spots at the organ scale. The overall accuracy of the model reached centimeter-level precision, with the scale consistency and color consistency accuracy generally exceeding 97%. Specific indicators (such as breast diameter and fruit diameter) were achieved in the millimeter-level precision, with the scale consistency errors exceeding 5%, and the color consistency accuracy reaching above 95%. The reconstruction time was reduced by 39.50%, and the errors in tree height, crown width, breast diameter, and ground diameter were reduced by 77.06%, 83.61%, 45.47%, and 62.23%, respectively, compared with the traditional SfM-MVS. Errors in hue, saturation, and brightness were also reduced by 20.88%, 99.85%, and 91.39%, respectively. This improved model can be expected to construct the point cloud of fruit trees with high accuracy and detailed phenotypic features. The finding can provide a strong reference for the various applications in the digital system of fruit trees.