基于相机位姿恢复与神经辐射场理论的果树三维重建方法

    Three-demensional reconstruction of reconstructing fruit tree images using camera pose recovery and neural radiance fields theory

    • 摘要: 针对传统立体视觉三维重建技术难以准确表征果树多尺度复杂表型细节的问题,该研究提出了一种基于相机位姿恢复技术与神经辐射场理论的果树三维重建方法,设计了一套适用于标准果园环境的果树图像采集设备和采集方案。首先,环绕拍摄果树全景视频并以抽帧的方式获取果树多视角图像;其次,使用运动结构恢复算法进行稀疏重建以计算果树图像位姿;然后,训练果树神经辐射场,将附有位姿的多视角果树图像进行光线投射法分层采样和位置编码后输入多层感知机,通过体积渲染监督训练过程以获取收敛且能反映果树真实形态的辐射场;最后,导出具有高精度与高表型细节的果树三维实景点云模型。试验表明,该研究构建的果树点云能准确表征从植株尺度的枝干、叶冠等宏观结构到器官尺度的果实、枝杈、叶片乃至叶柄、叶斑等微观结构。果树整体精度达到厘米级,其中胸径、果径等参数达到毫米级精度,尺度一致性误差不超过5%。相较于传统的立体视觉三维重建方法,重建时间缩短39.50%,树高、冠幅、胸径和地径4个树形参数的尺度一致性误差分别降低了77.06%、83.61%、45.47%和62.23%。该方法能构建具有高精度、高表型细节的果树点云模型,为数字果树技术的应用奠定基础。

       

      Abstract: An accurate characterization is required for the multi-scale and complex phenotypic details of fruit trees using three-dimensional reconstruction in stereo vision. In this study, 3D reconstruction was proposed for the fruit tree using camera pose recovery and neural radiance field theory. Data acquisition of fruit tree images was designed suitable for standard orchard environments. The collection scheme was to capture the videos from the multiple heights around the fruit tree and then to extract the frames for the images. Firstly, the panoramic video was captured surrounding the fruit tree to obtain the multi-angle images using frame extraction. A three-axis stabilized gimbal camera was also utilized to capture the videos from multiple heights around the fruit tree and frame extraction. Secondly, the motion structure recovery was employed for the sparse reconstruction to calculate the pose of the fruit tree images. Subsequently, the neural radiance field was trained for the fruit tree. The images with poses were pre-processed to the ray projection, layered sampling, and position encoding. A multi-layer perceptron was then used for the training under the volume rendering supervision. A converged radiance field was represented by the true morphology of the fruit tree. Finally, the 3D point cloud model of the fruit tree was exported with high accuracy and detailed phenotypic features. Experimental results indicate that the collection scheme was more efficient in obtaining the multi-angle videos of the fruit tree. The stability of the fruit tree video frames was enhanced by combining the stabilized gimbal and digital images in both hardware and software, resulting in high-quality images from frame extraction. There was a significant improvement in the image acquisition speed, compared with the traditional stereo vision 3D reconstruction using handheld cameras. The average reprojection error in the Structure from Motion (SfM) sparse reconstruction stage was only 0.847 66 pixels, with a mean trajectory length of 5.851 04 for the 3D points. The average number of feature points observed per image was approximately 1 601.87, with 600 camera poses, and a 100% success rate in camera pose recovery. In the NeRF scene training stage, the neural radiance field training process was taken as 30 000 steps over 1 109.69 seconds, in order to generate approximately 1.2×105 rays per second. The scene representation was stabilized after 10 000 steps. Some parameters were gradually converged, such as learning rate and training loss, where the PSNR index was stabilized between 22~23 dB. The fruit tree point cloud accurately represented the macroscopic structures, such as branches and canopies at the plant scale, as well as microscopic structures including fruits, branches, leaves, and even leaf stems and spots at the organ scale. The overall accuracy of the model reached centimeter-level precision, with the scale consistency and color consistency accuracy generally exceeding 97%. Specific indicators (such as breast diameter and fruit diameter) were achieved in the millimeter-level precision, with the scale consistency errors exceeding 5%, and the color consistency accuracy reaching above 95%. The reconstruction time was reduced by 39.50%, and the errors in tree height, crown width, breast diameter, and ground diameter were reduced by 77.06%, 83.61%, 45.47%, and 62.23%, respectively, compared with the traditional SfM-MVS. Errors in hue, saturation, and brightness were also reduced by 20.88%, 99.85%, and 91.39%, respectively. This improved model can be expected to construct the point cloud of fruit trees with high accuracy and detailed phenotypic features. The finding can provide a strong reference for the various applications in the digital system of fruit trees.

       

    /

    返回文章
    返回