基于YOLOv5m和CBAM-CPN的单分蘖水稻表型参数提取

    Extraction of the single-tiller rice phenotypic parameters based on YOLOv5m and CBAM-CPN

    • 摘要: 为快速获取单分蘖水稻植株的形态结构和表型参数,该研究提出了一种基于目标检测和关键点检测模型相结合的骨架提取和表型参数获取方法。该方法基于目标检测模型生成穗、茎秆、叶片的边界框和类别,将所得数据分别输入到关键点检测模型检测各部位关键点,按照语义信息依次连接关键点形成植株骨架,依据关键点坐标计算穗长度、茎秆长度、叶片长度、叶片-茎秆夹角4种表型参数。首先,构建单分蘖水稻的关键点检测和目标检测数据集;其次,训练Faster R-CNN、YOLOv3、YOLOv5s、YOLOv5m目标检测模型,经过对比,YOLOv5m的检测效果最好,平均精度均值(mean average precision,mAP)达到91.17%;然后,应用人体姿态估计的级联金字塔网络(cascaded pyramid network,CPN)提取植株骨架,并引入注意力机制CBAM(convolutional block attention module)进行改进,与沙漏网络(hourglass networks,HN)、堆叠沙漏网络模型(stacked hourglass networks,SHN)和CPN模型相比,CBAM-CPN模型的预测准确率分别提高了9.68、8.83和1.06个百分点,达到94.75%,4种表型参数的均方根误差分别为1.06 cm、0.81 cm、1.25 cm和2.94°。最后,结合YOLOv5m和CBAM-CPN进行预测,4种表型参数的均方根误差分别为1.48、1.05、1.74 cm和2.39°,与SHN模型相比,误差分别减小1.65 cm、3.43 cm、2.65 cm和4.75°,生成的骨架基本能够拟合单分蘖水稻植株的形态结构。所提方法可以提高单分蘖水稻植株的关键点检测准确率,更准确地获取植株骨架和表型参数,有助于加快水稻的育种和改良。

       

      Abstract: Rice is one of the most essential grain crops in China, providing an important guarantee for the food supply. The taste and nutrition of rice are ever-increasing with the development of society and the improvement of living standards. Therefore, it is necessary to accelerate the breeding and improvement to ensure the quantity and quality of rice. Among them, skeleton and phenotypic parameters can be used to represent the growth and health status of rice for better breeding and improvement. In this study, the object and key points detection models were used to extract the skeleton for the phenotypic parameters. The images of single-tiller rice were taken as the research object. The bounding box of spikes, stems and leaves was also detected by the object detection model. The predicted key points were connected to form the rice skeleton, according to the semantic information. Four phenotypic parameters were calculated, including spike length, stem length, leaf length, and leaf-stem angle, according to the key point coordinates. Firstly, 1 081 RGB images of single-tiller rice were collected in total. The datasets of single-tiller rice were created for object detection and key points detection. Secondly, four current mainstream object detection models were trained, namely Faster R-CNN, YOLOv3, YOLOv5s, and YOLOv5m. The best detection was achieved in YOLOv5m, with mean average precision (mAP) reaching 91.17%, compared with the rest, the mAP of YOLOv5m was improved by 49.55, 36.38, and 2.69 percentage points, respectively, compared with Faster R-CNN, YOLOv3 and YOLOv5s. The predicted bounding box and category were drawn on the original picture to observe the prediction of the model. The visualization results showed that YOLOv5m was basically detected in the bounding box and category of spikes, stems and leaves. Then, the cascaded pyramid network (CPN) model was used for human pose estimation and then applied to plant skeleton extraction. The attention mechanism squeeze and excitation networks (SENet) and convolutional block attention module (CBAM) were integrated into the backbone to improve the feature extraction ability of the model. By contrast, the key points prediction accuracies of SE-CPN and CBAM-CPN were higher than that of CPN. Furthermore, CBAM-CPN shared the highest prediction accuracy of key points, with accuracy of 95.24%, 95.74%, and 93.27% for spike, stem and leaf, respectively. The average accuracy reached 94.75%. The prediction accuracy of the CBAM-CPN model was improved by 9.68, 8.83, and 1.06 percentage points, respectively, compared with hourglass networks (HN), stacked hourglass networks (SHN) and CPN models. The root mean square errors (RMSE) of the phenotypic parameters were 1.06 cm, 0.81 cm, 1.25 cm, and 2.94° respectively. Lastly, the RMSE of four phenotypic parameters were 1.48 cm, 1.05 cm, 1.74 cm and 2.39°, combined with YOLOv5m and CBAM-CPN. The errors were reduced by 1.65 cm, 3.43 cm, 2.65 cm and 4.75°, respectively, compared with SHN. The better prediction was achieved in the improved model. Moreover, the formed skeleton can be expected to better fit the morphological structure of single-tiller rice. The feasibility of the improved model was further verified to combine the object and key points detection model, in order to extract the skeleton and phenotypic parameters of the single-tiller rice. In conclusion, higher detection accuracy was achieved in the key points of single-tiller rice plants. The skeleton and phenotypic parameters were extracted more efficiently and accurately. The findings can provide a strong reference to accelerate the breeding and improvement of rice.

       

    /

    返回文章
    返回