Abstract:
Rice is one of the most essential grain crops in China, providing an important guarantee for the food supply. The taste and nutrition of rice are ever-increasing with the development of society and the improvement of living standards. Therefore, it is necessary to accelerate the breeding and improvement to ensure the quantity and quality of rice. Among them, skeleton and phenotypic parameters can be used to represent the growth and health status of rice for better breeding and improvement. In this study, the object and key points detection models were used to extract the skeleton for the phenotypic parameters. The images of single-tiller rice were taken as the research object. The bounding box of spikes, stems and leaves was also detected by the object detection model. The predicted key points were connected to form the rice skeleton, according to the semantic information. Four phenotypic parameters were calculated, including spike length, stem length, leaf length, and leaf-stem angle, according to the key point coordinates. Firstly, 1 081 RGB images of single-tiller rice were collected in total. The datasets of single-tiller rice were created for object detection and key points detection. Secondly, four current mainstream object detection models were trained, namely Faster R-CNN, YOLOv3, YOLOv5s, and YOLOv5m. The best detection was achieved in YOLOv5m, with mean average precision (mAP) reaching 91.17%, compared with the rest, the mAP of YOLOv5m was improved by 49.55, 36.38, and 2.69 percentage points, respectively, compared with Faster R-CNN, YOLOv3 and YOLOv5s. The predicted bounding box and category were drawn on the original picture to observe the prediction of the model. The visualization results showed that YOLOv5m was basically detected in the bounding box and category of spikes, stems and leaves. Then, the cascaded pyramid network (CPN) model was used for human pose estimation and then applied to plant skeleton extraction. The attention mechanism squeeze and excitation networks (SENet) and convolutional block attention module (CBAM) were integrated into the backbone to improve the feature extraction ability of the model. By contrast, the key points prediction accuracies of SE-CPN and CBAM-CPN were higher than that of CPN. Furthermore, CBAM-CPN shared the highest prediction accuracy of key points, with accuracy of 95.24%, 95.74%, and 93.27% for spike, stem and leaf, respectively. The average accuracy reached 94.75%. The prediction accuracy of the CBAM-CPN model was improved by 9.68, 8.83, and 1.06 percentage points, respectively, compared with hourglass networks (HN), stacked hourglass networks (SHN) and CPN models. The root mean square errors (RMSE) of the phenotypic parameters were 1.06 cm, 0.81 cm, 1.25 cm, and 2.94° respectively. Lastly, the RMSE of four phenotypic parameters were 1.48 cm, 1.05 cm, 1.74 cm and 2.39°, combined with YOLOv5m and CBAM-CPN. The errors were reduced by 1.65 cm, 3.43 cm, 2.65 cm and 4.75°, respectively, compared with SHN. The better prediction was achieved in the improved model. Moreover, the formed skeleton can be expected to better fit the morphological structure of single-tiller rice. The feasibility of the improved model was further verified to combine the object and key points detection model, in order to extract the skeleton and phenotypic parameters of the single-tiller rice. In conclusion, higher detection accuracy was achieved in the key points of single-tiller rice plants. The skeleton and phenotypic parameters were extracted more efficiently and accurately. The findings can provide a strong reference to accelerate the breeding and improvement of rice.