基于THz成像和集成学习的番茄根长表型提取及预测

    Extracting and predicting tomato root length phenotype using THz imaging and ensemble learning

    • 摘要: 为检测番茄根系表型,该研究基于THz(Terahertz)成像和集成学习提出一种根系检测技术。首先,对20天生长过程中番茄根系进行多次THz成像。其次,对最优重构后的根系THz伪彩色图去除根系重叠和主根区域的噪声数据。再次,采用Rosenfeld细化算法和滑动窗口遍历法计算根系长度。最后,提取根系有效区域中THz时域光谱和折射率光谱,由Stacking集成模型对番茄根长进行预测。由THz成像计算的番茄根长结果误差小,平均相对误差仅为4.16%;由THz时域数据预测的根长与计算得到的根长之间最大决定系数为0.999,最小均方根误差为0.743 cm;由折射率光谱数据预测根长的最大决定系数为0.998,最小均方根误差为0.976 cm。该方法不仅能根据THz图像准确地计算出番茄根系的长度,还能由番茄根系的THz光谱有效地预测番茄根长表型,该研究为根系表型检测方法提供了理论依据。

       

      Abstract: Here rapid and non-destructive detection of tomato root phenotypes was realized using THz (Terahertz) imaging and ensemble learning. Firstly, three groups of tomato seedlings were grown in three sandy soil substrates watered with different concentrations of nitrogen nutrient solution. 12 groups of tomato root seedlings were collected for THz imaging during 20 days of growth. Secondly, the THz pseudo-color map was imaged on the root system after the optimal reconstruction of time-domain peaks. Color value of pseudo-color map was directly related to the intensity of the root system's THz absorption. The noisy data were removed from the overlapping and main root region in the THz maps of the root system, according to the color value of pseudo-color map. Again, the Rosenfeld refinement was used to obtain the skeleton map of tomato root system. The lengths of the root system pixel points were calculated using the sliding window traversal method. Finally, THz time-domain data and refractive index data were extracted from the effective feature region of the root system. The tomato root length was predicted by the Stacking ensemble model. Among them, the first layer of Stacking ensemble model was integrated with the four base models, namely, GBDT (gradient boosting decision tree), XGBoost (eXtreme gradient boosting), Catboost (categorical boosting), and Adaboost (adaptive boosting). The second layer was employed the linear regression as a meta-learner, in order to prevent the over data fitting. A 5-fold cross-validation was used to train the base models. The extraction of root skeleton showed that the RGB three-channel separation was effectively removed the overlapping roots and the spectral data containing noise, in order to fully display the root framework. Therefore, the calculation error of root length was reduced significantly. Only 4.16% was found in the average relative error of tomato root length value calculated by between the THz false-color image and Image-J software. The linear fitting determination coefficient of two types was 0.967. The THz time domain and refractive index of Stacking model were effectively predicted the root length, indicating the better performance than that of the sub-models. The optimal prediction of tomato root length was obtained using the THz data after WD denoising. The optimal determination coefficient of THz time-domain data prediction was 0.999, and the minimum root-mean-square error was 0.743. The optimal determination coefficient of THz refractive index data prediction was 0.998, and the root-mean-square error was 0.976 cm. The length of tomato root system was accurately calculated to predict the root length phenotypes using THz spectra images. The finding can provide a theoretical basis for the rapid and nondestructive detection of root system phenotypes. Further research can be required to nondestructively detect the complete parameters of root phenotype, and then detect the root internal characteristics using THz spectral data.  

       

    /

    返回文章
    返回