似然函数形式对水稻物候期模型品种参数校正的影响

    Effects of likelihood function form on variety parameters during rice phenological model correction

    • 摘要: 针对传统高斯正态似然函数(Gaussian likelihood function,GLF)在观测数据存在测量误差和模型算法结构复杂时无法描述模型残差异方差特点,造成马尔科夫链蒙特卡洛(Markov chain Monte Carlo,MCMC)算法进行模型参数校正时结果存在偏差的问题,通过引入变异系数(coefficient of variation,CV)变换的高斯似然函数(GLF with CV transformation,GLF-CV)和BC(Box-Cox)变换的高斯似然函数(GLF with BC transformation,GLF-BC)对观测数据和模型结构造成的异方差进行特征描述,并比较了参数校正效果及模型不确定度(uncertainty ratio,UR)。以2004—2009年高要雪花粘(早熟)、2001—2004年兴化武育粳3号(中熟)、1991—2004年六安汕优63号(晚熟)3个生态点的田间栽培试验数据为基础,RiceGrow和Oryza2000物候期模型为对象,利用仿射不变马尔科夫链蒙特卡洛集成采样(ensemble sampling for affine-invariant MCMC,EMCEE)算法实现模型参数校正,并比较了GLF-CV、GLF-BC、GLF对校正结果的影响。研究表明:1)3种似然函数下,RiceGrow和Oryza2000物候期模型预测均方根误差(root mean square error,RMSE)范围分别2.66~4.54d、2.30~4.41 d,表明3种似然函数用于参数校正均有效果。2)在RiceGrow物候期模型中,3个水稻品种参数相对均方根偏差(relative root mean square deviation,RRMSD)和模型预测RMSE均是GLF-BC最小,在GLF-BC下模型预测RMSE比GLF-CV小0.09、0.07、0.80 d,比GLF小1.21、0.20、0.07 d,表明GLF-BC对RiceGrow物候期模型具有良好的适应性。3)在Oryza2000物候期模型中,雪花粘、武育粳3号、汕优63号3个水稻品种的模型预测RMSE最小的是GLF、GLF-BC和GLF-CV,分别为2.30、4.17、3.50 d。可以看出LF的选择与模型残差异方差的主要来源有关,当主要来源为观测数据时,GLF-CV好于其他;当主要来源为模型结构本身时,GLF-BC好于其他;当模型残差的异方差性较小时,可使用GLF。

       

      Abstract: Biased calibration can often be found in the model parameter calibration using the traditional Gaussian likelihood function (GLF). Particularly, there are measurement errors in the observation data and complex algorithmic structures in the model. The GLF is also limited to capturing the heteroscedastic characteristics of model residuals, leading to the inaccurate calibration in Markov chain Monte Carlo (MCMC). In this study, two modified likelihood functions were introduced: the Gaussian likelihood function with coefficient of variation transformation (GLF-CV), and the Gaussian likelihood function with Box-Cox transformation (GLF-BC). These modified likelihood functions were then incorporated the heteroscedasticity induced by both the observation data and the model structure. A more accurate representation was obtained for the complex characteristics of the model residuals. The performance of the modified likelihood functions was also evaluated to conduct the parameter calibration and uncertainty analysis using the uncertainty ratio (UR). The field experimental data was collected from three ecological sites: Gaoyao Xuehuanian (early-maturing) from 2004 to 2009, Xinghua Wuyujing3 (mid-maturing) from 2001 to 2004, and Liu'an Shanyou63 (late-maturing) from 1991 to 2004. The RiceGrow and Oryza2000 phenology models were selected to compare the effects of GLF-CV, GLF-BC, and GLF on the calibration. Model parameter calibration was also conducted using the ensemble sampling for affine-invariant MCMC (EMCEE). The research findings were summarized as follows: 1) The UR ranges were 2.66-4.54, and 2.30-4.41 days, respectively, for the RiceGrow and Oryza2000 phenology models under all three likelihood functions. There were some effects of all three likelihood functions on the parameter calibration, leading to reasonable UR values within the specified ranges. 2) The GLF-BC of the RiceGrow model was achieved in the smallest UR values for the Gaoyao Xuehuanian, Xinghua Wuyujing3, and Liu'an Shanyou63 varieties. Specifically, the predicted UR values with the GLF-BC were 0.09, 0.07, and 0.80 days smaller than those with the GLF, while 1.21, 0.20, and 0.07 days smaller than those with the GLF-CV. The superior adaptability of GLF-BC was then achieved in the RiceGrow phenology model, indicating the better-improved calibration. 3) The likelihood function with the smallest UR varied greatly among the different rice varieties in the Oryza2000 phenology model. The GLF obtained the smallest UR for the Gaoyao Xuehuanian variety, while the GLF-BC and GLF-CV were the smallest UR for the Xinghua Wuyujing3 and Liu'an Shanyou 63 rice varieties, with values of 2.30, 4.17, and 3.50 days, respectively. Consequently, the likelihood function depended mainly on the primary source of heteroscedasticity in the model residuals. The optimal model was achieved in the GLF-CV, when the main source was the observation data, whereas, the GLF-BC was preferred, when the main source was the model structure. The GLF was selected as the suitable likelihood function with the minimal heteroscedasticity of model residuals. In conclusion, the heteroscedastic characteristics of model residuals were captured from the measurement errors and complex algorithmic structures. Two modified likelihood functions (GLF-CV and GLF-BC) can also provide more accurate descriptions of heteroscedasticity induced by both observation data and model structure. The comparative analysis of parameter calibration and model uncertainty with the UR demonstrated that the modified likelihood functions can be expected to effectively improve the calibration. The likelihood function also depended on the main source of heteroscedasticity, with the preferable GLF-CV under the primary source of observation data, and the preferred GLF-BC under the primary source of the model structure. The GLF was suitable for the minimal heteroscedasticity of model residuals.

       

    /

    返回文章
    返回