用于碳通量长时间缺失值插补的深度学习模型ARformer

齐建东; 吴鹏; 查天山

doi:10.11975/j.issn.1002-6819.202407071

摘要: 为提高净生态系统交换量（net ecosystem exchange，NEE）在长期缺失下的插补精度，为地球碳循环的后续研究提供关键数据，该研究提出了一种用于NEE长时间缺失值插补的模型Adapter-Reverseformer(ARformer)。首先，为了更好地捕捉NEE数据的时序特征，重新设计和优化了层归一化（layer normalization）、前馈网络（feed-forward network）和自注意力（self-attention）3个模块的功能，模型能够使用更长的回望窗口，从更长的时间序列中捕获时序特征；其次，设计了特征融合模块Attention-MLP，以拟合NEE与环境因子的即时响应关系，提高模型在不同土地利用类型NEE数据上的插补精度。在全球长期通量观测网络（FLUXNET）65个站点、10种土地利用类型的数据上进行试验，结果显示，ARformer模型的R²在0.762～0.913之间，均方根误差（root mean square error，RMSE）、平均绝对误差（mean absolute error, MAE）和偏差（bias）在0.668～2.724、0.410～1.751和−0.024～0.067μmol/(m²·s)之间。在不同缺失长度和土地利用类型下插补精度均高于边际分布采样法（marginal distribution sampling，MDS）、随机森林（random forest，RF）、DLinear、PatchTST和iTransformer。该研究结果可为NEE数据在长时间缺失场景下的插补提供参考。

Abstract: Net ecosystem exchange (NEE) measurement is very critical to understanding carbon flux in ecosystems. But some gaps are still common in data collection, due to the harsh weather or sensor malfunction. Traditional interpolation can struggle with the long-term missing data, leading to inaccuracies in carbon flux analysis. Long-term missing data can also occur in NEE measurements. In this study, the Adapter-Reverseformer (ARformer) model was proposed to enhance the accuracy of NEE gap-filling, especially for extended periods of data loss. A multi-layer perceptron (MLP) was integrated with the Reverseformer module. The longer data gaps were effectively utilized to leverage both environmental data and the temporal patterns in the NEE. A novel system was established to focus on the non-linear relationships between NEE and environmental factors. The time-dependent trend was characterized by NEE data. The obtained model was tested using the FLUXNET 2015 dataset. Half-hourly carbon flux data was collected from 65 sites across 10 types of land use. Five artificial gap scenarios were generated by randomly removing data for continuous periods of 1, 7, 15, 30, and 90 days. The performance of ARformer was compared with marginal distribution sampling (MDS), random forest (RF), and three advanced deep learning models: DLinear, PatchTST, and iTransformer. The results demonstrated that the ARformer model consistently outperformed the baseline, especially when dealing with long-term missing data. Specifically, the performance of RF decreased significantly, when the missing data spanned 90 days, and MDS failed to reasonably estimate the model. In contrast, the ARformer model maintained a high accuracy, with R² values ranging from 0.762 to 0.913. The root mean square error (RMSE) ranged between 0.668 and 2.724 μmol/(m²·s), the mean absolute error (MAE) ranged from 0.410 to 1.751 μmol/(m²·s), and bias values remained between −0.024 and 0.067 μmol/(m²·s). The ARformer model demonstrated superior performance across different land-use types, including closed shrublands, deciduous broadleaf forests, evergreen broadleaf forests, evergreen needle-leaf forests, and mixed forests. As such, the ARformer model was used to more effectively capture these vegetation types with the complex relationships between NEE and environmental drivers. Furthermore, it was observed that the time-series deep learning models in general provided the better interpolation for the long-term missing NEE data, with ARformer leading in accuracy. In conclusion, deep learning models, particularly the ARformer model, were highly effective in filling the gaps in NEE data for the various ecosystems. The ARformer model was recommended when the data gaps were extended beyond 30 days. The accuracy of interpolation was also attributed to the temporal dependencies and the relationship between NEE and environmental factors. More reliable NEE data was then obtained to clarify the carbon flux dynamics across different ecosystems. Thus, the ARformer model was represented for the long-term data gaps in NEE measurements.

用于碳通量长时间缺失值插补的深度学习模型ARformer

Constructing ARformer: A deep learning model for the long-term gap-filling of carbon flux data