Abstract:
Net ecosystem exchange (NEE) measurement is very critical to understanding carbon flux in ecosystems. But some gaps are still common in data collection, due to the harsh weather or sensor malfunction. Traditional interpolation can struggle with the long-term missing data, leading to inaccuracies in carbon flux analysis. Long-term missing data can also occur in NEE measurements. In this study, the Adapter-Reverseformer (ARformer) model was proposed to enhance the accuracy of NEE gap-filling, especially for extended periods of data loss. A multi-layer perceptron (MLP) was integrated with the Reverseformer module. The longer data gaps were effectively utilized to leverage both environmental data and the temporal patterns in the NEE. A novel system was established to focus on the non-linear relationships between NEE and environmental factors. The time-dependent trend was characterized by NEE data. The obtained model was tested using the FLUXNET 2015 dataset. Half-hourly carbon flux data was collected from 65 sites across 10 types of land use. Five artificial gap scenarios were generated by randomly removing data for continuous periods of 1, 7, 15, 30, and 90 days. The performance of ARformer was compared with marginal distribution sampling (MDS), random forest (RF), and three advanced deep learning models: DLinear, PatchTST, and iTransformer. The results demonstrated that the ARformer model consistently outperformed the baseline, especially when dealing with long-term missing data. Specifically, the performance of RF decreased significantly, when the missing data spanned 90 days, and MDS failed to reasonably estimate the model. In contrast, the ARformer model maintained a high accuracy, with
R2 values ranging from 0.762 to 0.913. The root mean square error (RMSE) ranged between 0.668 and 2.724 μmol/(m
2·s), the mean absolute error (MAE) ranged from 0.410 to 1.751 μmol/(m
2·s), and bias values remained between -0.024 and 0.067 μmol/(m
2·s). The ARformer model demonstrated superior performance across different land-use types, including closed shrublands, deciduous broadleaf forests, evergreen broadleaf forests, evergreen needle-leaf forests, and mixed forests. As such, the ARformer model was used to more effectively capture these vegetation types with the complex relationships between NEE and environmental drivers. Furthermore, it was observed that the time-series deep learning models in general provided the better interpolation for the long-term missing NEE data, with ARformer leading in accuracy. In conclusion, deep learning models, particularly the ARformer model, were highly effective in filling the gaps in NEE data for the various ecosystems. The ARformer model was recommended when the data gaps were extended beyond 30 days. The accuracy of interpolation was also attributed to the temporal dependencies and the relationship between NEE and environmental factors. More reliable NEE data was then obtained to clarify the carbon flux dynamics across different ecosystems. Thus, the ARformer model was represented for the long-term data gaps in NEE measurements.