Multi-temporal remote sensing based crop classification using a hybrid 3D-2D CNN model
-
-
Abstract
Reliable and accurate classification of crop types can greatly contribute to data sources in agricultural monitoring and food security. Remote sensing can be used to rapidly and accurately extract the planting areas and distribution of main crops, thereby optimizing the spatial pattern of crops, grain production, and management. However, it is extremely difficult to identify and then map different types of crops with high accuracy and efficiency, especially for traditional machine learning. The reason is that there are highly complex and heterogeneous spectral data in crop space on time-series remote sensing images. Fortunately, three-dimensional convolution neural networks (3D CNN) are suitable for the spatio-temporal information in the time-series remote sensing imagery. Nevertheless, the high complexity of the 3D CNN model often requires a large number of training samples. In this study, a novel hybrid classification model (called 3D-2D CNN) was proposed to integrate 3D CNN and two-dimensional convolution neural networks (2D CNN) in the trade-off among accuracy, efficiency, and ground sample acquisition. The specific procedure was as follows. The spatio-temporal features were first extracted from the multiple 3D convolutional layers, then the output features were compressed for the spatial feature analysis in the 2D convolutional layer, and finally the high-level maps of features were flattened to predict the category in the fully connected layer. Batch normalization was performed on the input data of each layer to accelerate the network convergence. As such, the complex structure of the original 3D CNN was reduced, while the capacity of 3D-2D CNN remained in spatio-temporal feature extraction. Taking northern California, USA, as the study area, Landsat8 multi-temporal images were utilized as the remote sensing data source in the test to verify the model. Landsat images presented specific characteristics, compared with the natural. The spectral and texture features of the same type varied greatly along with the imaging time and conditions. California agricultural investigation was used as sampling data. Accordingly, the land plots in the study area were randomly divided into a training, validation, and test region, according to 2:2:6 stratification, where the training and validation sample datasets were randomly selected. Since the overflow easily occurred, when the training dataset was limited in actual work, it was necessary for the deep learning model to require a large number of data samples to train a CNN. Correspondingly, two small sample sets of different proportions were randomly selected from the training sample set of 50% and 25% to verify the feasibility of the model. The trained models were then used to predict the test region. The experimental results showed that the overall accuracy (89.38%), macro-average F1 value (84.21%), and Kappa coefficient (0.881) of 3D-2D CNN for 13 crop classifications performed better than other deep learning, including 3D CNN and 2D CNN, as well as traditional machine learning, such as Support Vector Machines (SVM) and Random Forest (RF). It should be mentioned that the proposed 3D-2D CNN also achieved the best performance in the small training set, where the highest recognition rate of classification was obtained, compared with the benchmark models. Meanwhile, the convergence time of 3D-2D CNN was reduced greatly, compared with the 3D CNN, thanks to a significant reduction of parameters. It was found that there was a greater effect of temporal features of crops that were hidden in multi-temporal remote sensing imagery on CNN classification, compared with texture features. Consequently, the highest accuracy and strongest robustness were obtained in the 3D-2D CNN model, due mainly to the comprehensive utilization of spatial-temporal-spectrum features. The finding can provide a highly effective and novel solution to crop classification from multi-temporal remote sensing.
-
-