Abstract:
Soluble solid content (SSC) has been widely used to realize the prediction of apple grades in recent years. It is very necessary to establish a non-destructive prediction model for the mapping from the near-infrared spectra to the SSC. However, the single classification model has seriously limited the prediction accuracy and application, particularly with the hard partition of the instance space. In this study, an evidence theory-based multi-model fusion was proposed to deal with this issue. The evidence theory provides a flexible framework to represent and reason with uncertainty. After that, the evidence theory was adopted to aggregate the predictions of the two models. Moreover, the mass function generation was also treated for the fusion model. The classification knowledge was then better provided by the SSC, Therefore, a much more accurate classification was achieved using the improved model. Some processes were firstly carried out to improve the prediction performance, including the information collection, data pre-processing, and spectral feature selection. Specifically, the near-infrared spectra of apple instances were collected by the self-developed WY-6100 fruit online nondestructive testing system. Meanwhile, the SSC was measured using physicochemical technologies. 439 Red Fuji apple instances were collected from Yantai City, Shandong Province, China. The training and test set were then randomly divided by the 7:3 ratio. Specifically, the 307 randomly-selected instances were used for the model training, whereas, the rest 132 instances were used for the performance testing. A variety of procedures were selected to pre-treat the near-infrared spectra, in order to achieve the inputs of instances. Some anomalous instances were also eliminated using Principal Component Analysis-Mahalanobis distance (PCA-MD). A Savitzky-Golay convolutional smoothing filtering was used to remove the noise caused by the equipment, environment, and external factors. The degree of baseline drift was reduced by the standard normal variable transformations. The characteristic wavelength was then extracted from the spectrum by the genetic algorithm (GA), in order to preserve the most useful information. Partial least squares (PLS) and Extreme learning machine (ELM) models were also established in this paper, with classification accuracy of 93.80% and 92.25% respectively, as shown by the experimental data. Two prediction models were fused by the Evidence theory. The uncertainty of each prediction model needed to be quantified with a mass function, particularly for the higher classification performance of the fusion model. A triangular mass function generation was proposed to balance the distance between the predicted value of SSC and the classification boundary. The misclassifications were attributed to the apple instances, where the predicted values of SSC fell within the area near the boundary of apple classification. Thus, a novel strategy was used to assign the value of masses into a precise class, where this class was set to contain the adjacent one. Once the function was generated, the mass functions of ELM and PLS models were combined to obtain the fusion prediction using Dempster's combination. The focal element with the maximum mass value in the combined mass function was selected as the final decision of apple grades. Finally, the experimental results showed that the triangular mass function generation was more reasonable than that using the hard partition. The classification accuracy of the multi-model based fusion reached 95.35%. In summary, the mass function generation was modified to more precisely depict the Evidence theory-based fusion model for the uncertain classification information. Therefore, a much more accurate and intuitive classification was achieved during this time. Moreover, the improvement can also be applied to the fusion of other prediction models. The better applicability was also obtained without the hard partition, compared with the single prediction model.