Abstract:
This study aims to rapidly and accurately identify the origin of spearmint. A feature wavelength extraction was proposed to combine the sequential fusion of Kernel Principal Component Analysis (KPCA) with Kernel Locality Preserving Projections (KLPP) and Wilks' Λ statistics. Effective discrimination was realized among the five origins of spearmint. Firstly, the three-dimensional fluorescence data was collected from the 300 spearmint samples in five locations. The interferences of Rayleigh and Raman scattering were removed from the original spectra by triangular interpolation. Savitzky-Golay (SG) was used to smooth and then eliminate the noise. Secondly, the preprocessed three-dimensional fluorescence spectrum matrix was merged into the data matrix, according to the emission wavelength. KPCA, KPCA + KLPP, KPCA + Wilks Λ statistics, and KPCA + KLPP + Wilks Λ statistics were used to extract the feature wavelengths of the three-dimensional fluorescence spectrum matrix after preprocessing. Among them, KPCA and KPCA + Wilks Λ statistics were used to extract 40 wavelengths of feature emission and eight wavelengths of feature excitation, respectively. KPCA+KLPP and KPCA + KLPP + Wilks Λ statistics were to extract 50 wavelengths of feature emission and 10 wavelengths of feature excitation, respectively. Thirdly, the spectral values of feature emission wavelength were converted into row vectors, according to the wavelength order of feature excitation. Four 300-row spectral matrices of feature wavelength were then obtained from the 300 samples. Furthermore, Fisher discriminant analysis (FDA) was applied to the spectral matrix of feature wavelengths for data separable fusion for the separable fisher discriminant (FD) variables. The first four FD variables were selected with a cumulative discriminant ability of 99% as the input vectors of the discriminant model for the spearmint origins. Finally, the support vector machine (SVM) was used to analyze the four FD variables. The FDA + SVM identification model showed that there was a four-feature wavelength extraction. The correct rates for the training set were 92.00%, 97.78%, 95.11%, and 100%, respectively, and the correct rates for the test set were 92.00%, 96.00%, 94.67%, and 100%, respectively. The results show that both KLPP and Wilks Λ statistics improved the performance of KPCA. The sequential fusion significantly improved the overall feature extraction, in order to better remove the redundant information of spectral data for the high correct of identification. Therefore, the sequential fusion of KPCA + KLPP + Wilks Λ statistics + FDA + SVM can be expected to effectively identify the origin of spearmint. The extraction of feature wavelength with sequential fusion of KPCA + KLPP + Wilks Λ statistics can effectively reduce the redundancy of three-dimensional fluorescence spectral data for the informative features of the original fluorescence data. Better identification was achieved in the five kinds of spearmint origins. In addition, these research findings can lay a foundation for the subsequent quantitative analysis of the important components in spearmint by three-dimensional fluorescence spectroscopy.