Abstract:
It is very important and difficult to select appropriate samples for building a Partial Least Squares (PLS) calibration model of Near Infrared Spectroscopy (NIR).?A sample selection method named as SSGA (Sample Selection by Genetic Algorithms) once was presented by the author. A new algorithm named as PCA-SSGA using the score matrix of Principal Component Analysis (PCA) to replace the raw spectrum matrix was proposed to solve the problem that the running time was too long when using the raw spectrum matrix to calculate. These issues were discussed in this paper, which included Principal Component Analysis of PCA-SSGA, coding and decoding of chromosome, selection of object function and fitness, selection operator, crossover operator, mutation operator and so on. The PCA-SSGA software system was developed in the environment of Microsoft Visual C++. The optimal sample combination was found through calculating the dry protein content in 131 groups of wheat samples by 39200 evolving generations. As a result, the sample number decreased from 131 to 70, and experiments using Partial Least Square-Leave One Out-Cross Validation(PLS-LOO-CV) showed that the coefficient of determination(R2) increased from 0.9477 to 0.9841, and Root Mean Squared error of Prediction of Cross Validation (RMSPCV) reduced from 0.3938 to 0.1934, respectively. The running time of PCA-SSGA was approximately 1/2193 of that of SSGA comparatively when just calculating one generation, so the running time of the sample optimized selection was greatly shortened, and the efficiency was raised obviously. Using PCA-SSGA, the parameters for GA could be regulated conveniently, and also the sample could be selected automatically. Results showed that PCA-SSGA provided good technical support and raised predicting precision when optimizing the NIR model for agricultural products.