基于PCA与GA的近红外光谱建模样品选择方法

祝诗平

基于PCA与GA的近红外光谱建模样品选择方法

祝诗平

Sample selection methods for building calibration model of Near Infrared Spectroscopy based on Principal Component Analysis and Genetic Glgorithms

Zhu Shiping

摘要

摘要: 针对在利用遗传算法进行样品选择(SSGA)时，使用原光谱矩阵运算时间非常长的问题，提出了一种使用主成分得分矩阵代替原光谱矩阵进行选样的新算法(PCA-SSGA)。讨论了PCA-SSGA算法的主成分分解，染色体编码与解码，目标函数与适应度函数确定，选择算子、交叉算子、变异算子等。在Visual C++环境中开发了PCA-SSGA软件系统。通过对131份小麦籽粒样品针对其干基蛋白含量进行PCA-SSGA运算，经过39200代进化，最终找出最佳样品组合：样品数目由131减少为70，通过偏最小二乘留一法交叉验证(PLS-LOO-CV)，决定系数(R2)由0.9477增加为0.9841，交叉验证预测均方差(RMSPCV)由0.3938减少为0.1934。从运算时间上看，PCA-SSGA进化一代时间是SSGA的1/2193，整个样品优选过程时间大大缩短，效率得以显著提高。试验结果表明：PCA-SSGA可以方便灵活地调整遗传算法的参数、自动地选择样品，这对优化农产品近红外光谱模型、进一步提高预测精度提供了很好的技术支持。

Abstract: It is very important and difficult to select appropriate samples for building a Partial Least Squares (PLS) calibration model of Near Infrared Spectroscopy (NIR).?A sample selection method named as SSGA (Sample Selection by Genetic Algorithms) once was presented by the author. A new algorithm named as PCA-SSGA using the score matrix of Principal Component Analysis (PCA) to replace the raw spectrum matrix was proposed to solve the problem that the running time was too long when using the raw spectrum matrix to calculate. These issues were discussed in this paper, which included Principal Component Analysis of PCA-SSGA, coding and decoding of chromosome, selection of object function and fitness, selection operator, crossover operator, mutation operator and so on. The PCA-SSGA software system was developed in the environment of Microsoft Visual C++. The optimal sample combination was found through calculating the dry protein content in 131 groups of wheat samples by 39200 evolving generations. As a result, the sample number decreased from 131 to 70, and experiments using Partial Least Square-Leave One Out-Cross Validation(PLS-LOO-CV) showed that the coefficient of determination(R2) increased from 0.9477 to 0.9841, and Root Mean Squared error of Prediction of Cross Validation (RMSPCV) reduced from 0.3938 to 0.1934, respectively. The running time of PCA-SSGA was approximately 1/2193 of that of SSGA comparatively when just calculating one generation, so the running time of the sample optimized selection was greatly shortened, and the efficiency was raised obviously. Using PCA-SSGA, the parameters for GA could be regulated conveniently, and also the sample could be selected automatically. Results showed that PCA-SSGA provided good technical support and raised predicting precision when optimizing the NIR model for agricultural products.

HTML全文

参考文献(0)

施引文献

资源附件(0)