基于近红外光谱的未知类别样品聚类方法

    Clustering method of unknown sort samples based on near infrared spectroscopy

    • 摘要: 在近红外光谱分析中,针对大量样品参与建模时,需将样品集进行分类,以减少样品光谱变异范围,提高近红外模型的预测准确度。本文以来自中国各地的222份小麦样品为例,在未知样品组分含量和类别归属的前提下,结合样品的近红外光谱信息,采用基于试探的未知类别的样品聚类方法(最邻近规则法和最大最小距离算法)对样品集分类。其中,最邻近规则法在阈值T为1.9时,最大最小距离算法在阈值为样品间的最大距离的1/2时,分类建模指标均优于未分类所建模型。从分类实现过程和结果可以看出:基于试探的未知类别的样品聚类方法中无需多次训练,且对未知类别的样品集无需事先确定分类数目,但需要确定分类阈值,阈值不同,则分类结果会随之改变。研究为近红外建模过程中未知类别样品的分类提供了一种参考方法。

       

      Abstract: In the case of a large numbers of samples participating in the modeling, classification modeling on the sample set could reduce the range of variation of the sample, and improve the prediction capability of the model. In this paper, 222 wheat samples across China were used as modeling sample. Combined with near-infrared spectral information of samples, the sample set was classified by probing-based unknown classes samples clustering methods (nearest neighbor approximation and maximum-minimum distance algorithm),under the condition that the component content of samples and type of ownership were unknown. When the threshold of the nearest neighbor approximation algorithm was 1.9, and the threshold of the maximum-minimum distance algorithm was half of the maximum distance, the classification model indicators were better than unclassified model. The classification process and results indicated that many times of training was not a necessity with the probing-based method of sample clustering with unknown categories, but the classification threshold need to be determined, the classification changing accordingly with different threshold values. This study provided a reference method for unknown category sample classification during the near-infrared modeling process.

       

    /

    返回文章
    返回