全光谱匹配算法在苹果分类识别中的应用

周万怀; 谢丽娟; 应义斌

doi:10.3969/j.issn.1002-6819.2013.19.035

摘要: 为进一步提高光谱匹配准确率，该研究对杰卡德相似性原理（jaccard similarity coefficient，JSC）进行改进并提出新的光谱相似度的计算方法。同时，对光谱进行一阶导数二值化，以保证改进后的算法适用于光谱的匹配。此外，对不同光谱分辨率对该算法的影响进行了研究。试验样本选用阿克苏红富士、山东红将军、陕西红富士和陕西金帅4个品种的苹果进行算法能力验证，在2～128 cm-1之间，共7个不同水平的分辨率上进行比较。试验结果表明：该研究提出的算法正确分类识别率为94.5%；研究提出算法在8或16cm-1分辨率水平下取得最佳分类识别结果。因此，基于JSC的全谱匹配算法在光谱数据库系统中的应用将有助于光谱查询精度的提高。

Abstract: Abstract: A spectral database system (SDBS) can improve the usage efficiency and expand the application scope of spectra and their feature information, mainly referring to spectral peak information. The spectral matching algorithm (SMA) plays a decisive role in SDBS for the SMA which determines the similarity between the sample spectrum and reference spectrum, and further, decides the accuracy of database query. Traditional full spectral matching algorithms compute the distance or similarity among different spectra with spectral absorbance or reflectance directly, so they are vulnerable to noise. For a higher accuracy of a full spectral matching algorithm, this paper presents a full spectral matching algorithm based on a Jaccard similarity coefficient (JSC). JSC is a useful measure of the overlap that A and B have the same attributes which should either be 0 or 1. In order to satisfy the requirement of JSC, the first derivate of raw spectra should be computed, and a transformation process would transform negative values (of the first-order derivate) to 0 and positive values to 1, where 0 means the raw spectrum is descending in the according small region while 1 means the raw spectrum is ascending in the according small region. Different from common full spectral matching algorithms, the new proposed one calculates the similarity between different spectra with a spectral waveform but not with the absorbance or reflectance directly. Therefore, the influence of absolute absorbance or reflectance intensity was reduced and the influence of the similarity of the spectral waveform was enhanced. This mean that what substances are contained in the sample is more important than the contents of these substances. In this way, the influence of noise and the differences caused by different spectral collecting areas of solid samples was reduced to a quite low level. Comparisons among common full spectral matching algorithms and our new proposed algorithm have been carried out, and the results showed that 94.5% of the samples were correctly classified by our new proposed algorithm (4 varieties of apples, each number was 100) and the second highest classification accuracy was 73% obtained with a Euclidean distance (ED) method. This conclusion indicated that the proposed algorithm was more suitable for the classification of different kinds of samples and it would be helpful to reduce the database query scope, shorten the time consuming, and improve the accuracy of the data query. From the principle of this algorithm, it was obvious that it must be affected by the interval among the data points of the spectra. Thus, the effect of spectral resolution on the proposed algorithm was studied. In total, seven different resolutions (2~128 cm-1) were tested. It is a pity that our new proposed algorithm is sensitive to spectral resolution and the optimal resolution for this algorithm approximately is 8 or 16 cm-1 for apples' near infrared spectra. Therefore, the optimal resolution of this algorithm should be determined at first when it is used for the spectral matching of new objects. In short, our proposed spectral matching algorithm can classify NIR spectra of solid samples with higher accuracy and the application of this algorithm will be helpful in improving the accuracy of a spectral database query.

全光谱匹配算法在苹果分类识别中的应用

Application of full spectral matching algorithm in apple classification