基于图像特征选择识别田间籽棉品级

    Quality grading of raw cotton based on image feature selection

    • 摘要: 依据中国籽棉品级文字标准,在HSI颜色空间提取了反映籽棉颜色和杂质含量的14个纹理特征以及反映棉瓣大小、结构的16个形状特征,该特征集存在维数灾难,需要进行降维。面向籽棉品级识别的特征选择问题属于非多项式(NP)难题,该文基于交叉验证、混合Filter-Wrapper和启发式搜索提出了一种求解算法。首先,以最优特征组合和浮动搜索为启发式搜索策略,基于10-折交叉验证在每一个训练集上用Filter启发式搜索最优l维特征子集(l=1, 2, 3,…, 30),评价函数为类可分性准则;其次,在10个训练集上用Wrapper从最优l维特征子集中选择最优特征子集的容量(l=1,2,3,…,30),评价函数为Bayes分类器的误分率,10个验证集的平均误分率极小处产生最优特征子集的容量;最后,在最优特征子集容量处验证预测集的平均误分率。结果表明,所选择的10个最优特征子集在预测集上的平均识别率为88.39%,混合Filter-Wrapper和浮动搜索的特征选择算法效率高、效果好。

       

      Abstract: According to the grading standard of raw cotton issued by Chinese government, 14 texture features were extracted in HSI color space to describe their color and impurity content, and 16 shape features were extracted to describe their size and geometric structure, which leads dimensionality reduction for its dimensionality curse. The feature selection problem aiming at quality grading of raw cotton is a NP hard problem. A solution algorithm was proposed based on cross-validation, hybrid Filter-Wrapper and heuristic search. First, the optimal l feature subset was selected on each training set for 10-fold cross-validation by using heuristic search strategy, including optimal scalar feature combination and floating search, and filter with an assessing function of class-separability criterion, l=1, 2, 3, …, 30. Then, the capacity of optimal feature subset was selected from the optimal l (l=1, 2, 3, …, 30) feature subsets on 10 training sets by using wrapper with an assessing function of the error rate of Bayes-classifier, at which the average error rate of 10 optimal feature subsets on 10 corresponding validation sets reached the minimum value. Finally, the average error rate of 10 optimal feature subsets on prediction set was verified at the capacity of optimal feature subset. Experimental result showed that the average classification rate of the 10 optimal feature subsets on prediction set was 88.39%, and the feature selection algorithm for hybrid Filter-Wrapper and floating search had higher efficiency and good effect.

       

    /

    返回文章
    返回