基于特征组合与SVM的小粒种咖啡缺陷生豆检测

赵玉清; 杨慧丽; 张悦; 杨颜凯; 杨毅; 赛敏

doi:10.11975/j.issn.1002-6819.2022.14.033

摘要: 缺陷生咖啡豆显著影响商品咖啡豆品质及定价，其分选剔除是咖啡豆烘焙前的重要工作环节。目前缺陷豆的检测、分选及剔除主要由人工操作完成，耗时、费力且主观性大。该研究采用机器视觉技术提取咖啡豆轮廓、颜色和纹理3类特征，使用单一类别特征和不同类别特征进行组合，运用网格搜索确定支持向量机（Support Vector Machine，SVM）分类模型参数，通过k折交叉验证试验对比SVM模型性能，运用皮尔逊相关系数进行特征筛选，找到检测缺陷生咖啡豆的较优特征组合。为说明SVM检测模型的有效性，选用随机森林（Random Forests，RF）、极端随机树（Extremely Randomized Trees，ERT）、逻辑回归（Logistic Regression，LR）、LightGBM、XGBoost和CatBoost算法进行较优特征组合的对比试验。结果表明：包括轮廓、颜色和纹理3类14个特征的组合是较优特征组合，其SVM检测模型的平均准确率、平均精度、平均召回率、平均F1值分别为84.9%、85.8%、82.3%、84.0%，效果均明显优于2类特征组合和单一类别特征的检测模型，SVM检测模型的准确率和F1值相比随机森林、极端随机树、逻辑回归、LightGBM、XGBoost和CatBoost分别提高4.7和4.8，3.4和4.0，5.6和7.2，3.0和3.0，3.5和4.2，2.6和2.6个百分点。较优特征组合的SVM缺陷生咖啡豆检测模型检测缺陷类型较全面，识别准确率高，可实际应用于小粒种生咖啡豆智能化分选装备。

Abstract: Abstract: Defective green coffee beans significantly affect the quality and pricing of commercial coffee beans. A small number of defective beans (such as black and sour beans) can directly lower the quality of the green coffee beans and the commercial rating. For this reason, the key limiting steps can be the accurate detection and removal of defective green coffee beans before baking. However, the current manual detection and removal of defective green coffee beans cannot fully meet large-scale production during this time, due to the time-consuming, laborious, and subjective. Therefore, it is urgent for the automatic detection of defective green coffee beans with high efficiency, accuracy, and unified quantitative standards in the production and circulation of coffee beans. Alternatively, machine learning has been an active and cutting-edge artificial intelligence research field in recent years. Machine vision can also provide a non-contact automatic detection of the external quality of goods, particularly with real-time, highly efficient, and low cost. Therefore, machine vision has been widely used in agricultural product identification and defect detection. In this study, an accurate and rapid detection was proposed for the defective Arabica green coffee beans using feature combination and Support Vector Machine (SVM). 2873 Arabica green coffee beans were firstly selected as the trial materials via the globally certified coffee quality evaluators. 1 500 normal and 1 373 defective beans were included in this case. The defective beans were divided into 11 defective categories. The images were then preprocessed to accurately extract the value of the features for the higher detection efficiency. Specifically, some techniques were used during preprocessing, such as Grayscale, Gaussian Blur, Gamma transformation, Canny edge detection, Edge enhancement, Binarization, and Morphological processing. Machine vision was used to extract a total of 19 features in the three categories: profile, texture, and color of green coffee beans. The single category features and combinations of different category features were also used in the experiments. The key parameters of the SVM model were determined by the grid search. Pearson correlation coefficient was utilized to select the features. An optimal combination of features was achieved to compare the performance indexes of the SVM model from the K-validation tests. The SVM detection model with the better feature combination was expected to identify more categories of defective beans, particularly for the higher accuracy and the practical application. An experiment was performed on the better feature combination to verify the SVM detection model, compared with the Random Forests, Extremely Randomized Trees and Logistic Regression and Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost) algorithms. Experimental results showed that the better feature combination was achieved in the three categories, consisting of all profile, color, and texture features. The average accuracy, precision, recall, and F1-Score of the SVM detection model with the better feature combination were 84.9%, 85.8%, 82.3%, and 84.0%, respectively. Outstandingly, the SVM detection model with the better feature combination performed better than two category feature combinations, and significantly better than the single category ones. The average accuracy and F1-Score of the SVM detection model were improved by 4.7 and 4.8, 3.4 and 4.0, 5.6 and 7.2, 3.0, and 3.0, 3.5 and 4.2, 2.6 and 2.6 percentage points, respectively, compared with the Random Forests, Extremely Randomized Trees and Logistic Regression, LightGBM, XGBoost, and CatBoost. The SVM detection model of defective Arabica green coffee beans using three category feature combinations can be expected to detect much more defect categories. The improved model with the higher detecting accuracy can be applied to the intelligent sorting equipment of Arabica green coffee beans.

基于特征组合与SVM的小粒种咖啡缺陷生豆检测

Detection of defective Arabica green coffee beans based on feature combination and SVM