Abstract:
Abstract: Defective green coffee beans significantly affect the quality and pricing of commercial coffee beans. A small number of defective beans (such as black and sour beans) can directly lower the quality of the green coffee beans and the commercial rating. For this reason, the key limiting steps can be the accurate detection and removal of defective green coffee beans before baking. However, the current manual detection and removal of defective green coffee beans cannot fully meet large-scale production during this time, due to the time-consuming, laborious, and subjective. Therefore, it is urgent for the automatic detection of defective green coffee beans with high efficiency, accuracy, and unified quantitative standards in the production and circulation of coffee beans. Alternatively, machine learning has been an active and cutting-edge artificial intelligence research field in recent years. Machine vision can also provide a non-contact automatic detection of the external quality of goods, particularly with real-time, highly efficient, and low cost. Therefore, machine vision has been widely used in agricultural product identification and defect detection. In this study, an accurate and rapid detection was proposed for the defective Arabica green coffee beans using feature combination and Support Vector Machine (SVM). 2873 Arabica green coffee beans were firstly selected as the trial materials via the globally certified coffee quality evaluators. 1 500 normal and 1 373 defective beans were included in this case. The defective beans were divided into 11 defective categories. The images were then preprocessed to accurately extract the value of the features for the higher detection efficiency. Specifically, some techniques were used during preprocessing, such as Grayscale, Gaussian Blur, Gamma transformation, Canny edge detection, Edge enhancement, Binarization, and Morphological processing. Machine vision was used to extract a total of 19 features in the three categories: profile, texture, and color of green coffee beans. The single category features and combinations of different category features were also used in the experiments. The key parameters of the SVM model were determined by the grid search. Pearson correlation coefficient was utilized to select the features. An optimal combination of features was achieved to compare the performance indexes of the SVM model from the K-validation tests. The SVM detection model with the better feature combination was expected to identify more categories of defective beans, particularly for the higher accuracy and the practical application. An experiment was performed on the better feature combination to verify the SVM detection model, compared with the Random Forests, Extremely Randomized Trees and Logistic Regression and Light Gradient Boosting Machine (LightGBM), Extreme Gradient Boosting (XGBoost), and Categorical Boosting (CatBoost) algorithms. Experimental results showed that the better feature combination was achieved in the three categories, consisting of all profile, color, and texture features. The average accuracy, precision, recall, and F1-Score of the SVM detection model with the better feature combination were 84.9%, 85.8%, 82.3%, and 84.0%, respectively. Outstandingly, the SVM detection model with the better feature combination performed better than two category feature combinations, and significantly better than the single category ones. The average accuracy and F1-Score of the SVM detection model were improved by 4.7 and 4.8, 3.4 and 4.0, 5.6 and 7.2, 3.0, and 3.0, 3.5 and 4.2, 2.6 and 2.6 percentage points, respectively, compared with the Random Forests, Extremely Randomized Trees and Logistic Regression, LightGBM, XGBoost, and CatBoost. The SVM detection model of defective Arabica green coffee beans using three category feature combinations can be expected to detect much more defect categories. The improved model with the higher detecting accuracy can be applied to the intelligent sorting equipment of Arabica green coffee beans.