Prediction of moisture content in cigar tobacco leaves during the drying process based on random forest feature selection
-
-
Abstract
Airing process has been one of the most important stages in the production of cigar leaves. Also, the appearance quality can be enhanced to indicate the intrinsic quality. The temperature and humidity can be adjusted inside the drying chamber in real time, according to the moisture content of the leaves for the proper browning. However, the leaf moisture content is often determined by the manual experiences at present, resulting in subjectivity and low accuracy. Alternatively, computer vision can be expected to assess the quality of agricultural products in recent years, due to its simplicity and high flexibility. Additionally, the random forest (RF) model can serve as the bagging-based ensemble machine learning. The high-dimensional data variables can be efficiently handled with high precision, training and prediction speeds. In this study, the prediction models were established for the moisture content of cigar leaves using RF machine learning. "Yunxue-2" variety of cigar tobacco was taken as the research object. Initially, the images of cigar leaves were collected during the airing process. The crucial apparent feature was extracted to determine the moisture content of cigar leaves. The color threshold and OTSU segmentation were combined to obtain the leaf region of interest (ROI). Subsequently, four-dimensional features were extracted, including color, contour, texture, and location. The correlation coefficient analysis was employed to eliminate the highly correlated features within each feature dimension, in order to prevent "dimension explosion." Then, the out-of-bag (OOB) data was used to determine the average decrease in the coefficient of determination (Decr²). The importance of image features was ranked as well. A comparison was conducted on the prediction accuracy and runtime of the RF model under different feature quantities. The optimal subset of image features was selected as the seven image features that are closely related to the moisture content of cigar tobacco leaf. The original and optimal feature subsets were then used to evaluate the RF, support vector regression (SVR), and back propagation neural network (BPNN) models. Genetic algorithm (GA) was utilized to optimize the hyperparameters of each model. Three models were combined with the two sets of image features. Six model-feature combination schemes were then established. Five-fold cross validation was employed to compare the prediction accuracy and generalization. Subsequently, the performance of six schemes was verified on a test dataset during drying. The results demonstrated that the combination of color, contour, texture, and location features of cigar tobacco leaf images effectively characterized the changes in the appearance morphology under moisture loss. The combination of SVR and BPNN with the optimal image feature subset outperformed their combinations with the original one after five-fold cross-validation. While RF exhibited better performance on the original image feature set, leading to avoiding the information redundancy with high-dimensional data. The best performance on the test set was achieved in the combination of the GA-SVR model and optimal image feature subset, with r2 and MSE values of 0.980 and 0.001, respectively, with the shortest runtime (0.128 s). In summary, the image features of cigar tobacco leaf were utilized to accurately predict the moisture content of different parts in the entire drying. The finding can also provide the theoretical basis for the intelligent drying of cigar tobacco leaves.
-
-