基于随机森林特征优选的雪茄烟叶晾制过程含水率预测

    Prediction of moisture content in cigar tobacco leaves during the drying process based on random forest feature selection

    • 摘要: 针对雪茄烟叶晾制过程含水率人工判断主观性强、准确度低等不足,以及对影响雪茄烟叶晾制过程含水率预测的重要表观特征尚不明确等问题,该研究基于图像特征提取以及机器学习技术实现雪茄烟叶晾制过程含水率的预测。试验以雪茄烟品种“云雪2号”为试验材料,获取晾制过程的烟叶图像的颜色、轮廓、纹理以及部位四类特征并筛选出雪茄烟叶含水率预测的优选图像特征子集。在此基础上,构建了随机森林(random forest, RF)、支持向量机(support vector regression, SVR)与反向传播神经网络(back propagation neural network, BPNN)模型,并利用遗传算法(genetic algorithm, GA)对各模型超参数进行优化,将原始图像特征集与优选图像特征集输入3个机器学习模型,构建出6种模型-特征组合方案,依据晾制时期对原始数据集进行划分,并对测试集进行预测。最终结果显示:GA-SVR模型+优选图像特征子集的组合方案在测试集上表现最优,其决定系数(coefficient of determination,r2)与均方误差(mean square error,MSE)分别为0.980和0.001,且运行时间最短(运行时长为0.128 s)。研究结果可为雪茄烟叶晾制过程智能化控制提供理论依据。

       

      Abstract: Airing process has been one of the most important stages in the production of cigar leaves. Also, the appearance quality can be enhanced to indicate the intrinsic quality. The temperature and humidity can be adjusted inside the drying chamber in real time, according to the moisture content of the leaves for the proper browning. However, the leaf moisture content is often determined by the manual experiences at present, resulting in subjectivity and low accuracy. Alternatively, computer vision can be expected to assess the quality of agricultural products in recent years, due to its simplicity and high flexibility. Additionally, the random forest (RF) model can serve as the bagging-based ensemble machine learning. The high-dimensional data variables can be efficiently handled with high precision, training and prediction speeds. In this study, the prediction models were established for the moisture content of cigar leaves using RF machine learning. "Yunxue-2" variety of cigar tobacco was taken as the research object. Initially, the images of cigar leaves were collected during the airing process. The crucial apparent feature was extracted to determine the moisture content of cigar leaves. The color threshold and OTSU segmentation were combined to obtain the leaf region of interest (ROI). Subsequently, four-dimensional features were extracted, including color, contour, texture, and location. The correlation coefficient analysis was employed to eliminate the highly correlated features within each feature dimension, in order to prevent "dimension explosion." Then, the out-of-bag (OOB) data was used to determine the average decrease in the coefficient of determination (Decr²). The importance of image features was ranked as well. A comparison was conducted on the prediction accuracy and runtime of the RF model under different feature quantities. The optimal subset of image features was selected as the seven image features that are closely related to the moisture content of cigar tobacco leaf. The original and optimal feature subsets were then used to evaluate the RF, support vector regression (SVR), and back propagation neural network (BPNN) models. Genetic algorithm (GA) was utilized to optimize the hyperparameters of each model. Three models were combined with the two sets of image features. Six model-feature combination schemes were then established. Five-fold cross validation was employed to compare the prediction accuracy and generalization. Subsequently, the performance of six schemes was verified on a test dataset during drying. The results demonstrated that the combination of color, contour, texture, and location features of cigar tobacco leaf images effectively characterized the changes in the appearance morphology under moisture loss. The combination of SVR and BPNN with the optimal image feature subset outperformed their combinations with the original one after five-fold cross-validation. While RF exhibited better performance on the original image feature set, leading to avoiding the information redundancy with high-dimensional data. The best performance on the test set was achieved in the combination of the GA-SVR model and optimal image feature subset, with r2 and MSE values of 0.980 and 0.001, respectively, with the shortest runtime (0.128 s). In summary, the image features of cigar tobacco leaf were utilized to accurately predict the moisture content of different parts in the entire drying. The finding can also provide the theoretical basis for the intelligent drying of cigar tobacco leaves.

       

    /

    返回文章
    返回