Abstract:
Monitoring of SO2 in the brewing process is the key to the informatization of the wine industry and the guarantee of wine quality. Electronic nose is a new detection instrument that simulates animal olfactory system. It collects the information of tested samples based on gas sensor array, and realizes sample identification and quantitative analysis combined with appropriate machine learning algorithm. To solve the time-consuming and complex problem of traditional detection technology, this study established a method for the detection of SO2 in wine based on electronic nose technology, and tested six wines with different amounts of SO2 addition. Features (16 sensors × 4 features) were extracted from the response curve of the sensor array collected by the electronic nose system to form the original feature set. To improve the detection performance of electronic nose, an optimization algorithm of gas sensor array based on Dynamic Feature Importance - Recursive Sensor Elimination (DFI-RSE) was proposed. The Maximum Information Coefficient (MIC) was taken as the standard to measure the relationship between variables, and Feature Importance (FI), Feature Redundancy (FR), Sensor Importance (SI) and Dynamic Feature Importance (DFI) were defined. The algorithm was designed as a two-step flow. Firstly, sensors with SI greater than 1 were preselected to form a preliminarily optimized array. The features in this array were sorted according to DFI. Based on the correlation between the candidate features and the selected features, the contribution of the candidate features to the target was continuously modified, which is the DFI, hence features with both high importance and low redundancy was selected. On this basis, the Recursive Sensor Elimination (RSE) is proposed to remove sensors with smaller SI in the subset until the coefficient of determination (R2) corresponding to the reserved array is optimal. In order to verify the performance of the array optimization algorithm, Partial Least Squares Regression (PLSR), Multi-Layer Perceptron (MLP), Support Vector Regression (SVR) and Bayesian Ridge Regression (BRR) were used to compare the detection ability of the array before and after optimization. Based on the leave-one-out test, the number of sensors in the optimized array was reduced from 16 to 8 (TGS2612, 4SO2-20, TGS2603, TGS2611-2, TGS2602, TGS2630, TGS2610-2, WSP7110-2), and the number of features was reduced by 59%. The R2 of the PLSR, MLP, SVR and BRR regression models are all better than the original array, the Root Mean Square Error (RMSE) is between 11-12 mg/L, and 0.29, 0.37, 0.28, 0.06 seconds are saved respectively for the calculating time. Further test and verification is performed by using data that has not participated in model training. The R2 of the test set based on the optimized array by PLSR, MLP, SVR and BRR regression models are 0.983 9, 0.987 2, 0.983 0 and 0.984 0 respectively, and the RMSE are 8.68, 7.73, 8.90 and 8.65 mg/L respectively, which are better than or equivalent to the original array. The results show that the sensor array optimization algorithm based on DFI-RSE can effectively improve the detection performance of electronic nose, and the established wine SO2 detection model values practically for the effective monitoring of SO2 in the brewing process.