Abstract:
Abstract: In order to enhance discrimination ability of electronic nose (E-nose) for six kinds of vinegars, a multi-features representation strategy for E-nose data of vinegar samples based on multivariable analysis is proposed in this paper. Firstly, initial feature matrix, which was composed of six kinds of features extracted from E-nose data, was dealt with loadings analysis so as to optimize gas sensors, and then kept 12 gas sensors for next analysis. For eliminating correlation between response signals of gas sensors, feature matrix of 12 sensors array was carried out with principal component analysis (PCA), and generated principal component (PC) variables (PC variable(s) for short) for constructing Wilks Λ-statistic. Subsequently, Wilks Λ value of each PC variable was obtained. As we all known, the smaller the value of Λ, the higher separation ability of the calculated PC variables; in other words, some PC variables corresponding to larger Λ values should be eliminated due to their lower separation ability. Generally speaking, Wilks Λ-statistic was adopted to get principal component sub-matrix that was beneficial to identification of vinegar samples. On the basis of obtaining principal component sub-matrix, considering that each PC variable was a linear combination of all original feature variables, as for each original feature variable, the contribution quantity of original feature variable to all obtained PC variables may be as choosing criterion. So taking each original feature variable as an object, and the sum of absolute values of combination coefficients corresponding to each original feature variables would be calculated according to obtained principal component sub-matrix, and the sums corresponding to different original feature variables were sorted from large to small, and the greater the sum, the higher possibility for the corresponding original feature variables to be chosen. Meanwhile, according to different designation values for the sum of coefficient absolute values of each original feature variable to all picked PC variables, different original feature variable sets could be formed. With the help of Fisher discriminant analysis (FDA), after correct discrimination rates of different original feature variable sets were calculated and compared, optimal original feature variables set was determined. The results showed that representation feature variables for gas sensors were extremely different from initial ones. In view of the proposed feature selection strategy, 48 features were selected to characterize E-nose signals of vinegar samples at final. In order to verify and explain the application effect of feature selection strategy and the rationality of selected 48 characteristic parameters for vinegar samples, FDA and back propagation neural network (BPNN) were employed to discriminate six kinds of vinegar samples, and correct discrimination rates of FDA and BPNN were over 93% and 98% in training sets, respectively; corresponding test sets were also over 90% and 93%, respectively. In addition, Bhattacharyya distance was also employed further to explain the separability between six kinds of vinegar samples and illustrate the reliability of FDA and BPNN results. As a result, the proposed feature selection strategy is effective and feasible, which provides a new idea for multi-features representation of E-nose data.