基于深度度量学习的茶叶相似性评价方法

宋彦; 赵磊; 宁井铭; 戴前颖; 程福寿

doi:10.11975/j.issn.1002-6819.202207134

摘要: 在眉茶拼配过程中，为了客观定量的评价试拼小样与标准样之间的相似性，该研究提出了一种基于深度度量学习的相似性评价方法，采用7种等级的眉茶标准样作为训练集，并在标准样中加入不同含量半成品茶构建具有不同相似性的测试集。采集茶样的高光谱数据并获取光谱特征与图像特征，分别以光谱数据、图像数据、图谱融合数据3种数据类型作为模型的输入。为了构建距离特征空间，该研究提出了基于三元组损失的深度特征提取网络，并设计了Center Anchor Triplet Loss损失函数，通过样本在特征空间的距离，表征相似程度，达到定性判断相似性和定量度量相似度的目的。结果表明：图谱融合数据结合Center Anchor Triplet Loss的方法精度最高，相似性判断准确率为98.89%，相似度度量准确率为100%。该研究采用未经训练的独立样本评价模型，可以获得较好的结果，说明算法具有较好的泛化能力。研究结果为眉茶的相似性评价提供了理论依据。

Abstract: Abstract: Blending is often used to stabilize the quality of tea in refined processing. The quality standardization of export Mee tea can be realized to fully meet the requirements of national standards. Experts first make a small sample in the process of blending, according to the historical plan. The original proportion can also be adjusted, as the quality of tea materials varies each year. Then, the sensory evaluation can be used to judge the similarity between the sample and the standard sample. However, the highly subjective and lacking quantitative procedure is not conducive to the realization of tea quality standardization. In this study, a systematic evaluation was proposed to objectively and quantitatively evaluate the similarity among tea samples using Deep Metric Learning (DML). Standard samples of 7 grades of Mee tea were used as the training sets. The semi-finished tea with different contents was added to the standard samples to construct test sets with different similarities. Semi-finished teas included Fujian, Zhejiang Xikou tea and Huangshan fourth grade broken tea. Hyperspectral data from tea samples was then collected. Region of Interest (ROI) was selected to obtain the spectral features of the samples. The tea spectra were also preprocessed by the Multivariate Scattering Correction (MSC). The principal component images with the highest contribution rate were selected using Principal Component Analysis (PCA). The texture features were obtained as the image features of hyperspectral images by the Gray-level co-occurrence Matrix (GLCM). Spectral, image, and atlas fusion data were used as the input of the model. A deep feature extraction Network was also constructed to obtain the distance feature space using triple Loss. Convolutional Neural Network (CNN) was selected as the feature extraction Network. The Center Anchor Triplet Loss function was finally proposed. A mapping was learned from the original to the distance feature space. The same kinds of tea in the distance were made in the distance of the feature space as close as possible, whereas, the distance of the different types of tea in the distance feature space was away as far as possible. The training set data was separated for each type of standard sample feature mean value as the benchmark after the model generated the standard sample feature under the distance feature space. The deep metric learning model was transferred to the test set. The Euclidean distance was also calculated between the test set features and each benchmark feature. The similarity between tea samples was characterized by the distance between the samples in the feature space. As such, the qualitative judgment of similarity was achieved in the quantitative measurement of similarity. In addition to input data type and loss function, the output characteristic dimension of the network also dominated the accuracy of distance measurement. Therefore, the fine-tuned network structure was obtained, where the output dimension of the model was verified for 10 to 100 and traversed with 10 steps. The results show that: when the model output dimension was 40, the fusion data combined with Center Anchor Triplet Loss presented the highest accuracy, which was superior to others. Specifically, the accuracy of similarity judgment was 98.89%, and the accuracy of similarity measurement was 100%. The evaluation model of the untrained independent sample was used to obtain a better performance, indicating the better generalization ability of the improved algorithm. The findings can also provide the theoretical basis and data support for the similarity evaluation of export tea.

基于深度度量学习的茶叶相似性评价方法

Evaluation of tea similarity based on deep metric learning