Abstract:
Abstract: Tobacco leaves can often be classified in detail by variety, stalk position, and place of production. Different tobacco components include leaf shreds, stem shreds, and reconstituted tobacco leaf shreds. It is a high demand to classify and identify the tobacco components in recent years. This present work aims to analyze the absorption coefficient spectrum and refractive index spectrum of three tobacco components in the range of 0.35-1.50 THz using the terahertz time-domain spectroscopy. The low-variance filtering combined with Principal Component Analysis (PCA) was performed for the spectral feature extraction and dimension reduction on spectroscopy data. Three classification models were developed to determine the specific absorption and refraction spectra of tobacco, including Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Bagged trees. The results show that a higher accuracy was achieved in the classification model using the absorption coefficient spectrum. The low variance filter combined with the PCA feature extraction significantly improved the classification accuracy, and the KNN model presented an accuracy rate of 98.3%. Furthermore, the Successive Projections Algorithm (SPA) feature extraction was also utilized for the frequency domain spectrum combined with the SVM model, where the classification accuracy was also about 90%. Consequently, the terahertz time-domain spectroscopy technology can be expected to serve the classification of cut tobacco. The finding can provide a strong reference for the application of terahertz time-domain spectroscopy to the non-destructive detection of tobacco materials.