基于轻量化PPINET的花生荚果实时识别方法

员玉良; 黄劲龙; 李德豪; 王方艳; 马德新

doi:10.11975/j.issn.1002-6819.202502064

基于轻量化PPINET的花生荚果实时识别方法

Real-time peanut pod recognition method based on lightweight model PPINET

摘要

摘要: 传统CNN算法在花生荚果外观识别任务中存在内存密集型和计算密集型问题，以及其在资源受限的边缘终端上部署困难，基于此，该研究提出了一种高效的花生荚果识别模型——PPINET（peanut pod identification network），以适应嵌入式设备的资源限制需求。该模型通过结合深度可分离卷积和倒残差结构显著降低参数量和计算量，同时保留特征提取能力，并引入MQA（multi-query attention）模块增强关键特征提取，并利用TuNAS（easy-to-tune and scalable implementation of efficient neural architecture search with weight sharing）策略优化模型结构，使其在资源受限设备上表现优异。此外，采用ResNet（residual neural network）进行知识蒸馏配合三折交叉验证训练提升精度，最终量化为RKNN格式并在瑞芯微RK3588上实现NPU加速部署。PPINET模型尺寸仅为1.85 MB，参数量为0.49 M，浮点运算数为0.30 G。PPINET在花生荚果分类中表现优异，准确率达98.65%，在RK3588上推理速度达321 fps。该模型具备较高的识别准确率和快速的识别速度，能够实现花生荚果的实时精准检测。

Abstract: Convolutional neural network (CNN) algorithms have been confined to recognizing the peanut pod's appearance, such as the high memory consumption, computational complexity, and the difficulty of deployment on resource-constrained edge devices. This study aims to propose an efficient, lightweight, and real-time identification model, named PPINET (peanut pod identification network). Specifically, the embedded systems were designed for the PPINET model. The high accuracy and low latency were achieved to significantly reduce the computational overhead, thereby highly suitable for intelligent agricultural applications. The architecture of the PPINET was integrated with the depthwise separable convolutions and then inverted the residual blocks. Both the number of parameters and floating-point operations (FLOPs) were effectively reduced to extract the discriminative features from the peanut pod images. This lightweight backbone was better performed to efficiently deploy on the low-power edge devices. A lightweight attention module ( multi-query attention (MQA) mechanism) was incorporated to further enhance the feature extraction. Furthermore, the embedded deployment was optimized to strengthen the network’s focus on the key features. Thereby the classification accuracy and robustness were improved under variable conditions. The TuNAS (easy-to-tune and scalable implementation of efficient neural architecture search (NAS) with weight sharing) was adopted to adaptively optimize the model structure for the deployment environments with strict resource constraints. A convolutional unit called Tun was integrated with the multiple configurable inverted residual blocks. The architecture was dynamically adjusted, according to the application needs. The PPINET remained adaptable and efficient over various hardware platforms. Several training strategies were employed to further enhance the performance of the model. A pipeline was also introduced to refine the image preprocessing. Among them, contour extraction and precise cropping were used to mitigate the distortions caused by traditional image scaling. The quality of the training samples significantly improved the generalization for the different appearances of the peanut pod. In addition, a cosine annealing schedule was applied to dynamically adjust the learning rate. The convergence was accelerated to avoid the suboptimal local minima. In deployment, the model was quantized into the RKNN format. The hardware-level acceleration was realized on a Rockchip RK3588 platform equipped with a neural processing unit (NPU). The quantization significantly reduced the memory usage and inference latency with the high performance of the classification. The final model size was only 1.85 MB suitable for the embedded environments, with 0.49 million parameters and a computational complexity of just 0.30 GFLOPs. Experimental results demonstrated that the PPINET achieved an outstanding classification accuracy of 98.65% in the peanut pod. Furthermore, the comparative results showed that the inclusion of the MQA module improved the accuracy of the recognition by up to 2.34% over the conventional attention mechanisms, such as the SE (squeeze-and-excitation) and CBAM (convolutional block attention module), indicating the superior performance-to-efficiency trade-off. In deployment tests, the PPINET was achieved at an inference speed of 321 frames per second (fps) on the RK3588 development board. The better performance significantly outperformed the popular embedded systems, like the Raspberry Pi 4B and Jetson Nano. This real-time processing was effectively used in the automated sorting of the peanut pod. The rapid and accurate classification was essential to improve agricultural productivity. Particularly, the compact architecture and hardware-accelerated design were well-suited for the smart farming applications, fully meeting the requirement of the low-power, real-time AI solutions. In conclusion, the PPINET successfully solved the technical bottlenecks and then deployed the CNN recognition models on the embedded devices using the lightweight and efficient network. A scalable NAS framework was combined with the optimal attention mechanisms and the hardware-friendly quantization. This work can provide a practical and reliable solution to identify the real-time agricultural product in the edge AI applications for precision agriculture.

HTML全文

参考文献(40)

施引文献

资源附件(0)