基于混合空洞卷积和注意力多尺度网络的残饵密度估计

张丽珍; 李延天; 李志坚; 孟雄栋; 张永琪; 吴迪

doi:10.11975/j.issn.1002-6819.202403053

基于混合空洞卷积和注意力多尺度网络的残饵密度估计

Estimating residual bait density using hybrid dilated convolution and attention multi-scale network

摘要

摘要: 及时、准确地估算饵料盘中残留饲料量是提高养殖效益的重要措施。针对虾类养殖场景下残饵检测模型复杂度高、计数精度低的问题，提出了一种基于混合空洞卷积和注意力多尺度网络（hybrid dilated convolution and attention multi-scale network, HAMNet）的残饵密度估计方法。首先，借鉴MCNN（multi-column convolutional neural network）多列架构的思想设计并行卷积块（parallel convolution block, PCB），使网络在单列架构中提取多种尺度的残饵特征，简化了网络结构并减轻了计算量；同时为了弥补网络结构简化造成残饵特征表示能力略有不足的问题，引入混合空洞卷积块（hybrid dilated convolution block, HDCB）避免信息丢失并增大感受野，增强模型深入挖掘多尺度残饵信息的能力。其次，在网络中嵌入通道注意力机制（channel attention mechanism, CAM），利用通道之间的相互依赖性重新校准有用特征信息的权重，凸显目标与背景的差异性。最后，针对下采样导致密度图质量差的问题，应用可学习的转置卷积恢复特征图细节信息，进而提升模型计数性能。利用饵料盘条件下采集的残饵图像进行了验证，试验结果表明，与基准模型MCNN相比，HAMNet模型的平均绝对误差、均方根误差和计算量分别降低了44.4%、40.8%和13.7%，参数量仅为0.52 MB。与经典密度估计模型CMTL（cascaded multi-task learning）、SANet（scale aggregation network）、CSRNet（congested scene recognition network）相比，该模型在各项性能指标上保持了最佳平衡，明显处于优势。该研究可为人工智能在水产养殖中快速量化残饵提供参考。

Abstract: Shrimp has been one of the most favorite seafood for years. It is essential to timely estimate the amount of feed left in the bait tray after feeding in shrimp aquaculture. Feeding strategies can also be adjusted to reduce the bait costs in recent years. The traditional detection of residual baits can rely on visual inspection by the shrimp farmers. Neural networks and deep learning have been introduced to detect and count the residual baits at present. However, large-scale neural networks cannot be successfully implemented on mobile devices, due mainly to the low recognition accuracy and large model computation. In this study, the improved model was proposed to estimate the density map of residual baits using a hybrid dilated convolution and attention multi-scale network (HAMNet). The high accuracy and low complexity were achieved in the detection models of residual bait. The HAMNet model was divided into three components: A low-level feature extractor (LLFE), a high-level feature extractor (HLFE), and a density map restorer and generator (DMRG). These components served as the front-end, the middle-end, and the back-end network of the improved model, respectively. Firstly, inspired by the multi-column convolutional neural network (MCNN), the parallel convolution block (PCB) was designed in the front-end network, in order to extract the feature information of residual bait at multiple scales within a single-column architecture; At the same time, the hybrid dilated convolution block (HDCB) was introduced into the mid-end network to expand the receptive field, in order to further learn the multi-scale features. Secondly, a channel attention mechanism (CAM) was embedded into the network to recalibrate the weights of useful feature information using the interdependence among channels, in order to highlight the difference between the target and background. Finally, the learnable transposed convolutional layers were applied in the back-end network to recover the detailed information from the feature maps. The quality of density maps was improved to reduce the counting errors. As such, the high-quality density map was then obtained during downsampling in the front-end network. In addition, the effectiveness of the improved model was validated using residual bait images under bait tray conditions. A comparison was also implemented with the classical networks of density map estimation. Comparative experiments showed that the HAMNet model was achieved in the minimum mean absolute error (MAE) of 2.0, the minimum root mean square error (RMSE) of 2.9, and the least floating point operations (FLOPs) at 6.55 G on the residual bait datasets, with a parameter count of only 0.52MB. The HAMNet model shared the higher counting accuracy and stability with the lower computational complexity. Compared with the baseline MCNN network, the improved model achieved a 44.4% reduction in MAE, a 40.8% reduction in RMSE, and a 13.7% reduction in FLOPs. Compared with the CMTL, SANet, and CSRNet, the optimal balance was obtained in all performance metrics. In summary, the HAMNet model outperformed the rest, in terms of overall performance, thus improving the counting accuracy with the low computational volume. The finding can provide a strong reference to rapidly quantify the residual baits in shrimp aquaculture. Novel ideas were also offered to deploy the detection models of residual bait on the platforms with limited computational power.

HTML全文

参考文献(30)

施引文献

资源附件(0)