基于深度强化学习的收割机省内协同调度优化策略

李子康; 张璠; 滕桂法; 李政; 王梓怡; 马世纪

doi:10.11975/j.issn.1002-6819.202401145

基于深度强化学习的收割机省内协同调度优化策略

Deep reinforcement learning-based optimization strategy for the cooperative scheduling of harvesters

摘要

摘要: 针对目前多机多地块间调度作业存在效率低、成本高等问题，该研究构建了以收割机地块间转移成本最小为目标的协同调度模型，设计了基于深度强化学习的收割机协同调度优化算法(inter-regional collaborative optimization scheduling algorithm based on deep reinforcement learning，DRL-ICOSA)。首先分析收割机调度作业的马尔可夫决策过程，构建基于注意力机制的策略网络和价值网络，在随机采样策略中引入动态高斯噪声，以避免训练初期陷入局部最优，同时提高网络模型的鲁棒性；接着采用近端策略优化算法(proximal policy optimization,PPO)训练网络模型；最后利用测试集验证DRL-ICOSA算法，得到收割机优化调度方案。基于有效作业时长40和24 h、农机调度中心位于作业区域中心和区域边缘的4种组合作业场景下，采用DRL-ICOSA算法、遗传算法(genetic algorithm，GA)、粒子群算法(particle swarm optimization，PSO)和模拟退火算法(simulated annealing，SA)计算调度策略并进行对比分析。试验结果表明：当调度中心位于区域中心或边缘时，有效作业时长为40 h，DRL-ICOSA算法相较于GA、PSO和SA算法，平均调度成本降幅不少于13.9%；有效作业时长为24 h，平均调度成本降幅不少于11.5%。当作业时长为40或24 h时，调度中心位于区域中心，DRL-ICOSA算法相较于GA、PSO和SA算法，平均调度成本降幅不少于12.3%；调度中心位于区域边缘时，DRL-ICOSA算法相较于GA、PSO和SA算法，平均调度降幅不低于11.5%。因此，有效作业时长为40或24 h、调度中心位于区域中心或边缘时，相比其他3种算法，DRL-ICOSA算法均能计算得到最低的调度成本。这一研究结果可为收割机省内协同作业提供科学合理的调度方案。

Abstract: Agricultural machinery dispatching operations, as an innovative model of socialized agricultural machinery services, have been widely implemented in county- or district-level administrative areas across the province. Due to the similar crop maturity periods within the same province, the demand for agricultural machinery is concentrated during peak operation periods, leading to a supply-demand imbalance where some machinery owners have no work while others have jobs but no machinery. This not only affects agricultural production efficiency but also increases the difficulty and complexity of agricultural production due to the lack of scientific and rational scheduling strategies. To address this challenge, many studies have adopted traditional heuristic algorithms to optimize the scheduling of harvester cooperative operations. However, issues such as low work efficiency and high operational costs remain. In response to these challenges, this study constructs a harvester co-scheduling model aimed at minimizing the transfer costs between fields. An inter-regional collaborative optimization scheduling algorithm based on deep reinforcement learning (DRL-ICOSA) was designed. This paper first analyzes the Markov decision process of harvester collaborative scheduling operations and constructs a deep reinforcement learning environment. For the attention mechanism, a policy network and a value network based on the encoder-decoder architecture were designed to enable the model to automatically learn the complexity of the environment, effectively utilize raw data, and improve performance and generalization capabilities. Dynamic Gaussian noise was introduced into the random sampling strategy to prevent the policy network from falling into local optima during the initial training stage while enhancing the model's performance and robustness. The model was effectively trained using a proximal policy optimization algorithm. Finally, the trained model was validated on a farmland test set, and the optimal path was selected using a greedy action selection strategy, resulting in an optimized solution for cross-county harvester scheduling. To verify the algorithm's effectiveness, four combined operation scenarios were considered, based on effective operation durations of 40 and 24 h, with the agricultural machinery scheduling center located at the center and edge of the operation area, respectively. The DRL-ICOSA algorithm, genetic algorithm (GA), particle swarm optimization (PSO), and simulated annealing (SA) were used to calculate scheduling strategies and conduct comparative analysis. The experimental results indicate that: when the scheduling center is located at the center or edge of the area and the effective operation duration is 40h, the DRL-ICOSA algorithm reduces the average scheduling cost by no less than 13.9% compared to the GA, PSO, and SA algorithms; when the effective operation duration is 24 h, the average scheduling cost reduction is no less than 11.5%. when the operation duration is 40 or 24 h and the scheduling center is located at the center of the area, the DRL-ICOSA algorithm reduces the average scheduling cost by no less than 12.3% compared to the GA, PSO, and SA algorithms; when the scheduling center is located at the edge of the area, the average scheduling cost reduction is no less than 11.5% for the DRL-ICOSA algorithm compared to the GA, PSO, and SA algorithms. Therefore, regardless of the effective operation duration or the geographical location of the scheduling center, the DRL-ICOSA algorithm consistently achieves the lowest scheduling cost compared to the other three algorithms. In summary, this study provides a more scientific and reasonable scheduling solution for the complex problem of collaborative harvester scheduling operations. The DRL-ICOSA algorithm demonstrates outstanding effectiveness in reducing scheduling costs and shows significant potential and application value in addressing the optimization problem of collaborative harvester scheduling. Compared with traditional heuristic algorithms, the proposed method is more suitable for complex environments and possesses stronger generalization capabilities. It avoids the manual feature design steps of traditional methods, thereby reducing dependence on prior knowledge. This study can effectively reduce resource waste and cost expenditure.

HTML全文

参考文献(29)

施引文献

资源附件(0)