Abstract:
Agricultural machinery dispatching operations, as an innovative model of socialized agricultural machinery services, have been widely implemented in county- or district-level administrative areas across the province. Due to the similar crop maturity periods within the same province, the demand for agricultural machinery is concentrated during peak operation periods, leading to a supply-demand imbalance where some machinery owners have no work while others have jobs but no machinery. This not only affects agricultural production efficiency but also increases the difficulty and complexity of agricultural production due to the lack of scientific and rational scheduling strategies. To address this challenge, many studies have adopted traditional heuristic algorithms to optimize the scheduling of harvester cooperative operations. However, issues such as low work efficiency and high operational costs remain. In response to these challenges, this study constructs a harvester co-scheduling model aimed at minimizing the transfer costs between fields. An inter-regional collaborative optimization scheduling algorithm based on deep reinforcement learning (DRL-ICOSA) was designed. This paper first analyzes the Markov decision process of harvester collaborative scheduling operations and constructs a deep reinforcement learning environment. For the attention mechanism, a policy network and a value network based on the encoder-decoder architecture were designed to enable the model to automatically learn the complexity of the environment, effectively utilize raw data, and improve performance and generalization capabilities. Dynamic Gaussian noise was introduced into the random sampling strategy to prevent the policy network from falling into local optima during the initial training stage while enhancing the model's performance and robustness. The model was effectively trained using a proximal policy optimization algorithm. Finally, the trained model was validated on a farmland test set, and the optimal path was selected using a greedy action selection strategy, resulting in an optimized solution for cross-county harvester scheduling. To verify the algorithm's effectiveness, four combined operation scenarios were considered, based on effective operation durations of 40 and 24 h, with the agricultural machinery scheduling center located at the center and edge of the operation area, respectively. The DRL-ICOSA algorithm, genetic algorithm (GA), particle swarm optimization (PSO), and simulated annealing (SA) were used to calculate scheduling strategies and conduct comparative analysis. The experimental results indicate that: when the scheduling center is located at the center or edge of the area and the effective operation duration is 40h, the DRL-ICOSA algorithm reduces the average scheduling cost by no less than 13.9% compared to the GA, PSO, and SA algorithms; when the effective operation duration is 24 h, the average scheduling cost reduction is no less than 11.5%. when the operation duration is 40 or 24 h and the scheduling center is located at the center of the area, the DRL-ICOSA algorithm reduces the average scheduling cost by no less than 12.3% compared to the GA, PSO, and SA algorithms; when the scheduling center is located at the edge of the area, the average scheduling cost reduction is no less than 11.5% for the DRL-ICOSA algorithm compared to the GA, PSO, and SA algorithms. Therefore, regardless of the effective operation duration or the geographical location of the scheduling center, the DRL-ICOSA algorithm consistently achieves the lowest scheduling cost compared to the other three algorithms. In summary, this study provides a more scientific and reasonable scheduling solution for the complex problem of collaborative harvester scheduling operations. The DRL-ICOSA algorithm demonstrates outstanding effectiveness in reducing scheduling costs and shows significant potential and application value in addressing the optimization problem of collaborative harvester scheduling. Compared with traditional heuristic algorithms, the proposed method is more suitable for complex environments and possesses stronger generalization capabilities. It avoids the manual feature design steps of traditional methods, thereby reducing dependence on prior knowledge. This study can effectively reduce resource waste and cost expenditure.