李潇宇,张君华,郭晓光,等. 基于强化学习的温室底盘能量管理与路径规划优化算法[J]. 农业工程学报,2024,40(21):1-9. DOI: 10.11975/j.issn.1002-6819.202405192
    引用本文: 李潇宇,张君华,郭晓光,等. 基于强化学习的温室底盘能量管理与路径规划优化算法[J]. 农业工程学报,2024,40(21):1-9. DOI: 10.11975/j.issn.1002-6819.202405192
    LI XiaoYu, ZHANG JunHua, GUO XiaoGuang, et al. Reinforcement learning-based optimization algorithm for energy management and path planning of greenhouse chassis[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(21): 1-9. DOI: 10.11975/j.issn.1002-6819.202405192
    Citation: LI XiaoYu, ZHANG JunHua, GUO XiaoGuang, et al. Reinforcement learning-based optimization algorithm for energy management and path planning of greenhouse chassis[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(21): 1-9. DOI: 10.11975/j.issn.1002-6819.202405192

    基于强化学习的温室底盘能量管理与路径规划优化算法

    Reinforcement learning-based optimization algorithm for energy management and path planning of greenhouse chassis

    • 摘要: 为解决温室底盘传统路径规划中因忽略地面粗糙度而导致的电池寿命缩短与利用效率低下的问题,该研究探讨了3种融合电池能量管理与路径规划的强化学习算法。首先基于先验知识构建分级预打分奖励模型,并通过增加曼哈顿距离构建奖励函数,提高电池寿命和利用率;其次针对传统Q-Learning算法收敛效率低、易陷入局部最优等问题,提出了自适应变步长的优化算法(Adaptive Multi-step Q-learning,AMQL)和基于自适应改变探索率的优化算法(Adaptive ε-greedy Q-learning,AEQL),以提升Q-Learning算法的性能。此外,为进一步提高算法的可行性,本文将AMQL算法和AEQL算法进行融合,提出了一种自适应多步长和变ε-greedy融合算法(Adaptive Multi-step And ε-greedy Q-learning,AMEQL),并通过仿真对比的方式,验证了AMQL和AMEQL算法相对于传统Q-Learning算法在3个不同垄道下的性能。仿真试验结果表明:AMQL相对于传统QL算法,训练平均时间降低23.74%,收敛平均迭代次数降低14.01%,路径平均拐点数降低54.29%,收敛后的平均波动次数降低18.01%;AMEQL相对于传统QL算法,训练平均时间降低34.46%,收敛平均迭代次数降低23.68%,路径平均拐点数降低63.13%,收敛后的平均波动次数减少15.62%。在400次迭代过程中,AMQL到达最大奖励后平均每100次波动15次,而AMEQL平均波动14次。该算法可为温室底盘自主路径规划提供理论参考。

       

      Abstract: In greenhouse environments, variations in ground roughness significantly impact battery performance. This paper integrates battery energy management with path planning to address these challenges. The study focuses on the effects of ground roughness on the battery life and utilization efficiency of greenhouse vehicle platforms, and it constructs a graded pre-scoring model based on prior knowledge. Additionally, the Manhattan distance between the vehicle's current position and the target point is incorporated into the reinforcement learning reward function, linking travel distance with battery life to optimize both battery utilization efficiency and life during the path planning process.To address issues such as long iteration times, low convergence efficiency, susceptibility to local optima, and excessive path turns associated with traditional Q-learning algorithms, this paper proposes an Adaptive Multi-step Q-learning algorithm (AMQL) with adaptive step sizes and an Adaptive ε-greedy Q-learning algorithm (AEQL) with an adaptive exploration rate to enhance the performance of the traditional Q-learning algorithm.The AMQL algorithm works by adjusting the step size based on forward reward assessment—if the reward at the current position increases compared to the previous reward, the step size is increased. To prevent suboptimal path optimization, the step size gradually decreases as the current position approaches the endpoint. The AEQL algorithm adaptively adjusts the exploration rate ε based on the difference between adjacent reward values—ε increases when the adjacent reward value increases and decreases when the reward value decreases.Although AMQL improves convergence efficiency and iteration speed, the changes in step size cause significant fluctuations in rewards, resulting in lower algorithm stability. Additionally, the multi-step length's impact on improving convergence efficiency and iteration speed is not very pronounced. On the other hand, AEQL enhances exploration efficiency and algorithm stability through dynamic adjustments, but its fluctuating rise during the initial training phase may increase training time. Therefore, to address these issues, this paper combines the AMQL and AEQL algorithms to develop an Adaptive Multi-step and ε-greedy Q-learning algorithm (AMEQL). This algorithm not only solves the problems of AMQL and AEQL but also combines their respective advantages, ensuring faster and more optimal global path selection during path planning.In a simulated environment, the study first models a realistic greenhouse tomato scenario for simulation. Then, an IMU is used to record changes in aisle roughness in real time, incorporating this data into the simulation model. Finally, through 300 rounds of simulation experiments, the traditional Q-learning algorithm, the AMQL algorithm, and the AMEQL algorithm are tested for path planning in single-row (30 m×20 m), double-row (50 m×50 m), and triple-row (70 m×50 m) environments. Simulation results show that, compared to the traditional Q-learning algorithm, the AMEQL algorithm reduces average training time by 34.46%, decreases the average number of iterations required for convergence by 23.68%, reduces the number of path turns by 63.13%, and reduces post-convergence average fluctuation by 15.62%. Due to its faster convergence speed, in 400 iterations, the AMEQL algorithm averaged 14 fluctuations per 100 iterations after reaching the maximum reward, while the AMQL algorithm averaged 15 fluctuations. This algorithm can be provided as a theoretical reference for autonomous path planning of greenhouse platforms.

       

    /

    返回文章
    返回