LI Xiaoyu, ZHANG Junhua, GUO Xiaoguang, et al. Reinforcement learning-based optimization algorithm for energy management and path planning of robot chassis[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(21): 175-183. DOI: 10.11975/j.issn.1002-6819.202405192
    Citation: LI Xiaoyu, ZHANG Junhua, GUO Xiaoguang, et al. Reinforcement learning-based optimization algorithm for energy management and path planning of robot chassis[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2024, 40(21): 175-183. DOI: 10.11975/j.issn.1002-6819.202405192

    Reinforcement learning-based optimization algorithm for energy management and path planning of robot chassis

    • Ground roughness can significantly impact the battery performance in greenhouse environments. In this study, battery energy management was integrated with path planning to address this challenge. A systematic investigation was also implemented to explore the effects of ground roughness on the battery life and utilization efficiency of greenhouse vehicle platforms. A graded pre-scoring model was constructed using prior knowledge. Additionally, the Manhattan distance between the vehicle's current position and the target point was incorporated into the reinforcement learning reward function, thus linking travel distance with battery life to optimize both battery utilization efficiency and life during path planning. An Adaptive Multi-step Q-learning algorithm (AMQL) with adaptive step sizes and an Adaptive ε-greedy Q-learning algorithm (AEQL) with an adaptive exploration rate was proposed to enhance the performance of the Q-learning algorithm. The traditional Q-learning algorithms were associated with some issues, such as long iteration times, low convergence efficiency, susceptibility to local optima, and excessive path turns. The AMQL algorithm was used to adjust the step size, according to the forward reward assessment—if the reward at the current position increased corresponding to the previous reward, the step size increased. The step size gradually decreased to prevent suboptimal path optimization, as the current position approached the endpoint. The AEQL algorithm was used to adaptively adjust the exploration rate ε using the difference between adjacent reward values—ε increased when the adjacent reward value increased, and ε decreased when the reward value decreased. Although AMQL improved the convergence efficiency and iteration speed, the variations in the step size caused significant fluctuations in rewards, resulting in lower algorithm stability. Additionally, there was no outstanding impact of multi-step length on the convergence efficiency and iteration speed. Furthermore, the AEQL enhanced the exploration efficiency and algorithm stability through dynamic adjustments. But its fluctuating rise during the initial training phase also increased the training time. Therefore, the AMQL and AEQL algorithms were combined to develop an Adaptive Multi-step and ε-greedy Q-learning algorithm (AMEQL), in order to ensure faster and more optimal global path selection during path planning. In a simulated environment, the models were first used to simulate a realistic greenhouse tomato scenario. Then, an Inertial Measurement Unit (IMU) was used to record the changes in the aisle roughness in real time. This data was then incorporated into the simulation model. Finally, 300 rounds of simulation experiments were carried out to test the traditional Q-learning, AMQL, and AMEQL algorithm for path planning in the single-row (30 m×20 m), double-row (50 m×50 m), and triple-row (70 m×50 m) environments. Simulation results show that the AMEQL algorithm reduced the average training time by 34.46%, the average number of iterations required for convergence by 23.68%, the number of path turns by 63.13%, and the post-convergence average fluctuation by 15.62%, compared with the traditional Q-learning. Due to its higher convergence speed in 400 iterations, the AMEQL algorithm averaged 14 fluctuations per 100 iterations after reaching the maximum reward, while the AMQL algorithm averaged 15 fluctuations. This algorithm can provide a theoretical reference for the autonomous path planning of greenhouse platforms
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return