Reinforcement learning-based optimization algorithm for energy management and path planning of greenhouse chassis
-
-
Abstract
In greenhouse environments, variations in ground roughness significantly impact battery performance. This paper integrates battery energy management with path planning to address these challenges. The study focuses on the effects of ground roughness on the battery life and utilization efficiency of greenhouse vehicle platforms, and it constructs a graded pre-scoring model based on prior knowledge. Additionally, the Manhattan distance between the vehicle's current position and the target point is incorporated into the reinforcement learning reward function, linking travel distance with battery life to optimize both battery utilization efficiency and life during the path planning process.To address issues such as long iteration times, low convergence efficiency, susceptibility to local optima, and excessive path turns associated with traditional Q-learning algorithms, this paper proposes an Adaptive Multi-step Q-learning algorithm (AMQL) with adaptive step sizes and an Adaptive ε-greedy Q-learning algorithm (AEQL) with an adaptive exploration rate to enhance the performance of the traditional Q-learning algorithm.The AMQL algorithm works by adjusting the step size based on forward reward assessment—if the reward at the current position increases compared to the previous reward, the step size is increased. To prevent suboptimal path optimization, the step size gradually decreases as the current position approaches the endpoint. The AEQL algorithm adaptively adjusts the exploration rate ε based on the difference between adjacent reward values—ε increases when the adjacent reward value increases and decreases when the reward value decreases.Although AMQL improves convergence efficiency and iteration speed, the changes in step size cause significant fluctuations in rewards, resulting in lower algorithm stability. Additionally, the multi-step length's impact on improving convergence efficiency and iteration speed is not very pronounced. On the other hand, AEQL enhances exploration efficiency and algorithm stability through dynamic adjustments, but its fluctuating rise during the initial training phase may increase training time. Therefore, to address these issues, this paper combines the AMQL and AEQL algorithms to develop an Adaptive Multi-step and ε-greedy Q-learning algorithm (AMEQL). This algorithm not only solves the problems of AMQL and AEQL but also combines their respective advantages, ensuring faster and more optimal global path selection during path planning. In a simulated environment, the study first models a realistic greenhouse tomato scenario for simulation. Then, an IMU is used to record changes in aisle roughness in real time, incorporating this data into the simulation model. Finally, through 300 rounds of simulation experiments, the traditional Q-learning algorithm, the AMQL algorithm, and the AMEQL algorithm are tested for path planning in single-row (30 m×20 m), double-row (50 m×50 m), and triple-row (70 m×50 m) environments. Simulation results show that, compared to the traditional Q-learning algorithm, the AMEQL algorithm reduces average training time by 34.46%, decreases the average number of iterations required for convergence by 23.68%, reduces the number of path turns by 63.13%, and reduces post-convergence average fluctuation by 15.62%. During 400 iterations, AMEQL fluctuated once every 7.12 iterations on average after reaching the maximum reward, while AMQL fluctuated once every 6.68 iterations on average. This algorithm can be provided as a theoretical reference for autonomous path planning of greenhouse platforms.
-
-