Reinforcement learning dqn algorithm does not forge to target value

40 Views Asked by At

I am working on a study project with stable baselines3 and webots. I let a vehicle drive in a simple round course. The vehicle has sensors like a camera, distance sensor, the current speed and the current position as coordinates. The camera is giving the data in the observationspace. The rest of the data is used for rewards. I give rewards for driving in the middle of the lane (distance sensors) per step and the driven range per step. So i assume that the model learns so much that he gets a maximum of rewards. In the picture is seen the mean rewards with tensorboard. I have the problem that the given rewards dont forge to a maximum. The curve is rising and falling in different areas. Is there any known problem or behavior of DQN that describes exactly this? I am sitting there for a very long time on an explanation but it makes no sense to me that the curve does not become straight and forges to a maximum.Mean Rewards for driving in a round course I think the code is not needed here because the learning process works but I am failing in making it converge,

I tried to set the "target_update_interval" from 10.000 to 100.000 because I have read the higher the reward is the robust the agent is but its not working.The new graph with mean rewards is here I also rised the number of steps.

0

There are 0 best solutions below