Actually, I'm working on a project for autonomous vehicle management. We utilize a deep reinforcement learning model with a Q function. Initially, we trained our model with 256 actions for decision-making. However, as we increased the number of lanes, the new action state expanded to 65536, significantly amplifying the model's complexity and resource requirements.
My question is: do you have any ideas to resolve this problem and reduce the complexity of the Q function?
I attempted to address this by changing the model's output to directly predict the next state, but the model diverged and provided suboptimal solutions.