Gymnasium environment consisting of multiple environments

83 Views Asked by At

I'm using reinforcement learning to train an agent to estimate the stepsize in gradient descent. I want to train the agent on different objective functions of the form x'Qx. I'm currently using the Gymnasium interface to define the environment. The problem is that this environment has to consist of a large number of objective functions.

There exists something like 'vectorized environments' in Gymnasium which allows for stacking multiple independent environments into a single environment. The problem is that the action and observation space need to be the same. This obviously isn't the case for environments with different objective functions as the maximum stepsize has to be limited to (2/max eigenvalue of Q).

My current solution is that the Q matrix changes when gradient descent has converged and that the environment flags 'done' when it has gone trough all different objective functions.

Is there a better way to implement the environment than my current solution?

0

There are 0 best solutions below