Restored Policy gives action that is out of bound with RLlib

17 Views Asked by At

I’m running PPO with a custom env. The training is running perfectly; I checked it predicted action within the action space.

I save a checkpoint every iteration. But when I load it, the action predicted by the loaded policy using policy.from_checkpoint ranges from -1 to 1 where the action space should be between 0-30.

Is there any postprocessing or preprocessing I’m missing ? If yes, how to know which preprocessing rllib applies ?

I tried exporting in in ONNX, but it's the same thing.

0

There are 0 best solutions below