I’m running PPO with a custom env. The training is running perfectly; I checked it predicted action within the action space.
I save a checkpoint every iteration. But when I load it, the action predicted by the loaded policy using policy.from_checkpoint ranges from -1 to 1 where the action space should be between 0-30.
Is there any postprocessing or preprocessing I’m missing ? If yes, how to know which preprocessing rllib applies ?
I tried exporting in in ONNX, but it's the same thing.