Restored Policy gives action that is out of bound with RLlib

17 Views Asked by eilwa At 18 March 2024 at 13:36

I’m running PPO with a custom env. The training is running perfectly; I checked it predicted action within the action space.

I save a checkpoint every iteration. But when I load it, the action predicted by the loaded policy using policy.from_checkpoint ranges from -1 to 1 where the action space should be between 0-30.

Is there any postprocessing or preprocessing I’m missing ? If yes, how to know which preprocessing rllib applies ?

I tried exporting in in ONNX, but it's the same thing.

Original Q&A

Restored Policy gives action that is out of bound with RLlib

There are 0 best solutions below

Related Questions in REINFORCEMENT-LEARNING

Related Questions in RLLIB

Trending Questions

Popular # Hahtags

Popular Questions