MlpPolicy only return 1 and -1 with action spece[-1,1]

169 Views Asked by qwererer2 At 29 July 2025 at 07:36

I try to use Stable Baseliens train a PPO2 with MlpPolicy. After 100k timesteps, I can only get 1 and -1 in action. I restrict action space to [-1, 1] and directly use action as control. I don't know if it is because I directly use action as control?

Original Q&A

There are 1 best solutions below

Nico Bohlinger On 05 January 2021 at 21:16

This could be the result of the gauß distribution PPO2 is using. You could use a different algorithm that doesn't use gauß or use PPO with another distribution.

Checkout the example here: https://github.com/hill-a/stable-baselines/issues/112 And this paper: https://www.ri.cmu.edu/wp-content/uploads/2017/06/thesis-Chou.pdf

MlpPolicy only return 1 and -1 with action spece[-1,1]

There are 1 best solutions below

Related Questions in REINFORCEMENT-LEARNING

Related Questions in OPENAI-GYM

Related Questions in POLICY-GRADIENT-DESCENT

Related Questions in STABLE-BASELINES

Related Questions in MUJOCO

Trending Questions

Popular # Hahtags

Popular Questions