How to choose the reward function for the cart-pole inverted pendulum task

3k Views Asked by Stevy KUIMI At 14 June 2025 at 03:58

I am new in python or any programming language for that matter. For months now I have been working on stabilising the inverted pendulum. I have gotten everything working but struggling to get the right reward function. So far, after researching and trials and fails, the best I could come up with is

R=(x_dot**2)+0.001*(x**2)+0.1*(theta**2)

But I don't get to stability, this being theta=0 long enough.

Does anyone has an idea of the logic behind the ideal reward function?
Thank you.

Original Q&A

There are 2 best solutions below

sara On 02 August 2018 at 15:04

I am working on inverted pendulum too. I found the following reward function which I am trying.

costs = angle_normalise((th)**2 +.1*thdot**2 + .001*(action**2))
# normalize between -pi and pi
reward=-costs

but still have a problem in choosing the actions, maybe we can discuss.

Simon On 04 August 2018 at 08:52

For just the balancing problem (not the swing-up), even a binary reward is enough. Something like

Always 0, then -1 when the pole falls. Or,
Always 1, then 0 when the pole falls.

Which one to use depends on the algorithm used, the discount factor and the episode horizon. Anyway, the task is easy and both will do their job.

For the swing-up task (harder than just balancing, as the pole starts upside down and you need to swing it up by moving the cart) it is better to have a reward depending on the state. Usually, the simple cos(theta) is fine. You can also add a penalty for the angle velocity and for the action, in order to prefer slow-changing smooth trajectory. You can also add a penalty if the cart goes out of the boundaries of the x coordinate.
A cost including all these terms would look like this

reward = cos(theta) - 0.001*theta_d.^2 - 0.0001*action.^2 - 100*out_of_bound(x)

How to choose the reward function for the cart-pole inverted pendulum task

There are 2 best solutions below

Related Questions in ROBOTICS

Related Questions in REINFORCEMENT-LEARNING

Related Questions in Q-LEARNING

Related Questions in REWARD-SYSTEM

Trending Questions

Popular # Hahtags

Popular Questions