I am new in python or any programming language for that matter. For months now I have been working on stabilising the inverted pendulum. I have gotten everything working but struggling to get the right reward function. So far, after researching and trials and fails, the best I could come up with is
R=(x_dot**2)+0.001*(x**2)+0.1*(theta**2)
But I don't get to stability, this being theta=0
long enough.
Does anyone has an idea of the logic behind the ideal reward function?
Thank you.
I am working on inverted pendulum too. I found the following reward function which I am trying.
but still have a problem in choosing the actions, maybe we can discuss.