PPO2 reinforcement learning 'catastrophic forgetting'?

461 Views Asked by Lewis Liu At 05 November 2020 at 08:55

I'm implementing PPO2 reinforcement learning on my self-build tasks and always encounter such situations where the agent seems to be nearly matured then suddenly catstrophically loses its performance and couldn't hold its stable performance. I don't know what's the right word for it.

I'm just wondering what could be the cause for such catastrophic drop in performance? Any hints or tips?

Many thanks

learningprocess1 learningprocess2

Original Q&A

There are 1 best solutions below

Nico Bohlinger On 05 January 2021 at 21:06

I would guess that your reward function is not capped and can produce extremely high negative rewards in some edge cases.

Two things to prevent this are:

Limit the values from your reward function
Make sure that you can handle situations when your learning environment is unstable like the process crashed, froze, experienced a bug. For example if you give your agent negative reward when he falls (robot trying to walk) and the environment doesn't detect the fall because of some rare bug, then your reward function keeps giving negative rewards until the episode stopped.

Most of the time this is not that big of a deal but if you are unlucky your environment could even produce NaN values and those would corrupt your network

PPO2 reinforcement learning 'catastrophic forgetting'?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in REINFORCEMENT-LEARNING

Related Questions in POLICY-GRADIENT-DESCENT

Trending Questions

Popular # Hahtags

Popular Questions