I'm implementing PPO2 reinforcement learning on my self-build tasks and always encounter such situations where the agent seems to be nearly matured then suddenly catstrophically loses its performance and couldn't hold its stable performance. I don't know what's the right word for it.
I'm just wondering what could be the cause for such catastrophic drop in performance? Any hints or tips?
Many thanks
I would guess that your reward function is not capped and can produce extremely high negative rewards in some edge cases.
Two things to prevent this are:
Most of the time this is not that big of a deal but if you are unlucky your environment could even produce NaN values and those would corrupt your network