I am new to reinforcement learning agent training. I have read about PPO algorithm and used stable baselines library to train an agent using PPO. So my question here is how do I evaluate a trained RL agent. Consider for a regression or classification problem I have metrics like r2_score or accuracy etc.. Are there any such parameters or how do I test the agent, conclude that the agent is trained well or bad.
Thanks
You can run your environment with a random policy, and then run same environment with same random seed with the trained PPO model. The comparison of the accumulated rewards gives you some initial thoughts about the performance of the trained model.
Since you use PPO, you might want to check the trajectories of gradients and the KL divergence values, to see if you have well defined threshold for accepting a gradient step. If there are very few accepted gradient step, you might want to modify your parameters.