Selection of minibatch size for PPO Reinforcement Learning algorithm

70 Views Asked by At

PPO algorithm in Deep Reinforcement Learning requires to set the dimension of minibatches.

In case of "standard" Deep Learning, a lot of references are available as a guide to the optimal selection, and in general "small" minibatches (up to 32 samples) are suggested.

However, in the specific case of PPO things seem to be different. Even the original paper reports minibatch sizes that vary wildly across the different tasks (from 64 to 4096 samples). I guess the reason for this may be that in PPO the data are not iid, but are correlated since they all come from the most recent trajectory, so the selection of minibatch size must account for that.

In any case, I could not find any official reference that gives a clear guideline for this. Should the size depend on the total number of samples available? When a large number of samples is available, should I prefer large or small minibatches? Does it depend on whether the observation & action spaces are discrete vs continuous?

0

There are 0 best solutions below