It is somewhat unclear to be how SB3 differentiates between timesteps and episodes.
In the learn function you can only use the "total_timesteps" parameter, and for SB3 this is generally defined as the total number of timesteps the agent will interact with the environment during training. What is a bit weird to me is that during training you do get information about the mean reward and episode length, but I do not know how to figure out how many episodes occur in the simulation and what are the maximum number of timesteps allowed per episode.