I am training a reinforcement learning model on a custom environment and logging with Weights & Biases. Everything seems to log properly, except the gradient and parameter histograms. No matter how frequently I log, the graphs are always constant over all timesteps.The X-axis also does not start at 0, strangely.
However, I expect them to look something like this, where the gradients vary over time:
My model learns and its behavior changes greatly over time. I have tested many different hyper-parameters such as the learning rate, number of epochs to backprop over every rollout, gradient clipping, and more, and trained for over 100,000 episodes comprising millions of total steps. So, I don't think that the gradients are actually the exact same at every time step for literally every layer.
This should be the relevant part of my code:
config = {
"total_timesteps": model_parameters["n_steps"]*num_cpu*5,
"log_interval": 1,
}
run = wandb.init(
project="MyProject",
sync_tensorboard=True, # auto-upload sb3's tensorboard metrics
save_code=True, # optional
name=run_name # optional
)
wandbCb = callback=WandbCallback(
gradient_save_freq=1,
model_save_path=f"models/{run_name}",
verbose=2,
)
RewardCb = RewardCallback(eval_freq=model_parameters["n_steps"]*num_cpu)
callbacks = CallbackList([
wandbCb,
RewardCb,
])
print("Learning...")
model.learn(total_timesteps=config["total_timesteps"],
log_interval=config["log_interval"],
progress_bar=True,
callback=callbacks,
)
run.finish()
Does anyone have insight why these gradients don't change? Thank you.

