I am working on contextual bandits in tf_Agents and using the linearUCB agent and leanr thompson sampling agent.
I can get the actions, but not sure how to get the distributions (over actions) out of the agents for a given timestep.
I know linearUCB is deterministic and hence no distribution, but couldn't get the distribution from thompson sampling even with linearthompsonsamplingagent.policy.distribution(timestep)
. It says distribution are deterministic and the log_probability is blank.
Can someone please explain how to get distributions out of it.