How to get probability vector for all actions in tf-agents?

208 Views Asked by At

I'm working on Multi-Armed-Bandit problem, using LinearUCBAgent and LinearThompsonSamplingAgent but they both return a single action for an observation. What I need is the probability for all the action which I can use for ranking.

1

There are 1 best solutions below

0
On

You need to add the emit_policy_info argument when defining the agent. The specific values (encapsulated in a tuple) will depend on the agent: predicted_rewards_sampled for LinearThompsonSamplingAgent and predicted_rewards_optimistic for LinearUCBAgent.

For example:

agent = LinearThompsonSamplingAgent(
        time_step_spec=time_step_spec,
        action_spec=action_spec,
        emit_policy_info=("predicted_rewards_sampled")
    )

Then, during inference, you'll need to access those fields and normalize them (via softmax):

action_step = agent.collect_policy.action(observation_step)
scores = tf.nn.softmax(action_step.info.predicted_rewards_sampled)

where tf comes from import tensorflow as tf and observation_step is your observation array encapsulated in a TimeStep (from tf_agents.trajectories.time_step import TimeStep)

Note of caution: these are NOT probabilities, they are normalized scores; similar to the normalized outputs of a fully-connected layer.