Merging and splitting time and action steps from TF-agents

239 Views Asked by At

I am trying to use TF-agents in a simple multi-agent non-cooperative parallel game. To simplify, I have two agents, defined with TF-agents. I defined a custom gym environment that takes as input the combined actions of agents and return an observation. The agents' policies should not take the full observation as input, only part of it. So I need to do two things:

  • Split the time_step instance returned by the TF-agents environment wrapper to feed to the agents' policies independently
  • Merge the action_step instances coming from the agents' policies to feed the environment.

If agent1_policy and agent2_policy are the two TF-agents policies and environment is a TF-agents environment, I would like to be able to do that to collect steps:

from tf_agents.trajectories import trajectory

time_step = environment.current_time_step()

# Split the time_step to have partial observability
time_step1, time_step2 = split(time_step)

# Get action from each agent
action_step1 = agent1_policy.action(time_step1)
action_step2 = agent2_policy.action(time_step2)

# Merge the independent actions
action_merged = merge(action_step1, action_step2)

# Use the merged actions to have the next step
next_time_step = environment.step(action_merged)

# Split the next step too
next_time_step1, next_time_step2 = split(next_time_step)

# Build two distinct trajectories
traj1 = trajectory.from_transition(time_step1, action_step1, next_time_step1)
traj2 = trajectory.from_transition(time_step2, action_step2, next_time_step2)

traj1 and traj2 are then added to buffers that are used to train the two agents.

How should I define the functions merge and split in this example?

1

There are 1 best solutions below

0
samarth.robo On

This can be done by defining the proper action_spec and observation_spec in the environment class. See this documentation for an example of producing an observation that is a dictionary of tensors. A similar approach can be used for accepting an action that is a dictionary or a tuple.