Different observation space Multi-Agent Reinforcement Learning using PettingZoo and SuperSuit

603 Views Asked by At

I'm trying to create a Multi-Agent Reinforcement Learning step-up where there are two types of agents. Each with a different type of observation and action space, precisely, two different sizes of images, one for each agent type.

I am using 'PettingZoo' and 'SuperSuit' to warp an implementation of my costume gym environment to get vectorize and frame skip logic. In addition, I'm using a custom implantation of PPO derived from 'stable_baseline3' as the learning algorithm for both of the agent types.

Creation of my environment:

import supersuit as ss
from stable_baselines3.common.vec_env.vec_monitor import VecMonitor
from pettingzoo.utils.env import ParallelEnv

num_envs = 4
num_cpus = 4
env = ParallelEnv()  # this line is just to show the dataType of variable env.
env = ss.frame_stack_v1(env, 4)
env = ss.pettingzoo_env_to_vec_env_v1(env)
env = ss.concat_vec_envs_v1(env, num_vec_envs=num_envs, num_cpus=num_cpus, base_class="stable_baselines3")
env = VecMonitor(env)

The problem is that the environment has to contain in one way or another, two different types of action and observation spaces. The issue is that to my knowledge 'pettingzoo' and 'supersuit' doesn't support it.

I solved the action space issue using the 'MultiDiscrete' class from gym which is not the most elegant but its works...

The observation space issue is more complicated because I need the 'frame_skip' logic to apply to both of the observations which I have no control over...

I tried to solve that using 'gym.spaces.Dict' for my observation space which is a dictionary with one key-value pair belonging to agent type 1 and a different key-value pair for the other agent but supersuit doesn't support 'gym.spaces.Dict' in 'frame_stack_v1' logic, which means that although in theory, it could work in practice I got running error comes from supersuit wrappers

Is anybody has an idea of how to approach this or can refer me to an information source that can help?

0

There are 0 best solutions below