Trying to understand multi agents reinforcement learning (MARL) using SB3 PPO and pettingzoo pistonball env. Learning pistonball with parallel interface works ok (see enclosed code). Now I want to use AEC environment without conversion to parallel interface but all possible version I have tried is not accepted by the PPO function. Always the same error: builtins.ValueError: The environment is of type , not a Gymnasium environment. In this case, we expect OpenAI Gym to be installed and the environment to be an OpenAI Gym environment.
The demo code in pettingzoo for running pistonball with AEC or Parallel environment with random actions works well. This indicates that using AEC (without parallel) should work? I have tested (and more) env = pistonball_v6.env(n_pistons=20,time_penalty=-0.1,... and all combinations of ss. num_cpus=1 must be 1 as have 1 gpu, windows10, pettingzoo 1.24.0
from stable_baselines3 import PPO
from pettingzoo.butterfly import pistonball_v6
import supersuit as ss
from stable_baselines3.ppo import CnnPolicy
MODEL_FILE = "c:/temp/policy_101"
TIME_STEP = 5000_000
env = pistonball_v6.parallel_env(n_pistons=20,time_penalty=-0.1,
continuous=True,random_drop=True,random_rotate=True,
ball_mass=0.75,ball_friction=0.3,ball_elasticity=1.5, max_cycles=125,
render_mode=None)
env = ss.color_reduction_v0(env, mode="B")
env = ss.resize_v1(env, x_size=84, y_size=84)
env = ss.frame_stack_v1(env,3)
env = ss.pettingzoo_env_to_vec_env_v1(env)
env = ss.concat_vec_envs_v1(env,4,num_cpus=1,
base_class="stable_baselines3")
model = PPO(CnnPolicy,env,verbose=3,gamma=0.95,n_steps=256,ent_coef=0.0905168,learning_rate=0.00062211,vf_coef=0.042202,
max_grad_norm=0.9, gae_lambda=0.99,n_epochs=5,clip_range=0.3,
batch_size=256)
model.learn(total_timesteps = TIME_STEP)
model.save(MODEL_FILE)