IndexError while making predictions with model.predict using PPO and custom OpenAI Gym

37 Views Asked by At

I created a custom OpenAI Gym environment that has the following observation_space:

self.observation_space = spaces.Dict({
    'msecFromStart': spaces.Box(low=1, high=np.inf, shape=(1,), dtype=np.int64),
    'mStatus': spaces.Discrete(3),
    'selectionDone': spaces.Discrete(2),
})

I trained a very simple agent with the following:

# Create an instance of your custom environment
env = CustomEnv()
env.reset()
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(total_timesteps=10000)

Everything seems to work fine, but when I try to test the trained model with the following

# Test the trained model
num_episodes = 5  # You can adjust the number of episodes as needed
for episode in range(num_episodes):
    print(f"Testing Episode {episode + 1}")
    observation = env.reset()
    done = False
    while not done:
        # Use the trained model to predict the action
        action, _ = model.predict(observation)

        # Take the predicted action in the environment
        observation, reward, done, _ = env.step(action)

        # Optional: introduce a delay between steps
        time.sleep(0.1)

I get the following error IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

It seems to be related to how PPO works and the structure of the observation space... any help?

0

There are 0 best solutions below