I created a custom OpenAI Gym environment that has the following observation_space:
self.observation_space = spaces.Dict({
'msecFromStart': spaces.Box(low=1, high=np.inf, shape=(1,), dtype=np.int64),
'mStatus': spaces.Discrete(3),
'selectionDone': spaces.Discrete(2),
})
I trained a very simple agent with the following:
# Create an instance of your custom environment
env = CustomEnv()
env.reset()
model = PPO("MultiInputPolicy", env, verbose=1)
model.learn(total_timesteps=10000)
Everything seems to work fine, but when I try to test the trained model with the following
# Test the trained model
num_episodes = 5 # You can adjust the number of episodes as needed
for episode in range(num_episodes):
print(f"Testing Episode {episode + 1}")
observation = env.reset()
done = False
while not done:
# Use the trained model to predict the action
action, _ = model.predict(observation)
# Take the predicted action in the environment
observation, reward, done, _ = env.step(action)
# Optional: introduce a delay between steps
time.sleep(0.1)
I get the following error IndexError: only integers, slices (:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices
It seems to be related to how PPO works and the structure of the observation space... any help?