Gymnasium. Actions in continuous spaces

207 Views Asked by At

I am introduced to Gymnasium (gym) and RL and there is a point that I do not understand, relative to how gym manages actions.

I've read that actions in a gym environment are integer numbers, meaning that to the “step” function on gym, a single integer is passed:

observation_, reward, done, info = env.step(action)

I understand that in discrete environments, each integer can represent a specific action, as in the case of the “Cart pole” or “Mountain car”. However, what happens in continuous environments?

In a continuous environment, like the ant or the humanoid, there is a list of actions to select, but these actions also have a range of values, and this, from my point of view, implies two values.

For example, in the ant environment, there are 8 possible actions, e.g. action 0, “Torque applied on the rotor between the torso and back right hip” has a [-1.0, 1.0] range. That is, the value that represents the action and the value that represents the magnitude of the action.

Then, my question is: How does gym manage to know the specific value that an action has? In other words, it shouldn't be two values? one for the selected action and another one for the magnitude of the action?

1

There are 1 best solutions below

0
HyeAnn On

You should first check how the fundamenmtal spaces are made in Gymnasium: Fundamental spaces

The action space of Ant is made of Box, which "supports continuous (and discrete) vectors or matrices". In other words, an action for Ant is given as vector of 8 dimensions. You can check with the code below:

import gymnasium as gym
env = gym.make("Ant-v4")
action = env.action_space.sample()

print(type(action))
print(action)

>> <class 'numpy.ndarray'>
   [-0.92111397 -0.7149495  -0.6188663   0.8434039   0.75418323  0.63292015 
   0.6150556  -0.3987422 ]

“Torque applied on the rotor between the torso and back right hip” of this action is -0.92111397, which is action[0].