How to tell an agent that some actions in the action space are currently not available in gym?

820 Views Asked by At
  1. As the simplest case, I define an action space to be spaces.Discrete(3), but sometimes, 0 is unavailable, agent only can sample from 1 and 2. And sometimes, 2 unavailable, or, 1 and 2 are unavailable. How can I tell the agent that some choices are not available?

(Note: By unavailable, I means that this action is impossible, will not happen, and that it's results is undefined; rather than a bad choice which results in a negative reward.)

  1. In reality, I have MultiDiscrete action spaces, and some of the actions sometimes are not available(just as in question 1). Or even worse, actions chosen from those spaces must satisfy some condition, for example, a Discrete 2 - Discrete 2 MultiDiscrete action spaces must satisfy a function that f(a1, a2) <= 1 where a1 is sampled from the first Discrete 2 space, and a2 is sampled from the second Discrete 2 space. But the f here is a complex function which is not as simple as a +, but a function which related to the current state. If this is the case, how can I tell the agent that some choices are currently unavailable?
1

There are 1 best solutions below

0
On

Not sure how you can specify that when constructing the action space, but you can sample the action samples with conditions. For your example 1, you can use a while loop to keep sampling from the action space, and only return the result if the condition is satisfied.

from gym import spaces

def conditional_action():
    while True:
        a = spaces.Discrete(3).sample()
        if a == 0:
            continue
        break
    return a

print(conditional_action())

Using the same logic, you can apply this to other action_space to sample with conditions, for example, I have a MultiDiscrete action space that I specify the sum of the array should not be more than 6.

def conditional_action():
    while True:
        a = spaces.MultiDiscrete([6 for i in range(3)]).sample()
        if sum(a) > 6:
            continue
        break
    return a

print(conditional_action())