In the tensorflow documentation for TF-Agents Environments there is an example of an environment for a simple (blackjack inspired) card game.
The init looks like the following:
class CardGameEnv(py_environment.PyEnvironment):
def __init__(self):
self._action_spec = array_spec.BoundedArraySpec(
shape=(), dtype=np.int32, minimum=0, maximum=1, name='action')
self._observation_spec = array_spec.BoundedArraySpec(
shape=(1,), dtype=np.int32, minimum=0, name='observation')
self._state = 0
self._episode_ended = False
The action spec allows only for 0 (do not ask for a card) or 1 (ask for a card), and so it's sensible that the shape is shape=()
(just needs an integer).
However I don't quite understand the observation spec shape being shape=(1,)
, given that it will just represent the sum of the cards in the current round (so also an integer).
What explains the difference in shapes?
At start I thought they were the same. To test them, I've run the following code on the W3 Schools Python "Try Editor" (I accessed it through this link):
The output I got was:
This leads me to conclude that the
shape=()
is a simple integer, treated as a 0-D array, butshape=(1,)
is an 1-D array that consists of a single integer. I hope this is accurate, as I'd like some confirmation myself. On a second test to check this further:The output was:
This seems to corroborate with what I concluded first, as arr1 is a 0-D array and arr3 is a 1-D array of 4 elements (as explained in the W3 Schools tutorial), and the array arr2 has a similar shape to arr3, but with a different number of elements.
As for why the action and observation are represented respectively as integer and array of one element, it is probably because TensorFlow works using tensors (arrays of n-dimensions), and calculations might be easier considering the observation as an array.
The action is declared as an integer probably to ease the process flow inside the
_step()
function, as it would be a little more tedious to work with an array for the if/elif/else structure. There are other examples of action_specs with more elements and discrete/continuous values, so nothing else comes to mind.I am not really sure all of this is right, but seems a good point to at least start discussing.