No gradients for any variable during TF Agents training

181 Views Asked by At

I'm playing around with reinforcement learning and chose the 2048 game to start with. I have followed the guide for the TF-Agents package and copied most of the code from the cart pole environment and the reinforce agent.

In the tutorial they use an ActorDistributionNetwork that ships with TF Agents:

actor_net = actor_distribution_network.ActorDistributionNetwork(
    train_env.observation_spec(),
    train_env.action_spec(),
    fc_layer_params=fc_layer_params)

This does not seem to suit my needs as the input is a (16, 18) tensor, hot encoding of the 18 possible states on the 16 grid sites. The output is a (4) tensor, it should be the softmax with four categories. In between I just want to have a few dense layers.

The agent is just copied from the tutorial:

optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)
train_step_counter = tf.compat.v2.Variable(0)
tf_agent = reinforce_agent.ReinforceAgent(
    train_env.time_step_spec(),
    train_env.action_spec(),
    actor_network=actor_net,
    optimizer=optimizer,
    normalize_returns=True,
    use_advantage_loss=False,
    train_step_counter=train_step_counter)
tf_agent.initialize()

And I have a training loop, copied from the tutorial as well:

for _ in tqdm.tqdm(range(num_iterations)):
    # Collect a few episodes using collect_policy and save to the replay buffer.
    collect_episode(
        train_env, tf_agent.collect_policy, collect_episodes_per_iteration, replay_buffer)
    
    # Use data from the buffer and update the agent's network.
    experience = replay_buffer.gather_all()
    train_loss = tf_agent.train(experience)
    replay_buffer.clear()

With the given actor_net, the training works fine, the results are just nonsense. The actor basically has a random policy as the action output is a vector with four elements around 0.5. Apparently there is no softmax in the end.

I have tried to replace the network with a simple stack of Keras layers, like so:

actor_net = tf_agents.networks.Sequential(
    layers=[
        # tf.keras.layers.Flatten(),
        tf.keras.layers.Dense(32, activation=tf.keras.activations.relu),
        tf_agents.keras_layers.InnerReshape((16, 32), (16 * 32,)),
        tf.keras.layers.Dense(32, activation=tf.keras.activations.relu),
        tf.keras.layers.Dense(4, activation=tf.keras.activations.softmax),
    ],
    input_spec=train_env.observation_spec()
)

The InnerReshape comes because during experience gathering (or playing) the input shape is always (B, 16, 18), whereas during training it will be (B, T, 16, 18) where B is the batch size and T is the number of time steps that were done in one episode. A plain Keras Reshape or Flatten layer would also try to flatten away the time axis, which has a different number of elements due to the open nature of the game.

When I try to train this, I am told that there are no gradients provided for any variable:

ValueError: No gradients provided for any variable: ["<tf.Variable 'sequential/dense/kernel:0' shape=(18, 32) dtype=float32>", "<tf.Variable 'sequential/dense/bias:0' shape=(32,) dtype=float32>", "<tf.Variable 'sequential/dense_1/kernel:0' shape=(512, 32) dtype=float32>", "<tf.Variable 'sequential/dense_1/bias:0' shape=(32,) dtype=float32>", "<tf.Variable 'sequential/dense_2/kernel:0' shape=(32, 4) dtype=float32>", "<tf.Variable 'sequential/dense_2/bias:0' shape=(4,) dtype=float32>"].

The complete trace:

Traceback (most recent call last):
  File "/home/mu/reinforcement-2048/main.py", line 3, in <module>
    ri2048.__main__.main()
  File "/home/mu/reinforcement-2048/ri2048/__main__.py", line 16, in main
    ri2048.training.make_agent()
  File "/home/mu/reinforcement-2048/ri2048/training.py", line 103, in make_agent
    train_loss = tf_agent.train(experience)
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
    result = self._call(*args, **kwds)
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
    *args, **kwds))
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tf_agents/agents/tf_agent.py", line 519, in train
    experience=experience, weights=weights, **kwargs)
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tf_agents/utils/common.py", line 185, in with_check_resource_vars
    return fn(*fn_args, **fn_kwargs)
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tf_agents/agents/reinforce/reinforce_agent.py", line 289, in _train
    grads_and_vars, global_step=self.train_step_counter)
  File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/training/optimizer.py", line 595, in apply_gradients
    ([str(v) for _, v, _ in converted_grads_and_vars],))
ValueError: No gradients provided for any variable: ["<tf.Variable 'sequential/dense/kernel:0' shape=(18, 32) dtype=float32>", "<tf.Variable 'sequential/dense/bias:0' shape=(32,) dtype=float32>", "<tf.Variable 'sequential/dense_1/kernel:0' shape=(512, 32) dtype=float32>", "<tf.Variable 'sequential/dense_1/bias:0' shape=(32,) dtype=float32>", "<tf.Variable 'sequential/dense_2/kernel:0' shape=(32, 4) dtype=float32>", "<tf.Variable 'sequential/dense_2/bias:0' shape=(4,) dtype=float32>"].

My whole code is on GitHub, mostly in the environment.py and training.py files.

I suppose that this is something minor. How can I obtain the gradients needed for training?

0

There are 0 best solutions below