I'm playing around with reinforcement learning and chose the 2048 game to start with. I have followed the guide for the TF-Agents package and copied most of the code from the cart pole environment and the reinforce agent.
In the tutorial they use an ActorDistributionNetwork
that ships with TF Agents:
actor_net = actor_distribution_network.ActorDistributionNetwork(
train_env.observation_spec(),
train_env.action_spec(),
fc_layer_params=fc_layer_params)
This does not seem to suit my needs as the input is a (16, 18) tensor, hot encoding of the 18 possible states on the 16 grid sites. The output is a (4) tensor, it should be the softmax with four categories. In between I just want to have a few dense layers.
The agent is just copied from the tutorial:
optimizer = tf.compat.v1.train.AdamOptimizer(learning_rate=learning_rate)
train_step_counter = tf.compat.v2.Variable(0)
tf_agent = reinforce_agent.ReinforceAgent(
train_env.time_step_spec(),
train_env.action_spec(),
actor_network=actor_net,
optimizer=optimizer,
normalize_returns=True,
use_advantage_loss=False,
train_step_counter=train_step_counter)
tf_agent.initialize()
And I have a training loop, copied from the tutorial as well:
for _ in tqdm.tqdm(range(num_iterations)):
# Collect a few episodes using collect_policy and save to the replay buffer.
collect_episode(
train_env, tf_agent.collect_policy, collect_episodes_per_iteration, replay_buffer)
# Use data from the buffer and update the agent's network.
experience = replay_buffer.gather_all()
train_loss = tf_agent.train(experience)
replay_buffer.clear()
With the given actor_net
, the training works fine, the results are just nonsense. The actor basically has a random policy as the action output is a vector with four elements around 0.5. Apparently there is no softmax in the end.
I have tried to replace the network with a simple stack of Keras layers, like so:
actor_net = tf_agents.networks.Sequential(
layers=[
# tf.keras.layers.Flatten(),
tf.keras.layers.Dense(32, activation=tf.keras.activations.relu),
tf_agents.keras_layers.InnerReshape((16, 32), (16 * 32,)),
tf.keras.layers.Dense(32, activation=tf.keras.activations.relu),
tf.keras.layers.Dense(4, activation=tf.keras.activations.softmax),
],
input_spec=train_env.observation_spec()
)
The InnerReshape
comes because during experience gathering (or playing) the input shape is always (B, 16, 18), whereas during training it will be (B, T, 16, 18) where B is the batch size and T is the number of time steps that were done in one episode. A plain Keras Reshape
or Flatten
layer would also try to flatten away the time axis, which has a different number of elements due to the open nature of the game.
When I try to train this, I am told that there are no gradients provided for any variable:
ValueError: No gradients provided for any variable: ["<tf.Variable 'sequential/dense/kernel:0' shape=(18, 32) dtype=float32>", "<tf.Variable 'sequential/dense/bias:0' shape=(32,) dtype=float32>", "<tf.Variable 'sequential/dense_1/kernel:0' shape=(512, 32) dtype=float32>", "<tf.Variable 'sequential/dense_1/bias:0' shape=(32,) dtype=float32>", "<tf.Variable 'sequential/dense_2/kernel:0' shape=(32, 4) dtype=float32>", "<tf.Variable 'sequential/dense_2/bias:0' shape=(4,) dtype=float32>"].
The complete trace:
Traceback (most recent call last):
File "/home/mu/reinforcement-2048/main.py", line 3, in <module>
ri2048.__main__.main()
File "/home/mu/reinforcement-2048/ri2048/__main__.py", line 16, in main
ri2048.training.make_agent()
File "/home/mu/reinforcement-2048/ri2048/training.py", line 103, in make_agent
train_loss = tf_agent.train(experience)
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 828, in __call__
result = self._call(*args, **kwds)
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 871, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 726, in _initialize
*args, **kwds))
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 2969, in _get_concrete_function_internal_garbage_collected
graph_function, _ = self._maybe_define_function(args, kwargs)
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 3361, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/function.py", line 3206, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 990, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 634, in wrapped_fn
out = weak_wrapped_fn().__wrapped__(*args, **kwds)
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tf_agents/agents/tf_agent.py", line 519, in train
experience=experience, weights=weights, **kwargs)
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tf_agents/utils/common.py", line 185, in with_check_resource_vars
return fn(*fn_args, **fn_kwargs)
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tf_agents/agents/reinforce/reinforce_agent.py", line 289, in _train
grads_and_vars, global_step=self.train_step_counter)
File "/home/mu/reinforcement-2048/venv/lib64/python3.7/site-packages/tensorflow/python/training/optimizer.py", line 595, in apply_gradients
([str(v) for _, v, _ in converted_grads_and_vars],))
ValueError: No gradients provided for any variable: ["<tf.Variable 'sequential/dense/kernel:0' shape=(18, 32) dtype=float32>", "<tf.Variable 'sequential/dense/bias:0' shape=(32,) dtype=float32>", "<tf.Variable 'sequential/dense_1/kernel:0' shape=(512, 32) dtype=float32>", "<tf.Variable 'sequential/dense_1/bias:0' shape=(32,) dtype=float32>", "<tf.Variable 'sequential/dense_2/kernel:0' shape=(32, 4) dtype=float32>", "<tf.Variable 'sequential/dense_2/bias:0' shape=(4,) dtype=float32>"].
My whole code is on GitHub, mostly in the environment.py
and training.py
files.
I suppose that this is something minor. How can I obtain the gradients needed for training?