Issue with ACME's Distributional Reinforcement Learning Implementation in TensorFlow

29 Views Asked by Miguel Fernando Rangel Martnez At 30 August 2023 at 16:57

I've been using DeepMind's ACME library for some of my Reinforcement Learning projects and I've come across an issue that I'm hoping someone can shed light on.

Libraries and Environment

Python 3.9
TensorFlow 2.8
Acme by DeepMind

The Code

I am working with learning.py where I've noticed that the variables q_tm1 and q_t are tensors:

q_tm1 = self._critic_network(o_tm1, transitions.action)
q_t = self._target_critic_network(o_t, self._target_policy_network(o_t))

These are later passed to a function losses.categorical in the distributional.py file, which is implemented as follows:

def categorical(q_tm1: networks.DiscreteValuedDistribution,
                r_t: tf.Tensor,
                d_t: tf.Tensor,
                q_t: networks.DiscreteValuedDistribution) -> tf.Tensor:
  z_t = tf.reshape(r_t, (-1, 1)) + tf.reshape(d_t, (-1, 1)) * q_t.values
  p_t = tf.nn.softmax(q_t.logits)
  ...

The Problem

The function expects both q_tm1 and q_t to be instances of DiscreteValuedDistribution. However, they are tensors as generated in learning.py. Consequently, the program crashes when trying to access the values and logits attributes of q_t and q_tm1:

z_t = tf.reshape(r_t, (-1, 1)) + tf.reshape(d_t, (-1, 1)) * q_t.values
p_t = tf.nn.softmax(q_t.logits)

Questions

Is this behavior intentional, or perhaps an oversight? If it's intentional, what would be the workaround to ensure proper functioning? If it's not intentional, how could one go about fixing it?

What I've Tried

Tensor to DiscreteValuedDistribution Conversion: One of the initial thoughts was to convert the q_tm1 and q_t tensors into instances of DiscreteValuedDistribution. However, this approach was impractical as I didn't have access to the corresponding logits or probs, which are essential for initializing a DiscreteValuedDistribution.

# The following wouldn't work without logits or probs
q_tm1_as_dvd = networks.DiscreteValuedDistribution(values=q_tm1, logits=???)
q_t_as_dvd = networks.DiscreteValuedDistribution(values=q_t, logits=???)

Modifying categorical Implementation: Another potential fix considered was to alter the implementation of the losses.categorical function in distributional.py. However, this could create ripple effects affecting other parts of the codebase, including:

Breaking other algorithms that may rely on this function. Potentially introducing new bugs, especially if the original design was intentional. Invalidating any existing tests that were designed based on the current behavior of the function. Given these constraints, I'm hesitant to make large-scale changes without better understanding the implications and design choices behind the existing code. So any help or insights are greatly appreciated!

Original Q&A

Issue with ACME's Distributional Reinforcement Learning Implementation in TensorFlow

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in TENSORFLOW

Related Questions in REINFORCEMENT-LEARNING

Related Questions in ACME-DEEPMIND

Trending Questions

Popular # Hahtags

Popular Questions