RAM Usage keeps growing while training reinforcement learning agent

Question

RAM Usage keeps growing while training reinforcement learning agent

335 Views Asked by Bryan Carty At 01 July 2025 at 13:58

The other day I started training my Atari Breakout reinforcement learning agent. But after an hour and a half or so I noticed my screen started freezing and it became very difficult to interact with the computer via mouse.

So, I decided I'd rerun the program, but would monitor the system components. One thing I noticed was the RAM would continue to grow the longer the program ran. My first suspect was the replay buffer so I dedicated considerable amount of time to that to reduce it's memory requirements. But I got the same thing. To investigate further, I cut off any additions to the repay buffer after 50,000 to see if RAM usage continued to grow, it did. I eventually narrowed it down to this section of code:

def get_gradients(self, target_q_values, importance, states, actions):
        with tf.GradientTape() as tape:
            q_values_current_state_dqn = self.dqn_architecture(states)
            one_hot_actions = tf.keras.utils.to_categorical(actions, self.num_legal_actions, dtype=np.float32) # e.g. [[0,0,1,0],[1,0,0,0],...]
            Q = tf.reduce_sum(tf.multiply(q_values_current_state_dqn, one_hot_actions), axis=1)
            error = Q - tf.cast(target_q_values, tf.float32)
            loss = tf.keras.losses.Huber()(target_q_values, Q)
            
            if self.use_prioritized_experience_replay:
                loss = tf.reduce_mean(loss * importance) # Gradient is scaled -> loss = lower at begining -> reduces bias against situataions that are sampled more frequently
            
        dqn_architecture_gradients = tape.gradient(loss, self.dqn_architecture.trainable_variables) # Computes the gradient using operations recorded in context of this tape.
        self.dqn_architecture.optimizer.apply_gradients(zip(dqn_architecture_gradients, self.dqn_architecture.trainable_variables))  
        return loss, error

It should be noted that I also seen the following show up in the logs.

2023-02-16 22:48:32,045 5 out of the last 5 calls to <function Agent.get_gradients at 0x7fb3ec66e830> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
2023-02-16 22:48:32,217 6 out of the last 6 calls to <function Agent.get_gradients at 0x7fb3ec66e830> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

The get_gradient function outlined above is called from this function:

def train_network(self, batch_size, gamma, frame_number, priority_scale):
    importance = 0
    if self.use_prioritized_experience_replay:
        (states, actions, rewards, new_states, terminal_flags), importance, indices = self.replay_buffer.sample_buffer(self.batch_size, priority_scale)
        importance = importance ** (1-self.calculate_epsilon(frame_number)) # recently started training = low frame number = high epsilon = low power = largely decreased importance = lower importance, later in training is slightly decreased importance. Increases importance of newer frames
    else:
        states, actions, rewards, new_states, terminal_flags = self.replay_buffer.sample_buffer(self.batch_size, priority_scale)
    
    best_action_in_next_state_dqn = self.dqn_architecture.predict(new_states, verbose=0).argmax(axis=1)
    target_q_network_q_values = self.target_dqn_architecture.predict(new_states, verbose=0)
    optimal_q_value_in_next_state_target_dqn = target_q_network_q_values[range(batch_size), best_action_in_next_state_dqn]
    target_q_values = rewards + (gamma*optimal_q_value_in_next_state_target_dqn * (1-terminal_flags)) # makes 0 if terminal flag set
    # Calculate loss and perform gradfient descent
    # TensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.
    loss, error = self.get_gradients(target_q_values, importance, states, actions)
    
    if self.use_prioritized_experience_replay:
        self.replay_buffer.set_priorities(indices, error)
        
    return float(loss.numpy()), error

And that train_network function is called from the following loop:

while frame_number < NUM_FRAMES_AGENT_TRAINED_OVER:
        breakout_environment.reset_env()
        episode_reward_sum = 0
        for _ in range(MAX_EPISODE_LENGTH):
            # Get action
            action = breakout_agent.take_action(frame_number, breakout_environment.state)
            
            # Take step
            frame, reward, terminal, life_lost = breakout_environment.step(action)
            frame_number += 1
            episode_reward_sum += reward

            # Add experience to replay memory  action, frame, reward, terminal, clip_reward
            breakout_agent.add_experience_to_replay_buffer(action, frame[:, :, 0], reward, life_lost, CLIP_REWARD)
            

            # Train the network every 4 additions to the replay buffer 
            if frame_number % UPDATE_FREQUENCY == 0 and breakout_agent.replay_buffer.total_indexes_written_to > REPLAY_BUFFER_START_SIZE:
                loss, _ = breakout_agent.train_network(BATCH_SIZE, DISCOUNT_FACTOR, frame_number, PRIORITY_SCALE) # batch_size, gamma, frame_number, priority_scale
                loss_list.append(loss)

            # Update target network
            if frame_number % TARGET_UPDATE_FREQ == 0 and frame_number > REPLAY_BUFFER_START_SIZE:
                breakout_agent.update_target_network()

            # Break the loop when the game is over
            if terminal:
                break
        rewards_list.append(episode_reward_sum)

Any help would be greatly appreciated

EDIT: On further investigation, I found a question on stackoverflow that stated 'Passing python scalars or lists as arguments to tf.function will always build a new graph. To avoid this, pass numeric arguments as Tensors whenever possible' So I need to convert the optimizer.apply_gradients arguments from python lists to tensorflow tensors or another tensorflow data type. As the lists are of varying dimensionality with varying nesting depths I can't use tf.convert_to_tensor or tf.ragged.constant

Original Q&A

RAM Usage keeps growing while training reinforcement learning agent

There are 0 best solutions below

Related Questions in TENSORFLOW

Related Questions in TENSORFLOW2.0

Related Questions in TF.KERAS

Related Questions in TENSORFLOW-GRADIENT

Trending Questions

Popular # Hahtags

Popular Questions