Recommended way to use Gymnasium with neural networks to avoid overheads in model.fit and model.predict

31 Views Asked by At

I'm trying the Episodic Semi-Gradient Sarsa from Sutton & Barto Chapter 10 (second edition) for the CartPole problem using Gymnasium. For function approximation, I'm using neural networks with keras. However, implementing the algorithm faithfully forces me to use a batch size of 1 for fit and predict, and this results in extremely slow code. An alternative is to first run the code to collect data from gymnasium, and then use the data to train the neural network offline. Is that recommended - it'd be offline but still on-policy if I understand correctly)? Or is there some other standard way to use neural networks with Gymnasium without compromising on performance?

Episodic Semi-Gradient Sarsa from Sutton & Barto

Outline of my current attempt -

import gymnasium as gym
from numpy.random import choice as random_choice
from numpy import array, argmax

I wrote the algorithm as the following python code:

env = gym.make('CartPole-v1')

for ep_idx in range(num_episodes):
    terminated = False
    state, _ = env.reset()
    action = env.action_space.sample()
    while not terminated:
        action_ = policy.take_action(state, qvalue, ep_idx)
        state_, reward, terminated, _, _ = env.step(action_)
        if terminated:
            qvalue.update(state, action, reward, None, None)
        else:
            qvalue.update(state, action, reward, state_, action_)
        state, action = state_, action_

For function approximation, I decided to use Keras. This is implemented inside the qvalue.update as follows:

class QValueFunction:
    def __init__(self, discount, learning_rate, num_actions, *state_vector_dim):
        # not shown here for brevity
    def __call__(self, state, action=None):
        # not shown here for brevity
    def update(self, s, a, r, s_, a_):
        model = self._model # instance of keras.models.Model
        gamma = self._discount # float
        update_targets = self._update_targets # a pre-allocated numpy array
        q = self
        update_targets[:] = q(s, None)
        self._s[:] = s
        s = self._s
        if s_ is None and a_ is None:
            update_targets[0, a] = r
        else:
            update_targets[0, a] = r + gamma * q(s_, a_)
        model.fit(s, update_targets, batch_size=1, verbose=0)

And policy is an instance of EpsilonGreedyPolicy:

class EpsilonGreedyPolicy:
    def __init__(self, epsilon):
        self.eps = epsilon
    def take_action(self, state, qvalue, ep=None):
        num_actions = qvalue.num_actions
        if callable(self.eps): eps = self.eps(ep+1)
        else: eps = self.eps
        if rand() < eps:
            return random_choice(num_actions)
        else:
            qvalues = qvalue(state)
            return argmax(qvalues)

The above code runs at about 1 episode per 10 seconds on my laptop (CPU only). To check how fast the code can actually run, I tried using a random policy (eps=1) to generate data for 1000 episodes generating 20000+ tuples of (s, a, r, s_, a_). This required just about 10 seconds. Next, I used this data to train the neural network separately, this resulted in about 1 second per 10000 data points by passing all the data at once to the model.predict and model.fit of Keras. In essence, running the code faithfully to the algorithm using a batch size of 1 for model.fit and model.predict requires 10000s of seconds, while running it as (i) generate data first (ii) train neural network next requires 10s to 100s of seconds.

Is there any recommended way of using neural networks with Gymnasium to avoid such heavy overheads?

0

There are 0 best solutions below