Keras-RL episodes returning same values after fitting model

Question

Keras-RL episodes returning same values after fitting model

1.7k Views Asked by user3745453 At 27 June 2025 at 16:54

So I have created a custom environment using OpenAI Gym. I'm closely following the keras-rl examples of the DQNAgent for the CartPole example which leads to the following implementation:

nb_actions = env.action_space.n

# Option 1 : Simple model
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(nb_actions))
model.add(Activation('linear'))

# Next, we build a very simple model. 
#model = Sequential() 
#model.add(Flatten(input_shape=(1,) + env.observation_space.shape)) 
#model.add(Dense(16)) 
#model.add(Activation('relu')) 
#model.add(Dense(16)) 
#model.add(Activation('relu')) 
#model.add(Dense(16)) 
#model.add(Activation('relu')) 
#model.add(Dense(nb_actions, activation='linear')) 
#model.add(Activation('linear'))

# Finally, we configure and compile our agent. You can use every built-in Keras optimizer and # even the metrics! 

memory = SequentialMemory(limit=50000, window_length=1) 
policy = BoltzmannQPolicy() 
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10, target_model_update=1e-2, policy=policy) 

dqn.compile(Adam(lr=1e-3), metrics=['mae']) 

# Okay, now it's time to learn something! We visualize the training here for show, but this # slows down training quite a lot. You can always safely abort the training prematurely using # Ctrl + C. 

dqn.fit(env, nb_steps=2500, visualize=False, verbose=2) 

# After training is done, we save the final weights. 

dqn.save_weights('dqn_{}_weights.h5f'.format(ENV_NAME), overwrite=True) 

# Finally, evaluate our algorithm for 5 episodes. 

dqn.test(env, nb_episodes=10, visualize=False)

So everything looks as I would expect up until the dqn.test function call. Sample output from the dqn.fit is as follows:

... 1912/2500: episode: 8, duration: 1.713s, episode steps: 239, steps per second: 139, episode reward: -78.774, mean reward: -0.330 [-27928.576, 18038.443], mean action: 0.657 [0.000, 2.000], mean observation: 8825.907 [5947.400, 17211.920], loss: 7792970.500000, mean_absolute_error: 653.732361, mean_q: 1.000000

2151/2500: episode: 9, duration: 1.790s, episode steps: 239, steps per second: 134, episode reward: -23335.055, mean reward: -97.636 [-17918.534, 17819.400], mean action: 0.636 [0.000, 2.000], mean observation: 8825.907 [5947.400, 17211.920], loss: 8051206.500000, mean_absolute_error: 676.335266, mean_q: 1.000000

2390/2500: episode: 10, duration: 1.775s, episode steps: 239, steps per second: 135, episode reward: 16940.150, mean reward: 70.879 [-25552.948, 17819.400], mean action: 0.611 [0.000, 2.000], mean observation: 8825.907 [5947.400, 17211.920], loss: 8520963.000000, mean_absolute_error: 690.176819, mean_q: 1.000000

With the various rewards being different it appears to me that the fitting is working as expected. But when the dqn.test method is run, it keeps generating the same output for each episode. In the case of the data I'm using, negative rewards are bad and positive rewards would be good.

Here is the result of the test method being run:

Testing for 10 episodes

Episode 1: reward: -62996.100, steps: 239
Episode 2: reward: -62996.100, steps: 239
Episode 3: reward: -62996.100, steps: 239
Episode 4: reward: -62996.100, steps: 239
Episode 5: reward: -62996.100, steps: 239
Episode 6: reward: -62996.100, steps: 239
Episode 7: reward: -62996.100, steps: 239
Episode 8: reward: -62996.100, steps: 239
Episode 9: reward: -62996.100, steps: 239
Episode 10: reward: -62996.100, steps: 239

This leads me to two questions:

1) Why are the episode rewards the same for each episode?

2) Why might the model be recommending a set of actions that lead to terrible rewards?

Original Q&A

There are 1 best solutions below

**Alex Liang** · Answer 1

I would check the env object and see if it already has rewards computation there.

I am wondering if the .fit function is not exploring the state space for some reason.

I recently did a RL project (lunar lander) with open AI gym and Keras, although I didn’t use the DQN agent and other Keras built in RL stuff. I simply built a simple feedforwad network. Check this GitHub link see if it’s helpful? https://github.com/tianchuliang/techblog/tree/master/OpenAIGym

Keras-RL episodes returning same values after fitting model

There are 1 best solutions below

Related Questions in KERAS

Related Questions in REINFORCEMENT-LEARNING

Related Questions in OPENAI-GYM

Related Questions in KERAS-RL

Trending Questions

Popular # Hahtags

Popular Questions