Deep Reinforcement Learning 1-step TD not converging

136 Views Asked by John Hoeck At 14 May 2025 at 06:05

Is there some trick to getting 1-step TD (temporal difference) prediction to converge with a neural net? The network is a simple feed forward network using ReLU. I've got the network working for Q-learning in the following way:

  gamma = 0.9
  q0 = model.predict(X0[times+1])
  q1 = model.predict(X1[times+1])
  q2 = model.predict(X2[times+1])
  q_Opt = np.min(np.concatenate((q0,q1,q2),axis=1),axis=1)
  # Use negative rewards because rewards are negative
  target = -np.array(rewards)[times] + gamma * q_Opt

Where X0, X1, and X2 are MNIST image features with actions 0, 1, and 2 concatenated onto them respectively. This method converges. What I'm trying that doesn't work:

  # What I'm trying that doesn't work
  v_hat_next = model.predict(X[time_steps+1])
  target = -np.array(rewards)[times] + gamma * v_hat_next

  history = model.fit(X[times], target, batch_size=128, epochs=10, verbose=1)

This method doesn't converge at all and in fact gives identical state values for every state. Any idea what I'm doing wrong? Is there some trick to setting up the target? The target is supposed to be +1+̂ (+1,) and I thought that's what I've done here.

Original Q&A

Deep Reinforcement Learning 1-step TD not converging

There are 0 best solutions below

Related Questions in DEEP-LEARNING

Related Questions in REINFORCEMENT-LEARNING

Related Questions in DQN

Related Questions in TEMPORAL-DIFFERENCE

Trending Questions

Popular # Hahtags

Popular Questions