Temporal Difference Learning and Back-propagation

824 Views Asked by asdf At 17 August 2025 at 21:31

I have read this page of standford - https://web.stanford.edu/group/pdplab/pdphandbook/handbookch10.html. I am not able to understand how TD learning is used in neural networks. I am trying to make a checkers AI which will use TD learning, similar to what they have implemented in backgammon. Please explain the working of TD Back-Propagation.

I have already referred this question - Neural Network and Temporal Difference Learning But I am not able to understand the accepted answer. Please explain with a different approach if possible.

Original Q&A

There are 1 best solutions below

Juan Leni On 26 February 2016 at 12:41

TD learning is not used in neural networks. Instead, neural networks are used in TD learning to store the value (or q-value) function.

I think that you are confusing backpropagation ( a neural networks' concept) with bootstrapping in RL. Bootstrapping uses a combination of recent information and previous estimations to generate new estimations.

When the state-space is large and it is not easy to store the value function in tables, neural networks are used as an approximation scheme to store the value function.

The discussion on forward/backward views is more about eligibility traces, etc. A case where RL bootstraps serval steps ahead in time. However, this is not practical and there are ways (such as eligibility traces) to leave a trail and update past states.

This should not be connected or confused with back propagation in neural networks. It has nothing to do with it.

Temporal Difference Learning and Back-propagation

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in NEURAL-NETWORK

Related Questions in BACKPROPAGATION

Related Questions in REINFORCEMENT-LEARNING

Related Questions in TEMPORAL-DIFFERENCE

Trending Questions

Popular # Hahtags

Popular Questions