Gradient Temporal Difference Lambda without Function Approximation

283 Views Asked by Andnp At 30 April 2016 at 15:43

In every formalism of GTD(λ) seems to define it in terms of function approximation, using θ and some weight vector w.

I understand that the need for gradient methods widely came from their convergence properties for linear function approximators, but I would like to make use of GTD for the importance sampling.

Is it possible to take advantage of GTD without function approximation? If so, how are the update equations formalized?

Original Q&A

There are 1 best solutions below

Pablo EM On 04 May 2016 at 11:49 BEST ANSWER

I understand that when you say "without function approximation" you mean representing the value function V as a table. In that case, the tabular representation of V can also be seen as a function approximator.

For example, if we define the approximated value function as:

Then, using a tabular representation, there are as many features as states, and the feature vector for a given state s is zero for all states except s (that it's equal to one), and the parameter vector theta stores the value for each state. Therefore, GTD, as well as others algorithms, can be used without any modification in a tabular way.

Gradient Temporal Difference Lambda without Function Approximation

There are 1 best solutions below

Related Questions in MACHINE-LEARNING

Related Questions in REINFORCEMENT-LEARNING

Related Questions in TEMPORAL-DIFFERENCE

Trending Questions

Popular # Hahtags

Popular Questions