Gradient Temporal Difference Lambda without Function Approximation

254 Views Asked by At

In every formalism of GTD(λ) seems to define it in terms of function approximation, using θ and some weight vector w.

I understand that the need for gradient methods widely came from their convergence properties for linear function approximators, but I would like to make use of GTD for the importance sampling.

Is it possible to take advantage of GTD without function approximation? If so, how are the update equations formalized?

1

There are 1 best solutions below

1
On BEST ANSWER

I understand that when you say "without function approximation" you mean representing the value function V as a table. In that case, the tabular representation of V can also be seen as a function approximator.

For example, if we define the approximated value function as:

latex equations

Then, using a tabular representation, there are as many features as states, and the feature vector for a given state s is zero for all states except s (that it's equal to one), and the parameter vector theta stores the value for each state. Therefore, GTD, as well as others algorithms, can be used without any modification in a tabular way.