Why is Policy Iteration faster than Value Iteration?

2.7k Views Asked by shmi At 24 November 2019 at 23:33

We know that policy iteration gives us the policy directly and hence is faster. But can anyone explain it with some examples.

Original Q&A

There are 1 best solutions below

Pranshu Gupta On 24 November 2019 at 23:40 BEST ANSWER

The reason that policy iteration is faster because - one policy can be represented by an infinite number of value functions, so in policy iteration when you jump from one policy to another .. you have essentially jumped over infinite number of value functions.

For example:

p1 = [0, 1, 1]

is a policy for 3 states and 2 actions, where it chooses action 0 at state 0 and action 1 at states 1 and 2.

Now, let's consider two value functions:

v1 = [[0.9, 0.6], [0.6, 0.8], [0.8, 0.9]]

v2 = [[0.9, 0.6], [0.7, 0.8], [0.6, 0.9]]

Here, both v1 and v2 map to the same policy, so when you do policy iteration, it's like you don't care about these two as being different value functions because they map to the same policy. So when you update the policy you have essentially discarded a huge number of these value functions, each of which you might have iterated over (in worst case) when doing value iteration.

Why is Policy Iteration faster than Value Iteration?

There are 1 best solutions below

Related Questions in VALUE-ITERATION

Trending Questions

Popular # Hahtags

Popular Questions